So: what’s the deal with Akaike information criterion vs. Bayesian information criterion? "Information theory” and “Bayesianism” are both things with a lot of very devoted adherents and here they appear superficially to give different answers
They correspond to different priors. AIC has a bit better underlying framework (from an information theory point of view) and I believe better empirical validation.
Ah, OK. I found this paper through Wikipedia, about AIC as Bayesian with a different (better?) prior, which looks good.
BIC has the advantage that it will converge asymptotically to the true model if the true model lies in the set of models being fitted, although it’s disputable how important this is. And BIC can be derived using a minimum description length approach (can you get AIC this way too?).
One of the things I am wary of here is the sense that “information theory is magic” – e.g. in the paper linked above:
Their celebrated result, called Kullback-Leibler information, is a fundamental quantity in the sciences […] Clearly, the best model loses the least information relative to other models in the set […]
Using AIC, the models are then easily ranked from best to worst based on the empirical data at hand. This is a simple, compelling concept, based on deep theoretical foundations (i.e., entropy, K-L information, and likelihood theory).
Maybe I just don’t understand information theory, but I’m confused why I should care that the K-L divergence is “deep” and “fundamental,” here. The question at hand is how to select a model based on some sort of estimate of how the model will generalize from the training set. In practice I hear people justify using things like AIC by saying “well, obviously, you want the most information,” where “most information” is just a verbal tag we’ve associated with the K-L divergence and I’m not sure what mathematical weight I should give to it. If AIC does well, and this is because it is based on information theory, I would like to understand this in a nonverbal way – what property of K-L divergence made it a good choice here, ignoring suggestive words like “information”?
(via vaniver)
khintchine reblogged this from lambdaphagy
vatofbrains liked this
lambdaphagy liked this
neurocybernetics-moved-blog liked this
phenoct liked this
bhishma99 liked this
nostalgebraist reblogged this from raginrayguns and added:
This is all very interesting. Another property you’d want is invariance under general changes of variables, which L2...
nostalgebraist liked this
thejavaman liked this
raginrayguns reblogged this from lambdaphagy and added:
Okay, I tried to think this through a bit with L2 distance, and I think I’m dropping several levels in HabitRPG as a...
raginrayguns liked this
eka-mark liked this
kerapace liked this
lambdaphagy reblogged this from nostalgebraist and added:
Oops, didn’t have a chance to get to this earlier. Others have already chimed in with sensible responses, but here’s...
almostcoralchaos reblogged this from dataandphilosophy
eolianthane reblogged this from nostalgebraist and added:
KL (and AIC) can be put in the MML formalism pretty naturally. KL literally tells you the number of extra bits using a...
shlevy liked this serpent-moon liked this
rangi42 liked this
itthatpoints liked this
znk liked this
automatic-ally liked this
shacklesburst liked this
snarp liked this
youzicha liked this
adzolotl liked this
thespineanditstingle liked this
polyarche liked this
spookyrukie liked this
malpollyon liked this
bulbous-oar liked this vaniver liked this
vidvilts liked this
dataandphilosophy reblogged this from nostalgebraist and added:
I poked at this a while back, because my textbook introduced both with little information, and what other (unknown...
epistemic-horror reblogged this from nostalgebraist and added:
(IANA real statistician; any or all of this may be wrong)I’m not sure it’s that surprising? The AIC is derived by...
vaniver reblogged this from nostalgebraist and added:
They correspond to different priors. AIC has a bit better underlying framework (from an information theory point of...
