william's blog | 2012-02-04 12:28:34 +0000 =========================================== Understanding the "Bayesian Average" ------------------------------------ Date: March 12, 2009 4:07pm Author: William Morgan Labels: stats URL: http://masanjin.net/blog/bayesian-average.txt IMDB rates movies using a score they call the true Bayesian estimate [1] (bottom of the page). I'm pretty sure that's a made-up term. A couple other sites, like BoardGameGeek, use the same thing and call it a "Bayesian average". I think that's a made-up term, too, even through there's a Wikipedia article on it [2]. Nonetheless, the formula is simple, and it has a nice interpretation. Here it is: \frac{Cm + Rv}{m+v} where $$C$$ is the mean vote across all movies, $$v$$ is the number of votes, $$R$$ is the mean rating for the movie, and $$m$$ is the "minimum number of votes required to be listed in the top 250 (currently 1300)". The nice interpretation is this: pretend that, in addition to the $$v$$ votes that users give a movie, you're also throwing in $$m$$ votes of score $$C$$ each. In effect you're pushing the scores towards the global average, by $$m$$ votes. Is this arbitarary? Actually, no. It's the mean (i.e. MLE) of the posterior distribution you get when you have a Normal prior with mean $$C$$ and precision $$m$$, and a Normal conditional with variance 1.0. In other words, you're starting with a belief that, in the absense of votes, a movie/boardgame should be ranked as average, and you're assuming that user votes are normally-distributed around the "true" score with variance 1.0. Then you're looking at the posterior distribution (i.e. the probability distribution that arises as a result of those assumptions), and you're picking the most likely value from that, which in the case of Gaussians is the mean. Let's see how that works. To find the posterior distribution, we could work through the math, or we could just look at the Wikipedia article on conjugate priors [3]. We'll see that the posterior distribution of a Normal, when the prior is also a Normal, is a Normal with mean \frac{\tau_0 \mu_0 + \tau \sum_{i=1}^{n} x_i}{\tau_0 + n\tau} where $$\mu_0$$ and $$\tau_0$$ are the mean and precision of the prior, respectively, $$\tau$$ is the precision of the vote distribution, and $$n$$ is the number of votes. In the case of IMDB, we assumed above that $$\tau=1$$, so we have \frac{\tau_0 \mu_0 + \sum_{i=1}^{n} x_i}{\tau_0 + n} Comparing the IMDB equation to this, we can see that $$v$$ above is $$n$$ here, $$C$$ above is $$\mu_0$$ here, $$Rv=\frac{1}{v}\left(\sum_{i=1}^v v_i\right)\ v = \sum_{i=1}^v v_i$$ above is $$\sum_{i=1}^{n} x_i$$ here, and $$m$$ above is the hyperparameter $$\tau_0$$. So we know that even though IMDB says $$m$$ is the "minimum number of votes required to be listed in the top 250 list", that's an arbitrary decision on their part: it can be anything and the formula still works. $$m$$ is the precision of the prior distribution; as it gets bigger, the prior distribution gets "sharper", and thus has more of an effect on the posterior distribution. Now the assumptions we made to get to this point are almost laughable. If nothing else, we know that Gaussians are unbounded and continuous, and user votes on IMBD are integers in the range of 1-10. The interesting take-away message here is that even though we made a lot of assumptions above that were laughably wrong, the end result is a reasonable formula with an nice, intuitive meaning. [1] http://www.imdb.com/chart/top [2] http://en.wikipedia.org/wiki/Bayesian_average [3] http://en.wikipedia.org/wiki/Conjugate_prior Replies -------- John Henderson, on March 13, 2009 8:26pm: ["| Here's my old school wanker interpretation: C is one estimate\n", "| of the thing we want. R is the other estimate. We can smooth\n", "| between them with a good old \\lambda. Our estimate will then look\n", "| like\n", "| \n", "| \n", "| \\lambda R + (1-\\lambda)C\n", "| \n", "| (Boy I hope my latex works at all.)\n", "| \n", "| But what \\lambda do we want? Eh. Let's just pick something. Let's\n", "| compare the number of datapoints we have to the least popular thing in the\n", "| top-250. Yeah, that's the ticket:\n", "| \n", "| \n", "| \\lambda=\\frac{v}{m+v}\n", "| \n", "| So, if we've had as many votes as the thing in the bottom of\n", "| the top-250, then we should be half way between them. And if we've had an\n", "| infinite number of votes, then we better be on R. And if we've\n", "| had zero votes, then C. That's nice. Oh, and let's make it\n", "| linear so it's easy and doesn't piss people off by sliding down when a vote\n", "| goes up. Or incrementing less or more at different places.\n", "| \n", "| You really think the IMDB guys went beyond this in their thinking? Could be.\n", "| And if not, they sure should claim it.\n", "| \n"] William Morgan, on March 13, 2009 8:49pm: [" | \n", " | I don't know. They use the word \"Bayesian\" to describe it, which seems like a\n", " | non-sequitur if all that's happened is that they'v read a book on linear\n", " | interpolation.\n", " | \n", " | It's certainly possible that I'm just backfitting a model.\n", " | \n"] John Henderson, on March 13, 2009 10:20pm: [" | A better question from me should be, can Bayesian Averaging be interpreted as\n", " | MLE without the Gaussian modeling assumption on the estimators? I suspect the\n", " | answer is yes because there are many ways to arrive at the linear\n", " | interpolation form. It seems over-constrained at first glance from subsets of\n", " | the things I point out in the interpolation post. But I'm too dumb to know\n", " | better.\n", " | \n", " | Maybe the only weak assumption needed is that the model for the estimator have\n", " | conjugate priors.\n", " | \n", " | -John\n", " | \n"] Brendan O'Connor, on March 17, 2009 3:03am: ["| Nice writeup.\n", "| \n", "| \n", "| Handy!\n", "| \n", "| \n", "| I will point out another non-bayesian interpretation: it's L2 regularization\n", "| for the mean estimation :)\n", "| \n", "| Brendan\n", "| \n"] Gustavo Lacerda, on May 18, 2009 6:27pm: ["| \n", "| I think you mean the mode of the posterior distribution, (a.k.a. MAP, not\n", "| MLE).\n", "| \n"] William Morgan, on May 18, 2009 6:41pm: [" | \n", " | Yeah, that probably would've been more clear, especially since I go on to talk\n", " | about \"picking the most likely value\". But in my defense they're the same\n", " | value here.\n", " | \n"] Gustavo Lacerda, on May 18, 2009 7:03pm: [" | \n", " | MAP and MLE in general only coincide when you have a uniform (improper) prior\n", " | over the whole space.\n", " | \n", " | You have an interesting email interface here. But I get the illusion that I'm\n", " | writing a private email (which should go away if I'm not replying to \"William\n", " | Morgan\" ).\n", " | \n"] William Morgan, on May 18, 2009 7:29pm: [" | \n", " | The mode and the mean of a normal distribution are the same. That's all I'm\n", " | saying.\n", " | \n", " | \n", " | Thanks! But what client are you using? It's quoting stuff in a weird way.\n", " | \n", " | \n", " | Do you mean that because the From: field has my name in it, you get the\n", " | illusion that you're sending a private email? This is the same approach used\n", " | by most automated systems that relay email between participants. Evite and\n", " | JIRA, to name a few.\n", " | \n"] Gustavo Lacerda, on May 18, 2009 7:59pm: [" | \n", " | GMail.\n", " | \n", " | \n", " | yes. I'd prefer if it said \"William Morgan's blog\" or something.\n", " | \n"] William Morgan, on May 18, 2009 9:06pm: [" | \n", " | Whoops, looks like it's my fault. Your quoting looks fine; Whisper is just\n", " | doing something weird with it.\n", " | \n", " | \n", " | It could say something like \"William Morgan (via The All-Thing)\".\n", " | \n"] William Morgan, on May 19, 2009 11:17pm: [" | \n", " | I think I've fixed this.\n", " | \n"] John Henderson, on May 18, 2009 9:26pm: [" | On May 18, 2009, at 12:29 PM, William Morgan wrote:\n", " | \n", " | \n", " | If you do a goog search on \"map mle estimate\" right now this comes up\n", " | #3. Just saying. Fame. Ok now I'll shut up and do my homework. So long since\n", " | I've been in a discussion about this.\n", " | \n", " | -John\n", " | \n"] Gustavo Lacerda, on May 18, 2009 9:41pm: [" | -- Gustavo Lacerda http://www.optimizelife.com\n", " | \n", " | \n", " | \n", " | \n", " | for me it's #9.\n", " | \n", " | My #1 is a recitation by my friend Mary:\n", " | http://www.cs.cmu.edu/~tom/10601_sp08/slides/recitation-mle-nb.pdf\n", " | \n"] This delicious text version served up by Whisper .