IMDB rates movies using a score they call the true Bayesian estimate (bottom of the page). I’m pretty sure that’s a made-up term. A couple other sites, like BoardGameGeek, use the same thing and call it a “Bayesian average”. I think that’s a made-up term, too, even through there’s a Wikipedia article on it.
Nonetheless, the formula is simple, and it has a nice interpretation. Here it is:
where is the mean vote across all movies, is the number of votes, is the mean rating for the movie, and is the “minimum number of votes required to be listed in the top 250 (currently 1300)”.
The nice interpretation is this: pretend that, in addition to the votes that users give a movie, you’re also throwing in votes of score each. In effect you’re pushing the scores towards the global average, by votes.
Is this arbitarary? Actually, no. It’s the mean (i.e. MLE) of the posterior distribution you get when you have a Normal prior with mean and precision , and a Normal conditional with variance 1.0.
In other words, you’re starting with a belief that, in the absense of votes, a movie/boardgame should be ranked as average, and you’re assuming that user votes are normally-distributed around the “true” score with variance 1.0. Then you’re looking at the posterior distribution (i.e. the probability distribution that arises as a result of those assumptions), and you’re picking the most likely value from that, which in the case of Gaussians is the mean.
Let’s see how that works.
To find the posterior distribution, we could work through the math, or we could just look at the Wikipedia article on conjugate priors. We’ll see that the posterior distribution of a Normal, when the prior is also a Normal, is a Normal with mean
where and are the mean and precision of the prior, respectively, is the precision of the vote distribution, and is the number of votes. In the case of IMDB, we assumed above that , so we have
Comparing the IMDB equation to this, we can see that above is here, above is here, above is here, and above is the hyperparameter . So we know that even though IMDB says is the “minimum number of votes required to be listed in the top 250 list”, that’s an arbitrary decision on their part: it can be anything and the formula still works. is the precision of the prior distribution; as it gets bigger, the prior distribution gets “sharper”, and thus has more of an effect on the posterior distribution.
Now the assumptions we made to get to this point are almost laughable. If nothing else, we know that Gaussians are unbounded and continuous, and user votes on IMBD are integers in the range of 1-10. The interesting take-away message here is that even though we made a lot of assumptions above that were laughably wrong, the end result is a reasonable formula with an nice, intuitive meaning.
Here’s my old school wanker interpretation: is one estimate of the thing we want. is the other estimate. We can smooth between them with a good old . Our estimate will then look like
(Boy I hope my latex works at all.)
But what do we want? Eh. Let’s just pick something. Let’s compare the number of datapoints we have to the least popular thing in the top-250. Yeah, that’s the ticket:
So, if we’ve had as many votes as the thing in the bottom of the top-250, then we should be half way between them. And if we’ve had an infinite number of votes, then we better be on . And if we’ve had zero votes, then . That’s nice. Oh, and let’s make it linear so it’s easy and doesn’t piss people off by sliding down when a vote goes up. Or incrementing less or more at different places.
You really think the IMDB guys went beyond this in their thinking? Could be. And if not, they sure should claim it.
You really think the IMDB guys went beyond this in their thinking? Could be. And if not, they sure should claim it.
I don’t know. They use the word “Bayesian” to describe it, which seems like a non-sequitur if all that’s happened is that they’v read a book on linear interpolation.
It’s certainly possible that I’m just backfitting a model.
A better question from me should be, can Bayesian Averaging be interpreted as MLE without the Gaussian modeling assumption on the estimators? I suspect the answer is yes because there are many ways to arrive at the linear interpolation form. It seems over-constrained at first glance from subsets of the things I point out in the interpolation post. But I’m too dumb to know better.
Maybe the only weak assumption needed is that the model for the estimator have conjugate priors.
-John
Nice writeup.
To find the posterior distribution, we could work through the math, or we could just look at the Wikipedia article on conjugate priors.
Handy!
The nice interpretation is this: pretend that, in addition to the votes that users give a movie, you’re also throwing in votes of score each. In effect you’re pushing the scores towards the global average, by votes.
I will point out another non-bayesian interpretation: it’s L2 regularization for the mean estimation :)
Brendan
Is this arbitarary? Actually, no. It’s the mean (i.e. MLE) of the posterior distribution you get when you have a Normal prior with mean and precision , and a Normal conditional with variance 1.0.
I think you mean the mode of the posterior distribution, (a.k.a. MAP, not MLE).
I think you mean the mode of the posterior distribution, (a.k.a. MAP, not MLE).
Yeah, that probably would’ve been more clear, especially since I go on to talk about “picking the most likely value”. But in my defense they’re the same value here.
I think you mean the mode of the posterior distribution, (a.k.a. MAP, not MLE).
Yeah, that probably would’ve been more clear, especially since I go on to talk about “picking the most likely value”. But in my defense they’re the same value here.
MAP and MLE in general only coincide when you have a uniform (improper) prior over the whole space.
You have an interesting email interface here. But I get the illusion that I’m writing a private email (which should go away if I’m not replying to “William Morgan” <comments@…>).
MAP and MLE in general only coincide when you have a uniform (improper) prior over the whole space.
The mode and the mean of a normal distribution are the same. That’s all I’m saying.
You have an interesting email interface here.
Thanks! But what client are you using? It’s quoting stuff in a weird way.
But I get the illusion that I’m writing a private email (which should go away if I’m not replying to “William Morgan” <comments@…>).
Do you mean that because the From: field has my name in it, you get the illusion that you’re sending a private email? This is the same approach used by most automated systems that relay email between participants. Evite and JIRA, to name a few.
MAP and MLE in general only coincide when you have a uniform (improper) prior over the whole space.
The mode and the mean of a normal distribution are the same. That’s all I’m saying.
You have an interesting email interface here.
Thanks! But what client are you using? It’s quoting stuff in a weird way.
GMail.
But I get the illusion that I’m writing a private email (which should go away if I’m not replying to “William Morgan” <comments@…>).
Do you mean that because the From: field has my name in it, you get the illusion that you’re sending a private email? This is the same approach used by most automated systems that relay email between participants. Evite and JIRA, to name a few.
yes. I’d prefer if it said “William Morgan’s blog” or something.
GMail.
Whoops, looks like it’s my fault. Your quoting looks fine; Whisper is just doing something weird with it.
yes. I’d prefer if it said “William Morgan’s blog” or something.
It could say something like “William Morgan (via The All-Thing)”.
Whoops, looks like it’s my fault. Your quoting looks fine; Whisper is just doing something weird with it.
I think I’ve fixed this.
On May 18, 2009, at 12:29 PM, William Morgan <comments@all-thing.net> wrote:
MAP and MLE in general only coincide when you have a uniform (improper) prior over the whole space.
The mode and the mean of a normal distribution are the same. That’s all I’m saying.
You have an interesting email interface here.
Thanks! But what client are you using? It’s quoting stuff in a weird way.
But I get the illusion that I’m writing a private email (which should go away if I’m not replying to “William Morgan” <comments@…>).
Do you mean that because the From: field has my name in it, you get the illusion that you’re sending a private email? This is the same approach used by most automated systems that relay email between participants. Evite and JIRA, to name a few.
If you do a goog search on “map mle estimate” right now this comes up #3. Just saying. Fame. Ok now I’ll shut up and do my homework. So long since I’ve been in a discussion about this.
-John
— Gustavo Lacerda http://www.optimizelife.com
On May 18, 2009, at 12:29 PM, William Morgan <comments@all-thing.net>
MAP and MLE in general only coincide when you have a uniform (improper) prior over the whole space.
The mode and the mean of a normal distribution are the same. That’s all I’m saying.
You have an interesting email interface here.
Thanks! But what client are you using? It’s quoting stuff in a weird way.
But I get the illusion that I’m writing a private email (which should go away if I’m not replying to “William Morgan” <comments@…>).
Do you mean that because the From: field has my name in it, you get the illusion that you’re sending a private email? This is the same approach used by most automated systems that relay email between participants. Evite and JIRA, to name a few.
If you do a goog search on “map mle estimate” right now this comes up #3. Just saying. Fame. Ok now I’ll shut up and do my homework. So long since I’ve been in a discussion about this.
for me it’s #9.
My #1 is a recitation by my friend Mary: http://www.cs.cmu.edu/~tom/10601_sp08/slides/recitation-mle-nb.pdf