william's blog | 2012-02-04 12:09:48 +0000 =========================================== Smoothing users' votes ---------------------- Date: March 31, 2009 11:16pm Author: William Morgan Labels: stats URL: http://masanjin.net/blog/smoothing.txt In a previous post [1] I describe how you can cook up a Bayesian framework that results in IMDB's so-called "true Bayesian estimate", a formula which, on its face, doesn't look particularly Bayesian. As my astute commenters pointed out, this formula has many simpler interpretations without needing to invoke the B word. For example, it's a linear interpolation between two values: $$\define{\that}{\hat{\theta}}$$ \that=\lambda(v) R + (1-\lambda(v))\tau where $$R$$ is our mean vote, $$\tau$$ is some smoothing target, $$\lambda(v)$$ is the smoothing weight. $$\lambda(v)$$ can be any function, as long as it increases with $$v$$, stays between 0 and 1, and is 0 when $$v$$ is 0. Those constraints give you the right behavior: with no votes, your estimate is $$\tau$$ exactly; as you add votes, it approaches $$R$$, and $$\lambda(v)$$ controls how fast that happens. This formulation naturally leads to the following question: if I'm smoothing like this to deal with paucity-of-data issues, what value of $$\tau$$ should I pick? IMBD uses the $$\tau=C$$, the global movie mean. Intuitively that makes sense, but is it the right choice? What's nice about the expression for $$\that$$ above is that the behavior we're most interested in is when $$v=0$$, i.e. when there are no votes. In that case, $$\that=\tau$$, because of how I've constrained $$\lambda(v)$$. So finding the best $$\tau$$ is equivalent to finding the best $$\that$$ when $$v=0$$. $$ \define{\risk}{R(\theta, \that)} \define{\loss}{L(\theta, \that)} \define{\exp}[1]{E_\theta\left[#1\right]} $$ Happily, we can answer the question of the best $$\that$$ analytically, at least if we're happy to imagining that there is a "true" value of the movie $$\theta$$. Given $$\theta$$, we can define a loss function $$\loss$$ that describes how bad we think a particular value of $$\that$$ is. But we don't really know what $$\theta$$ is for any movie (if we did, we wouldn't be bothering with any of this). So we can generalize that a step further and define a risk function $$\risk=\exp{\loss}$$ quantifying our _expected loss_: the aggregate of the loss function across all possible values of $$\theta$$, weighted by the probability of each value. This gives us the tool we really need to answer the question above: the $$\that$$ that minimizes our risk is the winner. In the absence of any specific notions about errors, we'll use the standard loss function for reals, squared-error loss: $$\loss = (\theta-\that)^2$$. Then it's just a matter of churning the crank: \array{\arrayopts{\colalign{right center left}} \risk & = & \exp{\loss} \\ & = & \exp{(\theta-\that)^2} \\ & = & \exp{\theta^2-2\theta\that + \that^2} \\ & = & \exp{\theta^2} - 2\that \exp{\theta} + \that^2 } We can drop that first term since we're only interested in minimimizing this as a function of $$\tau$$. To find the minimum: \array{\arrayopts{\colalign{right center left}} \frac{d}{d\that} {-2\that} \exp{\theta} + \that^2 & = & 0 \\ -2 \exp{\theta} + 2\that & = & 0 \\ \that & = & \exp{\theta} } Unsurprisingly, we see that the best estimate of $$\theta$$ under squared-error loss is the mean of the distribution of $$\theta$$. Since we're interested in the case where $$v=0$$, this implies that the best value to use for $$\tau$$ is also the mean. So IMDB's choice of $$C$$ makes sense: the mean vote over all your movies is a great estimate of the mean of the distribution of $$\theta$$. A couple concluding points: 1. This answer is specific to squared-error loss; if you plug in another loss function, the optimal value for $$\tau$$ might very well change. And you might actually have a specific model in mind for how "bad" mis-estimates are. Maybe over-estimates are worse than under-estimates, or something like that. 2. The definition of the distribution of $$\theta$$ is actually completely vague above. In fact we don't even talk about it; we just use it implicitly in our $$\exp{\cdot}$$ terms. So you should feel free to plug in (the mean of) whatever distribution you believe most accurately represents your product/movie/whatever. IMDB could arguable to better by plugging in per-category means, or something even fancier. 3. IMDB is actually a particularly bad case because movie opinions are extremely subjective. If you're serious about modeling very subjective things, we should be talking about multinomial models, Dirichlet priors, and the like. But the take-home message is: in the absence of a specific loss function that you really believe, smoothing towards the mean isn't just intuitive, it's minimizing your risk. [1] http://all-thing.net/bayesian-average Replies -------- Adrian-Bogdan Morut, on December 15, 2010 10:03pm: ["| On Wed, Dec 15, 2010 at 23:37, William Morgan wrote:\n", "| \n", "| \n", "| I'm not all that fluent in math jargon, so there's a pretty important bit of\n", "| all this that's still unclear to me: what exactly is it that you/they are\n", "| calling the \"mean vote over all movies\"? Is it A. The mean of all elementary\n", "| scores entered by individual users (regardless of which movies they were for)\n", "| or B. The global mean of all the movie-specific mean scores (a mean of means)?\n", "| \n", "| \n", "| (And as a sidenote: I'm not sure if you're using some other literature than\n", "| the Wikipedia article on the \"Bayesian average\", but you seem to have reversed\n", "| the meanings of C and m: they're using \"m\" for the prior mean and C for the\n", "| number of instances of m that are added to the numerator.)\n", "| \n"] William Morgan, on January 5, 2011 8:45pm: [" | Hi Adrian-Bogdan,\n", " | \n", " | Thanks for the comment!\n", " | \n", " | \n", " | It's option B, the mean of the score of the movies. That these individual\n", " | scores also happen to be calculated as means of votes is only incidental. The\n", " | same risk function analysis would apply to any method of scoring.\n", " | \n", " | \n", " | I was going by IMDB's terminology at the bottom of\n", " | http://www.imdb.com/chart/top. You're right that $C$ and $m$ have the opposite\n", " | definitions in the Wikipedia page. Strange...\n"] This delicious text version served up by Whisper .