Showing only posts with label classification. See the RSS for this label, or see all posts.

Filter comments which contain incorrect punctuation or misspellings. In this case it’s YouTube comments (and apparently I’m not the only one who has a problem describing them without using the word “cesspool”), but frankly Reddit could use it just as well. The results are compelling, there’s a very strong correlation between comment stupidity and poor spelling/punctuation.

This is brilliant in its simplicity, and it would be interesting to see how this compares to an approach using StupidFilter, which I guess is some kind of binary SVM classifier.

Advantages:

  1. Far more debuggable and understandable than the output of SVM or a Bayesian classifier.
  2. Much quicker to execute!
  3. Wide-spread usage would have a positive effect on society as a whole.
William Morgan, August 25, 2008.