I’ve done some benchmarking on Whisper. Here are the results, with
a few points of comparison:
| system |
req/s |
ms/req |
delta ms/req |
| nginx static |
13736.04 |
7.280 |
|
| rack/thin |
3065.24 |
32.624 |
25.344 |
| whisper/no logging |
1918.56 |
52.123 |
19.499 |
| whisper |
1833.40 |
54.544 |
2.421 |
Nginx static is nginx serving a static file. We see it can handle 13k
requests per second, and takes about 7ms for a single request. If we add
a simple Thin server on top of that, going through Rack, we immediately
drop requests/second by an order of magnitude, and it takes us an extra
25ms/request. That’s the cost of using Ruby.
Adding Whisper on top of that requires another 19.5 ms/requests,
bringing our rate down to 1919 requests/second, or over 7 times slower
than Nginx serving static files. And if you want logging with that, add
another 2.4 ms/request.
That 2.4ms/request is interesting, because it’s basically the result of
a few puts statements. Yes, Ruby is expensive. The bare Rack/Thin
performance shows the headroom I have on the Ruby side (i.e. without
rewriting the whole thing in C). If a puts is that expensive, then
stripping out a couple debugging statements and caching some regexp
results would probably result in a very noticable improvement in
performance.
But how many requests/second do you need to be able to survive being
Slashdotted? A brief web search suggests a high estimate of “several
hundred”. Let’s say that means 300 req/s. That means that Whisper is
already 6 times the Slashdot effect requirement. So it’s almost definitely not
worth complicated the code for the sake of performance.
Experiment parameters: these are all tests using ab (the Apache
benchmark tool) with 100 concurrent requests, averaged over 50k
requests. The tests were performed by connecting to localhost (i.e.
going over the network stack but not over the network itself), on a
quad-core Intel 2Ghz (Q8200) running 64-bit Linux 2.6.27. YMMV.
Over the past month or so I’ve been spending some time hacking together yet
another blogging platform, to satisfy all my (admittedly weird) blogging
desires. It’s finally at the point where I can host my fascinating insights on
it, so here you go. It’s called Whisper, and you’re looking at it now.
Interesting features:
- No RDBMS. Storing your blog entries in a RDBMS is like driving to work in the
Space Shuttle.
- YAML+Textile, sitting on a disk. Like Hobix, blog posts
and comments are stored on disk in regular files, using a mix of YAML and
Textile. This means you can keep your content under version control, and you
can edit it with whatever editor you desire. Unlike Hobix, the entry content is
stored in a separate file from the metadata, so there’s none of the trickiness
of embedding Textile in YAML.
- Sits directly on top of Rack (or Thin). No intermediate layer to slow
things down. These particular bits are served from Thin over a unix socket to
Nginx.
- Lazy cached dependency graph: every bit of content is cached, built lazily,
and a part of a big dependency graph. That means almost every request is
served directly from memory, and making a change, like adding or updating an
entry, forces a regeneration of only those bits that require it.
Infrequently-requested bits of content eventually expire.
- Markup enhancements: I’ve added some extra processing on top of Textile to do
the things I’ve always wanted to do. Ruby code is automatically
syntax-highlighted, LaTeX math expressions are turned into MathML (via
RiTeX ), etc. Finally I can write purty-lookin’
math and code without a ridiculous amount of effort.
- Threaded comments. Why would you not have this?
- Comments via email. This is still a work in progress, but comments can
currently only be made by entering your email address, and replying to the
resulting email. This allows you to quote, thread, and generally have a
reasonable discussion, which is what email is good at, and what typing shit
into little text areas on your web browser is not. The eventual goal
is to automatically mirror the entire conversation, but right now it just
mirrors individual replies.
- Multiformat support. In addition to HTML and RSS output, there’s a plain
text mode for the hard-core.
- Pagination, labels, per-label and per-author indices, etc.
- The whole thing amounts to a little over 1200 lines of code.
The code’s still a while away from being ready for public consumption, but I’ve
put up a git repo here: git://masanjin.net/whisper.
The next steps are to flesh out the code enough to make it usable by other
people, make a gem, and maybe publish some performance numbers.
There’s a good discussion with lots of interesting details on a recent patch
submission for adding indirect threading to the Python
VM. (And by “discussion” I mean a single,
un-threaded sequence of comments where you have to manually figure out who’s
replying to what, which apparently is what everyone in the world is happy with
nowadays except for me. Email clients have had threading since 1975, bitches,
so get with the fucking program. [Hence, Whisper—ed.]) Pointed to by
programming.reddit.com, which remains
surprisingly useful, as long as you cut yourself once the comment thread
devolves (as it invariably does) into meta-argumentation.
Indirect threading is a vaguely-neat trick that I first learned about around
the time I was getting into the Rubinius code. The idea is that, in the inner
loop of your VM, which is going through and interpreting the opcodes one at a
time (dispatching each to a block of handler code), instead of jumping back to
the top of the loop at the end of handler code section, you jump directly to
the location of the handler code for the next opcode. The big benefit is not so
much that you save a jump per opcode (which maybe is optimized out for you
anyways), but that the CPU can do branch prediction on a per-opcode basis. So
common opcode sequences will all be pipelined together.
But the discussion shows that this kind of thing is very compiler- and
architecture-dependent, and you have to spend a lot of time making sure that
GCC is optimizing the “right way” for your particular architecture, is not
overly-optimizing by collapsing the jumps together, etc. OTOH, the submitter is
reporting a 20% speedup, and this is the very heart of the VM, so it could very
well be worth spending time on such trickery.
More information:
I found a neat little example in one of my introductory stats books about
Bayesian versus maximum-likelihood estimation for the simple problem of
estimating a binomial distribution given only one sample.
I was going to try and show the math but since Blogger is not making it
possible to actually render MathML I’ll just hand-wave instead.
[Fixed in Whisper. —ed.]
So let’s say we’re trying to estimate a binomial distribution parameterized by
, and that we’ve only seen one estimate. For example, someone flips a coin
once, and we have to decide what the coin’s probability of heads is.
The maximum likelhood estimate for is easy: if your single sample is a 1,
then , and if your sample is 0, . (And if you go through the laborious
process of writing the log likelihood, setting the derivative equal to 0, and
solving it, you come up with the general rule of (# of 1’s) / (# of 1’s + # of
0’s), which is kinda what you would expect.)
In the coin case it seems crazy to say, I saw one head, so I’m going to assume
that the coin always turns up heads, but that’s because of our prior
knowledge of how coins behave. If we’re given a black box with a button and two
lights, and you press the button, and one of the lights come on, then maybe
estimating that that light always comes on when you press the button makes a
little more sense.
Finding the Bayesian estimate is slightly more complicated. Let’s use a uniform
prior. Our conditional distribution is and , and if you work
it out, the posterior ends up as and .
Now if we were in the world of classication, we’d take the MAP estimate, which
is a fancy way of saying the value with the biggest probability, or the mode of
the distribution. Since we’re using a uniform prior, that would end up as the
same as the MLE. But we’re not. We’re in the world of real numbers, so we can
take something better: the expected value, or the mean of the distribution.
This is known as the Bayes estimate, and there are some decision-theoretic
reasons for using it, but informally, it makes more sense than using the MAP
estimate: you can take into account the entire shape of the distribution, not
just the mode.
Using the Bayes estimate, we arrive at if the sample was a 1, and
if the sample was a zero. So we’re at a place where Bayesian logic and
frequentist logic arrive at very different answers, even with a uniform
prior.
Up till now we’ve been talking about “estimation theory”, i.e. the art of
estimating shit. But estimation theory is basically decision theory in
disguise, where your decision space is the same as your parameter space: you’re
deciding on a value for , given your input data, and your prior knowledge, if
any.
Now what’s cool about moving to the world of decision theory is that we can
say: if I have to decide on a particular value for , how can I minimize my
expected cost, aka my risk? A natural choice for a cost, or loss, function, is
squared error. If the true value is , I’d like to estimate in such a way
that is minimized. So we don’t have to argue philosophically about
MLE versus MAP versus minimax versus Bayes estimates; we can quantify how well
each of them do under this framework.
And it turns out that, if you plot the risk for the MLE estimate and for the
Bayes estimate under different values of the true value , then MOST of the
time, the Bayes estimate has lower risk than the MLE. It’s only when is
close to 0 or to 1 that MLE has lower risk.
So that’s pretty cool. It seems like the Bayes estimate must be a superior
estimate.
Of course, I set this whole thing up. Those “decision-theoretic reasons” for
choosing the Bayes estimate I mentioned? Well, they’re theorems that show that
the Bayes estimate minimizes risk. And, in fact, the Bayes estimate of the mean
of the distribution is specific to squared-error loss. If we chose another
loss function, we could come up with a potentially very different Bayes
estimate.
But my intention wasn’t really to trick you into believing that Bayes estimates
are awesome. (Though they are!) I wanted to show that:
- Bayes and classical approaches can come up with very different estimates,
even with a uniform prior.
- If you cast things in decision-theoretic terms, you can make some real
quantitative statements about different ways of estimating.
In the decision theory world, you can customize your estimates to minimize
your particular costs in your particular situation. And that’s an idea that I
think is very, very powerful.
It really seems like this should display some kind of equation:
I can’t make it work despite all my xhtml’ing. Blogger fail.
[Fixed in Whisper. —ed.]
Managing that old Hobix blog was way more work than it should’ve been. So,
I’ve started over and outsourced the work to someone else. Let’s see how it
goes. [It didn’t go so well. Hence, Whisper.—ed.]