Sometimes it’s nice to have a simple breakpointing function that will dump you
into an interactive session with all your local variables in place.
There are more sophisticated solutions for the world of multiple servers and
daemonized code, but after some fighting with IRB, I find myself using this
little snippet of code in many projects:
require 'irb'
module IRB
def IRB.start_with_binding binding
IRB.setup __FILE__
w = WorkSpace.new binding
irb = Irb.new w
@CONF[:MAIN_CONTEXT] = irb.context
irb.eval_input
end
end
def breakpoint binding; IRB.start_with_binding binding end
As the comment states, you can invoke the breakpoint at any point by inserting
a breakpoint binding
statement anywhere in your code. Once that line is
reached, you’ll be dumped into an IRB session with local variables intact.
Quitting the session resumes execution.
Obviously with this method I’m having you pass in your binding explicitly.
There are fancier tricks for capturing the binding of the caller (involving
kernel trace functions and continuations), but I’m opting for the simpler
solution here.
Works with Ruby 1.9, of course.
This is a post I’ve been meaning to write for over a year, since I first came across an article entitled How Not to Sort by Average
Rating (made
popular on, amongst other places, Hacker
News). Recent mentions of that
article have finally spurred me to finish this off. Well, better late than never.
The Context
Let’s say you have a lot of things. Let’s say your users can vote on how much
they like those things. And let’s say that you want rank those things, so
that you can list them by how good they are. How should you do it?
You can find examples of this pattern all over the web. Amazon ranks products,
Hacker News ranks links, and Yelp ranks businesses, all based on reviews or
votes from their users.
These sites all rank the same way: they compile user votes into a score, and
items are ranked by that score. So how the score is calculated determines
entirely how the items are ranked.
So how might we generate such a score? The most obvious approach is to simply
use the average of the users’ votes. And, in fact, that is what these sites do.
(Hacker news also mixes in a timebased measure of freshness.)
Unfortunately, there’s a big problem with using the average: when you only have
a few votes, the average can take on extreme values. For example, if you have
only one vote, the average is that vote. (And if you have no votes… well,
what do you do then?) The result is that lowvote items can easily occupy the
extreme points on the list—at the very top or very bottom.
You see this problem every time you go to Amazon and sort by average customer
review. The topmost items are a morass of singlevote products, and you have
to wade through them before you get to the good stuff.
So what would be a better ranking? Well, we’d probably like to see an item with
50 5star votes before we see an item with a single 5star vote. Intuitively
speaking, the more votes, the more certain we are of an item’s usefulness, so
we should be focusing our attention on highscore, highvotecount items.
And, although we’re less likely to be concerned with what’s at the bottom of
the list, the symmetric case also seems right: an item with 50 1star votes
should be ranked lower than one with a single 1star vote, since the 50star
one is more certain to be bad.
Taken together, these two observations suggest that the best ordering is one
where the lowvote items are placed somewhere in the middle, and where
highconfidence items occupy the extremes—good at the top, and bad at the
bottom.
(We could develop this argument more formally, by talking about things like
expected utility / risk, but I’m going to leave it just intuitive for this
post.)
The Problem with Confidence Intervals
The “How Not To” article linked above suggests that the right way to rank
products to avoid the problem with the average is to construct a confidence
interval around the average vote, and rank by the the lower bound of that
interval. This confidence interval has the following behavior: when the amount
of data is large, it is a tight bound around the average vote; when the amount
of data is small, it is a very loose bound around the average.
Taking the lower bound of the confidence interval is fine when the number of
votes for an item is large: the lower bound will be close to the average
itself. But, just as with the average, the problem with this approach occurs
when the number of votes is small. In this case, the lower bound of the
confidence interval will be very low.
The result is that new items with few votes will always be ranked alongside the
bad items with many votes, at the bottom of the list. In other words, if you
rank by the lower bound of a confidence interval, you will rank an item with no
votes alongside an item with 1,000 bad votes.
If you use this approach, every new item will start out at the bottom of the
list.
Enter Bayesian Statistics
Is there a better alternative? Fortunately, this kind of paucityofdata
problem is tailormade for Bayesian statistics.
A Bayesian approach gives us a framework to model not only the votes
themselves, but also a prior belief of what we think an item looks like,
before seeing any votes. We can use this prior to do smoothing: take the
average vote, which can be jumpy, and “smooth” it towards the prior, which is
steady. And the smoothing is done in such a way that, with a small numbers of
votes, the score mostly reflects the prior, and as more votes arrive, the score
move towards the average vote. In other words, the votes are eventually able to
“override” the prior when there are enough of them.
Being able to incorporate a prior has two advantages. First, it gives us a
principled way of modeling what should happen with zero votes. We are no longer
at the mercy of the confidence interval; we can decide explicitly how lowvote
products should be treated. Second, it provides a convenient mechanism for us
to plug in any extra information that we have on hand. This extra information
becomes the prior belief.
But what kind of extra information do we have? Equivalently, how do we
determine the prior? Consider: when you decide to watch a Coen brothers movie,
you make that decision based on past Coen brothers movies. When you buy a Sony
product, you make a decision based on what you know of the brand.
In general, when you know nothing about an item, you can generalize from
information about related items, items from similar sources, etc. We will do
the same thing to create a prior.
Let’s see how we can use Bayesian statistics to do smoothing towards a
meaningful prior.
Solution #1: the “True Bayesian Average”
The first solution, made popular by IMDB, is the socalled “true Bayesian
average” (although so far as I know that terminology does not actually come
from statistics). Using TBA, to compute a score , we do:
where is the average vote for the item, is the number of votes,
is the smoothing target, and is a tuning parameter that controls
how quickly the score moves away from as the number of votes increases.
(You can read more about the Bayesian interpretation of the ‘true Bayesian average’.)
This formula has a nice interpretation: is a pseudocount, a number of
“pseudo” votes, each for exactly the value . These votes are automatically
added to the votes for every item, and then we take the average of the
pseudovotes and “real” votes combined.
In this formula, the prior is , the smoothing target. What value should we
choose for it? It turns out that if we set to the average vote over all
items (or over some representative class of items), we get the behavior we
wanted above: lowvote items start life near the middle of the herd, not near
the bottom, and make their way up or down the list as the votes come in.
(You can read a more rigorous argument about why using the global average is a
good target for smoothing.)
The TBA is easy to implement, and it’s trivial to adapt an existing system that
already uses average votes to use it.
But we can do better.
The Problem with the “True Bayesian Average”
The problem with the TBA is that it assumes a Normal distribution over user
votes.
Taken at face value, we know this assumption is bad for two reasons: one, we
have discrete, not continuous, votes, and two, we have no actual expectation
that votes will be normally distributed, except in the limit. But in reality,
neither of these are significant problems. We will always have some modeling
assumptions of dubious correctness for the sake of tractability, and these are
within the realm of plausibility.
The bigger problem with the assumption of Normality is that it forces us to
model items as if they had some “true” score, sitting at the mean of a
Normal distribution. But we know some items are simply divisive. Some Amazon
products accrue large numbers of both 5star and 1star reviews. Some movies
are loved and hated in equal measure. Being able to model those kinds of
distributions accurately would be to our advantage, and a Normal distribution
won’t let us do that.
Ultimately, of course, we need to produce an ordering, so we need to condense
everything we know about an item into a single value. But it would be to our
advantage to do this in an explicit, controllable manner.
This suggests we would like a solution which decomposes scoring into three
parts:
 Our prior beliefs about an item;
 The users’ votes on an item; and
 The mapping between the vote histogram and a score.
Such a system could both account for paucityofdata problems, and provide
us with explicit control on how the items are ranked.
Let’s see how we can do this.
Solution #2: Dirichlet Priors and an Explicit Value Function
To accomplish these goals, we have to forsake the normal distribution for the
multinomial. A multinomial model will let us represent the complete histogram
of the votes received for each items. For example, for Amazon products, a
multinomial model will capture how many votes for one star, how many votes for
two stars, and so on, a product had. If there are types of votes that
users can assign to an item, we can parameterize (i.e. fully specify) the
corresponding multinomial with an dimensional vector.
(I’m glossing over one technicality, which is that a multinomial really
measures only the relative proportion, not the actual counts, of the histogram.
But this detail won’t matter in our case.)
To fit the multinomial into a Bayesian framework, we will model our prior
belief as a Dirichlet distribution. Just as a multinomial distribution
represents a single dimensional vector, a Dirichlet distribution is a
probability distributions over all such dimensional vectors. In effect,
it is a distribution over all possible vote histograms for an item.
We use the Dirichlet because it is a conjugate prior of the multinomial.
This means that when we use Bayes’s rule to combine the Dirichlet with the
multinomial (we’ll see how to do this below), the resulting distribution is
also a Dirichlet. This property is very convenient because it allows us to keep
the representation to a single form, and use the same technique iteratively—we
start with a Dirichlet, and every new set of votes we incorporate leaves us
with a Dirichlet.
The Dirichlet is a complicated distribution. Luckily, the properties we’re
interested in make it very simple to use.
For one, it is parameterized by a histogram just as well as a multinomial is.
That is, if we write for a Dirichlet distribution and for a
multinomial, describes a multinomial distribution
corresponding to a vote histogram for a particular item with 7 onestar votes,
3 twostar reviews, etc., and describes a Dirchelet. Of
course, the meaning of the Dirichlet is quite different from the meaning of
the multinomial, and for the sake of brevity we won’t go into how to interpret
it here. But for our purposes, this is a nice property because it means that
specifying a Dirichlet given a vote histogram is trivial.
The other very handy property of Dirichlets is that when they’re combined with
multinomials using Bayes’s Rule, not only is the result a Dirichlet, it’s a
Dirichlet that’s easily specifiable in terms of the two input distributions.
Recall that Bayes’s Rule states:
where is the set of observed votes and is a possible model of the
item. In our case, that means that is our prior belief, is
what our actual votes look like given the model, is our updated
model, and is some normalizing constant, which we can ignore for now.
If we call our prior belief the Dirichlet and
our conditional distribution the multinomial ,
and skip over all the difficult math, then we can turn Bayes’s rule into a
rule for updating our model:
In other words, to create the posterior (that is, resulting) distribution, all
we need to do is add the two input histograms. (Note that falls away.)
So now we have a way of taking our prior information, incorporating the user
votes, and finding the resulting distribution. The resulting distribution is
also a vote histogram for an item, smoothed, just as in the TBA case, towards
the prior belief histogram. (And, in fact, the same pseudocount analogy
applies: are the pseudovotes, and the real votes, and
we’re simply adding them together. Neat, huh?)
But what do we do with that?
The final step of the puzzle is to transform this distribution into a single
score for ranking. The best way to do this is to take a function that describes
the score of a particular vote histogram, and compute the expected value of
that function under our distribution. The expected value will represents the
function as evaluated over every possible value in the distribution, weighted
by the probability of seeing that value. In effect, it will capture the
function as applied to the entire distribution, and package it up nicely into a
single number for us, ready to be used as a score.
To compute the expected value, in general, you are required to solve a nasty
integral. Happily, we can take advantage of one final property of the
Dirichlet, which is that, if your Dirichlet is parameterized by , the expected value of the proportion of votes in
category is simply:
In other words, the expected value of the proportion of votes in the th
bucket is simply the proportion of that parameter over the total sum of the
parameters.
If we stick to a linear scoring functions, we can take advantage of the
linearity of expectation and use this result, avoiding anything more
complicated than multiplication.
For example, sticking with our Amazon case, let’s use the “obvious” function:
where we give each onestar vote a 1, each 2star vote a 2, and so on, and just
sum these up to produce a score. Because this is linear, the expected value of
this function under our Dirichlet is simply:
Of course, this simple function has many issues. For example, are these weights
really want you want in practice? They imply that a fivestar score worth
exactly 5 times a onestar score, which is may not be the case. And it does not
do anything special with “divisive” items, which you might want to up or
downrank.
But, this framework will allow you to plug in any function you want at that
point, with the caveat that nonlinear functions may involve some nasty
integration.
So there you have it! We’ve achived all three of our desired goals. We can take
a prior belief, any number of user votes (including none at all), and a custom
scoring function, and combine them all to produce an ranking.
The final question is what prior belief we should use. Intuitively (and
reasoning by analogy from the TBA case above), if we use the mean vote
histogram over all items, scaled down by some tuning parameter, we should have
the desired behavior introducted in the first section, where lowvote items
start near the middle of the list, and highvote highscore items are at the
top, and highvote lowscore items at the bottom. Proof of the correctness of
this statement is left as an exercise to the reader. (Hint: see the related
blog post.)
At Long Last, Some Code
Let’s wrap it up with some Ruby code. If you’ve skipped to this point,
congratulations—all of the above was a very long, very discursive way of
arriving at something that is, in fact, very simple:
DEFAULT_PRIOR = [2, 2, 2, 2, 2]
def score votes, prior=DEFAULT_PRIOR
posterior = votes.zip(prior).map { a, b a + b }
sum = posterior.inject { a, b a + b }
posterior.
map.with_index { v, i (i + 1) * v }.
inject { a, b a + b }.
to_f / sum
end
If you play around with this method, you can see:
 With no votes, you get a score in the middle of the range, 3.0.
 As you add high scores, the value increases, and as you add low scores, it decreases.
 If the score is, say, 4.0, adding more 4star votes doesn’t change it.
 If you make the default prior bigger, you need more votes to move away from 3.0.
Enjoy!
It’s been 12 years since the first MathML spec was released, and math on the
web is still largely unsupported and incredibly complicated to get right. If
that isn’t a spec failure, I don’t know what is.
Personally, after a year of doing my best to do MathML the “right” way, I’ve
given up trying to be correct. I’m now using MathJax
to render math, a solution that is, while absolutely horrible, far less
horrible than before. In particular, people can actually see math.
But William! you might say, all you need to do for math on the web is to
generate MathML. Firefox has supported MathML for years and years! And that’s
true, BUT:
 Random browsers simply don’t support MathML (e.g. Chrome, Safari).
 The browsers that do support it (e.g. Firefox) support it only in strict compliance mode.
And strict compliance mode is an absolute hell for everyone involved. You must
produce valid XHTML and sending your content as text/xml
instead
of text/html
. Any kind of nonconforming XML produces a horrible
cryptic red error message instead of displaying the page. You will begin to
live in fear of screwing things up whenever you make a chance to your layout,
and god help you if you have any kind of UGC or templating or any kind of
nontrivial content generation.
MathJax smooths that away. You can embed MathML or even LaTeX math markup
directly into a text/html
document, and it will do the magic to
turn it into math in the browser. If the browser has native MathML support,
then great, it will use that. If the browser has web font support, then great,
you get pretty fonts. And if not, the degradation is graceful. And you get the
nice errorrobust rendering that makes HTML nice.
I’m still using Ritex to translate LaTeX math
into MathML, because I like the syntax and because I didn’t feel like going
back and translating all the math. I’ve changed
Whisper to emit text/html
as the
content type. So now I should have the best of all possible worlds.
Let’s try it:
If you see math above, I have succeeded. If not, I have failed.
I’ve just released Trollop 1.15, which fixes an
irritating misfeature pointed out by Rafael Sevilla: when Trollop runs out of
characters when it’s generating short option names, e.g. when you have a lot of
options, it shouldn’t throw an exception and die. It should just continue peacefully.
Trollop’s reign of domination continues!
If you’re writing a multithreaded Ruby program that uses ncurses,
you might be curious why program stops running when you call
Ncurses.getch
. Sup has been plagued
by this issue since 2005. Thankfully, I think I finally understand
it.
The problem is that there is a bug in the Ruby ncurses library
such that using blocking input will block all Ruby threads when
it waits for user input, instead of just the calling thread. So
Ncurses.getch
will cause everything to grind to a halt. This is
probably due to the library not releasing the GVL when blocking on
stdin.
This bug is present in the latest rubygems version of curses,
0.9.1. It has been fixed in the latest libncursesruby Debian
packages (1.13).
To see if you have a buggy, blocking version of the ruby ncurses
library, run this program:
require 'rubygems'
require 'ncurses'
require 'thread'
Ncurses.initscr
Ncurses.noecho
Ncurses.cbreak
Ncurses.curs_set 0
Thread.new do
sleep 0.1
Ncurses.stdscr.mvaddstr 0, 0, "library is GOOD."
end
begin
Ncurses.stdscr.mvaddstr 0, 0, "library is BAD."
Ncurses.getch
ensure
Ncurses.curs_set 1
Ncurses.endwin
puts "bye"
end
(I purposely require rubygems
in there to load the rubygems
ncurses library if it’s present; you can drop this if you don’t
use rubygems.)
There are two workarounds to this problem. First, you can simply
tell ncurses to use nonblocking input:
Ncurses.nodelay Ncurses.stdscr, true
But if you’re writing a multithreaded app, you probably aren’t
interested in nonblocking input, unless you want a nasty polling
loop.
The better choice is to add a call to IO.select
before getch
,
which will block the calling thread until there’s an actual
keypress, and then allow getch
to pick it up:
if IO.select [$stdin], nil, nil, 1
Ncurses.getch
end
IO.select
requires a delay, so you’ll have to handle the
periodic nils that generates. But the background threads should no longer block.
There is one further complication, which is that you won’t be able
to receive the pseudokeypresses Ncurses emits when the terminal
size changes, since they don’t show up on $stdin
and thus the
select
won’t pass. The solution is to install your own signal
handler:
trap("WINCH") { ... handle sigwinch ... }
You will still see the resize events coming from getch
, but only
once the user presses a key. You can drop them at this point.
That should be enough to make any multithreaded Ruby ncurses app
able function. Of course, once everyone’s using a fixed version fo
the ncurses libraries, you can do away with the select
and set
nodelay
to false.
(One last hint for the future: I’ve found it necessary to set it
to false before every call to getch
; otherwise a ctrlc will
magically change it back to nonblocking mode. Not sure why.)
I’ve released gitwtf version bf06ab7. The highlight of this release is
colorized output. ANSI escape sequences are the future of the web.
Also, the feature / integration branch comparisons is now only displayed when
r
is supplied.
Check out the gitwtf home page
for an example of the fancy colorization, or just download it now.
Since continuations are so fast in Ruby
1.9, I’ve been playing with a
simple little continationbased web framework I wrote based on code I ripped
out of Whisper.
Using continuations is really awesome for webapps because you can write
application control flow just like you would write the regular program control
flow. You can write it in a single method, store state in local variables, use
if
statements, whatever.
For example, here’s the entire code for a webapp that lets you increment and
decrement a number:
def run
counter = 0
while true
click = html <<EOS
<p>Current counter value: #{counter}.</p>
<p>To increment, #{link_to "click here", :increment}</p>
<p>To decrement, #{link_to "click here", :decrement}</p>
<p>Thanks!</p>
EOS
case click
when :increment
counter += 1
when :decrement
counter = 1
end
end
end
Pretty obvious huh? It’s a loop, and when someone clicks on the increment link,
you increment the counter, and when they click on the decrement counter, you
decrement it. The app displays the HTML you see, and provides /increment
and
/decrement
links. The logic isn’t spread out across different methods and
different classes like it would be in a regular framework.
I added backbutton support too, so pressing the back button after clicking
increment brings you to the previous number, and further increments and
decrements do the right thing.
I realize frameworks like Seaside and Wee have been doing this forever, but
it’s new to me and very fun.
I’ve released Ritex 0.3. No API or functionality changes; this is just
a set of miscellaneous tweaks that make Ritex work on Ruby 1.9.
In the last post I talked about some differences between fibers and
continuations. What may not have been clear is
that continuations are more primitive and flexible than fibers are. In fact,
you can implement fibers using continuations.
Here’s how. The basic idea is that we want to maintain two variables with
continuations in them, inside
and outside
. The first one will transfer
execution into the block of code that forms the fiber. The second will transfer
control back to the outside world.
When the outside world calls #resume
, we save our continuation point as
outside
, and call the current inside
continuation. When, within the block,
#yield
is called, we save our current continuation point as inside
, and
transfer code back to the current outside
.
There are a few more details in terms of passing values from #yield
to
#resume
, handling the return value of the block, and handling excessive
calls to #resume
, but that’s the basic story. Here’s the code:
require 'continuation'
class CFiber
class Error < StandardError; end
def initialize &block
@block = block
callcc do cc
@inside = cc
return
end
@var = @block.call self
@inside = nil
@outside.call
end
def resume
raise Error, "dead cfiber called!" unless @inside
callcc do cc
@outside = cc
@inside.call
end
@var
end
def yield var
callcc do cc
@var = var
@inside = cc
@outside.call
end
end
end
This is also runnable on Ruby 1.8—just remove the require
.
So why does Ruby 1.9 bother to implement fibers, when we can just use
continuations? I don’t know what the real answer is, but “speed” is at least a
good answer. Let’s do some some benchmarking to compare the two:
require 'benchmark'
n = ARGV.shift.to_i
Benchmark.bm do bm
bm.report " fibers" do
f = Fiber.new do
x, y = 0, 1
loop do
Fiber.yield y
x, y = y, x + y
end
end
n.times { i f.resume }
end
bm.report "cfibers" do
f = CFiber.new do c
x, y = 0, 1
loop do
c.yield y
x, y = y, x + y
end
end
n.times { i f.resume }
end
end
We’ll start with backporting that code to the Ruby 1.8.7 that Ubuntu provides
(ruby 1.8.7 (20080811 patchlevel 72)). For 10000 Fibonacci numbers, we see:

user 
system 
total 
real 
cfibers 
0.810000 
0.070000 
0.880000 
0.879930 
That’s roughly 11.4kfps (that’s thousand Fibonacci numbers per second) that we
can produce using continuationbased fibers.
Let’s try the ancient Ruby 1.9.0 that Ubuntu provides (Ruby 1.9.0 (20080620
revision 17482)):

user 
system 
total 
real 
fibers 
0.040000 
0.000000 
0.040000 
0.037583 
cfibers 
18.680000 
1.770000 
20.450000 
20.482006 
Wow, fibers are fast: 250kfps. But things have gotten significantly worse for
cfibers, clocking at a measely 0.489kfps for cfibers.
Finally let’s try the latest and greatest Ruby 1.9.1 (ruby 1.9.1p129
(20090512 revision 23412)):

user 
system 
total 
real 
fibers 
0.040000 
0.000000 
0.040000 
0.035148 
cfibers 
0.150000 
0.000000 
0.150000 
0.155890 
Fibers are just as fast as before, but continuations have improved
dramatically—from 11.4kfps to 66.6kfps. Still, native fibers are more than
three times faster.
So perhaps Ruby 1.9.1 is the best of both worlds. When you need fast
nonpreemptive concurrency, you can use native fibers; when you need to
implement your own crazy control structures, you can use continuations and be
assured that they’re still pretty darn fast (at least, as far as Ruby
operations are concerned).
Ruby 1.9 has both fibers and continuations. The two are often mentioned in the
same breath. They do vaguely similarsounding things, and are implemented in
Ruby 1.9 with similar mechanics underneath the
hood, much
as how continuations and threads were implemented with the same underlying
mechanics in Ruby 1.8
[PDF, p. 14].
But implementation similarities aside, continuations and fibers have very
different semantics. A fiber behaves as a thread without preemption. Like a
thread, you create it, and it eventually dies; unlike a thread, you must
manually call yield
and resume
to transfer control in and out of it,
instead of just letting the runtime call them for you whenever it feels like
it. Like a thread, when you resume a fiber, you have the same call stack and
heap state (local variables) as when you left.
What’s nice about fibers is that, since you keep explicit control of the order
of execution, you can get threadlike behavior without all the hassle of
mutexes and synchronization. Of course you have to deal with the hassle of
ordering all your operations, but you at least have the option of avoiding the
fun racecondition game that always seems to crop up in threaded programming.
What about continuations? Instead of fibers’ create, kill, yield, and resume
operations, a continuation only really has two operations: capture and resume.
A continuation is captured once, and may be resumed multiple times. When you
resume a continuation, the call stack is reverted to what it looked like when
it was captured, but the heap state stays the same. There’s no exit point or
death for a continuation (at least until Ruby gets bounded continuations);
execution simply continues from the capture point.
What’s nice about continuations is that you can use them to implement control
structures. Loops, exceptions, crossprocedure gotos… almost every control
structure you can come up with can be implemented with continuations. In fact,
you can implement fibers using
continuations!
Let’s look at an example. Here’s the fiberbased Fibonacci computation from
the InfoQ article on Fibers in Ruby
1.9:
fib = Fiber.new do
x, y = 0, 1
loop do
Fiber.yield y
x, y = y, x + y
end
end
20.times { puts fib.resume }
Here we call yield
from within the fiber once we’ve computed a number, which
transfers control to the main function, and which prints out the number yielded
and then calls resume
to transfer control back to the fiber. A thread version
looks very similar:
require 'thread'
q = SizedQueue.new 1
fib = Thread.new do
x, y = 0, 1
loop do
q.push y
x, y = y, x + y
end
end
20.times { puts q.pop }
Since we don’t have explicit control over the scheduling, we implicitly
scheduled the order of operations by using a synchronized SizedQueue
data
structure, which blocks the computation thread from computing a new number
until the printing thread is ready to receive it. (There are many ways we
could’ve accomplished this.)
Here’s the version using continuations:
require 'continuation'
c, x, y, i = callcc { cc [cc, 0, 1, 1] }
puts y
c.call c, y, x + y, i + 1 if i < 20
You’ll notice there are no loops, and variables are never changed after
assignment. In fact the code is starting to look suspiciously like an inductive
proof, with one line that like a base case and another line that looks like a
recursive case. You can see why continuations make functionalprogramming
enthusiasts get excited!
This implementation works because resuming the continuation (the call to
c.call
) replaces the call stack and point of execution with what they were at
the point it was captured (the call to callcc
). In contrast, resume
ing the
fiber moved us back to the point we were when the fiber called yield
, and so
the outer loop
in the fiber implementation was necessary.
Beyond call stacks, another major difference between fibers and continuations
is the way the heap is treated. Multiple fibers on the same section of code do
not share local variables. Multiple continuations on the same section of code
do. Here’s a brief example. First, the fibers version:
fib = (0 ... 5).map do i
Fiber.new do
x = 0
Fiber.yield x
x += 1
end
end
fib.each { f puts f.resume }
We create five fibers, and call resume
on them once each. As you’d expect,
this prints out a series of 0’s. The variable x
is not shared between the
multiple fibers. Of course, the fiber constructor here is a block, and blocks
are closures, so we could make them share state by moving the x = 0
line
outside the map
line. But that’s a result of having closures, not of fibers
per se.
Let’s try an example with multiple continuations, all jumping into the same point in the code:
require 'continuation'
x = 0
c = callcc { cc cc }
d = callcc { cc cc } if c
e = callcc { cc cc } if c && d
f = callcc { cc cc } if c && d && e
x += 1
puts x
c.call if c
d.call if d
e.call if e
f.call if f
We initialize x
to 0, create 4 separate continuations, add one to x
, and
call the continuations in order. (The postfix if
statements ensure that the
continuations variables aren’t set or called more than once. Calling c.call
without arguments will jump back to the c = callcc
line and set c
to
nil
.)
Silly, but it illustrates the point: the output is “1 2 3 4 5”, meaning that
the four continuations all share the same heap. When d
is called, its x
is
the same as the x
of c
, and even though it was 0 when d
was captured, it
has since been modified by the resumption of c
. When e
is called, its x
is also the same x
, and so on. (In fact this whole example depends on this
behavior—each of the continuation variables are only set once, and must
“retain” their value across all rentries to continuations above them.)
In additon to multiple continuations being able to share state, the converse is
true too: multiple resumes on the same continuation will share state:
require 'continuation'
x = 0
c = callcc { cc cc }
x += 1
puts x
c.call c while x < 5
This outputs the same thing as the examples above.
Hopefully that clears up some of the confusion. Here’s the summary:
Fibers 
Continuations 
Four operations: create, exit, yield, resume. 
Two operations: capture and resume. 
Upon resume, call stack is wherever it was at the last yield. 
Upon resume, call stack is where it was when captured. 
Do not share state except via closure. 
Multiple continuations and multiple invocations of the same continuation can share state. 