Whistlepig: minimalist real-time full-text search

Whistlepig is a minimalist realtime full-text search index. Its goal is to be as small and feature-free as possible, while still remaining useful, performant and scalable to large corpora. If you want realtime full-text search without the frills, Whistlepig may be for you.

Whistlepig is written in ANSI C99. It currently provides a C API and Ruby bindings.

Latest version:0.12, released 2012-06-09.
Bug reports:Github issue tracker

Getting it

Rubygem:gem install whistlepig
Git:git clone git://github.com/wmorgan/whistlepig.git

Realtime search

Roughly speaking, realtime search means:

Whistlepig takes these principles to an extreme. Features that Whistlepig does provide:

Synopsis (Ruby bindings)

require 'rubygems'
require 'whistlepig'

include Whistlepig

index = Index.new "index"

entry1 = Entry.new
entry1.add_string "body", "hello there bob"
docid1 = index.add_entry entry1              # => 1

entry2 = Entry.new
entry2.add_string "body", "goodbye bob"
docid2 = index.add_entry entry2              # => 2

q1 = Query.new "body", "bob"
results1 = index.search q1                   # => [2, 1]

q2 = q1.and Query.new("body", "hello")
results2 = index.search q2                   # => [1]

index.add_label docid2, "funny"

q3 = Query.new "body", "bob ~funny"
results3 = index.search q3                   # => [2]

entry3 = Entry.new
entry3.add_string "body", "hello joe"
entry3.add_string "subject", "what do you know?"
docid3 = index.add_entry entry3              # => 3

q4 = Query.new "body", "subject:know hello"
results4 = index.search q4                   # => [3]

API Documentation

For the Ruby bindings, see the Whistlepig Ruby API documentation.

For the query language, see the Whistlepig Query documentation.

For the C API, see the code.

Concurrency notes

Recent versions of Whistlepig are multi-process-safe, and support concurrent reads and writes. Whistlepig uses pthread read/write locks to support this, where a single writer will lock a segment (and the entire index, briefly) against readers, but multiple readers can execute in parallel. Services with lots of write traffic will see a degredation in read performance if both reads and writes are executed against the same single index. Index replication may be useful in this situation.


Whistlepig is distributed under the terms of the New BSD License. See the file COPYING for details.


Whistlepig is brought to you by William Morgan.