Most programmers are by now familiar with the difference between the number of bytes in a string and the number of characters. Depending on the string’s encoding, the relationship between these two measures can be either trivially computable or complicated and compute-heavy.
With the advent of Ruby 1.9, the Ruby world at last has this distinction
formally encoded at the language level:
String#bytesize is the number of
bytes in the string, and
String#size the number of
But when you’re writing console applications, there’s a third measure you have to worry about: the width of the string on the display. ASCII characters take up one column when displayed on screen, but super-ASCII characters, such as Chinese, Japanese and Korean characters, can take up multiple columns. This display width is not trivially computable from the byte size of the character.
Finding the display width of a string is critical to any kind of console application that cares about the width of the screen, i.e. is not simply printing stuff and letting the terminal wrap. Personally, I’ve been needing it forever:
- Trollop needs it because it tries to format the help screen nicely.
- Sup needs it in a million places because it is a full-fledged console application and people use it for reading mail in all sorts of funny languages.
The actual mechanics of how to compute string width make for an interesting
lesson in UNIX archaeology, but suffice it to say that I’ve travelled the path
for you, with help from Tanaka Akira of
pp fame, and I am happy to announce
the release of the Ruby console gem.
The console gem currently provides these two methods:
Console.display_width: calculates the display width of a string
Console.display_slice: returns a substring according to display offset and display width parameters.
There is one horrible caveat outstanding, which is that I haven’t managed to get it to work on Ruby 1.8. Patches to this effect are most welcome, as are, of course, comments and suggestions.