Showing only posts with label console. See the RSS for this label, or see all posts.

Found this message in my own commit logs for the console rubygem and I figured I’d post it here in case some other poor fool is stuck writing code with mbrtowc:

Well this is a fun little feature of mbrtowc that I discovered! Give it one bad input and it will barf on all future inputs for the rest of the program, unless you pass in your own cleared shift state object.

The mbrtowc manpage is helpfully vague:

“If the multibyte string starting at s contains an invalid multibyte sequence before the next complete character, mbrtowc() returns (size_t) -1 and sets errno to EILSEQ. In this case, the effects on *ps are undefined.”

and

“If ps is a NULL pointer, a static anonymous state only known to the mbrtowc function is used instead. Otherwise, *ps must be a valid mbstate_t object.”

The reader is left to infer, of course, that the “undefined effects” of the “static anonymous state only known to the mbrtowc function” are that of “breaking all successive calls for the rest of the execution of the program”.

William Morgan, July 7, 2011.

Most programmers are by now familiar with the difference between the number of bytes in a string and the number of characters. Depending on the string’s encoding, the relationship between these two measures can be either trivially computable or complicated and compute-heavy.

With the advent of Ruby 1.9, the Ruby world at last has this distinction formally encoded at the language level: String#bytesize is the number of bytes in the string, and String#length and String#size the number of characters.

But when you’re writing console applications, there’s a third measure you have to worry about: the width of the string on the display. ASCII characters take up one column when displayed on screen, but super-ASCII characters, such as Chinese, Japanese and Korean characters, can take up multiple columns. This display width is not trivially computable from the byte size of the character.

Finding the display width of a string is critical to any kind of console application that cares about the width of the screen, i.e. is not simply printing stuff and letting the terminal wrap. Personally, I’ve been needing it forever:

  1. Trollop needs it because it tries to format the help screen nicely.
  2. Sup needs it in a million places because it is a full-fledged console application and people use it for reading mail in all sorts of funny languages.

The actual mechanics of how to compute string width make for an interesting lesson in UNIX archaeology, but suffice it to say that I’ve travelled the path for you, with help from Tanaka Akira of pp fame, and I am happy to announce the release of the Ruby console gem.

The console gem currently provides these two methods:

  • Console.display_width: calculates the display width of a string
  • Console.display_slice: returns a substring according to display offset and display width parameters.

There is one horrible caveat outstanding, which is that I haven’t managed to get it to work on Ruby 1.8. Patches to this effect are most welcome, as are, of course, comments and suggestions.

Try it out!.

William Morgan, May 19, 2010.