The Woes of Wide-Character Support in Terminals

Posted on

I've been working on a project with a curses UI, and on several occasions, I have been foiled by the lack of wide-character support in the various libraries I wanted to use. One of the first things I needed for my application was a curses text-entry UI element. It would've been nice to use libreadline, but I was unable to devise a sane manner of integrating Python's readline bindings into a curses application that needed to update the UI and accept user input asynchronously. I first prototyped the application using Python's curses.textpad module. In addition to lacking many of the libreadline keybindings I regularly use, it also had absolutely no support for wide, East-Asian characters (hitherto "CJK characters," short for "Chinese, Japanese and Korean characters"). While looking for a solution to this problem, I ran into an amusing blog post in Japanese lamenting similar issues written back in October of 2009. Ultimately, I created my own text entry widget with support for nearly all of the default Emacs bindings that libreadline supports in addition to support for CJK characters. Once that was finished, I also created a bare-bones curses pager that also has support for CJK characters.

I've gotten the UI elements in a stable state, so I would like to write unit tests for them. I wasn't able to find any existing frameworks for unit-testing text user interfaces (TUI) in Python, so I have started writing my own. The test harness initially had support for python3-pexpect and python3-pyte, but when I started writing unit tests with wide characters, I discovered that pexpect didn't support them. Sending strings to the virtual terminal with Japanese characters resulted in nothing being rendered on the virtual screen:

~$ python
Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pexpect.ANSI
>>> terminal = pexpect.ANSI.ANSI(2, 40)
>>> terminal.write("Wikipedia: メインページ\\n\\rコマドリ (Bird)")
>>> print(*terminal.get_region(1, 1, 2, 40), sep="\\n")
Wikipedia:
 (Bird)
>>>

Since pyte did not suffer from the same issue, and I consider CJK support paramount, I removed support for pexpect from my test harness. The first iteration of the TUI test harness simply checked to see if the state of the virtual screen after pressing some keys matched an expected state, but in order to fully test the text-entry UI element I created, I also need to validate the position of the cursor. This initially seemed simple enough; the virtual screen class has a cursor object asociated with it that holds the coordinates of the cursor on the virtual screen. After adding support for checking the cursor position to the test harness, I discovered one of the two tests I had failed and the other did not. Much to my dismay, the test that failed was the test with CJK characters in it. The cursor position reported by pyte differed wildly from what I expected:

======================================================================
FAIL: test_basic_typing_cjk (__main__.LineEditorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_tui.py", line 19, in test_basic_typing_cjk
    self.launch(test_case, size=(40, 1))
  File ".../src/tests/ttyutils.py", line 65, in launch
    size=size
  File ".../src/tests/ttyutils.py", line 116, in scrape
    callback(screen, parent_file, process)
  File ".../src/tests/ttyutils.py", line 52, in callback
    self.assertEqual(expected_cursor, cursor_position)
AssertionError: Tuples differ: (0, 20) != (0, 14)

First differing element 1:
20
14

- (0, 20)
+ (0, 14)

I immediately knew what the issue was: pyte is not aware of the actual width of the characters on the screen and assumes all characters can span no more than a single column. On a terminal using monospaced fonts, "ABC 123 メインページ" spans 20 columns, but since there are only 14 glyphs, pyte thinks the cursor is only at column 14. I checked pyte's open issues and found issue #9 reported a few years ago by another issue user also struggling with the lack of wide character support.

My heart goes out to all the other poor souls that just want proper CJK support and more so to anyone that's ever tried to wrestle with right-to-left languages in terminals.