blog dds

2010.08.24

Sane vim Editing of Unicode Files

Being able to use plain alphabeitc keys as editing commands is for many of us a great strength of the vi editor. It allows us to edit without hunting for the placement of the various movement keys on each particular keyboard, and, most of the time, without having to juggle in order to combine particular keys with ctrl or alt. However, this advantage can turn into a curse when editing files using a non-ASCII keyboard layout. When the keyboard input method is switched to another script (Greek in my case, or, say, Cyrillic for others) vi will stop responding to its normal commands, because it will encounter unknown characters. Here is how I've dealt with this problem.

The vim reincarnation of vi offers the langmap setting, which allows one to map specific characters of a particular script into the corresponding vi commands. For instance, I would map the Greek letter ξ (xi) to j, because they lie both in the same position on a Greek keyboard. I assume that for the same reasons a Russian user of a so-called Windows keyboard layout would map the Cyrillic letter o to j. Sadly, each different encoding of a particular script requires a different langmap command, and non-ASCII files are often encoded in various incompatible ways.

Worse, although vim from version 7.2.109 onward supports multi-byte langmap commands, putting many of them in the same vimrc file can be tricky. This is particularly true for the Windows UCS-2/UTF-16 encoding, which is not compatible with ASCII, and therefore can't be easily combined with the others.

I solved this problem by concatenating various custom-encoded langmap commands into a single vimrc file, and by using a separate file for UCS-2/UTF-16. Here, how my vimrc startup file looks like.

if &encoding == "utf-8"
	set langmap="[a map crafted for UTF-8 input codes]
elseif &encoding == "utf-16le"
	" Can't set the encoding directly; the command does it as follows:
	" Edit a file containing the langmap, paste its line into a buffer
	" and execute the buffer.
	" The buffer ends with a command to return to the original file
	silent execute "e /home/dds/.vim/langmap-greek-ucs-2le.vim"|silent execute "normal \"uyy@u"
elseif !exists(&encoding)
	" Set encoding to the default for most files I'm editing
	" (ISO-8859-7 and Windows code page 1253)
	set encoding=cp1253
	set langmap="[a map crafted for ISO-8859-7 input codes]
endif

Here are links to my vimrc and langmap-greek-ucs-2le.vim files, in case you find them useful. If you edit with vim in another non-US keyboard layout, I hope you'll add as a comment to this blog a pointer to a place where you've uploaded your corresponding commands.

Read and post comments, or share through   


Creative Commons License Last modified: Tuesday, August 24, 2010 1:24 am
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.