blog dds

2008.01.23

The Mysterious TreeMap Type Signature

For my lecture notes on file handling I wrote a small Java program to display the number of characters that fall in each Unicode block, and got bitten by an unexpected runtime error. Angelika Langer, a wizard of Java Generics, kindly provided me with an explanation of the JDK design, which I'd like to share.

Here is the corrected program.

import java.io.*;
import java.util.*;
import java.lang.Character.UnicodeBlock;

/**
 * Count and display for the specified input file
 * the number of characters contained in various Unicode blocks .
 * @author Diomidis Spinellis
 */
class CharCount {
    public static void main(String args[]) {
	if (args.length != 2) {
	    System.err.println("Usage: CharCount file encoding");
	    System.exit(1);
	}

	// Open file
	BufferedReader in = null;
	try {
	    in = new BufferedReader(new InputStreamReader(
	        new FileInputStream(args[0]), args[1]));
	} catch (FileNotFoundException e) {
	    System.err.println("Unable to open file " + args[0] + ": " + e.getMessage());
	    System.exit(1);
	} catch (UnsupportedEncodingException e) {
	    System.err.println("Unsupported encoding " + args[1] + ": " + e.getMessage());
	    System.exit(1);
	}

	// Count characters in blocks
	HashMap<Character.UnicodeBlock, Integer> count
	    = new HashMap<Character.UnicodeBlock, Integer>();
	try {
	    int c;
	    while ((c = in.read()) != -1) {
		Character.UnicodeBlock u = Character.UnicodeBlock.of(c);
		    Integer oldN = count.get(u);
		    if (oldN == null)
			count.put(u, 1);
		    else
			count.put(u, oldN + 1);
	    }
	    in.close();
	} catch (Exception e) {
	    System.err.println("Error reading character: " + e.getMessage());
	    System.exit(1);
	}

	// Display results
	for (Map.Entry<Character.UnicodeBlock, Integer> s : count.entrySet())
	    System.out.println(s.getKey() + ": " + s.getValue());
    }
}
In the original program I used TreeMap instead of HashMap. Given this (very good) clue, can you imagine the problem?

The problem I faced was the following runtime error:

Error reading character:
java.lang.Character$UnicodeBlock cannot be cast to java.lang.Comparable
TreeMap can't be used with UnicodeBlock without explicitly providing a comparator.

Runtime errors that could have been caught at compile time irk me tremendously, because they point to areas where software quality lags due to language, compiler, or library deficiencies. In my case the fact that the key should be comparable is documented in the specification of the library, so why couldn't the compiler detect the problem at runtime?

As I investigated the matter I found in the wonderful Java Generic Tutorial that the error could be avoided if the type signature of TreeMap was

class TreeMap< Key extends Comparable<Key> ,Data>
instead of the JDK's
Class TreeMap<K,V>
Such a signature would cause a compile-time error in my original code, and thereby save me from the runtime error. Why isn't the signature of the JDK like this? Blinded by my quest for static type safety I refused to look at the larger picture, and couldn't find an answer.

I turned to Angelika Langer for help and this is what I found. The JDK implementation can't use an "extends Comparable" signature, for backward compatibility, and (more importantly) to allow its construction with a comparator. In the second case the keys don't need to implement the Comparable interface. I presume that Sun could deprecate TreeMap and extend it into two classes, one with a specified comparator, and one with a type signature requiring keys to implement the Comparable interface.

Read and post comments, or share through   


Creative Commons License Last modified: Wednesday, January 23, 2008 12:34 am
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.