Wednesday 20 April 2011

Locale-based or Natural Language based text comparison in java

The String class doesn't have the ability to compare text from a natural language perspective.

Its equals and compareTo methods compare the individual char values in the string. If the char value at index n in name1 is the same as the char value at index n in name2 for all n in both strings, the equals method returns true.

The java.text.Collator class provides natural language comparisons. Natural language comparisons depend upon locale-specific rules that determine the equality and ordering of characters in a particular writing system.A Collator object understands that people expect "cat" to come before "Hat" in a dictionary. Using a collator comparison, the following code prints cat < Hat.

Collator collator = Collator.getInstance(new Locale("en", "US")); 
//OR             Collator.getInstance(Locale.US);
int comparison ="cat", "Hat");
if (comparison < 0) {
System.out.printf("%s < %s\n", "cat", "Hat");
} else {
System.out.printf("%s < %s\n", "Hat", "cat" );

So this can be used for sorting of words based on locale, eg using Collections.sort() :

List<String> boyNames= new ArrayList<String>();

// Define a collator for US English.
Collator collator = Collator.getInstance(Locale.US);
// Sort the list base on the collator
Collections.sort(boyNames, collator);


No comments:

Post a Comment