Don't use String.ToUpper nor String.Length for String Comparison

Two strings can be Equal, Even if Their Lengths Are Not.

My favorite example of this is the German sharp-s character, which is used in the German word straße. The length of straße is 6 characters, and the length of the upper case form of the word (STRASSE) is 7 characters. These strings are considered as equal in German, even though the lengths don't match. [Larry Osterman]

Ligatures cause can cause the same problem and I am surprised how common they are, even in English and that most programmers are not aware of them whatsoever. Ligatures are more common in other languages, but are very common in good typography. For example in printed books look at words with a capital F immediately followed by a lowercase i (e.g. Find, Fine, etc...). I've noticed ligatures frequently in English PDF documents as well.

String.ToUpper Uses the Current Culture, not the String's Culture, and it's Slow Too

Chris Taylor has a post that shows just how drastic this can be in a simple test when compared with String.Compare (String.Equals in 2.0 should have similar results, and IMHO is more expressive). Additionally Without a specific culture, String.ToUpper uses the current culture, which may or may not be the culture that created the string.
One esoteric problem is the Turkish İ problem. Turkish has four letter Is: I, İ, ı and i. In Turkish, UC("file") == "FİLE". This introduces some subtle bugs, such as the following:
   // Do not allow file:// URLs
   if (url.ToUpper().Left(4) == "FILE") return ERROR;
   getStuff(url);
[Tim Sneath]
If you call String.Compare(userString, “file:”, true) without specifying the locale, then as Tim pointed out, if the userString contains one of the Turkish ‘I’ characters, you won’t match correctly. If you use the invariant culture you will. [Larry Osterman]

posted @ Saturday, September 25, 2004 2:55 PM

Print

Comments on this entry:

# re: Don't use String.ToUpper nor String.Length for String Comparison

Left by Chris at 3/24/2005 11:21 PM

"culture"? Is that Java slang? When I was young, we called it "current locale".

# re: Don't use String.ToUpper nor String.Length for String Comparison

Left by Scott Willeke at 3/25/2005 12:08 AM

Actually, Java's slang is "Locale" (http://java.sun.com/j2se/1.4.2/docs/api/java/util/Locale.html) :) In .NET / CLI the System.Globalization.CultureInfo class is used to represent information used for globalization and localization. In my mind a "locale" represents a particular place while a "culture" represents a culmination of a society's behaviors and traits. For example, a particular culture may format dates a certain way or use a different default set of characters. While a locale (or "place") will have a "culture", culture information can be necessary even though one may not necessarily associate it with a place (which is why the CLI uses CultureInfo.InvariantCulture (http://msdn2.microsoft.com/library/system.globalization.cultureinfo.invariantculture). In practice Java & Windows traditional concept of a "locale" is used in a similar manner to the way a culture is used in the CLI.

# mortgage loans

Left by mortgage loans at 9/30/2006 9:10 PM

shaping Noll predominating discusses bleary?Tyburn cuckoos penetrable <A HREF="http://www.baby-mortgage.com/">refinance</A> http://www.baby-mortgage.com/

Comments have been closed on this topic.