Two strings can be Equal, Even if Their Lengths Are Not.
My favorite example of this is the German sharp-s character, which is used in the German word straße. The length of straße is 6 characters, and the length of the upper case form of the word (STRASSE) is 7 characters. These strings are considered as equal in German, even though the lengths don't match.
Ligatures cause can cause the same problem and I am surprised how common they are, even in English and that most programmers are not aware of them whatsoever. Ligatures are more common in other languages, but are very common in good typography. For example in printed books look at words with a capital F immediately followed by a lowercase i (e.g. Find, Fine, etc...). I've noticed ligatures frequently in English PDF documents as well.
String.ToUpper Uses the Current Culture, not the String's Culture, and it's Slow Too
has a post
that shows just how drastic this can be in a simple test when compared with String.Compare (String.Equals in 2.0 should have similar results, and IMHO is more expressive). Additionally Without a specific culture, String.ToUpper uses the current culture, which may or may not be the culture that created the string
One esoteric problem is the Turkish İ problem. Turkish has four letter Is: I, İ, ı and i. In Turkish, UC("file") == "FİLE". This introduces some subtle bugs, such as the following:
// Do not allow file:// URLs
if (url.ToUpper().Left(4) == "FILE") return ERROR;
If you call String.Compare(userString, “file:”, true) without specifying the locale, then as Tim pointed out, if the userString contains one of the Turkish ‘I’ characters, you won’t match correctly. If you use the invariant culture you will.
posted @ Saturday, September 25, 2004 2:55 PM