Saturday, April 23, 2011

Programmers Topic: comparing UTF-8 strings in Cocoa

A short hint to something that is not documented correctly by Apple:
Though all Apple OSes use UTF-8 or UTF-16 to store and process strings, you need to know that not all unicode chars are well-defined. For instance there are two possibilities to create a German Umlaut like ä, ö or ü. First there is a unicode code for each one, and second you can compose these from 'o' + the unicode code for two Umlaut-dots.

This is a problem when you compare strings that you did not create by yourself (like filenames), because NSString's isEqualToString: method compares bytewise so 'Blüte' is not nescessarily equal to 'Blüte' depending on the Umlaut-Composition. I'm sure there are lots of other characters in unicode with the same problem like french é, à, î et cetera.

Luckily the solution is quite easy:
use NSString's localizedCompare: method and check if it returns NSOrderedSame.

BOOL isEqual = ([oneFilename localizedCompare: otherFilename] == NSOrderedSame);

0 comments:

Post a Comment