On Jun 30, 2005, at 7:02 AM, Chris Little wrote:
The simplest example would be names. Take the contrived example of
the
names Éastwôôd and Wu. Lexicographically Éastwôôd should come
before Wu.
This is an oversimplification. Although that may be the sort order
for American English, it may not be for other languages (don't
actually know). The classic (Unicode) example is 'ø' - in most
languages it is considered a variant of 'o', but it is not the case
in Norwegian and Danish, where it is sorted after 'z'. Even in that
case, sorts should not be done using code-point values, a mistake
people often make.
That said, I am a little surprised by the behavior of Rb. I wonder if
they have some special case code for UTF8 vs other encodings. I would
expect they are just asking the system to perform the comparison -
perhaps there is a problem with the way they are passing the data to
the system call. It is also entirely possible the system calls are
broken.
At this point I'm wondering about writing a plug-in that would use
CFString's on Mac so I could call CFStringCompare with
kCFCompareLocalized.
It would be expensive to create the CFString's but it wouldn't be
lossy. I
would have to research the functions to use on Windows.
A plug in seems like overkill. I think this one declare will do it
for you in 2005:
declare function CFStringCompare Lib "CarbonLib" (str1 as CFString,
str2 as CFString, flags as integer) as integer
then:
dim res as integer
res = CFStringCompare("Éastwôôd", "Wu", kCFCompareLocalized) //where
kCFCompareLocalized = 32
I haven't tested it, so there may be a typo, but it should work.
--Brady
The La Jolla Underground
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|