on 6/30/05 11:12 AM, Brady Duga at duga at ljug dot com wrote:
>
> On Jun 30, 2005, at 7:02 AM, Chris Little wrote:
>>
>> The simplest example would be names. Take the contrived example of
>> the
>> names Éastwôôd and Wu. Lexicographically Éastwôôd should come
>> before Wu.
>
> This is an oversimplification. Although that may be the sort order
> for American English, it may not be for other languages (don't
> actually know). The classic (Unicode) example is 'ø' - in most
> languages it is considered a variant of 'o', but it is not the case
> in Norwegian and Danish, where it is sorted after 'z'. Even in that
> case, sorts should not be done using code-point values, a mistake
> people often make.
>
> That said, I am a little surprised by the behavior of Rb. I wonder if
> they have some special case code for UTF8 vs other encodings. I would
> expect they are just asking the system to perform the comparison -
> perhaps there is a problem with the way they are passing the data to
> the system call. It is also entirely possible the system calls are
> broken.
It is an over simplification but it highlights the problem.
The conclusion I've come to is that StrComp just doesn't do what I want it
to. That it doesn't give the same results for UTF-8 and MacRoman encoded
strings is probably a bug but it's not clear from the documentation which is
the incorrect behavior. I wouldn't be surprised if the MacRoman string
comparison ignored accents during its comparison while the UTF-8 comparison
doesn't.
The StrComp docs imply that at the point two strings are different it uses
the ascii value of that character to determine ordering. What I need is
locale awareness to decide if indeed the characters are different and which
is greater. It may be that this is what StrComp is supposed to do but
doesn't. I'll have to think some more about what to log.
Chris
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|