On Feb 28, 2005, at 4:43 PM, Ronald Vogelaar wrote:
Charles Yeomans wrote:
Try testing it when txt and txtSep have different encodings.
different encodings... when would that happen?
Say you read a file made in an old version of SimpleText -- so the file
is encoded in MacRoman. Now say you're told the separator is '•'
(that's a bullet, if it doesn't come through on e-mail). So you
hard-code that, or create a constant, or whatever, in REALbasic.
Only... uh, oh, that's a UTF-8 character, and not the same binary code
as the equivalent MacRoman character. So your code would totally fail,
while CountFields would work just fine.
You could, of course, convert both strings to the same encoding...
though that's going to slow down your algorithm, and even worse, it's
no guarantee. There is more than one way to encode some characters in
Unicode, and thus the same character can have different binary codes
even in the same encoding!
The encoding thing is something most people don't think about, but it's
constantly waiting to bite you on the arse. It is extremely important
to educate yourself on this matter. Search the archives for Joe's Text
Encoding FAQ (I think that's what it's called) -- or perhaps Joe can
post it for us again, as it's been a while! :-)
-Thomas
Personal web page:
<http://homepage.mac.com/thomasareed/>
My shareware:
<http://www.bitjuggler.com/>
Free REALbasic code:
<http://www.bitjuggler.com/extra/>
There are 10 kinds of people in the world -- those who understand binary
numbers and those who don't.
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|