realbasic-nug
[Top] [All Lists]

Re: WritePString weirdness...

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: WritePString weirdness...
From: Joe Strout <joe at inspiringapps dot com>
Date: Sun, 30 Mar 2008 15:24:12 -0600
Delivered-to: listarchive at realsoftware dot com
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <226027DE-9E67-4701-90AD-12E8F3383CF6 at rbclass dot com> <A652AB13-FCF7-4D95-B6B1-5EC6EC3E84DE at inspiringapps dot com> <3E14F3DA-1A0F-47A7-A23A-5561C8EFDFC3 at rbclass dot com> <A0720DC7-6E68-4825-B1E7-06FA772B07E7 at inspiringapps dot com> <A08DAFF0-4131-4C31-BE88-965126CC576F at rbclass dot com> <09004169-C758-401C-BF6E-6FB615336097 at rbclass dot com> <1BD9DBC6-83AA-42FF-912E-010229927261 at sentman dot com> <6D8444CF-8F0F-4A8A-8117-B4C6E4EEE06C at rbclass dot com>
On Mar 30, 2008, at 2:58 PM, Mark O'Neill wrote:

>> What I think you're seeing is that the copyright symbol is translated
>> to 2 characters in UTF8 which is the default for strings in RB. You
>> can verify by msgbox str( len( CopyrightText)) + " " +
>> str( lenb( CopyrightText))
>
> So if I wanted to write out a "plain text" file, so the copyright
> symbol appears as a copyright symbol, how would I go about encoding
> that?

1. Stand, raise your right hand, and repeat after me: "there are  
hundreds of different ways to represent plain text as a series of  
bytes in a file."

2. Follow this with: "The copyright symbol is not in the ASCII  
character set."

3. Realize that, while ASCII characters tend to be the same in most  
encodings (with the notable exceptions of UTF-16 and UCS-4), non- 
ASCII characters such as the copyright symbol can vary dramatically  
>from encoding to encoding.

4. Repeat until all this sinks in.  :)

> Never used encodings before. :)

No, you've always used encodings; you can't represent text on a  
computer without them.  But I think you never understood that you  
were using them before.

It's really quite simple: an app that writes text to a file has to  
decide how to represent that text as bytes.  An app that reads text  
>from a file has to decide how to interpret bytes as text.  When those  
decisions are not the same, then the text that's read isn't the same  
as the text that was written.  Nothing mysterious about that.

If you want to increase the chances that well-written text editors  
will correctly guess the encoding of your file, you might consider  
starting it with a BOM -- which you can find described in the FAQ.   
But this is only a hint, which some apps will use and others will  
ignore.  There is no universal solution to this problem, and never  
will be until everybody in the world agrees upon one encoding --  
which, in my opinion, will most likely be UTF-8.

Best,
- Joe

--
Joe Strout
Inspiring Applications, Inc.
http://www.InspiringApps.com



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>


<Prev in Thread] Current Thread [Next in Thread>