gettingstarted
[Top] [All Lists]

reading non-ASCII chars in a binary file (was Re: Getting Started Digest

To: "Getting Started" <gettingstarted at lists dot realsoftware dot com>
Subject: reading non-ASCII chars in a binary file (was Re: Getting Started Digest #112)
From: "Joseph J. Strout" <joe at realsoftware dot com>
Date: Tue, 25 Nov 2003 14:31:25 -0600
References: <200311250610 dot 1aoDTc4qo3NZFkN0 at swallow> <3FC3B68F dot 90603 at ix dot netcom dot com>
At 3:07 PM -0500 11/25/03, Peter Gatti wrote:

Once again, my problem is blocks of text that are taken from the net may occasionally have glyphs in them.

Though it sounds like your strings are fine at this point, I'd still like to know exactly what you mean by "taken from the net" here.

The problem arises when I want to save the text to a binary file, if there are glyphs in any one block of text, it corrupts the file.

No, it doesn't. The file is fine; the trouble is almost certainly that when you read your data back in, you're not telling RB how to interpret this binary data as text. (Or, as was discussed here recently, your file-reading code is getting confused because it has mixed up bytes and characters.)

So it would seem that I would first need a way to detect if a block of text has a glyph in it, a way to determine which text encoding it is in and finally to apply that text encoding to it.

No. What you need is to first, make sure you always use the proper byte vs. character functions (for example, the parameter to BinaryStream.Read is a byte count, so if you're going to save a number to be used as that parameter, it'd better be the LenB of your string rather than the Len of it). Second, you need to either choose one encoding and always use that, or save the encoding of each string to the file along with its length and its text data. Either way, you must then be sure to define the encoding of the strings when you read them back in (the Read method has an optional parameter for exactly that purpose).

Come to think of it, it would be nice if edit fields and listbox cells had a method for extracting the text encoding information they use which allows them to properly display these glyphs in the first place.

They do; for example, EditField1.text.Encoding gives you the encoding of EditField1.text (and this will always be UTF-8 in the current version of RB).

Cheers,
- Joe

--
,------------------------------------------------------------------.
|    Joseph J. Strout           REAL Software, Inc.                |
|    joe at realsoftware dot com       http://www.realsoftware.com        |
`------------------------------------------------------------------'

- - -
Unsubscribe or switch delivery mode:
<http://support.realsoftware.com/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

<Prev in Thread] Current Thread [Next in Thread>