realbasic-nug
[Top] [All Lists]

Re: Filtering non-alphanumeric characters

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: Filtering non-alphanumeric characters
From: Charles Yeomans <charles at declareSub dot com>
Date: Fri, 30 Dec 2005 11:46:12 -0500
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <20051230021813 dot 00E86F9BE4D at lists dot realsoftware dot com> <78464EA1-C48D-4305-882E-0F218C3822AF at pluggers dot nl> <3DA0B36C-B0F2-49FC-8EC5-6C7310D6AD1F at elfdata dot com>

On Dec 30, 2005, at 9:23 AM, Theodore H. Smith wrote:


On 30 Dec 2005, at 13:22, Rob Laveaux wrote:


On 30-dec-2005, at 3:18, "Theodore H. Smith" <delete at elfdata dot com> wrote:

What about the rest of the world? :D

They're out of luck.
The application I'm working on is only targeting the European market. The regex I use covers just about all characters used in the countries where it will be marketed. So that is good enough. Besides it has to communicate with a backend system which has the same restrictions on allowed characters.

This is actually bad practice. What seems like a good idea now could seem like a bad idea in the future. Even if you DO get away with this bad practice, you could end up in trouble in the future by applying the same approach. Better to not pretend that non-European languages don't exist. I can forsee situations when a decision like this can come back and bite you.

If you need non ASCII punctuation stripped, better to do it in strict accordance with the Unicode spec. If you can't figure out how to do it yourself, you should ask how to do it in accordance with the Unicode spec.

It shouldn't be too hard to process UnicodeData.txt or whatever file it is, to extract the relevant information into a smaller specific table. Then use that table for your character stripping.

So, how is Rob's approach different from

"I'd rather check for anthing above 127, and wait till someone complains that it didn't filter ” or “, and then see if that poses enough problem to make it worth all the extra effort to do it the proper way, which is to do it according to the data in the Unicode data files." ?

--------------
Charles Yeomans

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>


<Prev in Thread] Current Thread [Next in Thread>