On Dec 30, 2005, at 9:23 AM, Theodore H. Smith wrote:
On 30 Dec 2005, at 13:22, Rob Laveaux wrote:
On 30-dec-2005, at 3:18, "Theodore H. Smith" <delete at elfdata dot com>
wrote:
What about the rest of the world? :D
They're out of luck.
The application I'm working on is only targeting the European market.
The regex I use covers just about all characters used in the
countries where it will be marketed. So that is good enough. Besides
it has to communicate with a backend system which has the same
restrictions on allowed characters.
This is actually bad practice. What seems like a good idea now could
seem like a bad idea in the future. Even if you DO get away with this
bad practice, you could end up in trouble in the future by applying
the same approach. Better to not pretend that non-European languages
don't exist. I can forsee situations when a decision like this can
come back and bite you.
If you need non ASCII punctuation stripped, better to do it in strict
accordance with the Unicode spec. If you can't figure out how to do it
yourself, you should ask how to do it in accordance with the Unicode
spec.
It shouldn't be too hard to process UnicodeData.txt or whatever file
it is, to extract the relevant information into a smaller specific
table. Then use that table for your character stripping.
So, how is Rob's approach different from
"I'd rather check for anthing above 127, and wait till someone
complains that it didn't filter ” or “, and then see if that poses
enough problem to make it worth all the extra effort to do it the
proper way, which is to do it according to the data in the Unicode data
files." ?
--------------
Charles Yeomans
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|