realbasic-nug
[Top] [All Lists]

Re: RealSQLDatabase text index

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: RealSQLDatabase text index
From: Norman Palardy <npalardy at great-white-software dot com>
Date: Thu, 30 Mar 2006 23:54:01 -0800
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <JNEMKODMMFHECCENBEECOEFEHHAA dot rblists at rbtips dot com> <D2E55A7F-0A5D-4E91-AA1B-353A4B2B8E63 at sqlabs dot net> <7EFF1391-03DB-492E-BC55-4E723CD0B6DF at mac dot com>

On Mar 30, 2006, at 11:36 PM, Guyren Howe wrote:


You could certainly do this in REALbasic. A useful, free resource for the dictionary etc would be WordNet <http://wordnet.princeton.edu/>.

But you're getting into some moderately complex stuff here. See, for example: <http://citeseer.ist.psu.edu/context/20836/0> to see that this is an area of active research for computer scientists. There are standard approaches, though. Typical is to index all the substrings of a particular length of each word, along with other information such as in what order they occur. As you can imagine, the index can wind up being many times the size of the original data.

Depending on your project, a compelling alternative might be Lucene: <http://lucene.apache.org/java/docs/>. Lucene is very, very good at this kind of thing.

OTOH, I'd love to see someone put together an all-REALbasic solution for this. Extra bonus points if you implement an index that lets me do grep searches. :-)

I just did a brute force inverted index with a very basic definition for "word"

Basically gave me a pretty decent search and combined with a LIKE clause was useful for my case

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>


<Prev in Thread] Current Thread [Next in Thread>