realbasic-nug
[Top] [All Lists]

Re: Doh! Variant.Hash is case insenstive Re: Variant.Hash for objects is

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: Doh! Variant.Hash is case insenstive Re: Variant.Hash for objects is not unique in RB 2005 v4
From: Roph <Roph at kc dot rr dot com>
Date: Wed, 30 Nov 2005 18:11:38 -0600
Delivered-to: realbasic-nug at lists dot realsoftware dot com
Thread-index: AcX2C8wlCpzMm2H/Edq96wANk0o3Jg==
Thread-topic: Doh! Variant.Hash is case insenstive Re: Variant.Hash for objects is not unique in RB 2005 v4
On 11/30/05 4:09 PM, "Charles Yeomans" <charles at declareSub dot com> wrote:
  
> I wrote a hash function for my case-sensitive dictionary using a
> standard algorithm lifted from Sedgewick's "Algorithms in C++". Here is
> a version of it.
> 

Umm... either I'm missing something here, or theHash as defined in the code
you sent below consists of an integer composed of the last four bytes of the
string to be hashed in consecutive bytes of the returned integer hash.  That
doesn't look very hashish to me.

Here's the algorithm I'd like to have for a hash that would meet my needs
quite well:
 
Function strHash( inString as String, previousHash as Integer = 0 ) _
    as Integer

    // Return an integer hash for inString.  The optional prevoiusHash
    // argument allows the incremental calculation of a hash on a
    // series of substrings.

    const rotateCt = 7
    // Define rotateCt to be the number of bits by which the hash value
    // will be rotated prior to adding each character byte of the string.
    // Any value that is relatively prime with respect to the number of
    // bits in the hash will work fine.  Since the hash is a 32-bit
    // integer and 32 is a power of 2, then any odd value will work for
    // this purpose.  To get some good hashing distance for smaller
    // strings of say, less than four characters, values like 5, 7, and
    // 9 will work particularly well.   Values like 1, 3, 31, 29 would
    // be poor choices if small strings will be hashed.
 
    const rotateFactor = &h80
                // Factor to rotate low-order bits left by rotateCt
                // Set to (2 ^ rotateCt) i.e. pow(2,rotateCt)

    const rightShiftFactor = &h02000000
                // Set to (2 ^ (32 - rotateCt) )
    const rotateMask = &hFFFFFF80
                // Set to BitXor( &hFFFFFFFF, rotateFactor - 1 )
 
    dim strLen as Integer
    dim theHash as Integer

    theHash = previousHash
    strLen = lenB( inString )

    for index as integer = 1 to strLen
    // Rotate the 32-bit hash rotateCt bits to the left.
    theHash = ( theHash * rotateFactor ) _ // shift low order bits left
        + BitAnd( rotateMask, ( theHash / rotateFactor ) )
        // shift high order bits right by (32 - rotateCt)
    theHash = theHash + MidB( inString, index, 1 )
    next
    
    return theHash
End Function
 

So all we need now is a C-savvy plugin writer who wants to proclaim "I can
name that tune in 5 minutes flat!"  There are lots of obvious optimizations
possible.  Use of "<<" and ">>" in C instead of * and / may generate code
that will run faster depending upon the processor and C compiler.  Use of
MidB to retrieve a byte out of a String in RB is always a bad idea compared
with other available solutions, including the one shown in the code below.

If I wanted to learn how to write a plugin, this looks like an ideal simple
first project.

> Assuming that you use a hash table whose size is prime (see Sedgewick
> for a table of sizes), it works pretty well.

The implication being that you use the low-order bits of the hash as your
table index.  Yes if the hash was well formed, that would make sense.

However, given a hash constructed in the manner shown above, you simply
calculate the table index as:
    
    tableIndex = strHash( aString ) mod ( ubound( table ) + 1 )
    
and it shouldn't matter whether the size of your table is prime or not.
Personally however, my superstious side agrees with Sedgewick and would at
least pick an odd size for the length of the table, if not a prime -- just
for luck  ;-) 

> Obviously a C  implementation could be much faster.

Yep, yep.  Maybe someone will write one.

> Function Hash(s as String) as Integer
>    #pragma DisableBackgroundTasks
> 
>    Static HashBlock as new MemoryBlock(256)
>    If HashBlock.Size < LenB(s) then
>      HashBlock.Size = 2*HashBlock.Size
>    End if
>    HashBlock.StringValue(0, LenB(s)) = s
>    dim N as Integer = LenB(s) - 1
>    dim theHash as Integer
>    For i as Integer = 0 to N
>      theHash = theHash*256 + me.hashBlock.Byte(i)
>    Next
>    Return theHash
> End Function
> 
> Charles Yeomans
>  


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

<Prev in Thread] Current Thread [Next in Thread>