realbasic-nug
[Top] [All Lists]

Re: Doh! Variant.Hash is case insenstive Re: Variant.Hash for objects is

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: Doh! Variant.Hash is case insenstive Re: Variant.Hash for objects is not unique in RB 2005 v4
From: Roph <Roph at kc dot rr dot com>
Date: Wed, 30 Nov 2005 18:54:59 -0600
Delivered-to: realbasic-nug at lists dot realsoftware dot com
Thread-index: AcX2C8wlCpzMm2H/Edq96wANk0o3JgABg5TV
Thread-topic: Doh! Variant.Hash is case insenstive Re: Variant.Hash for objects is not unique in RB 2005 v4
A couple afterthoughts:

1) I realized there's a bug in this since MidB returns a character, not an
integer.  The line showing:

>     theHash = theHash + MidB( inString, index, 1 )

should instead show:

      theHash = theHash + AscB( MidB( inString, index, 1 ) )

2) This approach will of course end up implicitly caching the encoding as
well as the string contents, or stated more simply, this hash is
encoding-dependent.  No big deal for my needs, but worth keeping in mind.
RB's Variant.hash is presumably encoding-dependent as well.

3) If you knew you were always working with 16-bit characters, say because
your internal encoding was UTF-16 ... (cough, cough, Ruslan, cough, cough)
ooops, excuse me ... then you'd of course want to construct the hash by
adding 16-bit character representations in the loop instead of 8-bit
character values.  In this case, a more appropriate value for rotateCt would
be a number like 13, 15, 17, or 19 to get a nice hashing distance for
strings of length 2.  I'd probably opt for rotateCt = 15

The corrected code including the call to AscB is shown in the edited reply
text below for the 8-bit case:
   
> 
> Here's the algorithm I'd like to have for a hash that would meet my needs
> quite well:
>  
> Function strHash( inString as String, previousHash as Integer = 0 ) _
>     as Integer
> 
>     // Return an integer hash for inString.  The optional prevoiusHash
>     // argument allows the incremental calculation of a hash on a
>     // series of substrings.
> 
>     const rotateCt = 7
>     // Define rotateCt to be the number of bits by which the hash value
>     // will be rotated prior to adding each character byte of the string.
>     // Any value that is relatively prime with respect to the number of
>     // bits in the hash will work fine.  Since the hash is a 32-bit
>     // integer and 32 is a power of 2, then any odd value will work for
>     // this purpose.  To get some good hashing distance for smaller
>     // strings of say, less than four characters, values like 5, 7, and
>     // 9 will work particularly well.   Values like 1, 3, 31, 29 would
>     // be poor choices if small strings will be hashed.
>  
>     const rotateFactor = &h80
>                 // Factor to rotate low-order bits left by rotateCt
>                 // Set to (2 ^ rotateCt) i.e. pow(2,rotateCt)
> 
>     const rightShiftFactor = &h02000000
>                 // Set to (2 ^ (32 - rotateCt) )
>     const rotateMask = &hFFFFFF80
>                 // Set to BitXor( &hFFFFFFFF, rotateFactor - 1 )
>  
>     dim strLen as Integer
>     dim theHash as Integer
> 
>     theHash = previousHash
>     strLen = lenB( inString )
> 
>     for index as integer = 1 to strLen
>     // Rotate the 32-bit hash rotateCt bits to the left.
>     theHash = ( theHash * rotateFactor ) _ // shift low order bits left
>         + BitAnd( rotateMask, ( theHash / rotateFactor ) )
>         // shift high order bits right by (32 - rotateCt)
>     theHash = theHash + AscB( MidB( inString, index, 1 ) )
>     next
>     
>     return theHash
> End Function
>   


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

<Prev in Thread] Current Thread [Next in Thread>