A couple afterthoughts:
1) I realized there's a bug in this since MidB returns a character, not an
integer. The line showing:
> theHash = theHash + MidB( inString, index, 1 )
should instead show:
theHash = theHash + AscB( MidB( inString, index, 1 ) )
2) This approach will of course end up implicitly caching the encoding as
well as the string contents, or stated more simply, this hash is
encoding-dependent. No big deal for my needs, but worth keeping in mind.
RB's Variant.hash is presumably encoding-dependent as well.
3) If you knew you were always working with 16-bit characters, say because
your internal encoding was UTF-16 ... (cough, cough, Ruslan, cough, cough)
ooops, excuse me ... then you'd of course want to construct the hash by
adding 16-bit character representations in the loop instead of 8-bit
character values. In this case, a more appropriate value for rotateCt would
be a number like 13, 15, 17, or 19 to get a nice hashing distance for
strings of length 2. I'd probably opt for rotateCt = 15
The corrected code including the call to AscB is shown in the edited reply
text below for the 8-bit case:
>
> Here's the algorithm I'd like to have for a hash that would meet my needs
> quite well:
>
> Function strHash( inString as String, previousHash as Integer = 0 ) _
> as Integer
>
> // Return an integer hash for inString. The optional prevoiusHash
> // argument allows the incremental calculation of a hash on a
> // series of substrings.
>
> const rotateCt = 7
> // Define rotateCt to be the number of bits by which the hash value
> // will be rotated prior to adding each character byte of the string.
> // Any value that is relatively prime with respect to the number of
> // bits in the hash will work fine. Since the hash is a 32-bit
> // integer and 32 is a power of 2, then any odd value will work for
> // this purpose. To get some good hashing distance for smaller
> // strings of say, less than four characters, values like 5, 7, and
> // 9 will work particularly well. Values like 1, 3, 31, 29 would
> // be poor choices if small strings will be hashed.
>
> const rotateFactor = &h80
> // Factor to rotate low-order bits left by rotateCt
> // Set to (2 ^ rotateCt) i.e. pow(2,rotateCt)
>
> const rightShiftFactor = &h02000000
> // Set to (2 ^ (32 - rotateCt) )
> const rotateMask = &hFFFFFF80
> // Set to BitXor( &hFFFFFFFF, rotateFactor - 1 )
>
> dim strLen as Integer
> dim theHash as Integer
>
> theHash = previousHash
> strLen = lenB( inString )
>
> for index as integer = 1 to strLen
> // Rotate the 32-bit hash rotateCt bits to the left.
> theHash = ( theHash * rotateFactor ) _ // shift low order bits left
> + BitAnd( rotateMask, ( theHash / rotateFactor ) )
> // shift high order bits right by (32 - rotateCt)
> theHash = theHash + AscB( MidB( inString, index, 1 ) )
> next
>
> return theHash
> End Function
>
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|