realbasic-nug
[Top] [All Lists]

Re: Doh! Variant.Hash is case insenstive

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: Doh! Variant.Hash is case insenstive
From: Phil M <phil at mobleybros dot com>
Date: Wed, 30 Nov 2005 20:05:45 -0800
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <BFB3CC7E dot 5C9C%Roph at kc dot rr dot com>
On Nov 30, 2005, at 7:39 PM, Roph wrote:

Actually you have to have the Little-Endian value defined because of the UShort. Besides giving a better distribution, it also runs almost twice as fast in REALbasic as grabbing each byte.

Yes, processing the shorts is great. I wonder about the end-case condition however in the event that the string has an odd number of bytes and you map to the string in memory. I'd have to go stare at it some more to be happy about that.

I just added a byte to the string if it needed it. I used the "Ceil (s.Size / 2) * 2" to find the size that I needed and then changed the size of the MemoryBlock (I cannot remember if this feature was introduced in REALbasic 2005):

  s.Size = kMax

This was just a modified Adler-32 Checksum. I think the normal Adler-32 starts with a=1 and b=0 with an option parameter for a to take a different value. I just added the prime numbers for a and b to give short strings a kick-start.

Sure. if you have the Adler32 algorithm or a pointer to it I'd love to see it. If it's really just this simple, that might be a fine algorithm to use, and there are already c-coded plugins for it I believe.

I took it from several different sources, and I think I ended up with one of the clearer (but not as optimized) versions of the algorithm. Here is I think the original algorithm:

http://leithal.cool-tools.co.uk/sourcedoc/mysql509/html/ adler32_8c-source.html

Yep, I knew it was in either eCryptIt or MBS, and I have both. And I've read enough to know it's probably the best of the canned-hash algorithms out there for my purpose. What I don't have is a description of the algorithm to know what it's doing.

Here is the best description I can find right now:

Adler-32 is composed of two sums accumulated per byte: s1 is the sum of all bytes, s2 is the sum of all s1 values. Both sums are done modulo 65521. s1 is initialized to 1, s2 to zero. The Adler-32 checksum is stored as s2*65536 + s1 in network byte order.

The following C code computes the Adler-32 checksum of a data buffer. It is written for clarity, not for speed. The sample code is in the ANSI C programming language. Non C users may find it easier to read with these hints:

  &      Bitwise AND operator.
>> Bitwise right shift operator. When applied to an unsigned quantity, as here, right shift inserts zero bit(s) at the left. << Bitwise left shift operator. Left shift inserts zero bit (s) at the right.
  ++     "n++" increments the variable n.
  %      modulo operator: a % b is the remainder of a divided by b.

#define BASE 65521 /* largest prime smaller than 65536 */
/*
Update a running Adler-32 checksum with the bytes buf[0..len-1] and return the updated checksum. The Adler-32 checksum should be initialized to 1.

Usage example:
  unsigned long adler = 1L;
  while (read_buffer(buffer, length) != EOF) {
    adler = update_adler32(adler, buffer, length);
  }
  if (adler != original_adler) error();
*/

unsigned long update_adler32(unsigned long adler, unsigned char *buf, int len)
{
    unsigned long s1 = adler & 0xffff;
    unsigned long s2 = (adler >> 16) & 0xffff;
    int n;
    for (n = 0; n < len; n++) {
        s1 = (s1 + buf[n]) % BASE;
        s2 = (s2 + s1)     % BASE;
    }
    return (s2 << 16) + s1;
}


/* Return the adler32 of the bytes buf[0..len-1] */

unsigned long adler32(unsigned char *buf, int len)
{
    return update_adler32(1L, buf, len);
}
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

<Prev in Thread] Current Thread [Next in Thread>