realbasic-nug
[Top] [All Lists]

High Performance Code (Was: Bitwise Shift operators)

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: High Performance Code (Was: Bitwise Shift operators)
From: Frank Condello <developer at chaoticbox dot com>
Date: Sat, 29 Sep 2007 02:51:48 -0400
Delivered-to: listarchive at realsoftware dot com
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <BAY107-DAV8132ACB18FF1E0678E2E693B20 at phx dot gbl>
On 28-Sep-07, at 5:35 PM, Daniel Stenning wrote:

> We have Bitwise OR, AND, and XOR operators. We need to be able to  
> shift
> bits left and right. So much high performance code depends on bit  
> shift
> operations.

I've signed on the shifting request but your "high performance code"  
remark gave me pause. I keep telling myself Rb is "fast enough" but  
just the other day I realized "fast enough" isn't always fast enough...

I have a plugin that provides a Carmack style square root  
approximation. The C code looks like this (plus wrappers for the Rb  
global methods):

   static inline float _FastInvSqrtf (float x)
   {
     float xhalf = 0.5f*x;
     int i = *(int*)&x;
     i = 0x5f375a86 - (i>>1);
     x = *(float*)&i;
     x = x*(1.5f-xhalf*x*x);
     return x;
   }

   static inline float _FastSqrtf (float x)
   {
     return x * _FastInvSqrtf(x);
   }

This is pretty quick when called from C code - around 4 times faster  
than a real sqrt - but Rb plugin function calls aren't exactly light  
weight. Within Rb it's still about twice as fast as the built-in  
sqrt, but if you declare out to the system supplied sqrt (to avoid  
Rb's wrapper function) the real sqrt is just a hair slower than this  
approximation - the plugin function call overhead basically kills  
much of the benefit.

So, for fun, I implemented the approximation in pure RB code. I knew  
the bit shift would be a problem but I was curious as to how Rb would  
handle it - here's what I ended up with:

   Function FastSqrt(x As Single) As Single
     #pragma BackgroundTasks False
     #pragma BoundsChecking False
     #pragma NilObjectChecking False
     #pragma StackOverflowChecking False
     Static m As New MemoryBlock(4)
     Static p As Ptr = m
     Dim i As Integer
     Dim y As Single
     p.Single(0) = x
     i = p.Integer(0)
     i = &h5f375a86 - Bitwise.ShiftRight(i,1)
     p.Integer(0) = i
     y = p.Single(0)
     y = y*(1.5-0.5*x*y*y)
     return x * y
   End Function

That runs at about 2X slower than Rb's built-in sqrt, and 3X slower  
than the plugin approximation or sqrt declare (and yes, all profiling  
was done in a release build). Seeing as this was academic at this  
point I commented out the bit shifting line to discount the function  
call overhead - it was still over twice as slow as the plugin method  
despite doing less work and spitting out gibberish.

The moral of this story: If you want high-performance with large  
vectors, stick the math *and* the loop into a plugin. More operators  
are a start, but Rb really needs core optimizations - simple math and  
pointer dereferencing is still WAY slower than equivalent C code.  
Perhaps the fact that Rb doesn't really like 4-byte floats has  
something to do with this, but that's just another excuse.

Frank.
<http://developer.chaoticbox.com/>



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>


<Prev in Thread] Current Thread [Next in Thread>