realbasic-nug
[Top] [All Lists]

Re: High Performance Code (Was: Bitwise Shift operators)

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: High Performance Code (Was: Bitwise Shift operators)
From: Charles Yeomans <charles at declareSub dot com>
Date: Sat, 29 Sep 2007 12:14:20 -0400
Delivered-to: listarchive at realsoftware dot com
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <BAY107-DAV4EF390FE1D7DC2DFB5A4893B30 at phx dot gbl>
On Sep 29, 2007, at 11:02 AM, Daniel Stenning wrote:

> On 29/9/07 07:51, "Frank Condello" <developer at chaoticbox dot com> wrote:
>
>> On 28-Sep-07, at 5:35 PM, Daniel Stenning wrote:
>>
>>> We have Bitwise OR, AND, and XOR operators. We need to be able to
>>> shift
>>> bits left and right. So much high performance code depends on bit
>>> shift
>>> operations.
>>
>> I've signed on the shifting request but your "high performance code"
>> remark gave me pause. I keep telling myself Rb is "fast enough" but
>> just the other day I realized "fast enough" isn't always fast  
>> enough...
>>
>> I have a plugin that provides a Carmack style square root
>> approximation. The C code looks like this (plus wrappers for the Rb
>> global methods):
>>
>>    static inline float _FastInvSqrtf (float x)
>>    {
>>      float xhalf = 0.5f*x;
>>      int i = *(int*)&x;
>>      i = 0x5f375a86 - (i>>1);
>>      x = *(float*)&i;
>>      x = x*(1.5f-xhalf*x*x);
>>      return x;
>>    }
>>
>>    static inline float _FastSqrtf (float x)
>>    {
>>      return x * _FastInvSqrtf(x);
>>    }
>>
>> This is pretty quick when called from C code - around 4 times faster
>> than a real sqrt - but Rb plugin function calls aren't exactly light
>> weight. Within Rb it's still about twice as fast as the built-in
>> sqrt, but if you declare out to the system supplied sqrt (to avoid
>> Rb's wrapper function) the real sqrt is just a hair slower than this
>> approximation - the plugin function call overhead basically kills
>> much of the benefit.
>>
>> So, for fun, I implemented the approximation in pure RB code. I knew
>> the bit shift would be a problem but I was curious as to how Rb would
>> handle it - here's what I ended up with:
>>
>>    Function FastSqrt(x As Single) As Single
>>      #pragma BackgroundTasks False
>>      #pragma BoundsChecking False
>>      #pragma NilObjectChecking False
>>      #pragma StackOverflowChecking False
>>      Static m As New MemoryBlock(4)
>>      Static p As Ptr = m
>>      Dim i As Integer
>>      Dim y As Single
>>      p.Single(0) = x
>>      i = p.Integer(0)
>>      i = &h5f375a86 - Bitwise.ShiftRight(i,1)
>>      p.Integer(0) = i
>>      y = p.Single(0)
>>      y = y*(1.5-0.5*x*y*y)
>>      return x * y
>>    End Function
>>
>> That runs at about 2X slower than Rb's built-in sqrt, and 3X slower
>> than the plugin approximation or sqrt declare (and yes, all profiling
>> was done in a release build). Seeing as this was academic at this
>> point I commented out the bit shifting line to discount the function
>> call overhead - it was still over twice as slow as the plugin method
>> despite doing less work and spitting out gibberish.
>>
>> The moral of this story: If you want high-performance with large
>> vectors, stick the math *and* the loop into a plugin. More operators
>> are a start, but Rb really needs core optimizations - simple math and
>> pointer dereferencing is still WAY slower than equivalent C code.
>
> I can only concur. I tend to put any C code I need into a dylib  
> instead of a
> plugin for so many reasons. Speed and also the drudgery of the plugin
> wrapper code. It would help if RB at least allowed us to place dynamic
> libraries into the plugins folder and then built those libraries  
> directly
> into the final application package.
>
>> Perhaps the fact that Rb doesn't really like 4-byte floats has
>> something to do with this, but that's just another excuse.
>
> Whats the issue with 4 byte floats ?  This is news to me - ( but  
> then a lot
> of such things are.. )
>
> If RS wants RB to gain wider acceptance as more than just a quick  
> RAD IT
> tool for knocking up cross-platform GUI applications they need to  
> sort out
> these core performance issues once and for all. It will help to  
> attract a
> new user base such as the scientific and engineering community.

You write as if a quick RAD IT tool for knocking up cross-platform  
GUI applications is not so much.

>
> RB has taken a step towards allowing us to code for better  
> performance -
> notedly by adding Struct and Ptr datatypes. But its sad to see they  
> haven't
> refined it more.  This is "bootstrap" stuff.  The more they can get  
> these
> basics right the more they can move their internal code over to RB  
> which
> benefits us all. I realise that most RB guys out there might not  
> need this
> fine tuning, but every time we have to dip into C to eke out  
> performance or
> functionality it just reinforces the impression that RB is not really
> "pro-grade".

Only among the stupid.  To write high-performance code, you need to  
be close to the machine (C, for example) or use a language optimized  
for the task (Fortran).

Charles Yeomans
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>


<Prev in Thread] Current Thread [Next in Thread>