On Sep 29, 2007, at 11:02 AM, Daniel Stenning wrote:
> On 29/9/07 07:51, "Frank Condello" <developer at chaoticbox dot com> wrote:
>
>> On 28-Sep-07, at 5:35 PM, Daniel Stenning wrote:
>>
>>> We have Bitwise OR, AND, and XOR operators. We need to be able to
>>> shift
>>> bits left and right. So much high performance code depends on bit
>>> shift
>>> operations.
>>
>> I've signed on the shifting request but your "high performance code"
>> remark gave me pause. I keep telling myself Rb is "fast enough" but
>> just the other day I realized "fast enough" isn't always fast
>> enough...
>>
>> I have a plugin that provides a Carmack style square root
>> approximation. The C code looks like this (plus wrappers for the Rb
>> global methods):
>>
>> static inline float _FastInvSqrtf (float x)
>> {
>> float xhalf = 0.5f*x;
>> int i = *(int*)&x;
>> i = 0x5f375a86 - (i>>1);
>> x = *(float*)&i;
>> x = x*(1.5f-xhalf*x*x);
>> return x;
>> }
>>
>> static inline float _FastSqrtf (float x)
>> {
>> return x * _FastInvSqrtf(x);
>> }
>>
>> This is pretty quick when called from C code - around 4 times faster
>> than a real sqrt - but Rb plugin function calls aren't exactly light
>> weight. Within Rb it's still about twice as fast as the built-in
>> sqrt, but if you declare out to the system supplied sqrt (to avoid
>> Rb's wrapper function) the real sqrt is just a hair slower than this
>> approximation - the plugin function call overhead basically kills
>> much of the benefit.
>>
>> So, for fun, I implemented the approximation in pure RB code. I knew
>> the bit shift would be a problem but I was curious as to how Rb would
>> handle it - here's what I ended up with:
>>
>> Function FastSqrt(x As Single) As Single
>> #pragma BackgroundTasks False
>> #pragma BoundsChecking False
>> #pragma NilObjectChecking False
>> #pragma StackOverflowChecking False
>> Static m As New MemoryBlock(4)
>> Static p As Ptr = m
>> Dim i As Integer
>> Dim y As Single
>> p.Single(0) = x
>> i = p.Integer(0)
>> i = &h5f375a86 - Bitwise.ShiftRight(i,1)
>> p.Integer(0) = i
>> y = p.Single(0)
>> y = y*(1.5-0.5*x*y*y)
>> return x * y
>> End Function
>>
>> That runs at about 2X slower than Rb's built-in sqrt, and 3X slower
>> than the plugin approximation or sqrt declare (and yes, all profiling
>> was done in a release build). Seeing as this was academic at this
>> point I commented out the bit shifting line to discount the function
>> call overhead - it was still over twice as slow as the plugin method
>> despite doing less work and spitting out gibberish.
>>
>> The moral of this story: If you want high-performance with large
>> vectors, stick the math *and* the loop into a plugin. More operators
>> are a start, but Rb really needs core optimizations - simple math and
>> pointer dereferencing is still WAY slower than equivalent C code.
>
> I can only concur. I tend to put any C code I need into a dylib
> instead of a
> plugin for so many reasons. Speed and also the drudgery of the plugin
> wrapper code. It would help if RB at least allowed us to place dynamic
> libraries into the plugins folder and then built those libraries
> directly
> into the final application package.
>
>> Perhaps the fact that Rb doesn't really like 4-byte floats has
>> something to do with this, but that's just another excuse.
>
> Whats the issue with 4 byte floats ? This is news to me - ( but
> then a lot
> of such things are.. )
>
> If RS wants RB to gain wider acceptance as more than just a quick
> RAD IT
> tool for knocking up cross-platform GUI applications they need to
> sort out
> these core performance issues once and for all. It will help to
> attract a
> new user base such as the scientific and engineering community.
You write as if a quick RAD IT tool for knocking up cross-platform
GUI applications is not so much.
>
> RB has taken a step towards allowing us to code for better
> performance -
> notedly by adding Struct and Ptr datatypes. But its sad to see they
> haven't
> refined it more. This is "bootstrap" stuff. The more they can get
> these
> basics right the more they can move their internal code over to RB
> which
> benefits us all. I realise that most RB guys out there might not
> need this
> fine tuning, but every time we have to dip into C to eke out
> performance or
> functionality it just reinforces the impression that RB is not really
> "pro-grade".
Only among the stupid. To write high-performance code, you need to
be close to the machine (C, for example) or use a language optimized
for the task (Fortran).
Charles Yeomans
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>
|