On 28-Sep-07, at 5:35 PM, Daniel Stenning wrote:
> We have Bitwise OR, AND, and XOR operators. We need to be able to
> shift
> bits left and right. So much high performance code depends on bit
> shift
> operations.
I've signed on the shifting request but your "high performance code"
remark gave me pause. I keep telling myself Rb is "fast enough" but
just the other day I realized "fast enough" isn't always fast enough...
I have a plugin that provides a Carmack style square root
approximation. The C code looks like this (plus wrappers for the Rb
global methods):
static inline float _FastInvSqrtf (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f375a86 - (i>>1);
x = *(float*)&i;
x = x*(1.5f-xhalf*x*x);
return x;
}
static inline float _FastSqrtf (float x)
{
return x * _FastInvSqrtf(x);
}
This is pretty quick when called from C code - around 4 times faster
than a real sqrt - but Rb plugin function calls aren't exactly light
weight. Within Rb it's still about twice as fast as the built-in
sqrt, but if you declare out to the system supplied sqrt (to avoid
Rb's wrapper function) the real sqrt is just a hair slower than this
approximation - the plugin function call overhead basically kills
much of the benefit.
So, for fun, I implemented the approximation in pure RB code. I knew
the bit shift would be a problem but I was curious as to how Rb would
handle it - here's what I ended up with:
Function FastSqrt(x As Single) As Single
#pragma BackgroundTasks False
#pragma BoundsChecking False
#pragma NilObjectChecking False
#pragma StackOverflowChecking False
Static m As New MemoryBlock(4)
Static p As Ptr = m
Dim i As Integer
Dim y As Single
p.Single(0) = x
i = p.Integer(0)
i = &h5f375a86 - Bitwise.ShiftRight(i,1)
p.Integer(0) = i
y = p.Single(0)
y = y*(1.5-0.5*x*y*y)
return x * y
End Function
That runs at about 2X slower than Rb's built-in sqrt, and 3X slower
than the plugin approximation or sqrt declare (and yes, all profiling
was done in a release build). Seeing as this was academic at this
point I commented out the bit shifting line to discount the function
call overhead - it was still over twice as slow as the plugin method
despite doing less work and spitting out gibberish.
The moral of this story: If you want high-performance with large
vectors, stick the math *and* the loop into a plugin. More operators
are a start, but Rb really needs core optimizations - simple math and
pointer dereferencing is still WAY slower than equivalent C code.
Perhaps the fact that Rb doesn't really like 4-byte floats has
something to do with this, but that's just another excuse.
Frank.
<http://developer.chaoticbox.com/>
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>
|