On Mar 25, 2006, at 3:42 PM, Frank Condello wrote:
On 25-Mar-06, at 1:07 PM, Mike Woodworth wrote:
On Mar 25, 2006, at 5:19 AM, Frank Condello wrote:
On 25-Mar-06, at 4:07 AM, Mike Woodworth wrote:
i have a vector class that wraps vDSP calls on the mac. i'm
trying to finsh up a few small things on the vImage side... but
I can prolly get the vDSP math stuff up on my website this weekend.
Looking forward to seeing it :) but I wonder how useful that'll
be for us 3D folks. I'll admit I'm not up to speed with the
Accelerate framework (pun intended) but padding out 3D vectors to
16 byte boundaries is usually more of a headache than doing the
scalar math yourself. That's not to say there aren't other areas
were this would be handy...
i donno, we'll have to see. i admit, ive done very little with
rb3d, so you may be right. but i've found vdsp to be drastically
faster than math in a loop, and not that much harder to work with
(besides the need to restructure you loop logic sometimes).
perhaps, you can send me a sample of this math done as scalar
operations, and i'll port it to my class? then we can see if
there's any advantage.
Lo's example should do:
newvec = veca + (vecb - veca) * scalar
Where a "vec" is an array of 3D vectors; 12 bytes, 3 floats per
member. To get this on a vector unit you'd need to pad out the
members to 16 bytes, so there's a few options.
1) Use altivec/SIMD instructions directly in a loop and pad each
vector as they come, repacking each one on the way out.
2) Pad the arrays on a whole, pass'em to vDSP, then repack'em.
3) Only use 4D vectors and vDSP functions (just ensure the "W"
component is 0).
Option #1 is a bit of work, and you'd need support code to check
for the presence of vector units, and have scalar fallbacks etc.
Option #2 may be less efficient than just using scalar code in the
first place, and option #3 is only appropriate if the 3D API can
handle it - this may not be of much use with Rb3D/Quesa but might
work out if you're using OpenGL directly for example.
I gotta admit I'm getting in a little over my head here :)
so all 3 floats have the same operation performed? so vecb
{xa1,ya1,za1,xa2,ya2,za3,...} + veca{xb1,yb1,zb1,xb2,yb2,zb2} = {xa1
+xb1,ya1+yb1,za1+zb1,xa2+xb2,ya2+yb2,za2+zb2} ?
if so, option 2 will likely be *much* faster than scalar math in rb
loops. sure, it might not be much faster than a plugin, but it will
still likely be on par. my time tests on similar vectorized code
usually shows a 10x speed increase.
mike
--
Mike Woodworth
mike at divergentmedia dot com
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
|