PS2
Our main target is the PS2. Up to this point I have mainly been making algorithmic optimizations. We use SIMD in our math library and this helps with performance and code size. However, as you may know, data cache misses are killer on the PS2. So getting the best performance requires drastic measures.
For a stack of 15 boxes arranged in a pyramid shape, the cost is 45% collision and 55% physics. As I described in my GDC paper, the heart of the physics solver is the Projected Gauss Seidel constraint solver. This takes about 65% of the physics cost. So, by Amdahl’s Law, if I can double the PGS speed, I’ll get an overall boost of about 20%. Not tremendous, but it’s a good start.
The PGS algorithm suffers from data cache misses. These can be avoided by double buffering with the scratch pad. Another boost can be had by double buffering on the vector unit registers. Basically, the vector unit solves one row of the constraint equations at a time. While one row is being solved, the CPU is loading the next row into an alternate set of registers.
Cool stuff. It’s fun programming to the metal. I know I’m not treading on new ground here as Richard Tonge has written about these kind of optimizations he did for the PS2 version of MathEngine.
My question to all the physics heads out there: do optimizations on other platforms, such as PC or XBox, go to such extremes? Are the days of programming to the metal fading? Will fat L2 caches save us?
Another question: do you know the typical cost in milliseconds of an 11 body ragdoll on the PS2?