PS2

Our main target is the PS2. Up to this point I have mainly been making algorithmic optimizations. We use SIMD in our math library and this helps with performance and code size. However, as you may know, data cache misses are killer on the PS2. So getting the best performance requires drastic measures.

For a stack of 15 boxes arranged in a pyramid shape, the cost is 45% collision and 55% physics. As I described in my GDC paper, the heart of the physics solver is the Projected Gauss Seidel constraint solver. This takes about 65% of the physics cost. So, by Amdahl’s Law, if I can double the PGS speed, I’ll get an overall boost of about 20%. Not tremendous, but it’s a good start.

The PGS algorithm suffers from data cache misses. These can be avoided by double buffering with the scratch pad. Another boost can be had by double buffering on the vector unit registers. Basically, the vector unit solves one row of the constraint equations at a time. While one row is being solved, the CPU is loading the next row into an alternate set of registers.

Cool stuff. It’s fun programming to the metal. I know I’m not treading on new ground here as Richard Tonge has written about these kind of optimizations he did for the PS2 version of MathEngine.

My question to all the physics heads out there: do optimizations on other platforms, such as PC or XBox, go to such extremes? Are the days of programming to the metal fading? Will fat L2 caches save us?

Another question: do you know the typical cost in milliseconds of an 11 body ragdoll on the PS2?

6 Responses to “PS2”

  1. Richard Says:

    I can’t really speak for PC or Xbox but I think things are going to get a whole lot more complicated on PS3 and Xbox360. Getting the best out of PS3 especially seems like it’ll require quite a bit of rethinking for most engines.

  2. Erwin Coumans Says:

    In my experience PS2 physics and collision programming is much more complicated then development for PS3.

    It sounds you haven’t been doing any PS3 work yourself, so what is your source Richard ?

    Erwin Coumans
    Sony Computer Entertainment

  3. Erin Catto Says:

    The conventional wisdom is that it is difficult to find the best way to break up the physics calculations into multiple threads. Certainly some calculations, such as object-object collision and island time stepping, can easily be threaded.

    However, some things, such as island generation and sweep-and-prune, remain serial. These serial portions of the pipeline could end up being the bottleneck.

  4. Erwin Coumans Says:

    I lack that wisdom, unfortunately ;-)

    You state the sweep and prune has to remain serial, why is that ?
    Given that the atomic action would be a swap on one axis, why can’t multiple threads all update the position of their object ?
    And if your island generation uses union find, why would
    “Wait-free Parallel Algorithms for the Union-Find Problem” not work ?

    Ok, I was a bit disturbed by the comparison XBox 360/PS3.
    I don’t see how speeding up physics on multiple cores that shares the all same cache is easier then splitting your code and data to work within 256 kb local memory for each SPU. Breaking physics calculations into multiple threads, well that has to be done anyhow. Even on PS2 if you want to make use of the VU0 efficiently, you will need to think how you spend your main cpu time during the VU0 execution.

  5. Erin Catto Says:

    Well, I think there are several obvious ways to split up the physics pipeline. But don’t tell my employer that. :)

    I don’t have any opinion yet on the whole XBox 360 vs PS3 thing. So I’ll duck that one.

    Any writable data structure that is shared, such as SAP, must be handled with care in a multiprocessor environment. Maybe there are nice solutions to the problem. But they have to be researched and implemented. So there is significant work to be done to get the most out of multiprocessor environment. However, a well structured engine should adapt well.

  6. Richard Says:

    I’ve not had much exposure to the PS3 but from what I picked up at GDC it doesn’t seem like spliting the problem up is going to be the tough part.

    Efficiently serializing your data to pass through and out of the SPU’s seems like it’s be the biggest area of change.

    How that’ll compare to PS2 work I’m not sure but it’s going to take a while to figure out.

Leave a Reply