PBD performances

Hi,

I would like to confirm the performances I have when running the PBD examples. From the literature, I expect PBD to run fast. I understand the implementation in iMSTK is on CPU, whereas the PBD papers usually run on GPU.
For example, PBD Collision Multiple Objects runs at about ~7 fps, PBD One Dragon ~30 fps, PBD Stairs ~17 fps. Are they the expected performances?
I run the examples on Windows, and I compiled them in Release mode.

Is there any way to run PBD faster?

Thanks

Those performances seem to be right. I want to note that there are different variants to the PBD solver. The one in imstk is a non-linear Gauss-Seidel solver (current constraint’s projection depends on projections of its dependent constraint). In order to parallelize GS type solver, we used a coloring approach wherein we color constraints and all the constraints of a particular color are solved in parallel.

If you desire faster simulation I would recommend a Jacobi variant (see: https://mmacklin.com/uppfrta_preprint.pdf) where each constraint is solved in parallel no matter the dependency and the updates to nodes from each is averaged. This variant is more suitable for GPU implementation. Even on CPU one should see substantial speedup with Jacobi style solver. We are currently working on GPU implementation. We are also very keen on adding the CPU version of the Jacobi variant which should be quick.

Hope this helps.

Thanks for the fast answer.

Is the graph coloring technique used in parallel when running the examples? How do you choose the number of threads?

I look forward to the GPU implementation. I’ll keep an eye on the new iMSTK features.

Thanks

Yes within each color the constraints are solved in parallel. You can set the number of threads in the thread pool across the application by calling ThreadManager::setThreadPoolSize(const size_t nThreads) function.

Let us know if you need help with anything else

1 Like

Also worth mentioning it will, by default, choose the “optimal” number of threads when spawning various parallel loops. Which is often just equivalent to the number of cores on the system (or x2 with hyperthreading).

1 Like