Revision as of 09:44, 10 September 2013

Basic Information
Proposal Name	BulletSim GPU Acceleration Using OpenCL
Date Proposed	September 10, 2013
Status	Draft
Proposer	User:Allquixotic

Introduction and Problem Statement

Right now, physical object movement and collision consumes significant CPU time for OpenSimulator. There are several popular sims that have either disabled physics completely, or severely restricted its use to the bare minimum. Use cases of OpenSimulator that desire to use lots of physical objects, collide them together in interesting ways, etc. will quickly peg their CPU, which slows down the simulator FPS, introduces time dilation, and increases network latency due to the CPU being pegged. This is observable even on high-end, current-generation single processor systems (e.g. Core i7 4770K), on any platform. Even on hardware that can easily handle a large number of objects, this problem negatively impacts region "density" (how many regions you can fit on a single server).

Physics-based simulations and interactions may continue to be optimized on the CPU by switching from the OpenDynamicsEngine physics backend to the the Bullet Physics Engine, because Bullet is significantly faster and more optimized than ODE. However, even Bullet has its upper limit of capabilities on the CPU. Furthermore, there is a certain tradeoff to be made between accuracy/precision and speed, as depicted in the following table. The point is, by raising our computational ceiling, we can either achieve better precision, larger scale (more or more complex objects that are physical), or some tradeoff improving both aspects to a lesser degree. The other point is that GPU acceleration is currently the most cost-sensitive way to raise the performance ceiling (can you afford a supercomputer?).

Caveat: I am well aware that the estimates of scale below for the number of prims is potentially inaccurate by several orders of magnitude up or down. These are rough estimates. I am also aware that different types of physics interactions and different shapes have vastly different computational costs; for instance, it is much easier to calculate collision between two cubes compared to two toruses (torii?). The general point, however, remains valid.

Physics Tradeoffs
Precision	# Physical Prims	Performance On Commodity Hardware	Hardware Class For Acceptable Performance
Baseline: Yay, we have physics! ...somewhat.
Poor	Few (10s)	Excellent	Laptop, netbook, or embedded
Typical (CPU-only BulletSim or ODE on desktops, single-CPU servers, etc.)
Good	Few (10s)	Acceptable	Commodity desktop or small server
Poor	Many (hundreds or thousands)	Acceptable	Commodity desktop or small server
Not currently possible for most individuals and small businesses
Good	Many (hundreds or thousands)	Poor	Large server (multi-CPU) OR GPU-accelerated
Poor	Hundreds of thousands or millions	Poor	Large server (multi-CPU) OR GPU-accelerated
Possible in the future with GPU acceleration
Extreme	Many (hundreds or thousands)	Awful	High-end GPU or multiple GPUs; impractical with non-supercomputer CPUs
Good	Hundreds of thousands or millions	Awful	High-end GPU or multiple GPUs; impractical with non-supercomputer CPUs

Proposal

Since we already have a physics backend that uses Bullet Physics Engine, and since Bullet upstream itself is developing a GPU-accelerated physics pipeline, "all" we have to do is to take advantage of that pipeline in our code. Successful implementations will notice reduced CPU usage, the possibility of increased region density, or the ability to remove restrictions on tenants, like, "make sure you have no more than 10 physical objects at any time" (for example).

General Observations

In recent years, both Intel and AMD have started shipping desktop and server CPUs which contain an IGPU (Integrated Graphics Processing Unit).
The capabilities of IGPUs have been rapidly accelerating -- in fact, they have been increasing at a much higher rate than the CPU part of the chip. GPU performance is still roughly following Moore's Law, while CPU performance is leveling off in a huge way.
Many computers running OpenSimulator, whether a spare desktop in someone's home or a dedicated server in a datacenter, have either an IGPU or a DGPU (Discrete Graphics Processing Unit) which is, to a greater or lesser extent, underutilized -- meaning that the resources are available, but are sitting entirely or mostly idle.
The state of the art of graphics drivers has advanced significantly, to the point that IGPUs and DGPUs by AMD, Intel and Nvidia have available OpenCL 1.1 implementations on major platforms (Windows, Mac, and Linux).
The Bullet Physics Engine development community is gradually shifting their own focus towards improving and optimizing Bullet for GPUs, and offering more physics operations being accelerated by the GPU rather than the CPU.
While Bullet supports DirectX and CUDA to various extents (these are also APIs to access the GPU), it also supports OpenCL. OpenCL is one of, if not the only industry-standard, general-purpose GPU computing language that is available on all the platforms that OpenSimulator officially supports (Windows, Mac and Linux), and on all the major GPU vendor hardware (Intel, AMD, and Nvidia).

Conclusion

Enabling GPU-accelerated Bullet physics via OpenCL is the obvious path forward to unlocking the next level of scalability and/or precision of physics simulations in OpenSimulator.

Scope of Work

Brainstorming

If you can answer any of these questions, please edit this page and fill in an answer beneath each question!

Let's assume that we're going to target the latest Bullet code from version control, and evolve our code to match as Bullet evolves. We should do this at least until Bullet 3.0 is released as stable, because major improvements to the OpenCL rigid body pipeline are available in git that are not in the latest stable release as of this writing.
What parts of Bullet are currently GPU-accelerated in the 3.x preview codebase?
Of the parts of Bullet that are GPU-accelerated, does OpenSimulator use any of them?
What configuration or API usage changes are required of OpenSimulator's use of Bullet in order to use the GPU-accelerated features?
1. Can we simply enable Bullet's GPU acceleration by changing a configuration setting, or an initialization flag?
2. Do we have to use entire new classes in the Bullet C++ API to use the GPU-accelerated features?
3. Do we have to change our entire approach to using Bullet in the BulletSim physics backend to use the GPU-accelerated features?
What hardware and software configurations should we test on?
What is the minimum hardware specification that would actually yield a performance improvement over the CPU pipeline?
Even if we successfully accelerate Bullet to a high degree, are we still bottlenecked by disk, memory bandwidth, locking primitives in OpenSimulator, or the .NET runtime, inhibiting our scalability past a certain point? If so, how far can we go before we hit this wall? Does GPU acceleration buy us anything, or are we already against that wall with the capabilities of CPU-based physics?

GPU OpenCL Support

Here is just a table describing the current state of GPU-accelerated OpenCL support on various platforms and hardware. I choose to arbitrarily define the following:

"Windows" as "Windows Vista SP2 or later, 32 or 64-bit";
"Mac" as "Mac OS X 10.7 or later";
"Linux Proprietary" as "Linux 2.6.32 or later on RHEL 6+, Debian 6+, Ubuntu 12.04+, Fedora 17+, OpenSUSE 12.1+, with a closed-source vendor-supplied GPU kernel module";
"Linux FOSS" as "Linux 3.10 or later on the latest stable Fedora, Ubuntu, or OpenSUSE, or Debian Testing, with an open-source, Mesa or Gallium3d graphics driver".

Vendor	Minimum Hardware Generation	Windows	Mac	Linux Proprietary	Linux FOSS
AMD	Radeon HD5000 or later FirePro W600 or later	Fully Supported	Fully Supported	Fully Supported	Experimental
Nvidia	GeForce 400 Series or later Quadro 400 Series or later Tesla	Fully Supported	Fully Supported	Fully Supported	Experimental
Intel	"Ivy Bridge" Desktop/Mobile (Core i3/i5/i7 3xxx) or later "Ivy Bridge" Server (Xeon 1275v2) or later "Knights Corner" (Xeon Phi) or later Excluding enthusiast and multi-CPU chips (Ivy-E, etc.)	Fully Supported	Fully Supported	N/A	Experimental

@@ Line 98: / Line 98: @@
 If you can answer any of these questions, please [http://opensimulator.org/index.php?title=Feature_Proposals/BulletSim_OpenCL&action=edit edit this page] and fill in an answer beneath each question!
-. Let's assume that we're going to target the latest Bullet code from version control, and evolve our code to match as Bullet evolves. We should do this ''at least'' until Bullet 3.0 is released as stable, because major improvements to the OpenCL rigid body pipeline are available in git that are not in the latest stable release as of this writing.
+# Let's assume that we're going to target the latest Bullet code from version control, and evolve our code to match as Bullet evolves. We should do this ''at least'' until Bullet 3.0 is released as stable, because major improvements to the OpenCL rigid body pipeline are available in git that are not in the latest stable release as of this writing.
-. What parts of Bullet are currently GPU-accelerated in the 3.x preview codebase?
+# What parts of Bullet are currently GPU-accelerated in the 3.x preview codebase?
-. Of the parts of Bullet that are GPU-accelerated, does OpenSimulator use any of them?
+# Of the parts of Bullet that are GPU-accelerated, does OpenSimulator use any of them?
-. What configuration or API usage changes are required of OpenSimulator's use of Bullet in order to use the GPU-accelerated features?
+# What configuration or API usage changes are required of OpenSimulator's use of Bullet in order to use the GPU-accelerated features?
-(a). Can we simply enable Bullet's GPU acceleration by changing a configuration setting, or an initialization flag?
+## Can we simply enable Bullet's GPU acceleration by changing a configuration setting, or an initialization flag?
-(b). Do we have to use entire new classes in the Bullet C++ API to use the GPU-accelerated features?
+## Do we have to use entire new classes in the Bullet C++ API to use the GPU-accelerated features?
-(c). Do we have to change our entire approach to using Bullet in the BulletSim physics backend to use the GPU-accelerated features?
+## Do we have to change our entire approach to using Bullet in the BulletSim physics backend to use the GPU-accelerated features?
-. What hardware and software configurations should we test on?
+# What hardware and software configurations should we test on?
-. What is the minimum hardware specification that would actually yield a performance improvement over the CPU pipeline?
+# What is the minimum hardware specification that would actually yield a performance improvement over the CPU pipeline?
-. Even if we successfully accelerate Bullet to a high degree, are we still bottlenecked by disk, memory bandwidth, locking primitives in OpenSimulator, or the .NET runtime, inhibiting our scalability past a certain point? If so, how far can we go before we hit this wall? Does GPU acceleration buy us anything, or are we already ''against'' that wall with the capabilities of CPU-based physics?
+# Even if we successfully accelerate Bullet to a high degree, are we still bottlenecked by disk, memory bandwidth, locking primitives in OpenSimulator, or the .NET runtime, inhibiting our scalability past a certain point? If so, how far can we go before we hit this wall? Does GPU acceleration buy us anything, or are we already ''against'' that wall with the capabilities of CPU-based physics?
 = GPU OpenCL Support =

Views

Feature Proposals/BulletSim OpenCL

From OpenSimulator

Revision as of 09:44, 10 September 2013

Contents

Introduction and Problem Statement

Proposal

General Observations

Conclusion

Scope of Work

Brainstorming

GPU OpenCL Support

Personal tools

General

For Administrators

For Developers

For Creators

For Grid Users

Related Links

About This Wiki

Search

Tools