Increasing Performance in OpenSim
What is this wiki page for?
This page is a place to gather performance data to drive forward the effort of improving OpenSim's performance in all platforms. This means decreasing CPU and memory usage, decreasing memory leaks, increasing number of avatars per region, performance analysis between different modules, physics engines, frameworks, etc. If you run any profiling work, performance patches, bots stress testing and any other work that has the objective to analyze or improve OpenSim's efficiency, please post your work and results. Try to be as scientific as possible, meaning, make it reproducible and comparable.
If instead you want to read about current tips for improving OpenSimulator performance, then please go to the Performance page.
Note - the information on here may be hopelessly outdated - I haven't had a chance to look through it yet (justincc).
Identified Performance Bottlenecks
Sending Packets Out
This seems to be the current bottleneck in OpenSimulator. What happens here is OpenSimulator does not scale its capability to send packets to multiple users too well. With around 30 avatars (after latest performance patches, used to be 20) the PacketQueue, the OpenSimulator "tubes", gets clogged (old youtube memories, but perfect analogy). The big problem with using Queues for packets is network data tends to have a fixed expiration date that does not scale as OpenSimulator starts to have too many requests. Eventually, the queue is so filled up that when a viewer sends a request, it expires before reaching the top of the queue. This makes the viewer request it again, thus ignoring the old request reply IF it ever comes.
This is a cycle that only ends when all viewers starve of ping timeout and get disconnected. You look at the console and see OpenSimulator desperately trying to catch up, but no exceptions. It turns out it's desperately trying to catch up on packets, but never quite making it.
Textures Are So Important?
OpenSimulator has some priority mechanisms to make sure important packets get sent, but this doesn't seem to work as well as it sounds. Viewers are incredibly aggressive and insistent texture requirers. So when a NEW viewer (one with no cache) visits a region with a lot of textures, this viewer receives all the NewPrim and NewAvatar announcements from OpenSimulator (which comes in a bundle very fast) and starts requesting all textures from prims and avatars. Each request generates many ImagePackets in return (sometimes 10, 50 more). When OpenSimulator reached 21 avatars, it simply couldn't handle all those texture requests and went into a cycle that led to all viewers timing out.
So it's obviously not worth it to let all your clients die just because you can't satisfy their texture needs. Recent change introduced a separate timer for textures, which makes texture sending potentially slower (potentially, after all, less packets, increased bandwidth, less sending time for each texture), but allows the server a breather to keep the clients alive.
The best idea would probably be a self regulating queue. If it could monitor it's size and delay on processing it, it could start refusing non-essential packets, such as textures, clouds, etc., if it thinks this will affect the ability to respond in time to live checking from clients. Another idea (by sdague) is to stop serving the clients with NewPrim and NewAvatar packets so fast. Instead, give them slowly, in a speed that OpenSimulator can handle. This could also be auto-adjusted, which would allow very fast texture download from a single client and sane texture downloading from multiple clients.
One of the most CPU consuming jobs in OpenSimulator is the physics engine. It's still way behind the OutPacket bottleneck, but it would be the next thing to work with. I (Arthur V.) ran few a tests with bots colliding with each other and with the objects inworld. Tests were run with 10 bots for 5 minutes on standalone region running on my own machine, with whc scheme (walk around a center) and no scripts (that influences the CPU usage quite a lot at the start of the opensim instance).
prof counts: total/unmanaged: 26571/19250 3224 12.13 % 2915 10.97 % mono(mono_struct_delete_old 2777 10.45 % /lib/tls/i686/cmov/librt.so.1(clock_gettime 2211 8.32 % mono(mono_type_to_unmanaged 1337 5.03 % libode.so 902 3.39 % OpenSim.Region.Framework.Scenes.TerrainUtil:Noise (double,double) 802 3.02 % System.Threading.Timer:SchedulerThread () 684 2.57 % mono(GC_mark_from
Looking at this profile dump you will see libode reached a count of 1337 (l33t?) hits, a total of 5% of total hits for this run, followed by TerrainUtil:Noise with 3% (this seems to be present in most of the profilings). libode was the highest OpenSimulator related library, behind only by garbage collection, mono and librt (clock get time? Not sure what it is, probably clock methods such as Environment.TickCount).
Looking at server stats, I saw an average of 2,5% OpenSimulator Sys CPU and 11.5% OpenSimulator User CPU (I think this is /2 because I'm on a dual core, it would make more sense). Looking at top I saw an average of 30% CPU usage.
prof counts: total/unmanaged: 19762/13931 4186 21.18 % 3362 17.01 % /lib/tls/i686/cmov/librt.so.1(clock_gettime 754 3.82 % System.Threading.Timer:SchedulerThread () 720 3.64 % mono(GC_mark_from 512 2.59 % System.Collections.Hashtable/Enumerator:MoveNext () 275 1.39 % /lib/tls/i686/cmov/libpthread.so.0 265 1.34 % /lib/tls/i686/cmov/libpthread.so.0(pthread_cond_signal 239 1.21 % /lib/tls/i686/cmov/libc.so.6(memset 215 1.09 % mono(mono_array_new_specific 202 1.02 % /lib/tls/i686/cmov/libpthread.so.0(pthread_mutex_lock 184 0.93 % /lib/tls/i686/cmov/libpthread.so.0(pthread_cond_broadcast 170 0.86 % OpenMetaverse.Helpers:ZeroEncode (byte,int,byte) 141 0.71 % mono(GC_local_malloc_atomic 137 0.69 % mono(GC_local_gcj_malloc 137 0.69 % (wrapper alloc) object:Alloc (intptr) 117 0.59 % System.Collections.Hashtable:PutImpl (object,object,bool) 112 0.57 % OpenSim.Region.Framework.Scenes.SceneObjectGroup:UpdateMovement ()
Here the first hit for OpenSimulator is in OpenSim.Region.Framework.Scenes.SceneObjectGroup:UpdateMovement, and it has 112 hits (0.57% of total). It seems to be apparent how hard libode is comparing to regular OpenSimulator usage. It's pretty amazing to see that same library (/lib/tls/i686/cmov/librt.so.1(clock_gettime) now consuming 17.01% out of total and 3362 hits.
Server stats shows 2% OpenSimulator sys CPU and 4% OpenSIm User CPU (this is probably /2). On top, I saw an average of 22% CPU usage.
prof counts: total/unmanaged: 22013/13728 4202 19.09 % 3272 14.86 % /lib/tls/i686/cmov/librt.so.1(clock_gettime 855 3.88 % OpenMetaverse.Vector3:op_Multiply (OpenMetaverse.Vector3,OpenMetaverse.Quaternion) 759 3.45 % System.Threading.Timer:SchedulerThread () 733 3.33 % mono(GC_mark_from 593 2.69 % System.Collections.Hashtable/Enumerator:MoveNext () 495 2.25 % OpenSim.Region.Physics.POSPlugin.POSScene:isColliding (OpenSim.Region.Physics.POSPlugin.POSCharacter,OpenSim.Region.Physics.POSPlugin.POSPrim) 402 1.83 % OpenMetaverse.Quaternion:Inverse (OpenMetaverse.Quaternion)
I ran this just to have a look if I had any interesting results. Seems better then ODE, but not as good as basic. We see the librt very high again and the first reference to POS seems to be in OpenSim.Region.Physics.POSPlugin.POSScene:isColliding, with 495 hits (2.25% of total) and some OpenMetaverse methods showing up as well.
Serverstats shows 4% OpenSimulator Sys CPU and 10 % OpenSimulator User CPU. On top, average of 26% Cpu Usage.
The difference wasn't as big as I was expecting. Basic physics managed near 2/3 of ODE processing, which seems pretty efficient, considering all that it does. POS has even lower CPU consumption difference comparing to ODE. Aside the fact ODE sometimes crashes due to it's increased complexity, there doesn't seem to be a reason to avoid ODE if you are searching for performance increase.
ScienceSim — ScienSim blog on profiling and performance patches.