LibSecondLife performance problems

During testing we found that OpenSimulator has performance problems, such as denied logins, even though CPU and network load appears to be within reasonable limits. I have checked the code and done some profiling to find the causes. The most clear problem appears to be the way packet handling is done in the ClientStack and libsecondlife.

Current Design
libSL is designed to wrap socket messages in objects. When the server receives a message in UPDServer it uses Packet.BuildPacket to make a new object, and sends it to the correct ClientView, which assembles this information into an event that changes the state of the simulation. New messages are created by instantiating a new object, and passing it onto the output queue. UPDServer will use the Packet.ToBytes function to get the socket representation.

The problem is that we create a new object (and a queue wrapper object, and perhaps other stuff as well) for each message, and we only use this information for a short time before throwing it away.

Experiments
In earlier experiments it was found that 300-400 GUI-less clients could log onto a system with basic physics and the default (empty) world, before new logins were denied. In more realistic tests only about 30 people could login at the same time. In both cases the CPU:s were less than 60% used, and the network was fine. All tests were run in stand-alone mode.

To investigate the communication I used the heap-buddy profiler and some custom scripts on the realistic world (about 7000 prims). The first experiment used 2 clients. Figure 1 shows the number of messages/10 seconds, and with client 1 login in at 10.00, and client 2 login in at 12.15. It appears that a new client login results in approx. 20,000 outgoing messages (all the scene data). In addition, the clients already present also get new updates which increases the message volume further (about 25,000 outgoing messages are generated when client 2 logs in). The number of incoming messages is small in comparison.



The socket size for each message is small; 120 bytes for incoming and 280 bytes for outgoing on average. This means that a new client will generate about 5MB worth of data traffic. However, the current model requires that we create at least two new objects (a packet object and a queue wrapper) for each packet. The size of the objects is hard to find out in C#, but for instance the RegionHandshakePacket object has about 1kB of data. If this is a typical size, then sending 20,000 messages means that we allocate about 20GB for objects that we simply throw away, and even if the actual value is much lower it should be enough to keep the garbage collector busy.





Figure 2 shows garbage collection and heap allocations, while figure 3 shows the garbage collection events and the packet activity. Initially there is much garbage collection caused by the application repeatedly adjusting the heap space (see Figure 2). Then not much is happening until an client connects. This causes a flood of garbage collections and heap allocations until the heap reaches in maximum allowed size. Now, adding an client shouldn't increase the data in the server that much (only add an avatar object and a ClientView), so it is fairly clear that the memory waste caused by the object wrapping of messages is to blame. When the second client attaches the heap is already at its maximum size, and a number of garbage collection events are needed to reclaim the lost heap space. In the final phase the two clients are being used, but the message amount is small and the garbage collection events less frequent.

My data from the big tests on dedicated servers with 30 or more real clients is incomplete, but the garbage collection pattern is shown in Figure 4 (below). Although I cannot compare the pattern to message activity, the figure shows a similar clustering behaviour. The difference is that the servers have more memory, and can increase their heap size. However, once they run out of heap space, they should show a similar delay in the message handling which will eventually lead to a client time-out and login failure.



How to fix it
I was going to draw some UML diagrams but the tools available through yum on Fedora suck. I'll try to explain in words and pseudo code instead. This idea can surely be improved upon, so let me know if you have any suggestions.

The main problem is that we have to reuse packages. I suggest adding a QueueElementPool class, which keeps a collection of QueueElements. QueueElement contains a packet object and some data needed by the ClientView (e.g. if we are waiting for an acknowledgment message).

QueueElementPool has two public methods: GetElement(type) and ReturnElement. GetElement will give a free element (with a packet of the given type), or create a new element of no element is available. ReturnElement will receive queue elements that are no longer needed and store them for future GetElementRequests. We only need one instance of QueueElementPool, since it will allow us to share the free QueueElements across all views.

libSL packets calls are written in this pattern (see also libsecondlife/trunk/data/message_template.msg)

and used like

This means that we can reuse old objects for message sending if QueueElementPool keeps track of the object type.

However, when the server receives a message it uses the Packet.BuildPacket method, which examines the message and instantiates a Packet of the correct type automatically,passing the raw message bytes as an argument to the constructor.

Instead the work done by the packet constructor should be done in a FromBytes function. This can be called from the constructor, so there will be no change for other applications. However, it will allow us to avoid re-instantiation by using a construct like this in the server:

When ClientView has decoded the message it returns the package to the QueueElementPool. When ClientView wants to send a message it gets a QueueElement of the correct type from QueueElementPool, and packs the message like before. Once UDPServer has passed the message to the socket it can return the QueueElement to the QueueElementPool.

This design allows reuse and avoids unnecessary object instantiations, while letting us continue to use OpenSimulator and libSL with limited modifications. The biggest work will be patching libSL.