[Opensim-dev] Modifying the networking stack
Justin Clark-Casey
jjustincc at googlemail.com
Tue Nov 18 21:29:05 UTC 2014
I'd like to add my multiple tuppenceworth (have been on an offline holiday for a few days).
1. As other core developers have said, there are many potential issues with transporting time-sensitive messages over
TCP rather than UDP. And as Diva says, I don't believe the proportion of 'reliable' UDP messages is high. At steady
state, the biggest proportion is inbound AgentUpdate messages and outbound ImprovedTerseObjectUpdates so it will be very
interesting if these can be largely eliminated.
And again as others have said, there already are mechanisms for sending data over TCP via capabilities, both initiated
by the viewer and via the server (via EventQueue polling as Dahlia mentioned [2]). I echo Diva's advice that if you
want to experiment with moving stuff to TCP to try this if possible rather than creating something new.
In general, if you want to make radical changes I would suggest writing an alternative client stack rather than changing
the existing LLUDP one. It should be possible to configure the use of an alternative. I don't think it's possible to
load multiple stacks at once but any patches to allow this would be welcome from my pov.
2. I did a lot of work to improve performance in the half year before OSCC 2014. I have been up and down the UDP stack
and elsewhere replicating and then fixing various issues. This is currently on a separate branch called "ghosts" but
merging this to OpenSim master is one of my next tasks.
This work had the target of allowing 400 simultaneous connections on a clover area (4 regions in a square) with
reasonable performance. This was achieved with pCampbot bots. At the event itself the peak observed concurrency in
keynotes was 159 (I have yet to run/write scripts to extract data post-facto from logs though I forgot I have a start at
[3]). There were no reported major issues with 159 real people, movement was still fine in the region though admittedly
almost everyone was sitting at peak.
So it is possible with "ghosts" right now (and shortly "master") to run hundreds of real people smoothly (although this
is with manual throttles [4], there may be an issue with adaptive at high loads). Problems will occur if all avatars
are moving extremely intensely (which can be simulated with the physics switch of pCampbot - this is one example of bot
load being higher than 'real person' load). At the point, both physics and outbound UDP queues get overloaded. Some
work could be done here to increase UDP send capacity, maybe with an additional sending thread when queues get
overloaded or some clever way to eliminate some outbound UDP to reduce movement fidelity if queues are backed up. As
these packets are all ImproveTerse updates making them TCP won't help at all - it will likely make the situation much
worse as others have said.
So I would recommend you wait until I merge the "ghosts" branch code to "master" shortly before doing much as this
incorporates the performance changes I made for the conference and which were proved out with actual real people. Some
of these were in UDP but many others were outside. In many cases, they appear to be problems with OpenSim's
thread-happy habits that break down when hundreds of connections contend for the CPU simultaneously rather than normal
CPU capacity issues.
Also, I have been documenting some parts of the stack (as can be seen in the references) but this is still ongoing, and
to some extent this is a process of discovery and test. I'm very happy to document parts on request if I don't get to
them myself (assuming I know enough about the area in question).
Finally, if you have a reliable way to replicate performance issues with just two avatars in a region I would be very
interested in seeing a bug report. I have no doubt that many weakness remain in handling inbound UDP but I am surprised
if they manifest at such a low number, and even more so if this doesn't involve deliberate behaviour.
[1] http://opensimulator.org/wiki/Capabilities
[2] http://opensimulator.org/wiki/Event_queue
[3] https://github.com/justincc/opensimulator-tools/tree/master/analysis/opensimulator-log-analyzer
[4] http://opensimulator.org/wiki/LLUDP_ClientStack#Throttles
On 15/11/14 02:07, Heilmann, Michael wrote:
> Diva
>
> Thank you for the testing workflow, it is always helpful to see how others have had success in testing.
>
> However, this particular endeavor is not a poorly performing grid. We would like to contribute to the scalability of
> the OpenSim project.
>
> Michael Heilmann
>
>> Date: Fri, 14 Nov 2014 13:56:24 -0800
>> From: Diva Canto <diva at metaverseink.com <mailto:diva at metaverseink.com>>
>> To: opensim-dev at opensimulator.org <mailto:opensim-dev at opensimulator.org>
>> Subject: Re: [Opensim-dev] Modifying the networking stack
>> Message-ID: <54667A88.4040300 at metaverseink.com <mailto:54667A88.4040300 at metaverseink.com>>
>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>> Michael,
>>
>> If I understand it correctly, the problem you are dealing with is a
>> poorly performing grid. With proper configuration, 1 simulator running
>> on a reasonable server should be able to handle 50 real people hanging
>> around without showing signs of distress. That's the kind of performance
>> we have been seeing recently at OSCC and other simulators. These days,
>> an OpenSim simulator can easily handle 100 people removed from the
>> physics scene (sitting). When a simulator performs poorly with 2 users,
>> something is very wrong. My guess would be mono, but that can have other
>> causes too (i.e. a bad kernel, inappropriate machine, etc.).
>>
>> Independent of configuration issues, which I can't really help with, if
>> you want to get a systematic grasp of the performance of OpenSim,
>> especially the network-related aspects, here's my suggestion: (this is
>> what I did last year)
>>
>> 1 - Use WinGridProxy between your viewer and your grid, so to understand
>> what the traffic really is. WireShark is the wrong tool; WinGridProxy
>> shows you everything. Pay particular attention to AgentUpdate messages
>> because those are, by far, the largest portion of UDP traffic from
>> viewers to the server once the initial login phase is over.
>>
>> 2 - Reconfigure your bot framework to send AgentUpdates at a constant
>> rate of at least 10/sec, or whatever you observe in step 1. Note that
>> libomv bots may or may not send AgentUpdates at a constant rate,
>> depending on how they are configured. That setting is
>> Settings.SEND_AGENT_UPDATES in libomv. By default, libomv bots send
>> 2/sec, and that is given by a timer that runs at
>> Settings.DEFAULT_AGENT_UPDATE_INTERVAL (500ms). 2/sec is insignificant
>> compared to what I've seen real viewers do, so if your bot framework
>> doesn't change that setting, the results will not correlate to
>> performance with real viewers.
>>
>> 3 - Measure load at the server when the bots are sitting down doing
>> nothing (except sending the AgentUpdate messages). If the CPU increases
>> much more than linearly with the number of bots, and you're running a
>> version of OpenSim as of the last 12 months, then there's something
>> wrong with the configuration of your simulator server -- kernel, mono,
>> or opensim -- because that is not what we observe in properly configured
>> OpenSim servers these days. It was, however, what we observed when our
>> server had the wrong kernel that was making mono behave badly.
>>
>> Good luck!
>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>
--
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc
More information about the Opensim-dev
mailing list