[Opensim-dev] Modifying the networking stack

Tue Nov 18 22:55:56 UTC 2014

I had run multiple IClientAPI stacks in several installations across
OpenSim versions between ~0.6.7 - 0.7.1 so if nothing has changed since
then which would prevent doing so I would believe running multiple
IClientAPI stacks simultaneously would still be possible. As mentioned in
an earlier message in this thread though, I abandoned IClientAPI as a
candidate for adding new protocols due to it's complexity, it's lack of
completeness (only LLUDP messages are routed thru it and other LL-designed
traffic such as CAPS is hard coded in core and has no equivilent interface)
and it's constant revision in core which made maintaining a non-core module
very difficult.

I've also not had any issues with running multiple extension protocols in
region mudules while simultaneously still supporting the LL protocol thru
the normal code paths.

On Tue, Nov 18, 2014 at 1:29 PM, Justin Clark-Casey <
jjustincc at googlemail.com> wrote:

> I'd like to add my multiple tuppenceworth (have been on an offline holiday
> for a few days).
>
> 1.  As other core developers have said, there are many potential issues
> with transporting time-sensitive messages over TCP rather than UDP.  And as
> Diva says, I don't believe the proportion of 'reliable' UDP messages is
> high.  At steady state, the biggest proportion is inbound AgentUpdate
> messages and outbound ImprovedTerseObjectUpdates so it will be very
> interesting if these can be largely eliminated.
>
> And again as others have said, there already are mechanisms for sending
> data over TCP via capabilities, both initiated by the viewer and via the
> server (via EventQueue polling as Dahlia mentioned [2]).  I echo Diva's
> advice that if you want to experiment with moving stuff to TCP to try this
> if possible rather than creating something new.
>
> In general, if you want to make radical changes I would suggest writing an
> alternative client stack rather than changing the existing LLUDP one.  It
> should be possible to configure the use of an alternative.  I don't think
> it's possible to load multiple stacks at once but any patches to allow this
> would be welcome from my pov.
>
> 2.  I did a lot of work to improve performance in the half year before
> OSCC 2014.  I have been up and down the UDP stack and elsewhere replicating
> and then fixing various issues.  This is currently on a separate branch
> called "ghosts" but merging this to OpenSim master is one of my next tasks.
>
> This work had the target of allowing 400 simultaneous connections on a
> clover area (4 regions in a square) with reasonable performance.  This was
> achieved with pCampbot bots.  At the event itself the peak observed
> concurrency in keynotes was 159 (I have yet to run/write scripts to extract
> data post-facto from logs though I forgot I have a start at [3]).  There
> were no reported major issues with 159 real people, movement was still fine
> in the region though admittedly almost everyone was sitting at peak.
>
> So it is possible with "ghosts" right now (and shortly "master") to run
> hundreds of real people smoothly (although this is with manual throttles
> [4], there may be an issue with adaptive at high loads).  Problems will
> occur if all avatars are moving extremely intensely (which can be simulated
> with the physics switch of pCampbot - this is one example of bot load being
> higher than 'real person' load).  At the point, both physics and outbound
> UDP queues get overloaded.  Some work could be done here to increase UDP
> send capacity, maybe with an additional sending thread when queues get
> overloaded or some clever way to eliminate some outbound UDP to reduce
> movement fidelity if queues are backed up.  As these packets are all
> ImproveTerse updates making them TCP won't help at all - it will likely
> make the situation much worse as others have said.
>
> So I would recommend you wait until I merge the "ghosts" branch code to
> "master" shortly before doing much as this incorporates the performance
> changes I made for the conference and which were proved out with actual
> real people.  Some of these were in UDP but many others were outside.  In
> many cases, they appear to be problems with OpenSim's thread-happy habits
> that break down when hundreds of connections contend for the CPU
> simultaneously rather than normal CPU capacity issues.
>
> Also, I have been documenting some parts of the stack (as can be seen in
> the references) but this is still ongoing, and to some extent this is a
> process of discovery and test.  I'm very happy to document parts on request
> if I don't get to them myself (assuming I know enough about the area in
> question).
>
> Finally, if you have a reliable way to replicate performance issues with
> just two avatars in a region I would be very interested in seeing a bug
> report.  I have no doubt that many weakness remain in handling inbound UDP
> but I am surprised if they manifest at such a low number, and even more so
> if this doesn't involve deliberate behaviour.
>
> [1] http://opensimulator.org/wiki/Capabilities
> [2] http://opensimulator.org/wiki/Event_queue
> [3] https://github.com/justincc/opensimulator-tools/tree/
> master/analysis/opensimulator-log-analyzer
> [4] http://opensimulator.org/wiki/LLUDP_ClientStack#Throttles
>
> On 15/11/14 02:07, Heilmann, Michael wrote:
>
>> Diva
>>
>> Thank you for the testing workflow, it is always helpful to see how
>> others have had success in testing.
>>
>> However, this particular endeavor is not a poorly performing grid.  We
>> would like to contribute to the scalability of
>> the OpenSim project.
>>
>> Michael Heilmann
>>
>>  Date: Fri, 14 Nov 2014 13:56:24 -0800
>>> From: Diva Canto <diva at metaverseink.com <mailto:diva at metaverseink.com>>
>>> To: opensim-dev at opensimulator.org <mailto:opensim-dev at opensimulator.org>
>>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>> Message-ID: <54667A88.4040300 at metaverseink.com <mailto:54667A88.4040300@
>>> metaverseink.com>>
>>>
>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>
>>> Michael,
>>>
>>> If I understand it correctly, the problem you are dealing with is a
>>> poorly performing grid. With proper configuration, 1 simulator running
>>> on a reasonable server should be able to handle 50 real people hanging
>>> around without showing signs of distress. That's the kind of performance
>>> we have been seeing recently at OSCC and other simulators. These days,
>>> an OpenSim simulator can easily handle 100 people removed from the
>>> physics scene (sitting). When a simulator performs poorly with 2 users,
>>> something is very wrong. My guess would be mono, but that can have other
>>> causes too (i.e. a bad kernel, inappropriate machine, etc.).
>>>
>>> Independent of configuration issues, which I can't really help with, if
>>> you want to get a systematic grasp of the performance of OpenSim,
>>> especially the network-related aspects, here's my suggestion: (this is
>>> what I did last year)
>>>
>>> 1 - Use WinGridProxy between your viewer and your grid, so to understand
>>> what the traffic really is. WireShark is the wrong tool; WinGridProxy
>>> shows you everything. Pay particular attention to AgentUpdate messages
>>> because those are, by far, the largest portion of UDP traffic from
>>> viewers to the server once the initial login phase is over.
>>>
>>> 2 - Reconfigure your bot framework to send AgentUpdates at a constant
>>> rate of at least 10/sec, or whatever you observe in step 1. Note that
>>> libomv bots may or may not send AgentUpdates at a constant rate,
>>> depending on how they are configured. That setting is
>>> Settings.SEND_AGENT_UPDATES in libomv. By default, libomv bots send
>>> 2/sec, and that is given by a timer that runs at
>>> Settings.DEFAULT_AGENT_UPDATE_INTERVAL (500ms). 2/sec is insignificant
>>> compared to what I've seen real viewers do, so if your bot framework
>>> doesn't change that setting, the results will not correlate to
>>> performance with real viewers.
>>>
>>> 3 - Measure load at the server when the bots are sitting down doing
>>> nothing (except sending the AgentUpdate messages). If the CPU increases
>>> much more than linearly with the number of bots, and you're running a
>>> version of OpenSim as of the last 12 months, then there's something
>>> wrong with the configuration of your simulator server -- kernel, mono,
>>> or opensim -- because that is not what we observe in properly configured
>>> OpenSim servers these days. It was, however, what we observed when our
>>> server had the wrong kernel that was making mono behave badly.
>>>
>>> Good luck!
>>>
>>
>>
>> _______________________________________________
>> Opensim-dev mailing list
>> Opensim-dev at opensimulator.org
>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>
>>
>
> --
> Justin Clark-Casey (justincc)
> OSVW Consulting
> http://justincc.org
> http://twitter.com/justincc
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20141118/591ca145/attachment.html>