[Opensim-dev] Modifying the networking stack

Fri Nov 14 21:56:24 UTC 2014

Michael,

If I understand it correctly, the problem you are dealing with is a 
poorly performing grid. With proper configuration, 1 simulator running 
on a reasonable server should be able to handle 50 real people hanging 
around without showing signs of distress. That's the kind of performance 
we have been seeing recently at OSCC and other simulators. These days, 
an OpenSim simulator can easily handle 100 people removed from the 
physics scene (sitting). When a simulator performs poorly with 2 users, 
something is very wrong. My guess would be mono, but that can have other 
causes too (i.e. a bad kernel, inappropriate machine, etc.).

Independent of configuration issues, which I can't really help with, if 
you want to get a systematic grasp of the performance of OpenSim, 
especially the network-related aspects, here's my suggestion: (this is 
what I did last year)

1 - Use WinGridProxy between your viewer and your grid, so to understand 
what the traffic really is. WireShark is the wrong tool; WinGridProxy 
shows you everything. Pay particular attention to AgentUpdate messages 
because those are, by far, the largest portion of UDP traffic from 
viewers to the server once the initial login phase is over.

2 - Reconfigure your bot framework to send AgentUpdates at a constant 
rate of at least 10/sec, or whatever you observe in step 1. Note that 
libomv bots may or may not send AgentUpdates at a constant rate, 
depending on how they are configured. That setting is 
Settings.SEND_AGENT_UPDATES in libomv. By default, libomv bots send 
2/sec, and that is given by a timer that runs at 
Settings.DEFAULT_AGENT_UPDATE_INTERVAL (500ms). 2/sec is insignificant 
compared to what I've seen real viewers do, so if your bot framework 
doesn't change that setting, the results will not correlate to 
performance with real viewers.

3 - Measure load at the server when the bots are sitting down doing 
nothing (except sending the AgentUpdate messages). If the CPU increases 
much more than linearly with the number of bots, and you're running a 
version of OpenSim as of the last 12 months, then there's something 
wrong with the configuration of your simulator server -- kernel, mono, 
or opensim -- because that is not what we observe in properly configured 
OpenSim servers these days. It was, however, what we observed when our 
server had the wrong kernel that was making mono behave badly.

Good luck!

On 11/14/2014 9:11 AM, Michael Heilmann wrote:
> I agree completely with the fact that many packets are time-sensitive 
> and expire quickly, which is why I was looking for a TCP/UDP pair, not 
> dropping UDP in favour of TCP.  I am concerned, however, that UDP is 
> the only mechanism for asynchronously sending information to the 
> client that may be important, and not expire quickly.
>
> We regularly deal with users who have less then reliable networks, 
> such as cellular wifi access points at remote locations, and cannot 
> assume that our packet loss will be below 1%.
> -- 
> Michael Heilmann
> Research Associate
> Institute for Simulation and Training
> University of Central Florida
>
>
>> Date: Fri, 14 Nov 2014 08:30:07 -0800
>> From: Mic Bowman<cmickeyb at gmail.com>
>> To:opensim-dev at opensimulator.org
>> Subject: Re: [Opensim-dev] Modifying the networking stack
>> Message-ID:
>>          <CAJaF1_HXXSTFj2KhxDXH4-f_8aSYnWon+QON-NWUn1DommLh4A at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> having done a lot of the token bucket & retransmission work in opensim... i
>> started with a strong belief that TCP & reliable streams were the right way
>> to do object updates (for all bulk content like asset transfers, tcp, or
>> http layered on tcp, is definitely the right way). however... what we found
>> is that object updates should not be retransmitted at the packet level
>> which is what you get from tcp (that is... tcp handles retransmission of
>> dropped packets not opensim). the reason is pretty simple... dynamic
>> objects change position repeatedly (even if only in short bursts).
>> retransmission at the packet level would almost always retransmit*OLD*
>> state. for any kind of connection (this was often a problem for
>> international connections) where packet drops were even as high as 1%, we
>> would see multiple updates for the same object queued to send. only the
>> last one has any meaning.
>>
>> so... while the situations where application level retransmission are rare
>> (*very*  rare)... object updates happens to be one of them. moving
>> reliability into a lower level (either layering some protocol on tcp or os
>> implemented reliable udp) will result in lower efficiency for faulty
>> connections (and if your connection isnt faulty then udp works just fine).
>>
>> --mic
>>
>>
>> On Fri, Nov 14, 2014 at 7:17 AM, Amit Goel<Amit.Goel at ucf.edu>  wrote:
>>
>>> Agree with Michael here that having TCP is better then using UDP with your
>>> own homegrown TCP implementation on top of it in application software.
>>>
>>> How about Reliable UDP :
>>>
>>> http://www.streamingmedia.com/Articles/Editorial/Featured-Articles/Reliable-UDP-(RUDP)-The-Next-Big-Streaming-Protocol-85316.aspx
>>> http://www.javvin.com/protocolRUDP.html
>>>
>>> I have not studied it in detail that how it fits in between lossy UDP and
>>> time-consuming TCP.
>>>
>>> Regards
>>>
>>> -- amit
>>> ________________________________________
>>> From:opensim-dev-bounces at opensimulator.org  [
>>> opensim-dev-bounces at opensimulator.org] on behalf of Michael Heilmann [
>>> mheilman at ist.ucf.edu]
>>> Sent: Friday, November 14, 2014 9:23 AM
>>> To:opensim-dev at opensimulator.org
>>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>>
>>> Thanks for the responses.  I'll go into a little more detail:
>>>
>>> We have been running several profilers against OpenSimulator on the
>>> MOSES grid, and on my development machine.  The tests were to examine
>>> the loading on the server under several different loads, specifically
>>> mesh and physics loads.  What we found appears to be that no matter what
>>> kind of load we placed on the region, even to the point of becoming
>>> unresponsive due to physics and mesh, that scripting and physics load
>>> were nowhere near the amount of time spent in
>>> OpenSim.Region.ClientStack.LindenUDP once we had more than one or two
>>> avatars logged in.  We know from previous investigations at our firewall
>>> that network traffic for OpenSim is not that heavy, especially with low
>>> numbers of users.
>>>
>>> I ran several Wireshark captures against a Firestorm viewer logging into
>>> the MOSES public grid ABWIS region, where we hold our office hours.  I
>>> saw that with our current configuration, all traffic between the server
>>> and my client, with the exception of http CAPS and fsapi calls, were UDP
>>> traffic.  This is not immediately concerning, as we have simian serve
>>> our mesh and textures directly. The messages are mostly binary
>>> information, so I could not examine closely, but I did see a lot of
>>> messages containing identical ASCII strings, such as the name of my avatar.
>>>
>>> My primary concern is the amount of time spent handling networking, not
>>> necessarily the networking its-self.  But there is at least a portion of
>>> messages on the UDP pipeline that are either reliable, or perhaps should
>>> be; and re-implementing a reliable transport over udp introduces load at
>>> the application layer, instead of letting a low-level reliable transport
>>> such as tcp handle it.  I went to university with a guy who implemented
>>> a java networking library completely over UDP, believing that it was
>>> faster than a normal TCP socket; but he was neglecting that the
>>> networking hardware handles the ACK and retransmission transparently,
>>> and without needing for the messages to be handled manually by the
>>> application.
>>>
>>> This may just be my opinion, but since I was going to be ecamining the
>>> network stack anyways, and typically in a client-server scenario the
>>> ability to maintain a persistent reliable connection where the server
>>> can push important events to the client, that it would be a good idea.
>>> The points about network throttling and QoS are taken, but wouldn't they
>>> also typically affect the UDP  stream? Working on MOSES I have plenty of
>>> problems dealing with external users who operate on restricted networks,
>>> and they cannot see traffic aside from 80 and 443 without dealing with
>>> their own IT personnel.  The fact that it is HTTP over TCP instead of
>>> raw TCP makes no difference once it is on a non-standard HTTP port.
>>>
>>> I agree that it would be more prudent to look at improving the websocket
>>> code and the http server, rather than replace it with a raw TCP socket,
>>> especially given that there are multiple plugins, such as jsonsimstats,
>>> that use the http functionality directly.
>>>
>>> I hope that explains my position a little better.  I would love to hear
>>> if there are other plans/ideas in the community to address time-sinks
>>> like this one, networking simply appears to us as a good starting point
>>> to increase performance and scalability of the system.
>>>
>>> --
>>> Michael Heilmann
>>> Research Associate
>>> Institute for Simulation and Training
>>> University of Central Florida
>>>
>>>
>>>
>>>> Date: Thu, 13 Nov 2014 13:50:32 -0800
>>>> From: Diva Canto<diva at metaverseink.com>
>>>> To:opensim-dev at opensimulator.org
>>>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>>> Message-ID:<546527A8.3040909 at metaverseink.com>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>> What problem are you trying to solve? It's hard to comment without
>>>> knowing what you want to achieve, but here are some random observations
>>>> that you may want to take into account.
>>>>
>>>> As far as I remember, the reliable packets are a very small percentage
>>>> of the UDP traffic at this point, so I'm not sure it's worth creating a
>>>> dedicated TCP channel for them. Moving the HTTP traffic to the same TCP
>>>> connection seems like a bad idea, as the HTTP traffic tends to be
>>>> dominated by big data (textures, assets, etc) which would then get in
>>>> the way of the small packets like AgentOnline, etc. I suspect it would
>>>> make the client less responsive than what it is now.
>>>>
>>>> There is already support for WebSockets in OpenSim. It may not be
>>>> complete, so I would encourage you to build on that. I am aware of WebGL
>>>> clients that use WebSockets with OpenSim, and they have the same problem
>>>> as described above: the big data gets in the way of the small packets,
>>>> making the clients less responsive at points. But since WebGL is
>>>> inevitable, your effort is probably best invested in this than in a TCP
>>>> channel.
>>>>
>>>> Best,
>>>> Diva
>>>>
>>>> On 11/13/2014 1:18 PM, Michael Heilmann wrote:
>>>>> Greetings everyone
>>>>>
>>>>> I and another MOSES developer are going to be looking at the
>>>>> client/server network stack, as well as the processing queue's used
>>>>> for incoming and outgoing packets.  I am going to see if I can
>>>>> implement a client stack on opensim and firestorm that uses the
>>>>> traditional TCP/UDP pairing for this type of client<->server
>>>>> relationship.  I have two thoughts, but I am interested in hearing if
>>>>> you have ideas or insight into this particular space.
>>>>>
>>>>> Idea 1:
>>>>>       Add a dedicated tcp port next to the UDP port, and move reliable
>>>>> transport transmissions to the tcp port.  I am uncomfortable
>>>>> increasing the required ports for each region, but the http server is
>>>>> in the way.  I can look to move all communications from http to a tcp
>>>>> socket-server type of deployment, at the expense of simple POST/GET
>>>>> operations
>>>>>
>>>>> idea 2:
>>>>>       Look into increasing the performance of the http server of the
>>>>> regions, as well as testing/implementing a full websockets
>>>>> implementation, and using the websockets upgrade for consistent client
>>>>> connections.  This could eventually lead to javascript-based clients,
>>>>> and does not remove http functionality.
>>>>>
>>>>> Either idea would see any traffic requiring reliable transport shifted
>>>>> off of the current UDP stack, and onto the tcp reliable transport.
>>>>> Either idea also will require modifications to a client to match.  I,
>>>>> and another developer here, would be developing the client code, the
>>>>> region code, and testing against a MOSES deployment.  As we are MOSES
>>>>> developers, we would be working against simian instead of Robust; so
>>>>> there would be a gap for regular Robust-based grids.
>>>>>
>>>>> If you could lend me your opinions about these ideas, the management
>>>>> queues, associated problems in opensim, etc, I would really appreciate
>>>>> it.  We would be working completely in the open on github, and obeying
>>>>> all licensing.  We would welcome any and all cooperation, and we will
>>>>> cooperate ourselves wherever we are welcome, but we are not interested
>>>>> in avoiding positive changes to maintain SecondLife compatibility.
>>>>>
>>>>> Thanks.
>>>>>
>>> _______________________________________________
>>> Opensim-dev mailing list
>>> Opensim-dev at opensimulator.org
>>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>> _______________________________________________
>>> Opensim-dev mailing list
>>> Opensim-dev at opensimulator.org
>>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>>
>
>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20141114/72bbc7ac/attachment-0001.html>