[Opensim-dev] Modifying the networking stack

Tue Dec 2 21:36:03 UTC 2014

(the reason older messages are suddenly showing up is that I had reason to go through the opensim-dev moderation queue 
and a lot of messages were sitting there because of a too low message byte limit.  This has now been corrected).

Yeah, I would say this is the approach taken by Linden Lab where a lot of heavy traffic is now sent over HTTP via 
capabilities [1] rather than UDP.  The simplest way to send more may be to add additional capabilities understood by the 
viewer.

That said, because a lot of the heavy stuff has already been moved to HTTP I'm not sure how much impact there would be 
from moving what is left.

[1] http://opensimulator.org/wiki/Capabilities

On 15/11/14 01:41, Heilmann, Michael wrote:
> Once again, I am not suggesting a single serial reliable connection, but a reliable connection in addition to the lossy
> up stream.  I understand what you and MIC are saying, I have a strong background in networking.  Many packets would
> still be best used on a lossy UDP stream.  However, every time you have a packet that needs to be acknowledged, or
> resent, that packet is a candidate for a serial stream.  If it were so time sensitive that a TCP stream would hurt the
> remote users experience, then it should be reevaluated whether that packet should be resent in the first place.
>
> A tcp socket is not a new protocol, it would use all existing serialization/deserialization mechanisms for existing
> mechanisms.  It is simply a socket that performs acknowledgements and resends at the hardware level, instead of forcing
> the application to do so.
>
> I would define a reliable packet as a packet that would be resent until it is confirmed that the client has received it.
>   I am concerned why I have multiple people replying as though I want to rip out the networking udp/http stack and
> replace it with a single tcp socket. is it something I said?
>
> Michael Heilmann
>
>> Date: Fri, 14 Nov 2014 13:33:51 -0800
>> From: Dahlia Trimble <dahliatrimble at gmail.com <mailto:dahliatrimble at gmail.com>>
>> To: opensim-dev at opensimulator.org <mailto:opensim-dev at opensimulator.org>
>> Subject: Re: [Opensim-dev] Modifying the networking stack
>> Message-ID:
>>        <CAAQTD4XMSZL90JKqtbabg8wF6rhst9F=kqPVWTOuQ1UNgQf5mQ at mail.gmail.com
>> <mailto:CAAQTD4XMSZL90JKqtbabg8wF6rhst9F=kqPVWTOuQ1UNgQf5mQ at mail.gmail.com>>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I think the term "reliability" needs to be re-evaluated here. As stated in
>> earlier messages, the point is to transfer state in a timely manner and not
>> waste time and network resources transferring stale, irrelevant state
>> messages. Hence, a "reliable stream" transport which can only send serial
>> data and does it over a lossy connection must recover all data and deliver
>> it in order to the end point and as such is *unreliable* in terms of
>> transferring state in a timely manner. I've seen some MMO games which use
>> TCP for transferring state and some of them seem to do it quite well but
>> they also use a lot of client-side prediction to mitigate the effects of
>> lossy networks. Unfortunately many of these prediction techniques often
>> fail in a system with dynamic content such as OpenSimulator provides; i.e.,
>> one cannot reliably predict a impending collision with an object that has
>> not yet rezzed in the client.
>>
>> While it's been stated that Object Updates form the majority of these
>> state-transferring messages, it is not the case that they are the only
>> messages. AgentUpdate messages form the majority of messages sent from the
>> viewer to the simulator and they are unreliable as adding any reliability
>> to them would introduce delays which would significantly reduce quality of
>> experience for users. Imagine pressing a button to move and having to wait
>> 1-2 seconds to move, or imagine playing a battle game and pressing a fire
>> button only to have your weapon not fire until several seconds later. There
>> are also other messages such as animation state which also need rapid state
>> transmission.
>>
>> Adding a new protocol also involves encoding the data you want and being
>> able to decode it when received. Often such encoding/decoding methods can
>> themselves be a source of significant overhead. The existing serialization
>> code used in libomv (used in OpenSimulator) is fairly efficient at UDP
>> packet encoding and decoding and also provides an efficient LLSD (Linden
>> Lab Structured Data) codec which is used for the messages sent via
>> EventQueueGet. EventQueueGet is also designed to traverse firewalls as it
>> uses HTTP long poll techniques. There has also been much work done in
>> OpenSimulator to pool and reuse these message structures, thereby reducing
>> the overhead of object creation/destruction and garbage collection,
>>
>> Assuming one *could* have a TCP connection that has a perfect transport
>> layer, there is still a single, serial stream which can cause other issues.
>> Once you send several messages you cannot immediately send a time critical
>> message and have it received until all other messages have been processed
>> at the receiving end. While this may be somewhat mitigated by using
>> alternate means of transmitting the initial scene as Mic eluded to in his
>> proxy example. There will likely also be other times such messages will be
>> needed however.
>>
>> LLUDP protocol used to send assets via UDP in a reliable nature. This had
>> the advantage that many assets could be sent simultaneously and
>> continuously re-prioritized as the user would move around in the region.
>> Unfortunately this required a fair amount of resources on the simulator end
>> and was eventually replaced with a HTTP transport which could be offloaded
>> from the simulator to another server such as Apache running on another core
>> or machine. This HTTP solution was never refined to fully implement the
>> dynamic reprioritization or mass simultaneous transfers that the UDP
>> transport allowed but has eventually resulted in a usable technology. There
>> was a time a while back when competing viewer projects would increase the
>> number of simultaneous HTTP downloads in an effort to be the
>> fastest-rezzing viewer however SL had problems with their HTTP servers
>> being overloaded and had to implement limits. I believe the HTTP RFC
>> specification also has a fairly low number of simultaneous connections as a
>> recommendation.
>>
>> I'd also second the recommendation to use WinGridProxy for examining the
>> protocols used by OpenSimulator. It is a very effective and useful tool.
>>
>> On Fri, Nov 14, 2014 at 9:38 AM, Mic Bowman <cmickeyb at gmail.com <mailto:cmickeyb at gmail.com>> wrote:
>>
>>> Hi Michael!
>>>
>>> Pulling out what should already be http over tcp based (textures &
>>> inventory & profiles & ...)...
>>>
>>> Just to be clear... talking very specifically about object updates only
>>> (though terrain updates are similar in nature), you DO NOT want TCP if your
>>> error rates are over 1%. you'll waste a huge amount of bw with
>>> network-level retransmission of old, out-dated updates. Its specifically in
>>> situations where you have high update rates that you want to send current
>>> object state on re-transmission rather than re-transmitting an old packet
>>> that contains an update that has already been superceded.
>>>
>>> That being said... the place where UDP is the wrong protocol is the
>>> initial scene load. It looks a lot like a bunch of one-time object update
>>> packets where the state of the object does not change.
>>>
>>> The approach I spent some time looking at was to create a local scene
>>> proxy... run an opensim instance on your local client that you connect to
>>> once... then that instance mirrors remote scenes (kind of like the dynamic
>>> scene demonstration at oscc except that the scenes are coming from existing
>>> scenes rather than an archive). The proxy idea was inspired by the
>>> architecture that Stefan Anderson used years ago in their opensim service
>>> (I can't remember what it was called).
>>>
>>> The proxy design assumes that making major protocol changes to the viewer
>>> is really hard & expensive, and that local accesses are fast and highly
>>> reliable. The proxy would enable custom protocols for connection to the
>>> remote scene which would be a lot easier to optimize (e.g. bulk transfer of
>>> an oar-like file for initial scene load). The proxy idea got far enough to
>>> show that it could work for a "consumption" experience... but it would take
>>> a lot of work to hook up enough events to make it completely transparent.
>>>
>>> --mic
>>>
>>>
>>>
>>> On Fri, Nov 14, 2014 at 9:11 AM, Michael Heilmann <mheilman at ist.ucf.edu <mailto:mheilman at ist.ucf.edu>>
>>> wrote:
>>>
>>>> I agree completely with the fact that many packets are time-sensitive
>>>> and expire quickly, which is why I was looking for a TCP/UDP pair, not
>>>> dropping UDP in favour of TCP.  I am concerned, however, that UDP is the
>>>> only mechanism for asynchronously sending information to the client that
>>>> may be important, and not expire quickly.
>>>>
>>>> We regularly deal with users who have less then reliable networks, such
>>>> as cellular wifi access points at remote locations, and cannot assume that
>>>> our packet loss will be below 1%.
>>>>
>>>> --
>>>> Michael Heilmann
>>>> Research Associate
>>>> Institute for Simulation and Training
>>>> University of Central Florida
>>>>
>>>>
>>>>
>>>> Date: Fri, 14 Nov 2014 08:30:07 -0800
>>>> From: Mic Bowman <cmickeyb at gmail.com <mailto:cmickeyb at gmail.com>> <cmickeyb at gmail.com <mailto:cmickeyb at gmail.com>>
>>>> To: opensim-dev at opensimulator.org <mailto:opensim-dev at opensimulator.org>
>>>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>>> Message-ID:
>>>>        <CAJaF1_HXXSTFj2KhxDXH4-f_8aSYnWon+QON-NWUn1DommLh4A at mail.gmail.com
>>>> <mailto:CAJaF1_HXXSTFj2KhxDXH4-f_8aSYnWon+QON-NWUn1DommLh4A at mail.gmail.com>>
>>>> <CAJaF1_HXXSTFj2KhxDXH4-f_8aSYnWon+QON-NWUn1DommLh4A at mail.gmail.com
>>>> <mailto:CAJaF1_HXXSTFj2KhxDXH4-f_8aSYnWon+QON-NWUn1DommLh4A at mail.gmail.com>>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> having done a lot of the token bucket & retransmission work in opensim... i
>>>> started with a strong belief that TCP & reliable streams were the right way
>>>> to do object updates (for all bulk content like asset transfers, tcp, or
>>>> http layered on tcp, is definitely the right way). however... what we found
>>>> is that object updates should not be retransmitted at the packet level
>>>> which is what you get from tcp (that is... tcp handles retransmission of
>>>> dropped packets not opensim). the reason is pretty simple... dynamic
>>>> objects change position repeatedly (even if only in short bursts).
>>>> retransmission at the packet level would almost always retransmit **OLD**
>>>>
>>>> state. for any kind of connection (this was often a problem for
>>>> international connections) where packet drops were even as high as 1%, we
>>>> would see multiple updates for the same object queued to send. only the
>>>> last one has any meaning.
>>>>
>>>> so... while the situations where application level retransmission are rare
>>>> (**very** rare)... object updates happens to be one of them. moving
>>>> reliability into a lower level (either layering some protocol on tcp or os
>>>> implemented reliable udp) will result in lower efficiency for faulty
>>>> connections (and if your connection isnt faulty then udp works just fine).
>>>>
>>>> --mic
>>>>
>>>>
>>>> On Fri, Nov 14, 2014 at 7:17 AM, Amit Goel <Amit.Goel at ucf.edu <mailto:Amit.Goel at ucf.edu>> <Amit.Goel at ucf.edu
>>>> <mailto:Amit.Goel at ucf.edu>> wrote:
>>>>
>>>>
>>>> Agree with Michael here that having TCP is better then using UDP with your
>>>> own homegrown TCP implementation on top of it in application software.
>>>>
>>>> How about Reliable UDP :
>>>> http://www.streamingmedia.com/Articles/Editorial/Featured-Articles/Reliable-UDP-(RUDP)-The-Next-Big-Streaming-Protocol-85316.aspxhttp://www.javvin.com/protocolRUDP.html
>>>>
>>>> I have not studied it in detail that how it fits in between lossy UDP and
>>>> time-consuming TCP.
>>>>
>>>> Regards
>>>>
>>>> -- amit
>>>> ________________________________________
>>>> From: opensim-dev-bounces at opensimulator.org
>>>> <mailto:opensim-dev-bounces at opensimulator.org> [opensim-dev-bounces at opensimulator.org
>>>> <mailto:opensim-dev-bounces at opensimulator.org>] on behalf of Michael Heilmann [mheilman at ist.ucf.edu
>>>> <mailto:mheilman at ist.ucf.edu>]
>>>> Sent: Friday, November 14, 2014 9:23 AM
>>>> To: opensim-dev at opensimulator.org <mailto:opensim-dev at opensimulator.org>
>>>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>>>
>>>> Thanks for the responses.  I'll go into a little more detail:
>>>>
>>>> We have been running several profilers against OpenSimulator on the
>>>> MOSES grid, and on my development machine.  The tests were to examine
>>>> the loading on the server under several different loads, specifically
>>>> mesh and physics loads.  What we found appears to be that no matter what
>>>> kind of load we placed on the region, even to the point of becoming
>>>> unresponsive due to physics and mesh, that scripting and physics load
>>>> were nowhere near the amount of time spent in
>>>> OpenSim.Region.ClientStack.LindenUDP once we had more than one or two
>>>> avatars logged in.  We know from previous investigations at our firewall
>>>> that network traffic for OpenSim is not that heavy, especially with low
>>>> numbers of users.
>>>>
>>>> I ran several Wireshark captures against a Firestorm viewer logging into
>>>> the MOSES public grid ABWIS region, where we hold our office hours.  I
>>>> saw that with our current configuration, all traffic between the server
>>>> and my client, with the exception of http CAPS and fsapi calls, were UDP
>>>> traffic.  This is not immediately concerning, as we have simian serve
>>>> our mesh and textures directly. The messages are mostly binary
>>>> information, so I could not examine closely, but I did see a lot of
>>>> messages containing identical ASCII strings, such as the name of my avatar.
>>>>
>>>> My primary concern is the amount of time spent handling networking, not
>>>> necessarily the networking its-self.  But there is at least a portion of
>>>> messages on the UDP pipeline that are either reliable, or perhaps should
>>>> be; and re-implementing a reliable transport over udp introduces load at
>>>> the application layer, instead of letting a low-level reliable transport
>>>> such as tcp handle it.  I went to university with a guy who implemented
>>>> a java networking library completely over UDP, believing that it was
>>>> faster than a normal TCP socket; but he was neglecting that the
>>>> networking hardware handles the ACK and retransmission transparently,
>>>> and without needing for the messages to be handled manually by the
>>>> application.
>>>>
>>>> This may just be my opinion, but since I was going to be ecamining the
>>>> network stack anyways, and typically in a client-server scenario the
>>>> ability to maintain a persistent reliable connection where the server
>>>> can push important events to the client, that it would be a good idea.
>>>> The points about network throttling and QoS are taken, but wouldn't they
>>>> also typically affect the UDP  stream? Working on MOSES I have plenty of
>>>> problems dealing with external users who operate on restricted networks,
>>>> and they cannot see traffic aside from 80 and 443 without dealing with
>>>> their own IT personnel.  The fact that it is HTTP over TCP instead of
>>>> raw TCP makes no difference once it is on a non-standard HTTP port.
>>>>
>>>> I agree that it would be more prudent to look at improving the websocket
>>>> code and the http server, rather than replace it with a raw TCP socket,
>>>> especially given that there are multiple plugins, such as jsonsimstats,
>>>> that use the http functionality directly.
>>>>
>>>> I hope that explains my position a little better.  I would love to hear
>>>> if there are other plans/ideas in the community to address time-sinks
>>>> like this one, networking simply appears to us as a good starting point
>>>> to increase performance and scalability of the system.
>>>>
>>>> --
>>>> Michael Heilmann
>>>> Research Associate
>>>> Institute for Simulation and Training
>>>> University of Central Florida
>>>>
>>>>
>>>>
>>>>
>>>> Date: Thu, 13 Nov 2014 13:50:32 -0800
>>>> From: Diva Canto<diva at metaverseink.com <mailto:diva at metaverseink.com>> <diva at metaverseink.com
>>>> <mailto:diva at metaverseink.com>>To:opensim-dev at opensimulator.org <mailto:dev at opensimulator.org>
>>>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>>> Message-ID:<546527A8.3040909 at metaverseink.com <mailto:546527A8.3040909 at metaverseink.com>>
>>>> <546527A8.3040909 at metaverseink.com <mailto:546527A8.3040909 at metaverseink.com>>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>> What problem are you trying to solve? It's hard to comment without
>>>> knowing what you want to achieve, but here are some random observations
>>>> that you may want to take into account.
>>>>
>>>> As far as I remember, the reliable packets are a very small percentage
>>>> of the UDP traffic at this point, so I'm not sure it's worth creating a
>>>> dedicated TCP channel for them. Moving the HTTP traffic to the same TCP
>>>> connection seems like a bad idea, as the HTTP traffic tends to be
>>>> dominated by big data (textures, assets, etc) which would then get in
>>>> the way of the small packets like AgentOnline, etc. I suspect it would
>>>> make the client less responsive than what it is now.
>>>>
>>>> There is already support for WebSockets in OpenSim. It may not be
>>>> complete, so I would encourage you to build on that. I am aware of WebGL
>>>> clients that use WebSockets with OpenSim, and they have the same problem
>>>> as described above: the big data gets in the way of the small packets,
>>>> making the clients less responsive at points. But since WebGL is
>>>> inevitable, your effort is probably best invested in this than in a TCP
>>>> channel.
>>>>
>>>> Best,
>>>> Diva
>>>>
>>>> On 11/13/2014 1:18 PM, Michael Heilmann wrote:
>>>>
>>>> Greetings everyone
>>>>
>>>> I and another MOSES developer are going to be looking at the
>>>> client/server network stack, as well as the processing queue's used
>>>> for incoming and outgoing packets.  I am going to see if I can
>>>> implement a client stack on opensim and firestorm that uses the
>>>> traditional TCP/UDP pairing for this type of client<->server
>>>> relationship.  I have two thoughts, but I am interested in hearing if
>>>> you have ideas or insight into this particular space.
>>>>
>>>> Idea 1:
>>>>     Add a dedicated tcp port next to the UDP port, and move reliable
>>>> transport transmissions to the tcp port.  I am uncomfortable
>>>> increasing the required ports for each region, but the http server is
>>>> in the way.  I can look to move all communications from http to a tcp
>>>> socket-server type of deployment, at the expense of simple POST/GET
>>>> operations
>>>>
>>>> idea 2:
>>>>     Look into increasing the performance of the http server of the
>>>> regions, as well as testing/implementing a full websockets
>>>> implementation, and using the websockets upgrade for consistent client
>>>> connections.  This could eventually lead to javascript-based clients,
>>>> and does not remove http functionality.
>>>>
>>>> Either idea would see any traffic requiring reliable transport shifted
>>>> off of the current UDP stack, and onto the tcp reliable transport.
>>>> Either idea also will require modifications to a client to match.  I,
>>>> and another developer here, would be developing the client code, the
>>>> region code, and testing against a MOSES deployment.  As we are MOSES
>>>> developers, we would be working against simian instead of Robust; so
>>>> there would be a gap for regular Robust-based grids.
>>>>
>>>> If you could lend me your opinions about these ideas, the management
>>>> queues, associated problems in opensim, etc, I would really appreciate
>>>> it.  We would be working completely in the open on github, and obeying
>>>> all licensing.  We would welcome any and all cooperation, and we will
>>>> cooperate ourselves wherever we are welcome, but we are not interested
>>>> in avoiding positive changes to maintain SecondLife compatibility.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> _______________________________________________
>>>> Opensim-dev mailing listOpensim-dev at opensimulator.orghttp
>>>> <mailto:listOpensim-dev at opensimulator.orghttp>://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>>> <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>
>>>> _______________________________________________
>>>> Opensim-dev mailing listOpensim-dev at opensimulator.orghttp
>>>> <mailto:listOpensim-dev at opensimulator.orghttp>://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>>> <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>
>>>>
>>>>
>>>> _______________________________________________
>>>> Opensim-dev mailing list
>>>> Opensim-dev at opensimulator.org <mailto:Opensim-dev at opensimulator.org>
>>>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Opensim-dev mailing list
>>> Opensim-dev at opensimulator.org <mailto:Opensim-dev at opensimulator.org>
>>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>

-- 
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc