[Opensim-dev] Modifying the networking stack (UNCLASSIFIED)

Tue Nov 18 22:17:02 UTC 2014

justin,

does pcampbot allow us to drop random incoming messages? one thing about
ec2 instances is that they are (generally) well connected. adding random or
bursty packet drops might help test more effectively problems with
networking.

--mic

On Tue, Nov 18, 2014 at 1:51 PM, Justin Clark-Casey <
jjustincc at googlemail.com> wrote:

> These are all fixable issues, either with pCampbot improvements or
> distributing pCampbot instances amongst more machines.  I expect pCampbot
> will be built upon to address these points as required.  And this year I
> successfully used 4 Amazon c2 large instances for bot running so a more
> realistic network load means spinning up more cloud instances.
>
> I agree that unless you can reproduce an issue you are shooting in the
> dark with any changes.  And organizing enough real people to reproduce
> issues on a regular basis and without a huge amount of confusing other
> behaviour is impossible in practice.
>
> On 14/11/14 16:46, Maxwell, Douglas CIV USARMY ARL (US) wrote:
>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>> Dr. Lopez, thank you for sharing your paper.  Can you tell me where it was
>> peer reviewed and published?  I would like to reference it in my
>> dissertation.
>>
>> On the topic of bots, the MOSES team has not been able to compose a NPC
>> agent or bot that accurately replicate the footprint of a human agent on
>> the
>> simulator.  We believe this is for many reasons:
>>
>>
>> 1)  Bots are usually composed on a server on the same network, not
>> dispersed
>> across the internet.  The bots should be software throttled and noise
>> introduced into their connections to approximate random access.
>>
>> 2)  Bots aren't using full clients, so they are not filling caches and
>> making the same scene requests as humans in graphical clients.
>>
>> 3)  Bots are usually homogenous.  They need to be randomly dressed, have
>> random attachments, and have random inventories.
>>
>> 4)  Bots need to move randomly and collide with objects in the scene and
>> with each other.
>>
>> 5)  Bots need to randomly chat with each other and broadcast locally.
>>
>> We think we can create a NPC solution that satisfies these issues.  Will
>> take some thought and development.  Has anyone come close to this?
>>
>> Goal:  Compose bots/NPCs that can approximate the loads of humans within
>> 90%
>> certainty.  Meaning if we load 100 of these artificial agents into the
>> MOSES, we are certain that it will accurately behave as if at least 90
>> humans are logged in.
>>
>> IMHO, if you can't assign a reliability to a test, then you are just
>> wasting
>> your time.  This is basic V&V tenants.
>>
>> v/r -douglas
>>
>> Douglas Maxwell, MSME
>> Science and Technology Manager
>> Virtual World Strategic Applications
>> U.S. Army Research Lab
>> Simulation & Training Technology Center (STTC)
>> (c) (407) 242-0209
>>
>>
>>
>> -----Original Message-----
>> From: opensim-dev-bounces at opensimulator.org
>> [mailto:opensim-dev-bounces at opensimulator.org] On Behalf Of Diva Canto
>> Sent: Friday, November 14, 2014 11:05 AM
>> To: opensim-dev at opensimulator.org
>> Subject: Re: [Opensim-dev] Modifying the networking stack
>>
>> On 11/14/2014 6:23 AM, Michael Heilmann wrote:
>>
>>> Thanks for the responses.  I'll go into a little more detail:
>>>
>>> We have been running several profilers against OpenSimulator on the
>>> MOSES grid, and on my development machine.  The tests were to examine
>>> the loading on the server under several different loads, specifically
>>> mesh and physics loads.  What we found appears to be that no matter
>>> what kind of load we placed on the region, even to the point of
>>> becoming unresponsive due to physics and mesh, that scripting and
>>> physics load were nowhere near the amount of time spent in
>>> OpenSim.Region.ClientStack.LindenUDP once we had more than one or two
>>> avatars logged in.  We know from previous investigations at our
>>> firewall that network traffic for OpenSim is not that heavy,
>>> especially with low numbers of users.
>>>
>>
>> If this is a problem, and you are running a recent-ish version of core
>> OpenSim, it sounds like some misconfiguration somewhere. Back in the
>> summer
>> of 2013 we had a problem with the server running OSCC'13; the kernel was
>> configured to run in some sort of special mode that was making everything
>> run badly and unpredictably. We fixed the kernel configuration, and
>> suddenly
>> things started running much more smoothly-- I don't remember the details,
>> but Nebadon may clarify things.
>>
>> OpenSim these days can handle 50 people on a single simulator without much
>> trouble. If you look at figure 7 of my paper
>> (http://www.ics.uci.edu/~lopes/documents/summersim14/
>> gabrielova_lopes_prepri
>> nt.pdf)
>> you will see the quantification of "without much trouble." I suggest that
>> you reproduce my experimental conditions with pCamBot and check whether
>> your
>> numbers are very different from ours. If they are very different, then
>> there's definitely something odd in your setup, as we were able to
>> reproduce
>> these numbers in several machines. Feel free to contact me directly for
>> details about pCamBot configuration.
>>
>> Bots aren't real viewers, but they are much better for measuring things
>> systematically and detecting problems and bottlenecks than relying on real
>> users driven by real people. The performance you get with pCamBot will be
>> correlated with the performance you get with real users.
>>
>>
>>  I ran several Wireshark captures against a Firestorm viewer logging
>>> into the MOSES public grid ABWIS region, where we hold our office
>>> hours.  I saw that with our current configuration, all traffic between
>>> the server and my client, with the exception of http CAPS and fsapi
>>> calls, were UDP traffic.  This is not immediately concerning, as we
>>> have simian serve our mesh and textures directly. The messages are
>>> mostly binary information, so I could not examine closely, but I did
>>> see a lot of messages containing identical ASCII strings, such as the
>>> name of my avatar.
>>>
>>
>> Hard to say what you saw, but I bet those are the AgentUpdate messages
>> that
>> I mentioned before. The viewer sends at least 10/sec. At points, the
>> viewer
>> sends much more than 10/sec, up to 60/sec. Again, take a look at my paper
>> for understanding what those are, and how OpenSim deals with them since
>> OSCC'13.
>>
>> As I said before, it would be nice to understand why the viewer is so
>> eager
>> to blabber its status to the server when nothing is going on.
>>
>>
>>  My primary concern is the amount of time spent handling networking,
>>> not necessarily the networking its-self.  But there is at least a
>>> portion of messages on the UDP pipeline that are either reliable, or
>>> perhaps should be; and re-implementing a reliable transport over udp
>>> introduces load at the application layer, instead of letting a
>>> low-level reliable transport such as tcp handle it.  I went to
>>> university with a guy who implemented a java networking library
>>> completely over UDP, believing that it was faster than a normal TCP
>>> socket; but he was neglecting that the networking hardware handles the
>>> ACK and retransmission transparently, and without needing for the
>>> messages to be handled manually by the application.
>>>
>>> This may just be my opinion, but since I was going to be ecamining the
>>> network stack anyways, and typically in a client-server scenario the
>>> ability to maintain a persistent reliable connection where the server
>>> can push important events to the client, that it would be a good
>>> idea.  The points about network throttling and QoS are taken, but
>>> wouldn't they also typically affect the UDP stream? Working on MOSES I
>>> have plenty of problems dealing with external users who operate on
>>> restricted networks, and they cannot see traffic aside from 80 and 443
>>> without dealing with their own IT personnel.  The fact that it is HTTP
>>> over TCP instead of raw TCP makes no difference once it is on a
>>> non-standard HTTP port.
>>>
>>> I agree that it would be more prudent to look at improving the
>>> websocket code and the http server, rather than replace it with a raw
>>> TCP socket, especially given that there are multiple plugins, such as
>>> jsonsimstats, that use the http functionality directly.
>>>
>>> I hope that explains my position a little better.  I would love to
>>> hear if there are other plans/ideas in the community to address
>>> time-sinks like this one, networking simply appears to us as a good
>>> starting point to increase performance and scalability of the system.
>>>
>>>
>> _______________________________________________
>> Opensim-dev mailing list
>> Opensim-dev at opensimulator.org
>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>>
>>
>>
>> _______________________________________________
>> Opensim-dev mailing list
>> Opensim-dev at opensimulator.org
>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>
>>
>
> --
> Justin Clark-Casey (justincc)
> OSVW Consulting
> http://justincc.org
> http://twitter.com/justincc
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20141118/e9749816/attachment.html>