[Opensim-dev] Modifying the networking stack (UNCLASSIFIED)

Tue Nov 18 21:51:54 UTC 2014

These are all fixable issues, either with pCampbot improvements or distributing pCampbot instances amongst more 
machines.  I expect pCampbot will be built upon to address these points as required.  And this year I successfully used 
4 Amazon c2 large instances for bot running so a more realistic network load means spinning up more cloud instances.

I agree that unless you can reproduce an issue you are shooting in the dark with any changes.  And organizing enough 
real people to reproduce issues on a regular basis and without a huge amount of confusing other behaviour is impossible 
in practice.

On 14/11/14 16:46, Maxwell, Douglas CIV USARMY ARL (US) wrote:
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> Dr. Lopez, thank you for sharing your paper.  Can you tell me where it was
> peer reviewed and published?  I would like to reference it in my
> dissertation.
>
> On the topic of bots, the MOSES team has not been able to compose a NPC
> agent or bot that accurately replicate the footprint of a human agent on the
> simulator.  We believe this is for many reasons:
>
> 1)  Bots are usually composed on a server on the same network, not dispersed
> across the internet.  The bots should be software throttled and noise
> introduced into their connections to approximate random access.
>
> 2)  Bots aren't using full clients, so they are not filling caches and
> making the same scene requests as humans in graphical clients.
>
> 3)  Bots are usually homogenous.  They need to be randomly dressed, have
> random attachments, and have random inventories.
>
> 4)  Bots need to move randomly and collide with objects in the scene and
> with each other.
>
> 5)  Bots need to randomly chat with each other and broadcast locally.
>
> We think we can create a NPC solution that satisfies these issues.  Will
> take some thought and development.  Has anyone come close to this?
>
> Goal:  Compose bots/NPCs that can approximate the loads of humans within 90%
> certainty.  Meaning if we load 100 of these artificial agents into the
> MOSES, we are certain that it will accurately behave as if at least 90
> humans are logged in.
>
> IMHO, if you can't assign a reliability to a test, then you are just wasting
> your time.  This is basic V&V tenants.
>
> v/r -douglas
>
> Douglas Maxwell, MSME
> Science and Technology Manager
> Virtual World Strategic Applications
> U.S. Army Research Lab
> Simulation & Training Technology Center (STTC)
> (c) (407) 242-0209
>
>
>
> -----Original Message-----
> From: opensim-dev-bounces at opensimulator.org
> [mailto:opensim-dev-bounces at opensimulator.org] On Behalf Of Diva Canto
> Sent: Friday, November 14, 2014 11:05 AM
> To: opensim-dev at opensimulator.org
> Subject: Re: [Opensim-dev] Modifying the networking stack
>
> On 11/14/2014 6:23 AM, Michael Heilmann wrote:
>> Thanks for the responses.  I'll go into a little more detail:
>>
>> We have been running several profilers against OpenSimulator on the
>> MOSES grid, and on my development machine.  The tests were to examine
>> the loading on the server under several different loads, specifically
>> mesh and physics loads.  What we found appears to be that no matter
>> what kind of load we placed on the region, even to the point of
>> becoming unresponsive due to physics and mesh, that scripting and
>> physics load were nowhere near the amount of time spent in
>> OpenSim.Region.ClientStack.LindenUDP once we had more than one or two
>> avatars logged in.  We know from previous investigations at our
>> firewall that network traffic for OpenSim is not that heavy,
>> especially with low numbers of users.
>
> If this is a problem, and you are running a recent-ish version of core
> OpenSim, it sounds like some misconfiguration somewhere. Back in the summer
> of 2013 we had a problem with the server running OSCC'13; the kernel was
> configured to run in some sort of special mode that was making everything
> run badly and unpredictably. We fixed the kernel configuration, and suddenly
> things started running much more smoothly-- I don't remember the details,
> but Nebadon may clarify things.
>
> OpenSim these days can handle 50 people on a single simulator without much
> trouble. If you look at figure 7 of my paper
> (http://www.ics.uci.edu/~lopes/documents/summersim14/gabrielova_lopes_prepri
> nt.pdf)
> you will see the quantification of "without much trouble." I suggest that
> you reproduce my experimental conditions with pCamBot and check whether your
> numbers are very different from ours. If they are very different, then
> there's definitely something odd in your setup, as we were able to reproduce
> these numbers in several machines. Feel free to contact me directly for
> details about pCamBot configuration.
>
> Bots aren't real viewers, but they are much better for measuring things
> systematically and detecting problems and bottlenecks than relying on real
> users driven by real people. The performance you get with pCamBot will be
> correlated with the performance you get with real users.
>
>
>> I ran several Wireshark captures against a Firestorm viewer logging
>> into the MOSES public grid ABWIS region, where we hold our office
>> hours.  I saw that with our current configuration, all traffic between
>> the server and my client, with the exception of http CAPS and fsapi
>> calls, were UDP traffic.  This is not immediately concerning, as we
>> have simian serve our mesh and textures directly. The messages are
>> mostly binary information, so I could not examine closely, but I did
>> see a lot of messages containing identical ASCII strings, such as the
>> name of my avatar.
>
> Hard to say what you saw, but I bet those are the AgentUpdate messages that
> I mentioned before. The viewer sends at least 10/sec. At points, the viewer
> sends much more than 10/sec, up to 60/sec. Again, take a look at my paper
> for understanding what those are, and how OpenSim deals with them since
> OSCC'13.
>
> As I said before, it would be nice to understand why the viewer is so eager
> to blabber its status to the server when nothing is going on.
>
>
>> My primary concern is the amount of time spent handling networking,
>> not necessarily the networking its-self.  But there is at least a
>> portion of messages on the UDP pipeline that are either reliable, or
>> perhaps should be; and re-implementing a reliable transport over udp
>> introduces load at the application layer, instead of letting a
>> low-level reliable transport such as tcp handle it.  I went to
>> university with a guy who implemented a java networking library
>> completely over UDP, believing that it was faster than a normal TCP
>> socket; but he was neglecting that the networking hardware handles the
>> ACK and retransmission transparently, and without needing for the
>> messages to be handled manually by the application.
>>
>> This may just be my opinion, but since I was going to be ecamining the
>> network stack anyways, and typically in a client-server scenario the
>> ability to maintain a persistent reliable connection where the server
>> can push important events to the client, that it would be a good
>> idea.  The points about network throttling and QoS are taken, but
>> wouldn't they also typically affect the UDP stream? Working on MOSES I
>> have plenty of problems dealing with external users who operate on
>> restricted networks, and they cannot see traffic aside from 80 and 443
>> without dealing with their own IT personnel.  The fact that it is HTTP
>> over TCP instead of raw TCP makes no difference once it is on a
>> non-standard HTTP port.
>>
>> I agree that it would be more prudent to look at improving the
>> websocket code and the http server, rather than replace it with a raw
>> TCP socket, especially given that there are multiple plugins, such as
>> jsonsimstats, that use the http functionality directly.
>>
>> I hope that explains my position a little better.  I would love to
>> hear if there are other plans/ideas in the community to address
>> time-sinks like this one, networking simply appears to us as a good
>> starting point to increase performance and scalability of the system.
>>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>

-- 
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc