[Opensim-dev] Question about the udp receiver algorithm

Dahlia Trimble dahliatrimble at gmail.com
Tue Apr 29 01:19:13 UTC 2014


I agree the present use of threads is far from optimal but I believe it to
be more a matter of convenience which was overused, partially due to the
origins of the codebase being from many diverse contributors with diverse
goals and a naive, but evolving, initial design. Such is probably
unavoidable in open source programs of this size.

However, threadpool threads are not necessarily as inefficient as some of
the messages in this thread might imply. They end up using very few
operating system threads and share these among many tasks and do context
switching in user space rather than requiring kernel traps. This reduces
the normal cost s associated with using threads and can provide an
efficient  means of multitasking within a process if used correctly.

A discussion of recommend ways of performing asynchronous IO with a
threadpool can be found here:
http://msdn.microsoft.com/en-us/library/ms973903.aspx#threadpool_topic6


On Mon, Apr 28, 2014 at 4:49 PM, Matt Lehmann <mattlehma at gmail.com> wrote:

> Thanks so much for the explanation Justin.
>
> I agree that Networking issues like this are hard to measure.
>
> I guess my concern is that,  after reading alot of the opensim code,
> there seems to be a prevalent notion that concurrency will solve issues
> with responsiveness.   It's treated like a magic box where you can
> 'fireandforget.'
>
> If a computation is going to lag your system,  then it is going to lag
> your system whether you do it now or later.   What's more is that all the
> spawning of threads will definitely cause its own problems.
>
> I do agree though that a database call might be made in its own thread,
> so that the cpu can do other things while the disk is spinning. In this
> case it might be prudent to implement a more complicated scheme involving
> cached database management.
>
> Anyways,  thanks so much for the work you all do on opensim.   It's a
> great system.
> On Apr 28, 2014 4:01 PM, "Justin Clark-Casey" <jjustincc at googlemail.com>
> wrote:
>
>> That's correct.  I added some high level summary to [1].
>>
>> My expectation is that in many situations, the async packets are indeed
>> handled pretty quickly with very often only a single thread from the pool
>> used.  I've seen this when analyzing threadpool data from the crude stats
>> recording mechanism [2].
>>
>> However, blocking IO can indeed be a problem, often due to heavy load on
>> services.  For instance, it used to be the case that some asset fetches
>> were performed asynchronously, so if the asset service was slow, processing
>> triggered by inbound packet handling would be held up.
>>
>> If one were running the two tier system that you described, then this
>> would hold up processing of all second tier packets more than the current
>> system.  Perhaps one could forensically identify which handlers could be
>> subject to this problem and handle those differently from others.
>>
>> It's a complex topic as OpenSimulator is very much an evolved (and
>> evolving codebase), where delays can occur in unexpected places and there's
>> a huge variance in network and hardware conditions.  In this case, I can
>> imagine that our async handling is more resilient in cases where only a few
>> requests are slow.
>>
>> I believe the key is trying to measure the performance change of a
>> second-tier loop to see if the potential gains are worth the potential
>> problems.
>>
>> [1] http://opensimulator.org/wiki/LLUDP_ClientStack#Inbound_UDP
>> [2] http://opensimulator.org/wiki/Show_stats#stats_record
>>
>> On 26/04/14 13:53, Matt Lehmann wrote:
>>
>>> I looked at this a bit more this morning.
>>>
>>> So, as I understand, the handling looks like this-->
>>>
>>> --async read cycle which drops packet into a blocking queue
>>> --smart thread which services the blocking queue, and calls the
>>> LLClientView method ProcessInPacket
>>>
>>> LLClientView sorts the packets according to whether the handler should
>>> be called asynchronously or not.
>>>
>>> If async is needed, LLClientView will create a smart thread for the
>>> handler, and start the thread.
>>> ...the handlers basically signal the events defined in LLClientView
>>> which are listened to by one or more other callbacks.
>>> If async is not needed/desired, then LLClientView will process the
>>> packet directly.
>>>
>>> So there is one additional thread being created for each async handler,
>>> with the original smart thread running all the
>>> non-async packet handlers.
>>>
>>> The question is /can these async threads can be replaced by a second
>>> smart thread, which services a queue of async
>>> handlers/?  Do the handlers require some sort of blocking I/O?  Can we
>>> rearrange the handlers to operate under these
>>> conditions?
>>>
>>> If the answer is yes, then a great deal of compute cycles can be saved
>>> by consolidating all the spawned threads into one
>>> single thread loop.
>>>
>>> Matt
>>>
>>>
>>>
>>>
>>> On Fri, Apr 25, 2014 at 10:03 PM, Matt Lehmann <mattlehma at gmail.com<mailto:
>>> mattlehma at gmail.com>> wrote:
>>>
>>>     Yes I agree that the udp service is critical and would need
>>> extensive testing.
>>>
>>>     I wouldn't expect you all to make any changes.
>>>
>>>     Still it's an interesting topic.  The networking world seems to be
>>> moving towards smaller virtualized servers with
>>>     less resources, so I think it's an important discussion.  At my work
>>> we are deploying an opensim cluster which is
>>>     why I have become so interested.
>>>
>>>
>>>     Thanks
>>>
>>>     Matt
>>>
>>>
>>>     On Friday, April 25, 2014, Diva Canto <diva at metaverseink.com<mailto:
>>> diva at metaverseink.com>> wrote:
>>>
>>>         That is one very specific and unique case, something that
>>> happens in the beginning, and that is necessary,
>>>         otherwise clients crash. It's an "exception" wrt the bulk of
>>> processing UDP packets. The bulk of them are
>>>         processed as you described in your first message: placed in a
>>> queue, consumed by a consumer thread which either
>>>         processes them directly or spawns threads for processing them.
>>>
>>>         In general, my experience is also that limiting the amount of
>>> concurrency is a Good Thing. A couple of years ago
>>>         we had way too much concurrency; we've been taming that down.
>>>
>>>         As Dahlia said, the packet handling layer of OpenSim is really
>>> critical, and the viewers are sensitive to it, so
>>>         any drastic changes to it need to go through extensive testing.
>>> The current async reading is not bad, as it
>>>         empties the socket queue almost immediately. The threads that
>>> are spawn from the consumer thread, though, could
>>>         use some rethinking.
>>>
>>>         On 4/25/2014 9:29 PM, Matt Lehmann wrote:
>>>
>>>>         One example of what I'm trying to say.
>>>>
>>>>         In part of the packet handling there is a condition where the
>>>> server needs to respond to the client, but does
>>>>         not yet know the identity of the client. So the server responds
>>>> to the client and then spawns a thread which
>>>>         loops and sleeps until it can identify the client.( I don't
>>>> really understand what's going on here,)
>>>>
>>>>         Nevertheless in this case you could do without the new thread
>>>> if you queued a lambda function which would
>>>>         check to see if the client can be identified.  A second event
>>>> loop could periodically poll this function until
>>>>         it completes.
>>>>
>>>>         You could also queue other contexts which would complete the
>>>> handling of other types of packets.
>>>>
>>>>         Matt
>>>>
>>>>         On Friday, April 25, 2014, Dahlia Trimble <
>>>> dahliatrimble at gmail.com> wrote:
>>>>
>>>>             From my experience there are some things that need to
>>>> happen as soon as possible and others which can be
>>>>             delayed. What needs to happen ASAP:
>>>>             1). reading the socket and keeping it emptied.
>>>>             2) acknowledge any received packets which may require such
>>>>             3) process any acknowledgements sent by the viewer
>>>>             4) handle AgentUpdate packets. (these can probably be
>>>> filtered for uniqueness and mostly discarded if not
>>>>             unique).
>>>>
>>>>             This list is off the top of my head and may not be
>>>> complete. Most, if not all, other packets could be put
>>>>             into queues and process as resources permit without
>>>> negatively affecting the quality of the shared state
>>>>             of the simulation.
>>>>
>>>>             Please be aware that viewers running on high-end machines
>>>> can constantly send several hundred packets per
>>>>             second, and that under extreme conditions there can be
>>>> several hundred viewers connected to a single
>>>>             simulator.  Any improvements in the UDP processing portions
>>>> of the code base should probably take these
>>>>             constraints into consideration.
>>>>
>>>>
>>>>             On Fri, Apr 25, 2014 at 8:17 PM, Matt Lehmann <
>>>> mattlehma at gmail.com> wrote:
>>>>
>>>>                 That makes sense to me.
>>>>
>>>>                 If I recall, the packet handlers will create more
>>>> threads if they expect delays, such as when waiting
>>>>                 for a client to finish movement into the sim.
>>>>
>>>>                 Considering that I have 65 threads running on my
>>>> standalone instance, with 4 cores that leaves about
>>>>                 15 threads competing.  You have to do the work at some
>>>> point.
>>>>
>>>>                 Matt
>>>>
>>>>                 On Friday, April 25, 2014, Dahlia Trimble <
>>>> dahliatrimble at gmail.com> wrote:
>>>>
>>>>                     Depends on what you mean by "services the packets".
>>>> Decoding and ACKing could probably work well
>>>>                     in a socket read loop but dispatching the packet to
>>>> the proper part of the simulation could incur
>>>>                     many delays which can cause a lot of packet loss in
>>>> the lower level operating system routines as
>>>>                     the buffers are only so large and any excessive
>>>> data is discarded. Putting them in a queue
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Opensim-dev mailing list
>>> Opensim-dev at opensimulator.org
>>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>>
>>>
>>
>> --
>> Justin Clark-Casey (justincc)
>> OSVW Consulting
>> http://justincc.org
>> http://twitter.com/justincc
>> _______________________________________________
>> Opensim-dev mailing list
>> Opensim-dev at opensimulator.org
>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20140428/d07cc324/attachment-0001.html>


More information about the Opensim-dev mailing list