[Opensim-dev] networking issues

Mon Mar 28 19:03:13 UTC 2011

For avatars yes. But prim updates can never be discarded, no matter
how trivial, because they establish new persistent state.

Melanie

Dahlia Trimble wrote:
> the viewer discards small changes anyway if avatar imposters are enabled
> 
> On Mon, Mar 28, 2011 at 11:54 AM, Melanie <melanie at t-data.com> wrote:
> 
>> No, we can't discard small changes. As the avatar comes closer, they
>> would be seen out of place, e.g. someone building in the distance
>> would move prims and then you come closer to look and all prims
>> would be out of place.
>>
>> Melanie
>>
>> Dahlia Trimble wrote:
>> > a couple thoughts..
>> >
>> > Perhaps resend timeout period could be a function of throttle setting
>> and/or
>> > measured packet acknowledgement time per-client? (provided we measure
>> it).
>> > That may prevent excessive resend processing that may not be necessary.
>> >
>> > On the distance prioritization, could small changed in object
>> translations
>> > be discarded from the prioritization queues/resend buffers for distant
>> > objects when new updates occur for those objects? Small changes may not
>> be
>> > noticeable from the viewer perspective anyway.
>> >
>> >
>> > On Mon, Mar 28, 2011 at 10:48 AM, Teravus Ovares <teravus at gmail.com>
>> wrote:
>> >
>> >> Here are a few facts that I've personally discovered while working
>> >> with LLClientView.
>> >>
>> >> 1. It has been noted that people with poor connections to the
>> >> simulator do consume more bandwidth, cpu, and have a generally worse
>> >> experience.   This has been tested and profiled extensively.    This
>> >> may seem like a small issue because what it's doing is so basic...
>> >> however the frequency in which this occurs is a real cause of
>> >> performance issues.
>> >>
>> >> 2. It's also noted that the CPU used in these cases reduces the CPU
>> >> available to the rest of the simulator resulting in a lower quality of
>> >> service for the rest of the people on the simulator.
>> >> This has been seen in the profiling and has been qualitatively
>> >> observed by a large number of users connected and everything is OK and
>> >> then a 'problem connection' user connecting causing a wide range of
>> >> issues.
>> >>
>> >> 3. It's also noted that lowering the outgoing UDP packet throttles
>> >> beyond a certain point results in perpetual queuing and resends.
>> >> This was tested by using a throttle multiplier last year that was
>> >> implemented by justincc.  I'm not sure if the multiplier is still
>> >> there.   It's most easily seen with image packets.   Again, I note
>> >> that the packets are not rebuilt going from the regular outbound queue
>> >> to the resend queue.    The resend queue is /supposed/ to be used to
>> >> quickly get data that is essential to the client after attempting to
>> >> send once already.   The UDP spec declares the maximum resend to be 2
>> >> times, however there has been some considerable debate on whether or
>> >> not OpenSimulator should follow that specific specification item
>> >> leading to a configuration option to enable perpetual resends
>> >> (Implemented by Melanie).  The configuration item was named similar
>> >> to, 'reliable is important' or something like that.   I'm not sure if
>> >> the configuration item survived the many revisions however I suspect
>> >> that it did.
>> >>
>> >> 4. It's also noted that raising the packet throttles beyond what the
>> >> connection can support results in resending almost every packet the
>> >> maximum amount of times before the limit is reached.
>> >> This is easily reproducible by setting the connection (in the client)
>> >> to the maximum and connecting to a region that you've never been to
>> >> before on a sub par connection.   Before the client adjusts and
>> >> requests a lower throttle setting there's massive data loss and
>> >> massive re-queuing.
>> >>
>> >> 5. The client tries to adjust the throttle settings based on network
>> >> conditions.   This can be observed by monitoring the packet that sets
>> >> the throttles and dragging the bar to maximum.   After a certain
>> >> amount of resends, the client will call the set throttle packet with
>> >> reduced settings (some argue that it doesn't do that fast enough).
>> >>
>> >> 6. A user who has connected previously to the simulator will use less
>> >> resources then a user who has never connected to the simulator.  (this
>> >> is mostly because of the image cache on the client).    Any client
>> >> that uses CAPS images will use less resources then one that uses
>> >> LLUDP.
>> >>
>> >> When working with the packet queues, it's essential to understand
>> >> those 6 observations.   Even though, the place where you tend to see
>> >> the issues with queuing is the image queue over LLUDP, the principles
>> >> apply to all of the udp queues.
>> >>
>> >> Regards
>> >>
>> >> Teravus
>> >>
>> >>
>> >> On Mon, Mar 28, 2011 at 1:00 PM, Mic Bowman <cmickeyb at gmail.com> wrote:
>> >> > Over the last several weeks, Dan Lake & I have been looking some of
>> the
>> >> > networking performance issues in opensim. As always, our concerns are
>> >> with
>> >> > the problems caused by very complex scenes with very large numbers of
>> >> > avatars. However, I think some of the issues we have found will
>> generally
>> >> > improve networking with OpenSim. Since the behavior represents a
>> fairly
>> >> > significant change in behavior (though the number of lines of code is
>> not
>> >> > great), I'm going to put this into a separate branch for testing
>> (called
>> >> > queuetest) in the opensim git repository.
>> >> > We've found several problems with the current
>> >> > networking/prioritization code.
>> >> > * Reprioritization is completely broken for SceneObjectParts. On
>> >> > reprioritization, the current code uses the localid stored in the
>> scene
>> >> > Entities list but since the scene does not store the localid for SOPs,
>> >> that
>> >> > attempt always fails. So the original priority of the SOP continues to
>> be
>> >> > used. This could be the cause of some problems since the initial
>> >> > prioritization assumes position 128,128. I don't understand all the
>> >> possible
>> >> > ramifications, but suffice it to say, using the localid is causing
>> >> > problems.
>> >> > Fix: The sceneentity is already stored in the update, just use that
>> >> instead
>> >> > of the localid.
>> >> > * We currently pull (by default) 100 entity updates from the
>> entityupdate
>> >> > queue and convert them into packets. Once converted into packets, they
>> >> are
>> >> > then queued again for transmissions. This is a bad thing. Under any
>> kind
>> >> of
>> >> > load, we've measured the time in the packet queue to be up to many
>> >> > hundreds/thousands of milliseconds (and to be highly variable). When
>> an
>> >> > object changes one property and then doesn't change it again, the time
>> in
>> >> > the packet queue is largely irrelevant. However, if the object is
>> >> > continuously changing (an avatar changing position, a physical object
>> >> > moving, etc) then the conversion from a entity update to a packet
>> >> "freezes"
>> >> > the properties to be sent. If the object is continuously changing,
>> then
>> >> with
>> >> > fairly high probability, the packet contains old data (the properties
>> of
>> >> the
>> >> > entity from the point at which it was converted into a packet).
>> >> > The real problem is that, in theory, to improve the efficiency of the
>> >> > packets (fill up each message) we are grabbing big chunks of updates.
>> >> Under
>> >> > load, that causes queuing at the packet layer which makes updates
>> stale.
>> >> > That is... queuing at the packet layer is BAD.
>> >> > Fix: We implemented an adaptive algorithm for the number of updates to
>> >> grab
>> >> > with each pass. We set a target time of 200ms for each iteration. That
>> >> > means, we are trying to bound the maximum age of any update in the
>> packet
>> >> > queue to 200ms. The adaptive algorithm looks a lot like a TCP slow
>> start:
>> >> > every time we complete an iteration (flush the packet queue) in less
>> than
>> >> > 200ms we increase linearly the number of updates we take in the next
>> >> > iteration (add 5 to the count) and when we don't make it back in
>> 200ms,
>> >> we
>> >> > drop the number we take quadratically (cut the number in half). In our
>> >> > experiments with large numbers of moving avatars, this algorithm works
>> >> > *very* well. The number of updates taken per iteration stabilizes very
>> >> > quickly and the response time is dramatically improved (no "snap back"
>> on
>> >> > avatars, for example). One difference from the traditional slow
>> start...
>> >> > since the number of "static" items in the queue is very high when a
>> >> client
>> >> > first enters a region, we start with the number of updates taken at
>> 500.
>> >> > that gets the static items out of the queue quickly (and delay doesn't
>> >> > matter as much) and the number taken is generally stable before the
>> >> > login/teleport screen even goes away.
>> >> > * The current prioritization queue can lead to update starvation. The
>> >> > prioritization algorithm dumps all entity updates into a single
>> ordered
>> >> > queue. Lets say you have several hundred avatars moving around in a
>> >> scene.
>> >> > Since we take a limited number of updates from the queue in each
>> >> iteration,
>> >> > we will take only the updates for the "closest" (highest priority)
>> >> avatars.
>> >> > However, since those avatars continue to move, they are re-inserted
>> into
>> >> the
>> >> > priority queue *ahead* of the updates that were already there. So...
>> >> unless
>> >> > the queue can be completely emptied each iteration or the priority of
>> the
>> >> > "distant" (low priority) avatars changes, those avatars will never be
>> >> > updated.
>> >> > Fix: We converted the single priority queue into multiple priority
>> queues
>> >> > and use fair queuing to retrieve updates from each. Here's how it
>> works
>> >> > (more or less)... the current metrics (all of the current
>> prioritization
>> >> > algorithms use distance at some point for prioritization) compute a
>> >> distance
>> >> > from the avatar/camera to an object. We take the log of that distance
>> and
>> >> > use that as the index for the queue where we place the update. So
>> close
>> >> > things go into the highest priority queue and distant things go into
>> the
>> >> > lowest priority queue. Since the area covered by a priority queue
>> grows
>> >> as
>> >> > the square of the radius, the distant (lowest priority queues) will
>> have
>> >> the
>> >> > most objects while the highest priority queues will have a small
>> number
>> >> of
>> >> > objects. Inside each priority queue, we order the updates by the time
>> in
>> >> > which they entered the queue. Then we pull a fixed number of updates
>> from
>> >> > each priority queue each iteration. The result is that local updates
>> get
>> >> a
>> >> > high fraction of the outgoing bandwidth but distant updates are
>> >> guaranteed
>> >> > to get at least "some" of the bandwidth. No starvation. The current
>> >> > prioritization algorithm we implemented is a modification of the "best
>> >> > avatar responsiveness" and "front back" in that we use root prim
>> location
>> >> > for child prims and the priority of updates "in back" of the avatar is
>> >> lower
>> >> > than updates "in front". Our experiments show that the fair queuing
>> does
>> >> > drain the update queue AND continues to provide a disproportionately
>> high
>> >> > percentage of the bw to "close" updates.
>> >> > One other note on this... we should be able to improve the performance
>> of
>> >> > reprioritization with this approach. If we know the distance an avatar
>> >> has
>> >> > moved, we only have to reprioritize objects that might have changed
>> >> priority
>> >> > queues. Haven't implemented this yet but have some ideas for how to do
>> >> it.
>> >> > * The resend queue is evil. When an update packet is sent (they are
>> >> marked
>> >> > reliable) it is moved to a queue to await acknowledgement. If no
>> >> > acknowledgement is received (in time), the packet is retransmitted and
>> >> the
>> >> > wait time is doubled and so on... What that means is that a resend
>> >> packets
>> >> > in a scene that is rapidly changing will often contain updates that
>> are
>> >> > outdated. That is, when we resend the packet, we are just resending
>> old
>> >> data
>> >> > (and if you're having a lot of resends that means you already have a
>> bad
>> >> > connection & now you're filling it up with useless data).
>> >> > Fix: this isn't implemented yet (help would be appreciated)... we
>> think
>> >> that
>> >> > instead of saving packets for resend... a better solution would be to
>> >> keep
>> >> > the entity updates that went into the packet. if we don't receive an
>> ack
>> >> in
>> >> > time, then put the entity updates back into the entity update queue
>> (with
>> >> > entry time from their original enqueuing). That would ensure that we
>> send
>> >> an
>> >> > update for the object & that the data sent is the most recent.
>> >> > * One final note... per client bandwidth throttles seem to work very
>> >> well.
>> >> > however, our experiments with per-simulator throttles was not
>> positive.
>> >> it
>> >> > appeared that a small number of clients was consuming all of the bw
>> >> > available to the simulator and the rest were starved. Haven't looked
>> into
>> >> > this any more.
>> >> >
>> >> > So...
>> >> > Feedback appreciated... there is some logging code (disabled) in the
>> >> branch;
>> >> > real data would be great. And help testing. there are a number of
>> >> > attachment, deletes and so on that i'm not sure work correctly.
>> >> > --mic
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Opensim-dev mailing list
>> >> > Opensim-dev at lists.berlios.de
>> >> > https://lists.berlios.de/mailman/listinfo/opensim-dev
>> >> >
>> >> >
>> >> _______________________________________________
>> >> Opensim-dev mailing list
>> >> Opensim-dev at lists.berlios.de
>> >> https://lists.berlios.de/mailman/listinfo/opensim-dev
>> >>
>> >
>> >
>> > ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Opensim-dev mailing list
>> > Opensim-dev at lists.berlios.de
>> > https://lists.berlios.de/mailman/listinfo/opensim-dev
>> _______________________________________________
>> Opensim-dev mailing list
>> Opensim-dev at lists.berlios.de
>> https://lists.berlios.de/mailman/listinfo/opensim-dev
>>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/opensim-dev