[Opensim-dev] networking issues

Mon Mar 28 19:01:41 UTC 2011

yes, which is why I said discard them when new updates occur.

On Mon, Mar 28, 2011 at 12:03 PM, Melanie <melanie at t-data.com> wrote:

> For avatars yes. But prim updates can never be discarded, no matter
> how trivial, because they establish new persistent state.
>
> Melanie
>
> Dahlia Trimble wrote:
> > the viewer discards small changes anyway if avatar imposters are enabled
> >
> > On Mon, Mar 28, 2011 at 11:54 AM, Melanie <melanie at t-data.com> wrote:
> >
> >> No, we can't discard small changes. As the avatar comes closer, they
> >> would be seen out of place, e.g. someone building in the distance
> >> would move prims and then you come closer to look and all prims
> >> would be out of place.
> >>
> >> Melanie
> >>
> >> Dahlia Trimble wrote:
> >> > a couple thoughts..
> >> >
> >> > Perhaps resend timeout period could be a function of throttle setting
> >> and/or
> >> > measured packet acknowledgement time per-client? (provided we measure
> >> it).
> >> > That may prevent excessive resend processing that may not be
> necessary.
> >> >
> >> > On the distance prioritization, could small changed in object
> >> translations
> >> > be discarded from the prioritization queues/resend buffers for distant
> >> > objects when new updates occur for those objects? Small changes may
> not
> >> be
> >> > noticeable from the viewer perspective anyway.
> >> >
> >> >
> >> > On Mon, Mar 28, 2011 at 10:48 AM, Teravus Ovares <teravus at gmail.com>
> >> wrote:
> >> >
> >> >> Here are a few facts that I've personally discovered while working
> >> >> with LLClientView.
> >> >>
> >> >> 1. It has been noted that people with poor connections to the
> >> >> simulator do consume more bandwidth, cpu, and have a generally worse
> >> >> experience.   This has been tested and profiled extensively.    This
> >> >> may seem like a small issue because what it's doing is so basic...
> >> >> however the frequency in which this occurs is a real cause of
> >> >> performance issues.
> >> >>
> >> >> 2. It's also noted that the CPU used in these cases reduces the CPU
> >> >> available to the rest of the simulator resulting in a lower quality
> of
> >> >> service for the rest of the people on the simulator.
> >> >> This has been seen in the profiling and has been qualitatively
> >> >> observed by a large number of users connected and everything is OK
> and
> >> >> then a 'problem connection' user connecting causing a wide range of
> >> >> issues.
> >> >>
> >> >> 3. It's also noted that lowering the outgoing UDP packet throttles
> >> >> beyond a certain point results in perpetual queuing and resends.
> >> >> This was tested by using a throttle multiplier last year that was
> >> >> implemented by justincc.  I'm not sure if the multiplier is still
> >> >> there.   It's most easily seen with image packets.   Again, I note
> >> >> that the packets are not rebuilt going from the regular outbound
> queue
> >> >> to the resend queue.    The resend queue is /supposed/ to be used to
> >> >> quickly get data that is essential to the client after attempting to
> >> >> send once already.   The UDP spec declares the maximum resend to be 2
> >> >> times, however there has been some considerable debate on whether or
> >> >> not OpenSimulator should follow that specific specification item
> >> >> leading to a configuration option to enable perpetual resends
> >> >> (Implemented by Melanie).  The configuration item was named similar
> >> >> to, 'reliable is important' or something like that.   I'm not sure if
> >> >> the configuration item survived the many revisions however I suspect
> >> >> that it did.
> >> >>
> >> >> 4. It's also noted that raising the packet throttles beyond what the
> >> >> connection can support results in resending almost every packet the
> >> >> maximum amount of times before the limit is reached.
> >> >> This is easily reproducible by setting the connection (in the client)
> >> >> to the maximum and connecting to a region that you've never been to
> >> >> before on a sub par connection.   Before the client adjusts and
> >> >> requests a lower throttle setting there's massive data loss and
> >> >> massive re-queuing.
> >> >>
> >> >> 5. The client tries to adjust the throttle settings based on network
> >> >> conditions.   This can be observed by monitoring the packet that sets
> >> >> the throttles and dragging the bar to maximum.   After a certain
> >> >> amount of resends, the client will call the set throttle packet with
> >> >> reduced settings (some argue that it doesn't do that fast enough).
> >> >>
> >> >> 6. A user who has connected previously to the simulator will use less
> >> >> resources then a user who has never connected to the simulator.
>  (this
> >> >> is mostly because of the image cache on the client).    Any client
> >> >> that uses CAPS images will use less resources then one that uses
> >> >> LLUDP.
> >> >>
> >> >> When working with the packet queues, it's essential to understand
> >> >> those 6 observations.   Even though, the place where you tend to see
> >> >> the issues with queuing is the image queue over LLUDP, the principles
> >> >> apply to all of the udp queues.
> >> >>
> >> >> Regards
> >> >>
> >> >> Teravus
> >> >>
> >> >>
> >> >> On Mon, Mar 28, 2011 at 1:00 PM, Mic Bowman <cmickeyb at gmail.com>
> wrote:
> >> >> > Over the last several weeks, Dan Lake & I have been looking some of
> >> the
> >> >> > networking performance issues in opensim. As always, our concerns
> are
> >> >> with
> >> >> > the problems caused by very complex scenes with very large numbers
> of
> >> >> > avatars. However, I think some of the issues we have found will
> >> generally
> >> >> > improve networking with OpenSim. Since the behavior represents a
> >> fairly
> >> >> > significant change in behavior (though the number of lines of code
> is
> >> not
> >> >> > great), I'm going to put this into a separate branch for testing
> >> (called
> >> >> > queuetest) in the opensim git repository.
> >> >> > We've found several problems with the current
> >> >> > networking/prioritization code.
> >> >> > * Reprioritization is completely broken for SceneObjectParts. On
> >> >> > reprioritization, the current code uses the localid stored in the
> >> scene
> >> >> > Entities list but since the scene does not store the localid for
> SOPs,
> >> >> that
> >> >> > attempt always fails. So the original priority of the SOP continues
> to
> >> be
> >> >> > used. This could be the cause of some problems since the initial
> >> >> > prioritization assumes position 128,128. I don't understand all the
> >> >> possible
> >> >> > ramifications, but suffice it to say, using the localid is causing
> >> >> > problems.
> >> >> > Fix: The sceneentity is already stored in the update, just use that
> >> >> instead
> >> >> > of the localid.
> >> >> > * We currently pull (by default) 100 entity updates from the
> >> entityupdate
> >> >> > queue and convert them into packets. Once converted into packets,
> they
> >> >> are
> >> >> > then queued again for transmissions. This is a bad thing. Under any
> >> kind
> >> >> of
> >> >> > load, we've measured the time in the packet queue to be up to many
> >> >> > hundreds/thousands of milliseconds (and to be highly variable).
> When
> >> an
> >> >> > object changes one property and then doesn't change it again, the
> time
> >> in
> >> >> > the packet queue is largely irrelevant. However, if the object is
> >> >> > continuously changing (an avatar changing position, a physical
> object
> >> >> > moving, etc) then the conversion from a entity update to a packet
> >> >> "freezes"
> >> >> > the properties to be sent. If the object is continuously changing,
> >> then
> >> >> with
> >> >> > fairly high probability, the packet contains old data (the
> properties
> >> of
> >> >> the
> >> >> > entity from the point at which it was converted into a packet).
> >> >> > The real problem is that, in theory, to improve the efficiency of
> the
> >> >> > packets (fill up each message) we are grabbing big chunks of
> updates.
> >> >> Under
> >> >> > load, that causes queuing at the packet layer which makes updates
> >> stale.
> >> >> > That is... queuing at the packet layer is BAD.
> >> >> > Fix: We implemented an adaptive algorithm for the number of updates
> to
> >> >> grab
> >> >> > with each pass. We set a target time of 200ms for each iteration.
> That
> >> >> > means, we are trying to bound the maximum age of any update in the
> >> packet
> >> >> > queue to 200ms. The adaptive algorithm looks a lot like a TCP slow
> >> start:
> >> >> > every time we complete an iteration (flush the packet queue) in
> less
> >> than
> >> >> > 200ms we increase linearly the number of updates we take in the
> next
> >> >> > iteration (add 5 to the count) and when we don't make it back in
> >> 200ms,
> >> >> we
> >> >> > drop the number we take quadratically (cut the number in half). In
> our
> >> >> > experiments with large numbers of moving avatars, this algorithm
> works
> >> >> > *very* well. The number of updates taken per iteration stabilizes
> very
> >> >> > quickly and the response time is dramatically improved (no "snap
> back"
> >> on
> >> >> > avatars, for example). One difference from the traditional slow
> >> start...
> >> >> > since the number of "static" items in the queue is very high when a
> >> >> client
> >> >> > first enters a region, we start with the number of updates taken at
> >> 500.
> >> >> > that gets the static items out of the queue quickly (and delay
> doesn't
> >> >> > matter as much) and the number taken is generally stable before the
> >> >> > login/teleport screen even goes away.
> >> >> > * The current prioritization queue can lead to update starvation.
> The
> >> >> > prioritization algorithm dumps all entity updates into a single
> >> ordered
> >> >> > queue. Lets say you have several hundred avatars moving around in a
> >> >> scene.
> >> >> > Since we take a limited number of updates from the queue in each
> >> >> iteration,
> >> >> > we will take only the updates for the "closest" (highest priority)
> >> >> avatars.
> >> >> > However, since those avatars continue to move, they are re-inserted
> >> into
> >> >> the
> >> >> > priority queue *ahead* of the updates that were already there.
> So...
> >> >> unless
> >> >> > the queue can be completely emptied each iteration or the priority
> of
> >> the
> >> >> > "distant" (low priority) avatars changes, those avatars will never
> be
> >> >> > updated.
> >> >> > Fix: We converted the single priority queue into multiple priority
> >> queues
> >> >> > and use fair queuing to retrieve updates from each. Here's how it
> >> works
> >> >> > (more or less)... the current metrics (all of the current
> >> prioritization
> >> >> > algorithms use distance at some point for prioritization) compute a
> >> >> distance
> >> >> > from the avatar/camera to an object. We take the log of that
> distance
> >> and
> >> >> > use that as the index for the queue where we place the update. So
> >> close
> >> >> > things go into the highest priority queue and distant things go
> into
> >> the
> >> >> > lowest priority queue. Since the area covered by a priority queue
> >> grows
> >> >> as
> >> >> > the square of the radius, the distant (lowest priority queues) will
> >> have
> >> >> the
> >> >> > most objects while the highest priority queues will have a small
> >> number
> >> >> of
> >> >> > objects. Inside each priority queue, we order the updates by the
> time
> >> in
> >> >> > which they entered the queue. Then we pull a fixed number of
> updates
> >> from
> >> >> > each priority queue each iteration. The result is that local
> updates
> >> get
> >> >> a
> >> >> > high fraction of the outgoing bandwidth but distant updates are
> >> >> guaranteed
> >> >> > to get at least "some" of the bandwidth. No starvation. The current
> >> >> > prioritization algorithm we implemented is a modification of the
> "best
> >> >> > avatar responsiveness" and "front back" in that we use root prim
> >> location
> >> >> > for child prims and the priority of updates "in back" of the avatar
> is
> >> >> lower
> >> >> > than updates "in front". Our experiments show that the fair queuing
> >> does
> >> >> > drain the update queue AND continues to provide a
> disproportionately
> >> high
> >> >> > percentage of the bw to "close" updates.
> >> >> > One other note on this... we should be able to improve the
> performance
> >> of
> >> >> > reprioritization with this approach. If we know the distance an
> avatar
> >> >> has
> >> >> > moved, we only have to reprioritize objects that might have changed
> >> >> priority
> >> >> > queues. Haven't implemented this yet but have some ideas for how to
> do
> >> >> it.
> >> >> > * The resend queue is evil. When an update packet is sent (they are
> >> >> marked
> >> >> > reliable) it is moved to a queue to await acknowledgement. If no
> >> >> > acknowledgement is received (in time), the packet is retransmitted
> and
> >> >> the
> >> >> > wait time is doubled and so on... What that means is that a resend
> >> >> packets
> >> >> > in a scene that is rapidly changing will often contain updates that
> >> are
> >> >> > outdated. That is, when we resend the packet, we are just resending
> >> old
> >> >> data
> >> >> > (and if you're having a lot of resends that means you already have
> a
> >> bad
> >> >> > connection & now you're filling it up with useless data).
> >> >> > Fix: this isn't implemented yet (help would be appreciated)... we
> >> think
> >> >> that
> >> >> > instead of saving packets for resend... a better solution would be
> to
> >> >> keep
> >> >> > the entity updates that went into the packet. if we don't receive
> an
> >> ack
> >> >> in
> >> >> > time, then put the entity updates back into the entity update queue
> >> (with
> >> >> > entry time from their original enqueuing). That would ensure that
> we
> >> send
> >> >> an
> >> >> > update for the object & that the data sent is the most recent.
> >> >> > * One final note... per client bandwidth throttles seem to work
> very
> >> >> well.
> >> >> > however, our experiments with per-simulator throttles was not
> >> positive.
> >> >> it
> >> >> > appeared that a small number of clients was consuming all of the bw
> >> >> > available to the simulator and the rest were starved. Haven't
> looked
> >> into
> >> >> > this any more.
> >> >> >
> >> >> > So...
> >> >> > Feedback appreciated... there is some logging code (disabled) in
> the
> >> >> branch;
> >> >> > real data would be great. And help testing. there are a number of
> >> >> > attachment, deletes and so on that i'm not sure work correctly.
> >> >> > --mic
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > Opensim-dev mailing list
> >> >> > Opensim-dev at lists.berlios.de
> >> >> > https://lists.berlios.de/mailman/listinfo/opensim-dev
> >> >> >
> >> >> >
> >> >> _______________________________________________
> >> >> Opensim-dev mailing list
> >> >> Opensim-dev at lists.berlios.de
> >> >> https://lists.berlios.de/mailman/listinfo/opensim-dev
> >> >>
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------
> >> >
> >> > _______________________________________________
> >> > Opensim-dev mailing list
> >> > Opensim-dev at lists.berlios.de
> >> > https://lists.berlios.de/mailman/listinfo/opensim-dev
> >> _______________________________________________
> >> Opensim-dev mailing list
> >> Opensim-dev at lists.berlios.de
> >> https://lists.berlios.de/mailman/listinfo/opensim-dev
> >>
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Opensim-dev mailing list
> > Opensim-dev at lists.berlios.de
> > https://lists.berlios.de/mailman/listinfo/opensim-dev
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/opensim-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20110328/a7ef1640/attachment-0001.html>