[Opensim-dev] Behaviour of adaptive throttles under high load

Justin Clark-Casey jjustincc at googlemail.com
Mon Dec 1 19:51:11 UTC 2014


In the initial period where this problem was seen only adaptive was active, AFAIK.

I am quite familiar with the hierarchical structure now, as I had to investigate it to work out what was going on. 
Indeed, to address various bugs and make it possible to change parameters in real time for experimentation (e.g. to 
adjust client throttles on the fly from the console) I rewrote parts of it (without change the algorithms) and added 
some regression tests where there were none before.  These changes were made after the original problem with adaptive 
throttles was encountered.

So I don't believe (though I could be wrong) that it's a timing issue with the buckets themselves.  That said, it's not 
impossible that there's an issue somewhere deep within the UDP processing code or even in Mono.  My problem is that this 
is the kind of problem that takes a very large amount of time to investigate and may turn up nothing, compared with 
making the throttle reduction less aggressive in the face of a stream of ack timeouts that occur in the same second.

In the future, if a problem elsewhere is identified so that this behaviour no longer occurs then one can, of course, 
retighten the algorithm.  However, I would posit that with UDP streams, it's always possible for a stream to drop 
momentarily because of network issues and that it is better behaviour not to severe penalise what may be a momentary glitch.

On 01/12/14 16:46, Mic Bowman wrote:
> and, just to be clear...
>
> did you have *both* adaptive and total bw throttles turned on?
>
> the interaction between the two through the hierarchical token bucket is another place where i was more than a little
> worried. i tested that with network emulators under high load & it seemed to do what it was supposed to do, but i
> wouldn't be surprised to find a timing issue.
>
> --mic
>
>
> On Mon, Dec 1, 2014 at 8:42 AM, Mic Bowman <cmickeyb at gmail.com <mailto:cmickeyb at gmail.com>> wrote:
>
>     one thing that i was concerned about when i put the throttles in place is the relationship between congestion
>     control and packet sizes. if you're generating a large number of small, reliable packets that are being dropped,
>     that could cause the congestion control to kick in more quickly. that would suggest an adjustment based on bytes
>     sent rather than time (though both are probably appropriate).
>
>     my biggest concern is that we start fixing by "stabbing in the dark". congestion control is particularly nasty in
>     how it interacts which is why i started with a well known & long battle tested algorithm. making random changes
>     might fix one problem and introduce a half dozen others.
>
>     i'm not in a position to help on the diagnosis until next week if you can wait until then.
>
>     --mic
>
>
>     On Wed, Nov 26, 2014 at 4:04 PM, Justin Clark-Casey <jjustincc at googlemail.com <mailto:jjustincc at googlemail.com>> wrote:
>
>         This was actually happening at quite low loads (< 40 connections over all 4 keynotes).  Once adaptive throttles
>         was disabled and other unrelated issues fixed the system had no obvious issues coping with higher loads in both
>         testing and the conference itself (e.g. the 159 peak keynote avatars in the conference).  So I don't think it
>         was a server bandwidth issue.
>
>         That said, it was somewhat strange behaviour as affected only maybe 10-20% of connections.  Once it did affect a
>         connection (I saw this happening by logging downward adjustments which one can still do with the console command
>         "debug lludp throttles log 1"), the connection would not recover - at some point a bunch of expires would reduce
>         the throttle again.  Connections seemed to be affected randomly - I experienced the issue myself at one point
>         and I have pretty solid fibre.
>
>         You're right in that I don't know why this happened or why problematic connections stayed problematic instead of
>         slowly recovering.  Because of time constraints we had to disable adaptive instead of investigating further.
>         But I don't advocate doing this by default at all because, as you say, it's an important mechanism for
>         congestion control.
>
>         I do plan further investigation will happen at some point but it's time consuming work and I'd really love to
>         get a release out soon-ish.  So for the moment I would like to do tune the adapation mechanism tuning as you've
>         mentioned, which I believe should probably be done anyway.  Because of the nature of the problem, my plan would
>         be not to change the adaption divisor but rather to adapt downwards only every 2 seconds or so if packets are
>         expiring rather than on every packet expire.  I believe this should still achieve the adaption effect without
>         massively penalising the connection if there has been a momentary connection issue or similar.
>
>         On 26/11/14 02:39, Mic Bowman wrote:
>
>             As you mention... cutting the throttle by 50% was modeled on the TCP congestion control approach. It is very
>             aggressive
>             as a congestion control mechanism and certainly could be tuned.
>
>             That being said... do you know why the packets were considered un-acked? If its because the simulator is
>             having problems
>             (which given your description that it happens under load seems to be the case) then we can probably do
>             something more
>             intelligent about throttling over all simulator BW. That is... maybe the problem is that the top end of the
>             overall
>             simulator bw is the problem, not the per connection throttles.
>
>             Manual throttles & adaptive throttles are not exclusive. You can use both. Adaptive manages the top end, but
>             the manual
>             throttles set an absolute max.
>
>             --mic
>
>
>             On Tue, Nov 25, 2014 at 5:15 PM, Justin Clark-Casey <jjustincc at googlemail.com
>             <mailto:jjustincc at googlemail.com> <mailto:jjustincc at googlemail.__com <mailto:jjustincc at googlemail.com>>> wrote:
>
>                  Hi Mic (primarily),
>
>                  Two years ago [1] we had a discussion about the enable_adaptive_throttles setting.  Just for
>             background, this is a
>                  setting that adapts the amount of data sent to the viewer depending on whether reliable packets sent
>             from the
>                  simulator are acked or not.  As such, it looks to make sure that a viewer which sets a downstream
>             bandwidth higher
>                  than its network connection can cope with is not permanently hosed with too much data.  We enabled it on an
>                  experimental basis [2].
>
>                  As you said at the time, this is modelled on the congestion approach used in TCP.  I see that for TCP,
>             the rate is
>                  halved on every unacked segment.  In OpenSimulator, it's halved on every unacked reliable packet.
>
>                  However, under fairly modest load conditions in the conference grid, I saw a behaviour where sometimes
>             for a
>                  connection a sequence of packets would expire for some connections in a very short time period (< 1
>             sec).  This
>                  would halve the throttle many times, in my observations right down to the absolute minimum.  This
>             caused the
>                  behaviour from the user's point of view to degrade considerably for an extended period of time.  The
>             throttles takes
>                  quite a long period to grow again.
>
>                  I didn't get much further with the diagnostics since a lack of time forced us to switch back to manual
>             throttling
>                  instead (with a 1 mbit per viewer and 400 mbit total on the keynotes).  This seemed to work okay in
>             testing and in
>                  the event itself.  However, this leaves one vulnerable to the problem adaptive_throttles looks to
>             tackle in the
>                  first place.
>
>                  I'm still reading up about this stuff, but it strikes me that halving the throttle on every missed
>             packet is much
>                  harsher than the TCP approach, as with UDP a whole sequence can expire at once rather than a single
>             segment that is
>                  subsequently retried before another segment can be missed.
>
>                  One idea is to ignore all expiries in a certain period (e.g. next 2 seconds) if an expired packet has
>             already caused
>                  the throttle to be halved.  Of course, this is a bit more complicated to do but hopefully not too much
>             so.  What do
>                  you think?  Any other ideas?
>
>                  [1] http://opensimulator.org/____pipermail/opensim-dev/2011-____October/023017.html
>             <http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023017.html>
>                  <http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023017.html
>             <http://opensimulator.org/pipermail/opensim-dev/2011-October/023017.html>>
>                  [2] http://opensimulator.org/____pipermail/opensim-dev/2011-____October/023063.html
>             <http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023063.html>
>                  <http://opensimulator.org/__pipermail/opensim-dev/2011-__October/023063.html
>             <http://opensimulator.org/pipermail/opensim-dev/2011-October/023063.html>>
>
>                  Best Regards,
>
>                  --
>                  Justin Clark-Casey (justincc)
>                  OSVW Consulting
>             http://justincc.org
>             http://twitter.com/justincc
>                  ___________________________________________________
>                  Opensim-dev mailing list
>             Opensim-dev at opensimulator.org <mailto:Opensim-dev at opensimulator.org> <mailto:Opensim-dev at __opensimulator.org
>             <mailto:Opensim-dev at opensimulator.org>>
>             http://opensimulator.org/cgi-____bin/mailman/listinfo/opensim-____dev
>             <http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev>
>                  <http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
>             <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>>
>
>
>
>
>             _________________________________________________
>             Opensim-dev mailing list
>             Opensim-dev at opensimulator.org <mailto:Opensim-dev at opensimulator.org>
>             http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
>             <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>
>
>
>
>         --
>         Justin Clark-Casey (justincc)
>         OSVW Consulting
>         http://justincc.org
>         http://twitter.com/justincc
>         _________________________________________________
>         Opensim-dev mailing list
>         Opensim-dev at opensimulator.org <mailto:Opensim-dev at opensimulator.org>
>         http://opensimulator.org/cgi-__bin/mailman/listinfo/opensim-__dev
>         <http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev>
>
>
>
>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at opensimulator.org
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
>


-- 
Justin Clark-Casey (justincc)
OSVW Consulting
http://justincc.org
http://twitter.com/justincc


More information about the Opensim-dev mailing list