the only thing we've touched so far is the entity update queue. that's all avatar updates & prim updates. we haven't touched any of the other packets. the resend focus would be for prims & avatar updates only.<div>

--mic</div><div><br><br><div class="gmail_quote">On Mon, Mar 28, 2011 at 11:19 AM, Melanie <span dir="ltr"><<a href="mailto:melanie@t-data.com">melanie@t-data.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi,<br>

<br>

sounds great.<br>

<br>

Some things to consider:<br>

<br>

- Some actions require explicit sending of a packet which is an<br>

update packet, but is used for special cases. Sit, stand, changing<br>

group tags, creating/joining groups are all such cases where special<br>

care needs to be taken.<br>

<br>

- Resend is evil for static objects and avatars, but may be needed<br>

to sync up dead reckoning with the real data on physical objects.<br>

Just a feeling.<br>

<font color="#888888"><br>

Melanie<br>

</font><div><div></div><div class="h5"><br>

Mic Bowman wrote:<br>

> Over the last several weeks, Dan Lake & I have been looking some of the<br>

> networking performance issues in opensim. As always, our concerns are with<br>

> the problems caused by very complex scenes with very large numbers of<br>

> avatars. However, I think some of the issues we have found will generally<br>

> improve networking with OpenSim. Since the behavior represents a fairly<br>

> significant change in behavior (though the number of lines of code is not<br>

> great), I'm going to put this into a separate branch for testing (called<br>

> queuetest) in the opensim git repository.<br>

><br>

> We've found several problems with the current<br>

> networking/prioritization code.<br>

><br>

> * Reprioritization is completely broken for SceneObjectParts. On<br>

> reprioritization, the current code uses the localid stored in the scene<br>

> Entities list but since the scene does not store the localid for SOPs, that<br>

> attempt always fails. So the original priority of the SOP continues to be<br>

> used. This could be the cause of some problems since the initial<br>

> prioritization assumes position 128,128. I don't understand all the possible<br>

> ramifications, but suffice it to say, using the localid is causing<br>

> problems.<br>

><br>

> Fix: The sceneentity is already stored in the update, just use that instead<br>

> of the localid.<br>

><br>

> * We currently pull (by default) 100 entity updates from the entityupdate<br>

> queue and convert them into packets. Once converted into packets, they are<br>

> then queued again for transmissions. This is a bad thing. Under any kind of<br>

> load, we've measured the time in the packet queue to be up to many<br>

> hundreds/thousands of milliseconds (and to be highly variable). When an<br>

> object changes one property and then doesn't change it again, the time in<br>

> the packet queue is largely irrelevant. However, if the object is<br>

> continuously changing (an avatar changing position, a physical object<br>

> moving, etc) then the conversion from a entity update to a packet "freezes"<br>

> the properties to be sent. If the object is continuously changing, then with<br>

> fairly high probability, the packet contains old data (the properties of the<br>

> entity from the point at which it was converted into a packet).<br>

><br>

> The real problem is that, in theory, to improve the efficiency of the<br>

> packets (fill up each message) we are grabbing big chunks of updates. Under<br>

> load, that causes queuing at the packet layer which makes updates stale.<br>

> That is... queuing at the packet layer is BAD.<br>

><br>

> Fix: We implemented an adaptive algorithm for the number of updates to grab<br>

> with each pass. We set a target time of 200ms for each iteration. That<br>

> means, we are trying to bound the maximum age of any update in the packet<br>

> queue to 200ms. The adaptive algorithm looks a lot like a TCP slow start:<br>

> every time we complete an iteration (flush the packet queue) in less than<br>

> 200ms we increase linearly the number of updates we take in the next<br>

> iteration (add 5 to the count) and when we don't make it back in 200ms, we<br>

> drop the number we take quadratically (cut the number in half). In our<br>

> experiments with large numbers of moving avatars, this algorithm works<br>

> *very* well. The number of updates taken per iteration stabilizes very<br>

> quickly and the response time is dramatically improved (no "snap back" on<br>

> avatars, for example). One difference from the traditional slow start...<br>

> since the number of "static" items in the queue is very high when a client<br>

> first enters a region, we start with the number of updates taken at 500.<br>

> that gets the static items out of the queue quickly (and delay doesn't<br>

> matter as much) and the number taken is generally stable before the<br>

> login/teleport screen even goes away.<br>

><br>

> * The current prioritization queue can lead to update starvation. The<br>

> prioritization algorithm dumps all entity updates into a single ordered<br>

> queue. Lets say you have several hundred avatars moving around in a scene.<br>

> Since we take a limited number of updates from the queue in each iteration,<br>

> we will take only the updates for the "closest" (highest priority) avatars.<br>

> However, since those avatars continue to move, they are re-inserted into the<br>

> priority queue *ahead* of the updates that were already there. So... unless<br>

> the queue can be completely emptied each iteration or the priority of the<br>

> "distant" (low priority) avatars changes, those avatars will never be<br>

> updated.<br>

><br>

> Fix: We converted the single priority queue into multiple priority queues<br>

> and use fair queuing to retrieve updates from each. Here's how it works<br>

> (more or less)... the current metrics (all of the current prioritization<br>

> algorithms use distance at some point for prioritization) compute a distance<br>

> from the avatar/camera to an object. We take the log of that distance and<br>

> use that as the index for the queue where we place the update. So close<br>

> things go into the highest priority queue and distant things go into the<br>

> lowest priority queue. Since the area covered by a priority queue grows as<br>

> the square of the radius, the distant (lowest priority queues) will have the<br>

> most objects while the highest priority queues will have a small number of<br>

> objects. Inside each priority queue, we order the updates by the time in<br>

> which they entered the queue. Then we pull a fixed number of updates from<br>

> each priority queue each iteration. The result is that local updates get a<br>

> high fraction of the outgoing bandwidth but distant updates are guaranteed<br>

> to get at least "some" of the bandwidth. No starvation. The current<br>

> prioritization algorithm we implemented is a modification of the "best<br>

> avatar responsiveness" and "front back" in that we use root prim location<br>

> for child prims and the priority of updates "in back" of the avatar is lower<br>

> than updates "in front". Our experiments show that the fair queuing does<br>

> drain the update queue AND continues to provide a disproportionately high<br>

> percentage of the bw to "close" updates.<br>

><br>

> One other note on this... we should be able to improve the performance of<br>

> reprioritization with this approach. If we know the distance an avatar has<br>

> moved, we only have to reprioritize objects that might have changed priority<br>

> queues. Haven't implemented this yet but have some ideas for how to do it.<br>

><br>

> * The resend queue is evil. When an update packet is sent (they are marked<br>

> reliable) it is moved to a queue to await acknowledgement. If no<br>

> acknowledgement is received (in time), the packet is retransmitted and the<br>

> wait time is doubled and so on... What that means is that a resend packets<br>

> in a scene that is rapidly changing will often contain updates that are<br>

> outdated. That is, when we resend the packet, we are just resending old data<br>

> (and if you're having a lot of resends that means you already have a bad<br>

> connection & now you're filling it up with useless data).<br>

><br>

> Fix: this isn't implemented yet (help would be appreciated)... we think that<br>

> instead of saving packets for resend... a better solution would be to keep<br>

> the entity updates that went into the packet. if we don't receive an ack in<br>

> time, then put the entity updates back into the entity update queue (with<br>

> entry time from their original enqueuing). That would ensure that we send an<br>

> update for the object & that the data sent is the most recent.<br>

><br>

> * One final note... per client bandwidth throttles seem to work very well.<br>

> however, our experiments with per-simulator throttles was not positive. it<br>

> appeared that a small number of clients was consuming all of the bw<br>

> available to the simulator and the rest were starved. Haven't looked into<br>

> this any more.<br>

><br>

><br>

> So...<br>

><br>

> Feedback appreciated... there is some logging code (disabled) in the branch;<br>

> real data would be great. And help testing. there are a number of<br>

> attachment, deletes and so on that i'm not sure work correctly.<br>

><br>

> --mic<br>

><br>

><br>

><br>

</div></div>> ------------------------------------------------------------------------<br>

<div><div></div><div class="h5">><br>

> _______________________________________________<br>

> Opensim-dev mailing list<br>

> <a href="mailto:Opensim-dev@lists.berlios.de">Opensim-dev@lists.berlios.de</a><br>

> <a href="https://lists.berlios.de/mailman/listinfo/opensim-dev" target="_blank">https://lists.berlios.de/mailman/listinfo/opensim-dev</a><br>

_______________________________________________<br>

Opensim-dev mailing list<br>

<a href="mailto:Opensim-dev@lists.berlios.de">Opensim-dev@lists.berlios.de</a><br>

<a href="https://lists.berlios.de/mailman/listinfo/opensim-dev" target="_blank">https://lists.berlios.de/mailman/listinfo/opensim-dev</a><br>

</div></div></blockquote></div><br></div>