[Opensim-dev] Designing with Instrumentation in mind.

Justin Clark-Casey jjustincc at googlemail.com
Mon Nov 30 20:43:58 UTC 2009


Teravus Ovares wrote:
> Hey there,
> 
> A while back, we had somewhat reasonable statistics being generated
> and presented to the client.    They were not always accurate, but
> based on what I saw, I could, pretty much pin certain parts of the
> simulator as the limiting factor during load tests.  I'd say, the
> number 1 reason that they were semi-accurate and not accurate..  in
> the past..   is because nobody ever thought about instrumentation
> during the functionality design.     It was always 'tacked on later'.
>   One example of this..    is the current AssetCache implementation.
>   There's no way, currently, to know, at a glance..   how many
> external requests it has open.   Additionally, it will be extremely
> difficult to put one in because of the way the objects are designed
> and accessed.  To put one in, an event needs to be added to the
> IAssetService interface and each AssetCache implementation will need
> an interlocked int to count how many gets and puts it currently has
> open to the external data source as well as it's own event calling
> schedule.   Then, the IAssetService property in Scene, (AssetService)
> will need an event handler..   which updates the values in
> SimStatsReporter in Scene (StatsReporter).   This idea of external
> access resource instrumentation should really have been built in to
> the design of the AssetService.
> 
> This last recent load test, there were no real statistics that I could
> use to determine what the limiting factor was.
> Time Dilation was pegged at 1.0..    even when the simulator was
> obviously struggling.    Total Frame time (MS) was -50ms even when the
> simulation MS was 850ms and the Physics ms was 250ms, so the
> inconsistencies made it impossible to know what part of the simulator
> was struggling.  Agent Updates were erratic..   sometimes high..
> sometimes low when the simulator was fine and when it was struggling.
> Pending Uploads and Downloads were always 0, so there was no way to
> know how well the simulator was downloading and uploading assets to
> and from the grid.   Packet stats were non-existant, so there was no
> way to know how well the UDP handlers were faring under the load.
> When it crashed, it crashed with a mono based stack trace which
> pointed to out of memory errors, so the only way that you could,
> scientifically, find out what the issue is..   is to run a load test
> under a memory profiler.     We know, that running a public load test
> under a memory profiler is quite impractical.
> 
> To make something better, I need to know two things, where it is, and
> where I want it to be.    How can we make OpenSimulator better if we
> don't have statistics that point to where we are currently?
> 
> On that note, I propose that, when designing objects for functionality
> in OpenSimulator, that we also consider if the objects should be
> instrumented and, what would be the best way to go about instrumenting
> the objects.  We should incorporate instrumentation into the design of
> the objects.   Some of that instrumentation is appropriate for a
> client to see, some of it might not be.   Consider that, many of them
> should be client facing and be included in the SimStats that get sent
> to the client..    so that we can have a reasonable idea of what's
> going on with a simulator at a glance.   Also, in the design of the
> instrumentation, we make sure that the instrumentation is accurate and
> lightweight.
> 
> The load test went reasonably...      but, we didn't get half of the
> information on the simulator that we needed to be able to improve it.
> 
> 
> Please comment :)     I look forward to hearing your responses.

Instrumentation of internal components was what I was originally doing with the 
"show stats" command on the region console.  You'll see that it has fields for 
asset cache information, abnormal client thread terminations, etc.  Some of this 
data collection may have decayed because of a lack of unit tests... ;)

The framework itself is in OpenSim.Framework.Statistics.  Collection is done 
either by explicit push (i.e. the login service knows to call 
UserStatsCollector.AddSuccessfulLogin()) or by pull.

This was a pretty clumsy way of implementing it though, events might be a lot 
better (in my defense, I was fresh from Java at the time).

I'd really like to see a single instrumentation framework rather than the 3 (?) 
we now have.  I don't mind at all if the stuff in OpenSim.Framework.Statistics 
goes away as long as the functionality remains.

Of course, needless to say I'm +1 on the idea of instrumentation in general and 
trying to do a little bit of thinking about providing it up front when writing a 
component.

-- 
Justin Clark-Casey (justincc)
http://justincc.org
http://twitter.com/justincc



More information about the Opensim-dev mailing list