[Opensim-dev] Designing with Instrumentation in mind.

Kyle create at reactiongrid.com
Sat Nov 28 02:21:08 UTC 2009


Implementation is over my head but as a former hardware technician & old
enough to have troubleshot mainframes by tape reels and blinking red lights
I lived by front panel indicators to help me quickly isolate issues and
train operators on what trouble indicators to look for to catch issues
before they caused downtime.

So to me this is a must have to help diagnose and improve stability and
uptime-Brilliant!....+1

Kyle G

-----Original Message-----
From: opensim-dev-bounces at lists.berlios.de
[mailto:opensim-dev-bounces at lists.berlios.de] On Behalf Of Teravus Ovares
Sent: Friday, November 27, 2009 9:10 PM
To: opensim-dev at lists.berlios.de
Subject: [Opensim-dev] Designing with Instrumentation in mind.

Hey there,

A while back, we had somewhat reasonable statistics being generated
and presented to the client.    They were not always accurate, but
based on what I saw, I could, pretty much pin certain parts of the
simulator as the limiting factor during load tests.  I'd say, the
number 1 reason that they were semi-accurate and not accurate..  in
the past..   is because nobody ever thought about instrumentation
during the functionality design.     It was always 'tacked on later'.
  One example of this..    is the current AssetCache implementation.
  There's no way, currently, to know, at a glance..   how many
external requests it has open.   Additionally, it will be extremely
difficult to put one in because of the way the objects are designed
and accessed.  To put one in, an event needs to be added to the
IAssetService interface and each AssetCache implementation will need
an interlocked int to count how many gets and puts it currently has
open to the external data source as well as it's own event calling
schedule.   Then, the IAssetService property in Scene, (AssetService)
will need an event handler..   which updates the values in
SimStatsReporter in Scene (StatsReporter).   This idea of external
access resource instrumentation should really have been built in to
the design of the AssetService.

This last recent load test, there were no real statistics that I could
use to determine what the limiting factor was.
Time Dilation was pegged at 1.0..    even when the simulator was
obviously struggling.    Total Frame time (MS) was -50ms even when the
simulation MS was 850ms and the Physics ms was 250ms, so the
inconsistencies made it impossible to know what part of the simulator
was struggling.  Agent Updates were erratic..   sometimes high..
sometimes low when the simulator was fine and when it was struggling.
Pending Uploads and Downloads were always 0, so there was no way to
know how well the simulator was downloading and uploading assets to
and from the grid.   Packet stats were non-existant, so there was no
way to know how well the UDP handlers were faring under the load.
When it crashed, it crashed with a mono based stack trace which
pointed to out of memory errors, so the only way that you could,
scientifically, find out what the issue is..   is to run a load test
under a memory profiler.     We know, that running a public load test
under a memory profiler is quite impractical.

To make something better, I need to know two things, where it is, and
where I want it to be.    How can we make OpenSimulator better if we
don't have statistics that point to where we are currently?

On that note, I propose that, when designing objects for functionality
in OpenSimulator, that we also consider if the objects should be
instrumented and, what would be the best way to go about instrumenting
the objects.  We should incorporate instrumentation into the design of
the objects.   Some of that instrumentation is appropriate for a
client to see, some of it might not be.   Consider that, many of them
should be client facing and be included in the SimStats that get sent
to the client..    so that we can have a reasonable idea of what's
going on with a simulator at a glance.   Also, in the design of the
instrumentation, we make sure that the instrumentation is accurate and
lightweight.

The load test went reasonably...      but, we didn't get half of the
information on the simulator that we needed to be able to improve it.


Please comment :)     I look forward to hearing your responses.

Regards

Teravus
_______________________________________________
Opensim-dev mailing list
Opensim-dev at lists.berlios.de
https://lists.berlios.de/mailman/listinfo/opensim-dev

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.709 / Virus Database: 270.14.79/2522 - Release Date: 11/27/09
14:39:00





More information about the Opensim-dev mailing list