[Opensim-dev] A way to track access times for the Asset store
Alex Reatov
info at adeptsql.com
Wed May 26 13:14:22 UTC 2010
Hi all,
The asset store is supposed to keep the time of last access for each
asset fetched, so that long unused assets can be later removed from
the database, preventing it from growing infinitely.
However, this feature has not been implemented in any of the supported
databases because the straightforward implementation that uses a separate
UPDATE after each asset read slows things down by an order of
magnitude.
There is a way to track the asset access times relatively efficiently
(with 10-15% performance loss), which I've implemented and pushed to my
github repo. See branches "asset-access-time" and
"asset-access-time-test" at
git (at) github.com:AlexRa/opensim-mods-Alex.git
There is also a Mantis with more details at:
http://opensimulator.org/mantis/view.php?id=4736
(which I'm quoting below)
If this is of interest to the core team, I'd appreciate if you use the
stuff on the main repo.
The changes in "asset-access-time" branch are practically stand-alone,
but the perfomance tests in the "asset-access-time-test" branch require
the "unitests" stuff to run, which you can read about here:
http://opensimulator.org/mantis/view.php?id=4717
Best regards,
AlexRa
----------------------
The proposed solution (currently for MySql, although a similar
implementation should be possible for MS SQL) involves a stored
"tracker" function that records (UUID, access_time) into a separate table, and a
stored procedure that can be called infrequently to move the updated
access times back into the 'assets' table. The tracking function first
looks up the asset in the temp table and applies UPDATE or INSERT as
needed.
Assuming that the recording the exact timing is not important (i.e. on
the cleanup we'll need to know that a specific asset has not been
accessed for a year or more, rather than the exact time when it was
accessed), the code is further optimized not to update the access log
too often, with a configurable delay threshold).
I've also tried a "fast log" variant of the same code, where the
server-side tracking function just logs each access to a non-indexed
(id, time) table, without doing any index lookups on it first. That
was supposed to be faster, at the expense of the resulting log growing
fast and the stored proc to process it slower to run. However, the
tests haven't shown much (if any) speed advantage of this variant. It
is still possible it might have some advantage when the asset table is
very large (and the lookups take longer), although I like the
"smarter" version better.
For testing purposes, I've also implemented the "naive" way of
tracking access times by issuing a separate UPDATE statement.
A test set has been prepared both to ensure that the proposed solution
works and to see what the performance impact is. The tests involve
filling the database with 1000 random assets, taking a random subset
of 500 of their UUDS, then randomly accessing those 500 assets 10000
times. The results were something like this:
No access tracking: 8.33 s
"New" access tracking: 9.60 s
Separate UPDATE: 35 - 103 sec
Separate UPDATE: 15 sec (when using 50-sec resolution)
As you can see, this implementation of server-side access tracking is
about 15% less efficient than no tracking at all, but is nowhere as
bad as the UPDATE tracking.
This acceleration has only become possible due to MySQL's "INSERT
DELAYED" (and "UPDATE LOW_PRIORITY") features. It is therefore
critically important that the log table be placed on a MySql storage
engine supporting this feature, which is either MEMORY or MyISAM (or
perhaps ARCHIVE, but that doesn't seem to be commonly used or
available by default). On this short test I haven't noticed any
significant difference between the two (most likely all logged data
still lived in the buffers), but with "ENGINE=InnoDB", the results
become the same or worse than with the direct UPDATE (~40sec).
Increasing the delay threshold dramatically increased efficiency of
the "naive" UPDATE approach, by reducing the actual number of updates
from 10K to 500. But even in this best-case scenario it is still twice
as slow as the "no tracking" mode.
Effect of the delay threshold on the server-side tracking is less
direct, because the `assets`.`access_time` column doesn't reflect the
actual access times until the log table is merged into `assets`. As
implemented now, there is at least one extra SELECT hidden inside the
tracking function, possibly followed by an INSERT/UPDATE depending on
the delay threshold. To have a significant effect, the delay threshold
should be set longer than the average interval between the merges
(e.g. if there is a script running the merge every hour, we might want
to set the delay so that an asset access is not tracked if it comes
within one hour from the previous access to the same asset).
Th modified code currently has no provisions for automatically calling
the merging stored proc, except on exit. Full implementation should
possibly include a way to measure the load level (e.g. as a moving
average of the intervals between asset requests) and initiate the
required housekeeping whenever it's not too busy.
More information about the Opensim-dev
mailing list