[Opensim-dev] Proposal: Implement a de-duplicating core ROBUST asset service

Fri Mar 9 04:19:42 UTC 2012

Also, munching fritos and looking at this,  we could assume that any asset
that is new to one avatar that was created by a different avatar is a
high-probability candidate for being a duplicate and should be checked out.

That would capture a good chunk ( over 50%?) of duplicates without having
to touch the renaming-or-making-a-copy processes.

Again, this could be event-driven, or db-trigger-driven on INSERT,  etc.
(Or does MySQL not have transactions and not have on-insert triggers?  I'm
used to Oracle. )

Wade

On Thu, Mar 8, 2012 at 8:06 PM, Wade Schuette <wade.schuette at gmail.com>wrote:

> Justin,
>
> I have to respectfully agree with Cory.
>
> Wouldn't something like the following address your valid concerns about
> complexity and reducing total load as well as perceived system response
> time to both filing and retrieving assets?
>
> First, if  you use event-driven processes, there's no reason to rescan the
> entire database, and by separating the processes into distinct streams,
> they are decoupled which is actually a good thing and simplifies both
> sides.   There's no reason I can see they need to be coupled, and
> separating them allows them to be optimized and tested separately, which is
> a good thing.
>
> In fact, the entire deduplication process could run overnight at a
> low-load time, which is even better,  or have multiple "worker" processes
> assisgned to it,  if it's taking too long.   Seems very flexible.
>
> I'm assuming that a hash-code isn't unique, but just specifies the bucket
> into which this item can be categorized.
>
> When a new asset arrives,  if the hash-code already exists, put the
> unique-ID in a pipe and finish filing it and move on.   If the hash-code
> doesn't already exist,  just file it and move on.
>
> At the other end of the pipe, this wakes up a process that can, as time
> allows,  check in the background to see if not only the hash-code is the
> same, but the entire item is the same, and if so,  change the handle to
> point to the existing copy.   ( For all I know,  this can be done in one
> step if CRC codes are sufficiently unique, but computing such a code is cpu
> intensive unless you can do it in hardware.)
>
> Of course,  now the question arises of what happens when the original
> person DELETES the shared item.   If you have solid database integrity, you
> only need to know how many pointers to it exist, and if someone deletes
> "their copy",  you decrease the count by one, and when the count gets to
> one,  the next delete can actually delete the entry.
>
>
>
> Wade
>
>
>
>
>
> On 3/8/12 7:41 PM, Justin Clark-Casey wrote:
>
>> On 08/03/12 22:00, Rory Slegtenhorst wrote:
>>
>>> @Justin
>>> Can't we do the data de-duplication on a database level? Eg find the
>>> duplicates and just get rid of them on a regular
>>> interval (cron)?
>>>
>>
>> This would be enormously intricate.  Not only would you have to keep
>> rescanning the entire asset db but it adds another moving part to an
>> already complex system.
>>
>>
>

-- 
R. Wade Schuette, CDP, MBA, MPH
698 Monterey Ave
Morro Bay CA 93442
cell: 1 (734) 635-0508
fax:  1 (734) 864-0318
wade.schuette at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20120308/0c1f967a/attachment-0001.html>