[Opensim-dev] Proposal: Implement a de-duplicating core ROBUST asset service
Wade Schuette
wade.schuette at gmail.com
Fri Mar 9 04:19:42 UTC 2012
Also, munching fritos and looking at this, we could assume that any asset
that is new to one avatar that was created by a different avatar is a
high-probability candidate for being a duplicate and should be checked out.
That would capture a good chunk ( over 50%?) of duplicates without having
to touch the renaming-or-making-a-copy processes.
Again, this could be event-driven, or db-trigger-driven on INSERT, etc.
(Or does MySQL not have transactions and not have on-insert triggers? I'm
used to Oracle. )
Wade
On Thu, Mar 8, 2012 at 8:06 PM, Wade Schuette <wade.schuette at gmail.com>wrote:
> Justin,
>
> I have to respectfully agree with Cory.
>
> Wouldn't something like the following address your valid concerns about
> complexity and reducing total load as well as perceived system response
> time to both filing and retrieving assets?
>
> First, if you use event-driven processes, there's no reason to rescan the
> entire database, and by separating the processes into distinct streams,
> they are decoupled which is actually a good thing and simplifies both
> sides. There's no reason I can see they need to be coupled, and
> separating them allows them to be optimized and tested separately, which is
> a good thing.
>
> In fact, the entire deduplication process could run overnight at a
> low-load time, which is even better, or have multiple "worker" processes
> assisgned to it, if it's taking too long. Seems very flexible.
>
> I'm assuming that a hash-code isn't unique, but just specifies the bucket
> into which this item can be categorized.
>
> When a new asset arrives, if the hash-code already exists, put the
> unique-ID in a pipe and finish filing it and move on. If the hash-code
> doesn't already exist, just file it and move on.
>
> At the other end of the pipe, this wakes up a process that can, as time
> allows, check in the background to see if not only the hash-code is the
> same, but the entire item is the same, and if so, change the handle to
> point to the existing copy. ( For all I know, this can be done in one
> step if CRC codes are sufficiently unique, but computing such a code is cpu
> intensive unless you can do it in hardware.)
>
> Of course, now the question arises of what happens when the original
> person DELETES the shared item. If you have solid database integrity, you
> only need to know how many pointers to it exist, and if someone deletes
> "their copy", you decrease the count by one, and when the count gets to
> one, the next delete can actually delete the entry.
>
>
>
> Wade
>
>
>
>
>
> On 3/8/12 7:41 PM, Justin Clark-Casey wrote:
>
>> On 08/03/12 22:00, Rory Slegtenhorst wrote:
>>
>>> @Justin
>>> Can't we do the data de-duplication on a database level? Eg find the
>>> duplicates and just get rid of them on a regular
>>> interval (cron)?
>>>
>>
>> This would be enormously intricate. Not only would you have to keep
>> rescanning the entire asset db but it adds another moving part to an
>> already complex system.
>>
>>
>
--
R. Wade Schuette, CDP, MBA, MPH
698 Monterey Ave
Morro Bay CA 93442
cell: 1 (734) 635-0508
fax: 1 (734) 864-0318
wade.schuette at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20120308/0c1f967a/attachment-0001.html>
More information about the Opensim-dev
mailing list