Feature Proposals/Deduplicating Asset Service
From OpenSimulator
(Created page with "=Date= November 2011. =Status= Draft =Proposers= Justin Clark-Casey (justincc) =Introduction= The asset service stores practically all sim data (textures, scripts, etc.). ...") |
|||
Line 19: | Line 19: | ||
=Proposal= | =Proposal= | ||
− | + | A high proportion of uploaded assets, whether uploaded via the viewer or through mechanisms such as OARs and IARs are duplicates of existing assets stored by the asset service. On OSGrid, for example, DATA. | |
+ | |||
+ | Here, I propose to establish an asset service that hashes all assets and so detects and eliminates duplicates. This would function along the same lines as coyled's existing [https://github.com/coyled/sras Simple Ruby Asset Server (SRAS)]. | ||
+ | |||
+ | ==Design== | ||
+ | |||
+ | There are two major alternatives. | ||
+ | |||
+ | ===Design 1: Add hash and pointer column to existing asset table=== | ||
+ | |||
+ | The first design would see a hash column and a pointer column added to the existing asset table. The hash column would be a primary key. | ||
+ | |||
+ | Pseudo-code for adding a new asset | ||
+ | |||
+ | On asset add | ||
+ | Hash new asset | ||
+ | Compare to existing hashes | ||
+ | If match | ||
+ | create new asset table entry storing metadata and pointer to existing asset hash | ||
+ | else | ||
+ | create new asset table entry storing data, hash and metadata | ||
+ | |||
+ | |||
+ | Pseudo-code for retrieving an asset | ||
+ | |||
+ | On asset get | ||
+ | select existing asset based on input id | ||
+ | If match | ||
+ | If asset contains data directly | ||
+ | return existing asset | ||
+ | else | ||
+ | look up asset pointed to by reference | ||
+ | returning asset using initial metadata and pointer-referenced data | ||
+ | else | ||
+ | return no such asset | ||
+ | |||
+ | ===Design 2: Create two separate tables assetsmeta and assetsdata=== | ||
+ | |||
+ | The second design would see the creation of two separate tables. The assetsmeta would be | ||
+ | |||
+ | {| border="1" | ||
+ | ! column !! type !! notes | ||
+ | |- | ||
+ | | id || char(36) || Primary key | ||
+ | |- | ||
+ | | sha256 || char(64) || | ||
+ | |- | ||
+ | | name || varchar(64) || | ||
+ | |- | ||
+ | | description || varchar(64) || | ||
+ | |- | ||
+ | | assetType || tinyint(4) || | ||
+ | |- | ||
+ | | local || tinyint(1) || | ||
+ | |- | ||
+ | | temporary || tinyint(1) || | ||
+ | |- | ||
+ | | create_time || int(11) || | ||
+ | |- | ||
+ | | access_time || int(11) || | ||
+ | |- | ||
+ | | asset_flags || int(11) || | ||
+ | |- | ||
+ | | CreatorID || varchar(128) || | ||
+ | |} | ||
+ | |||
+ | This matches the existing assets table except that the data column is no longer present and a sha256 column has been added instead. | ||
+ | |||
+ | The assetsdata table would be | ||
+ | |||
+ | {| border="1" | ||
+ | ! column !! type !! notes | ||
+ | |- | ||
+ | | sha256 || char(64) || Primary key | ||
+ | |- | ||
+ | | data || longblob || | ||
+ | |} | ||
+ | |||
+ | This could be replaced by other storage mechanism options (e.g. filesystem) in the future. | ||
+ | |||
+ | Pseudo-code for adding a new asset | ||
+ | |||
+ | On asset add | ||
+ | Hash new asset | ||
+ | Compare to existing hashes | ||
+ | If match | ||
+ | create new assetmeta entry pointing to existing assetdata entry | ||
+ | If no match | ||
+ | create new asset data entry | ||
+ | create new assetmeta entry pointing to new assetdata entry | ||
+ | |||
+ | Pseudo-code for retrieving an asset | ||
+ | |||
+ | On asset get | ||
+ | select existing asset based on input id | ||
+ | If match | ||
+ | Fetch asset data from asset data | ||
+ | Return asset metadata + data | ||
+ | else | ||
+ | return no such asset |
Revision as of 11:03, 11 November 2011
Contents |
Date
November 2011.
Status
Draft
Proposers
Justin Clark-Casey (justincc)
Introduction
The asset service stores practically all sim data (textures, scripts, etc.). Many assets are exact duplicates of existing assets, except stored under a different asset ID (and occasionally different metadata).
This feature would be to create an asset service which stores hashes of all assets, enabling duplicate assets to be detected and only one copy stored.
Proposal
A high proportion of uploaded assets, whether uploaded via the viewer or through mechanisms such as OARs and IARs are duplicates of existing assets stored by the asset service. On OSGrid, for example, DATA.
Here, I propose to establish an asset service that hashes all assets and so detects and eliminates duplicates. This would function along the same lines as coyled's existing Simple Ruby Asset Server (SRAS).
Design
There are two major alternatives.
Design 1: Add hash and pointer column to existing asset table
The first design would see a hash column and a pointer column added to the existing asset table. The hash column would be a primary key.
Pseudo-code for adding a new asset
On asset add Hash new asset Compare to existing hashes If match create new asset table entry storing metadata and pointer to existing asset hash else create new asset table entry storing data, hash and metadata
Pseudo-code for retrieving an asset
On asset get select existing asset based on input id If match If asset contains data directly return existing asset else look up asset pointed to by reference returning asset using initial metadata and pointer-referenced data else return no such asset
Design 2: Create two separate tables assetsmeta and assetsdata
The second design would see the creation of two separate tables. The assetsmeta would be
column | type | notes |
---|---|---|
id | char(36) | Primary key |
sha256 | char(64) | |
name | varchar(64) | |
description | varchar(64) | |
assetType | tinyint(4) | |
local | tinyint(1) | |
temporary | tinyint(1) | |
create_time | int(11) | |
access_time | int(11) | |
asset_flags | int(11) | |
CreatorID | varchar(128) |
This matches the existing assets table except that the data column is no longer present and a sha256 column has been added instead.
The assetsdata table would be
column | type | notes |
---|---|---|
sha256 | char(64) | Primary key |
data | longblob |
This could be replaced by other storage mechanism options (e.g. filesystem) in the future.
Pseudo-code for adding a new asset
On asset add Hash new asset Compare to existing hashes If match create new assetmeta entry pointing to existing assetdata entry If no match create new asset data entry create new assetmeta entry pointing to new assetdata entry
Pseudo-code for retrieving an asset
On asset get select existing asset based on input id If match Fetch asset data from asset data Return asset metadata + data else return no such asset