Feature Proposals/Deduplicating Asset Service
From OpenSimulator
Contents | 
Date
November 2011.
Status
Draft
Proposers
Justin Clark-Casey (justincc)
Introduction
The asset service stores practically all sim data (textures, scripts, etc.). Many assets are exact duplicates of existing assets, except stored under a different asset ID (and occasionally different metadata).
This feature would be to create an asset service which stores hashes of all assets, enabling duplicate assets to be detected and only one copy stored.
Proposal
A high proportion of uploaded assets, whether uploaded via the viewer or through mechanisms such as OARs and IARs are duplicates of existing assets stored by the asset service. On OSGrid, for example, DATA.
Here, I propose to establish an asset service that hashes all assets and so detects and eliminates duplicates. This would function along the same lines as coyled's existing Simple Ruby Asset Server (SRAS).
Design
There are two major alternatives.
Design 1: Add hash and pointer column to existing asset table
The first design would see a hash column and a pointer column added to the existing asset table. The hash column would be a primary key.
Pseudo-code for adding a new asset
On asset add
  Hash new asset
  Compare to existing hashes
  If match
    create new asset table entry storing metadata and pointer to existing asset hash
  else
    create new asset table entry storing data, hash and metadata
Pseudo-code for retrieving an asset
On asset get
  select existing asset based on input id
  If match
    If asset contains data directly
      return existing asset
    else
      look up asset pointed to by reference
      returning asset using initial metadata and pointer-referenced data
  else
    return no such asset
Design 2: Create two separate tables assetsmeta and assetsdata
The second design would see the creation of two separate tables. The assetsmeta would be
| column | type | notes | 
|---|---|---|
| id | char(36) | Primary key | 
| sha256 | char(64) | |
| name | varchar(64) | |
| description | varchar(64) | |
| assetType | tinyint(4) | |
| local | tinyint(1) | |
| temporary | tinyint(1) | |
| create_time | int(11) | |
| access_time | int(11) | |
| asset_flags | int(11) | |
| CreatorID | varchar(128) | 
This matches the existing assets table except that the data column is no longer present and a sha256 column has been added instead.
The assetsdata table would be
| column | type | notes | 
|---|---|---|
| sha256 | char(64) | Primary key | 
| data | longblob | 
This could be replaced by other storage mechanism options (e.g. filesystem) in the future.
Pseudo-code for adding a new asset
On asset add
  Hash new asset
  Compare to existing hashes
  If match
    create new assetmeta entry pointing to existing assetdata entry
  If no match
    create new asset data entry
    create new assetmeta entry pointing to new assetdata entry
Pseudo-code for retrieving an asset
On asset get
  select existing asset based on input id
  If match
    Fetch asset data from asset data
    Return asset metadata + data
  else
    return no such asset