Feature Proposals/Deduplicating Asset Service

From OpenSimulator

(Difference between revisions)
Jump to: navigation, search
(Created page with "=Date= November 2011. =Status= Draft =Proposers= Justin Clark-Casey (justincc) =Introduction= The asset service stores practically all sim data (textures, scripts, etc.). ...")
 
Line 19: Line 19:
 
=Proposal=
 
=Proposal=
  
todo
+
A high proportion of uploaded assets, whether uploaded via the viewer or through mechanisms such as OARs and IARs are duplicates of existing assets stored by the asset service.  On OSGrid, for example, DATA.
 +
 
 +
Here, I propose to establish an asset service that hashes all assets and so detects and eliminates duplicates.  This would function along the same lines as coyled's existing [https://github.com/coyled/sras Simple Ruby Asset Server (SRAS)].
 +
 
 +
==Design==
 +
 
 +
There are two major alternatives.
 +
 
 +
===Design 1: Add hash and pointer column to existing asset table===
 +
 
 +
The first design would see a hash column and a pointer column added to the existing asset table.  The hash column would be a primary key.
 +
 
 +
Pseudo-code for adding a new asset
 +
 
 +
On asset add
 +
  Hash new asset
 +
  Compare to existing hashes
 +
  If match
 +
    create new asset table entry storing metadata and pointer to existing asset hash
 +
  else
 +
    create new asset table entry storing data, hash and metadata
 +
 
 +
 
 +
Pseudo-code for retrieving an asset
 +
 
 +
On asset get
 +
  select existing asset based on input id
 +
  If match
 +
    If asset contains data directly
 +
      return existing asset
 +
    else
 +
      look up asset pointed to by reference
 +
      returning asset using initial metadata and pointer-referenced data
 +
  else
 +
    return no such asset
 +
 
 +
===Design 2: Create two separate tables assetsmeta and assetsdata===
 +
 
 +
The second design would see the creation of two separate tables.  The assetsmeta would be
 +
 
 +
{| border="1"
 +
! column      !! type        !! notes
 +
|-
 +
| id          || char(36)    || Primary key
 +
|-
 +
| sha256      || char(64)    ||
 +
|-
 +
| name        || varchar(64)  ||
 +
|-
 +
| description || varchar(64)  ||
 +
|-
 +
| assetType  || tinyint(4)  ||
 +
|-
 +
| local      || tinyint(1)  ||
 +
|-
 +
| temporary  || tinyint(1)  ||
 +
|-
 +
| create_time || int(11)      ||
 +
|-
 +
| access_time || int(11)      ||
 +
|-
 +
| asset_flags || int(11)      ||
 +
|-
 +
| CreatorID  || varchar(128) ||
 +
|}
 +
 
 +
This matches the existing assets table except that the data column is no longer present and a sha256 column has been added instead.
 +
 
 +
The assetsdata table would be
 +
 
 +
{| border="1"
 +
! column      !! type        !! notes
 +
|-
 +
| sha256      || char(64)    || Primary key
 +
|-
 +
| data        || longblob    ||
 +
|}
 +
 
 +
This could be replaced by other storage mechanism options (e.g. filesystem) in the future.
 +
 
 +
Pseudo-code for adding a new asset
 +
 
 +
On asset add
 +
  Hash new asset
 +
  Compare to existing hashes
 +
  If match
 +
    create new assetmeta entry pointing to existing assetdata entry
 +
  If no match
 +
    create new asset data entry
 +
    create new assetmeta entry pointing to new assetdata entry
 +
 
 +
Pseudo-code for retrieving an asset
 +
 
 +
On asset get
 +
  select existing asset based on input id
 +
  If match
 +
    Fetch asset data from asset data
 +
    Return asset metadata + data
 +
  else
 +
    return no such asset

Revision as of 11:03, 11 November 2011

Contents

Date

November 2011.

Status

Draft

Proposers

Justin Clark-Casey (justincc)

Introduction

The asset service stores practically all sim data (textures, scripts, etc.). Many assets are exact duplicates of existing assets, except stored under a different asset ID (and occasionally different metadata).

This feature would be to create an asset service which stores hashes of all assets, enabling duplicate assets to be detected and only one copy stored.

Proposal

A high proportion of uploaded assets, whether uploaded via the viewer or through mechanisms such as OARs and IARs are duplicates of existing assets stored by the asset service. On OSGrid, for example, DATA.

Here, I propose to establish an asset service that hashes all assets and so detects and eliminates duplicates. This would function along the same lines as coyled's existing Simple Ruby Asset Server (SRAS).

Design

There are two major alternatives.

Design 1: Add hash and pointer column to existing asset table

The first design would see a hash column and a pointer column added to the existing asset table. The hash column would be a primary key.

Pseudo-code for adding a new asset

On asset add
  Hash new asset
  Compare to existing hashes
  If match
    create new asset table entry storing metadata and pointer to existing asset hash
  else
    create new asset table entry storing data, hash and metadata


Pseudo-code for retrieving an asset

On asset get
  select existing asset based on input id
  If match
    If asset contains data directly
      return existing asset
    else
      look up asset pointed to by reference
      returning asset using initial metadata and pointer-referenced data
  else
    return no such asset

Design 2: Create two separate tables assetsmeta and assetsdata

The second design would see the creation of two separate tables. The assetsmeta would be

column type notes
id char(36) Primary key
sha256 char(64)
name varchar(64)
description varchar(64)
assetType tinyint(4)
local tinyint(1)
temporary tinyint(1)
create_time int(11)
access_time int(11)
asset_flags int(11)
CreatorID varchar(128)

This matches the existing assets table except that the data column is no longer present and a sha256 column has been added instead.

The assetsdata table would be

column type notes
sha256 char(64) Primary key
data longblob

This could be replaced by other storage mechanism options (e.g. filesystem) in the future.

Pseudo-code for adding a new asset

On asset add
  Hash new asset
  Compare to existing hashes
  If match
    create new assetmeta entry pointing to existing assetdata entry
  If no match
    create new asset data entry
    create new assetmeta entry pointing to new assetdata entry

Pseudo-code for retrieving an asset

On asset get
  select existing asset based on input id
  If match
    Fetch asset data from asset data
    Return asset metadata + data
  else
    return no such asset
Personal tools
General
About This Wiki