[Opensim-dev] UUID [Was: Proposal to eliminate the name, description and invType fields from the assets db]

Tue Jun 24 08:25:45 UTC 2008

> > Obviously, you wouldn't trust an assetId generated by an > > untrusted source to represent the asset binary anyway, so quite a lot of > > re-hashing and checking will be done along the way anyway.> > I think that whether there's a trust issue or not depends on where you > calculate the hash. If it's calculated at the asset service then > there's no issue. Provided it's a _trusted_ asset service. If you're fetching assets to your local asset cache from untrusted sources (asset services) then you have to re-calculate the hash (which makes sending it rather meaningless) - otherwise, I can send you whatever binary data and race another source for that key. The scenario with distributed asset sources has already been implemented by us, hence our interest in this matter. > Even if it's calculated externally, then you could > always force a recalculation if there appears to be a hash collision > when an upload is attempted.Unintentional hash collisions aren't my main concern, as they are astronomically small. I'm worried about Intentional hash collisions, and especially in this scenario, just plain id tampering and spoofing across trust boundaries. Now, ok, generating sha-1 collisions is still in the 2^60 operations range (I believe?) so I don't think anybody would throw that much computer power at replacing an asset binary. It's more a matter of principle - we need start thinking in terms of trust boundaries, where we place them, and what that actually entails. Also, I feel I must say that our 'happy days' of alpha coding are over now. It's obvious that we aren't going to be able to do many more big changes to the architecture now that we have god knows how many installations. So if we're going to do this, we might just as well do it right, or be stuck with a broken solution.> I do think that the dupe issue is going to loom larger if we were to get > to an intergrid architecture of federated asset databases with lots of > people loading in material at different points. The number of > duplicates for a popular texture, say, would steadily increase over time > without any realistic way of co-ordinating reaping within the federation.I totally agree. I'm _for_ hashing and de-duping all the way. Hell, I even want us to do id scavenging thru asset parsing. The SL model of not keeping track of assets, is fundamentally flawed. If we can circumvent that by taking some wise decisions, we should. One way to go about it is to have asset references coupled to users and objects. I don't know if you all have thought about that, but we actually control both the vertical AND the horizontal. When somebody create an asset, we can couple that to that user. (not just as 'creator', but as a reference) When somebody drags it into/onto an object, or changes a texture, we can make a note of that (couple all referenced assets to that object - or remove it if it's removed) We can then declare 'there are no global assets' only what the objects explicitly reference, or what someone has explicitly in his inventory. Hell, in that scenario we could even say 'you can't reference an asset you, or an object of yours, do not have a reference to' - which will probably make SL-molded content producers wet their pants. Of course, this will be a problem when importing assets from other non-OpenSim sources, but hey, that's a problem in itself anyway (as discussed below)> I know flexibility is our watchword, but I wonder if this isn't one > design point where making efforts to be as bendy as possible isn't too > costly in terms of adding extra layers of indirection or passing more > information around a network than is really needed.As Sean was saying - separating the use cases are key. And yes, premature generalization is teh enemy. But just having that extra indirection Asset -> Binary would probably open up for solving several cases. And, as I said, it could even be solved in the database layer, with views.> Another good thing about asset uuid hashes is that they can potentially > make the uuids disappear into the background when integrating with > external programs (and the rest of the web). For instance, suppose that > a user wants to design a complex object or entire scene in some external > program, and periodically upload different versions to their region > server (grid connected or not). If our hashes are random, either the > user or their program has to worry about generating new random ones for > each changed prim in their building/vehicle/elaborate clothing design. > For OpenSim to be reliable we have to assume that they will forget and > attempt to upload changed assets with existing uuids. OpenSim might > then need to drag down large amounts of existing data from the asset > service to run internal hashing to know whether it needs to generate new > uuids or not, or force every single imported prim to have a new uuid > (which will really push up the dupe levels).Which would then give us the problem that the external source has taken the OpenSim id, changed the binary, uploaded it again, OpenSim forces an id change, and your interlinked assets are then corrupted? Or, the id is actually changed, which means that all assets containing a reference to that id (think shirt asset containing a reference to a texture asset) now has a new sha-1 hash id (since it contais the id as text) which has to be propagated to all entities referencing it, which in turn gets new sha-1 hashes? And all this has to happen automagically based on the asset being updated or not?> If asset uuids are hashes, then they can disappear from the user and > program's view. Outside of OpenSim, there is no need to manipulate > uuids in this scenario - collections of objects or scenes can be > structured in any way that the external program pleases. Instead of > getting asset uuids on import, the importing process calculates the > necessary uuids from data, and can ask the asset service whether these > already exist without needing to actually obtain the blobs.I definitively think we need to get away from supplying ids in cross-trust border transfers, that's for sure. (Think sending a save-xml to another region and what that implies in terms of (child) agent->prim referential integrity) If it means stricting up by keeping better track of things or if it means loosening by hashing, I don't know.> I think that it's possible to experiment with the hashing idea without > requiring the whole system to change. Indeed, because of the very > import scenario outlined above, I may well try using it for importing > object archives. This shouldn't impact the rest of the system since, of > course, the chances of collision are small.+1 on more experimenting, less toolshedding. Again, I would say splitting out the binary parts of assets, introducing a sha-1 binaryId is a non-intrusive, no-brainer first step. /Stefan 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20080624/16e9b02a/attachment-0001.html>