[Opensim-dev] Error detection when storing an asset

Fri Apr 18 09:39:42 UTC 2014

Hi Oren,

I disagree about propagating exceptions. To the caller, asset calls are opaque and cannot fail. There are too many places where exceptions would cause issues and the focus has always been on keeping the sim up and running at almost any cost.

That said, no old asset servers (UGAIM) exist anymore, so the comment and the bad handling of null returns should be fixed.

I would like to see assets being cached on asset server failure, but then retried using a disk based queue with exponential back-off. Allowing an asset to fail storing is better than having the sim crash, but of course storing it as soon as the asset server is back is preferable. We have something similar, but memory based, for Avination and it has served us well. I believe even our code may still have the bad return calue checking in it.

I will see about putting our asset retry code into core as a starting point, then the rough edges can be smoothed and disk based queueing introduced so that assets can be retried even across a simulator restart.

- Melanie

On 18 Apr 2014, at 10:44, Oren Hurvitz <orenh at kitely.com> wrote:

I have found that when OpenSim tries to store an asset using AssetServicesConnector, it doesn't handle failures well. There are several problems in AssetServicesConnector.Store(), and some of them seem to be based on historical considerations that may no longer be relevant, so I'd like to see if anyone knows about these issues before pushing a change.

AssetServicesConnector.Store() uses SynchronousRestObjectRequester to send the request. The first problem is that SynchronousRestObjectRequester.MakeRequest() hides exceptions and just returns null. This is a mistake: MakeRequest() should propagate exceptions, so that callers will know that an error occurred. Store() already catches these exceptions (as it should), so this won't make a big difference in our case. But of course, this change in behavior to MakeRequest() may affect other places in the code as well. It's better to fail-fast when errors occur, and not hide them, because doing so makes the errors much harder to diagnose once they become apparent. This brings me to the next problem...

The second problem is that AssetServicesConnector.Store() compares the return value from MakeRequest to string.Empty, but in fact the return value that is returned in case of error is null. So it mistakenly treats 'null' as a valid Asset ID, which causes it to cache the asset. This can cause the operation to appear to have succeeded for a while, since OpenSim will have the asset available, but the moment the asset is loaded from the asset server "the jig is up" and the asset will be missing. This can explain many problems people have had with disappearing assets.

The third problem is a comment found in this method, which says that a return value of 'null' is considered to be success because of "old asset servers that don't send any reply back". That's a really old comment (from 2009). Can I assume that we no longer need to support such servers and we can treat 'null' as the error value that it is?

Oren

_______________________________________________
Opensim-dev mailing list
Opensim-dev at lists.berlios.de
https://lists.berlios.de/mailman/listinfo/opensim-dev