[Opensim-dev] Error detection when storing an asset

Melanie melanie at t-data.com
Sat Apr 19 02:12:09 UTC 2014


The kind of permanent failure scenarios outlined below are a death
knell for the affected grid; if the asset server is full, any
reasonably sized grid will be inconsistent and irrecoverably damaged
in less than a day.
Therefore, I postulate that that condition does not need to be
covered by extra code that would impair legibility since the
affected grid would have been neglectful in not monitoring disk usage.
I'm concerned about having to have exception catching and handling
in every place Get() is called because that would certainly make the
code a lot bulkier and less readable for no real-life gain.
In particular, since in most cases no intelligent response is
possible, it just means ignoring the exception in multiple places
rather than just one.
Reporting that kind of error to the user is pointless. There is
nothing the user can do to recover.

- Melanie


On 19/04/2014 03:56, Justin Clark-Casey wrote:
> There are always times when failure is permanent.  For instance, at the moment if the asset service is full.  Even if 
> store and forward were added, unrecoverable failure can still occur if both sides are full.
> 
> I have had to spend my scarce time dealing with many bugs where the investigation has led to a piece of code that simply 
> swallows its exception, where signalling or logging the error condition would have made the cause of complex problems 
> obvious.  So I think that lower level components should always appropriately signal failure or other relevant 
> conditions.  It's a different question as to whether the caller thinks it appropriate to notify the user, log the 
> problem or do nothing (though it should always be possible to tell somehow when there is an issue).  But the caller 
> should be given the capability to deal with errors or other conditions as appropriate.
> 
> As for Oren's second issue (null being mistaken as indicating a valid asset), this sounds like a straightforward bug 
> that should be fixed.
> 
> The 3rd issue, where very old asset servers sent no reply, I agree sounds like something we can risk fixing, as nobody 
> is using (or could use) the old UGAIM pre 0.7 service processes any more.
> 
> Regarding message queueing, I agree that this would be a great approach.  However, OpenSimulator would still need some 
> kind of sqlite equivalent so that it can work out of the box, whilst still being pluggable if someone wants to use a 
> more heavyweight system.  The sqlite equivalent must have equivalent support by OpenSimulator, just as the sqlite 
> database plugin should have that today.
> 
> On 18/04/14 22:43, Melanie wrote:
>> The point is no NOT let it fail. Asset storing never fails
>> permanently if it's retried until successful. The upper layers (all
>> the way to the viewer) are not equipped to handle an asset storing
>> failure. Propagating the exception would just annoy the user with
>> needless messages. Since asset servers can be "gone" for a while,
>> for instance when there is a net failure, there is no way to give it
>> a timeout, either. A sim in OSGrid, if it gets disconnected,  could
>> run on locally cached assets and manage to reconnect after 20
>> minutes and simply upload all new assets since then. Screaming
>> "failure" at the user is pointless in such a scenario.
>>
>> - Melanie
>>
>> On 18/04/2014 22:56, Oren Hurvitz wrote:
>>> There seems to be a misunderstanding here. We're talking about a case where
>>> the operation has FAILED. The only question is whether to pretend that it
>>> succeeded, so that the user will find out that it failed later, to their
>>> surprise, or to report failure immediately. Obviously it's better to report
>>> failure immediately.
>>>
>>>
>>>
>>> On Fri, Apr 18, 2014 at 10:40 PM, Melanie <melanie at t-data.com> wrote:
>>>
>>>> Name one valid use case where current OpenSim is able to handle such
>>>> an exception gracefully, e.g. without user-visible error.
>>>>
>>>> - Melanie
>>>>
>>>> On 18/04/2014 13:28, Mike Chase wrote:
>>>>> I'm inclined to agree with Oren.  Asset Writes could fail for a variety
>>>> of
>>>>> reasons and there are lots of use cases where you need to know the asset
>>>> is
>>>>> on disk.  I think propagating exceptions is the more sound approach IMO.
>>>>>
>>>>> I also agree re: the custom comms vs a persistent queue mechanism but I
>>>>> don't want to derail this topic.   That can wait for another day.
>>>>>
>>>>> Mike
>>>>>
>>>>> -----Original Message-----
>>>>> From: opensim-dev-bounces at lists.berlios.de
>>>>> [mailto:opensim-dev-bounces at lists.berlios.de] On Behalf Of Oren Hurvitz
>>>>> Sent: Friday, April 18, 2014 7:06 AM
>>>>> To: opensim-dev at lists.berlios.de
>>>>> Subject: Re: [Opensim-dev] Error detection when storing an asset
>>>>>
>>>>> Regarding the hiding of exceptions: to be clear, I was already bitten by
>>>>> this behavior; that's why I started to investigate how assets are
>>>> stored. I
>>>>> have therefore already changed Kitely's version of OpenSim to propagate
>>>>> exceptions, and the question is whether other people would like me to
>>>>> contribute this change. If anyone has an opinion then please reply.
>>>>>
>>>>> Regarding your suggestion to save assets to local disk and retry them
>>>> later:
>>>>> this is basically what a persistent message queue does. If you're going
>>>> to
>>>>> go that route then it would be best to add a real message queue rather
>>>> than
>>>>> a home-grown one. I would LOVE it if OpenSim used a message queue for
>>>>> communications, as it would allow ripping out thousands of lines of
>>>> homemade
>>>>> communications code, and would be faster and more reliable to boot. But
>>>>> that's a bigger issue and I'll put it aside for now.
>>>>>
>>>>> In this particular case, using a persistent message queue isn't be the
>>>> right
>>>>> solution: the right solution is to report failures immediately. Otherwise
>>>>> you'd get weird behavior such as a user who thinks they've successfully
>>>> worn
>>>>> a piece of clothing, but when they teleport to another region it
>>>> disappears
>>>>> because the other region can't load the asset (because it was never
>>>> saved).
>>>>> To prevent these problems you need to fail-fast, and tell the user
>>>>> immediately when a problem happens. This doesn't mean to crash the sim; I
>>>>> strongly doubt any asset failure would cause that, it would just fail the
>>>>> specific packet or message that is currently being handled, as it should.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>> http://opensim-dev.2196679.n2.nabble.com/Error-detection-when-storing-an-ass
>>>>> et-tp7579223p7579225.html
>>>>> Sent from the opensim-dev mailing list archive at Nabble.com.
>>>>> _______________________________________________
>>>>> Opensim-dev mailing list
>>>>> Opensim-dev at lists.berlios.de
>>>>> https://lists.berlios.de/mailman/listinfo/opensim-dev
>>>>>
>>>>> _______________________________________________
>>>>> Opensim-dev mailing list
>>>>> Opensim-dev at lists.berlios.de
>>>>> https://lists.berlios.de/mailman/listinfo/opensim-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Opensim-dev mailing list
>>>> Opensim-dev at lists.berlios.de
>>>> https://lists.berlios.de/mailman/listinfo/opensim-dev
>>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Opensim-dev mailing list
>>> Opensim-dev at lists.berlios.de
>>> https://lists.berlios.de/mailman/listinfo/opensim-dev
>> _______________________________________________
>> Opensim-dev mailing list
>> Opensim-dev at lists.berlios.de
>> https://lists.berlios.de/mailman/listinfo/opensim-dev
>> .
>>
> 
> 



More information about the Opensim-dev mailing list