|Anonymous | Login | Signup for a new account||2019-01-21 15:19 PST|
|Main | My View | View Issues | Change Log | Roadmap | Summary | My Account|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006374||opensim||[REGION] OpenSim Core||public||2012-10-24 06:26||2014-07-29 13:42|
|Product Version||master (dev code)|
|Target Version||master (dev code)||Fixed in Version|
|Summary||0006374: [PATCH] Improved teleport error handling|
|Description||Improved error-handling in case teleport fails:|
1. Instead of resetting the agent's "In-Transit" flag separately in each place where the teleport can fail, do so in a "finally" clause. This is more robust because it reduces the need to remember to reset the flag wherever it's needed.
2. In addition, reset the "In-Transit" flag when the client disconnects. This ensures that if a bug did happen and leave the agent In-Transit, then simply disconnecting and reconnecting will fix the problem. Without this change only a region restart would fix the agent's status.
|Tags||No tags attached.|
|Git Revision or version number||9a8a5afc59c11719595096ff3b9926c8c6cc0d6a|
|Run Mode||Grid (Multiple Regions per Sim)|
|Environment||Mono / Linux32|
|Attached Files|| 0001-Improved-teleport-error-handling.patch [^] (17,367 bytes) 2012-12-16 06:26 [Show Content]
jc.reset-state-on-client-close.patch [^] (4,389 bytes) 2013-02-22 14:48 [Show Content]
I am certainly happy with changes to improve the readability of ResetFromTransit().
However, I believe this patch suffers from a problem if DoTeleport() exits early because the agent is already teleporting. In this case, from my reading ResetFromTransit() will be invoked when it should not be.
In addition, have you encountered any cases where the client was closed in the middle of teleport? I am extremely wary of belt and braces solutions here because they tend to hide underlying problems - I would far rather a bug start occurring so that it can be fixed properly rather than get hidden and possibly manifest itself in other ways that become much harder to debug.
However, it strikes me that perhaps a client close by the user is possible in the middle of teleport. Have you tested this?
For instance, hooking OnClientClosed() also has the effect of resetting the transit flag at sp.Scene.IncomingCloseAgent() in ETM.DoTeleport() rather than right at the very end. This would open up a race condition with sp.Scene.NeedSceneCacheClear(). It just so happens that I just looked through this old method and it appears completely pointless so I intend to comment it out/remove it. However, this is the kind of unexpected stuff that could be introduced without very careful analysis.
Also, why make the EntityTransferStateMachine (and OnClientClosed()) protected rather than private? ETSM should be managed purely by the ETM module and not touched by subclasses unless there's a very good reason.
I updated the patch with the following changes:
1. Handle correctly the case where an agent that is already in transit attempts to start another teleport/cross. There's a simple rule that ensures that this works correctly: calls to SetInTransit() and ResetFromTransit() must come in pairs. If SetInTransit() succeeds then start a try/finally block that calls ResetFromTransit() when it's done.
2. Changed m_entityTransferStateMachine to be private. I actually think it's ok to have it protected: that was done to allow HGEntityTransferModule to access it. Since HGEntityTransferModule is similar to EntityTransferModule, it should be allowed to call its internal members. But after this latest refactoring I no longer need to call m_entityTransferStateMachine from HGEntityTransferModule, so I made it private.
To answer your questions:
Yes, I did encounter a case where the client was closed in the middle of teleport. This is easy to do. A bigger problem happened when the teleport failed: in some cases the user's transit state wasn't reset, with the serious consequence that the user couldn't teleport any longer. Previous to this patch, this was a region-level error that required a region restart to fix. Now, if this happens again, it's just a user-level error that can be fixed simply by logging out and back in.
Hi Oren. I certainly agree with the try/finally changes so I've put these in as git commit ccb7cce. I separated out the other changes into jc.reset-state-on-client-close.patch above.
I actually do like the idea of resetting the transit state on the client close but not for the 'belt and braces' reason. Rather, it's because it has the potential to stop other transfer-related things from happening when the client is closed during teleport.
However, on quickly testing this, I do notice that if a client is closed whilst the first region is waiting for the 'everything is fine' reply from the destination region, the following kind of thing occurs.
22:33:34 - [ENTITY TRANSFER MODULE]: Teleporting Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test three to http://192.168.1.2:9000/ [^] (http://192.168.1.2:9000/ [^]) test two/<120.161, 127.3456, 34.84365>
22:33:34 - [ENTITY TRANSFER MODULE]: Destination is running version SIMULATION/0.1
22:33:34 - [SCENE]: Region test two told of incoming child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 (circuit code 995135401, IP 192.168.1.2, viewer Second Life Release 126.96.36.1992321, teleportflags (ViaLocation), position <120.161, 127.3456, 34.84365>)
22:33:34 - [SCENE]: Region test two authenticated and authorized incoming child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 (circuit code 995135401)
22:33:34 - [SCENE PRESENCE]: Closing child agents. Checking 2 regions in test three
22:33:34 - [SCENE PRESENCE]: Closing 1 child agents
22:33:34 - [SCENE COMMUNICATION SERVICE]: Sending close agent ID e4f3924a-5a7c-4e1a-bee7-aa96580f2515 to test one
22:33:34 - [CLIENT]: Close has been called for Justin Clark-Casey attached to scene test one
22:33:34 - [SCENE]: Removing child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test one
22:33:34 - [ENTITY TRANSFER MODULE]: Set release callback URL to http://192.168.1.2:9000/agent/e4f3924a-5a7c-4e1a-bee7-aa96580f2515/dd5b77f8-bf88-45ac-aace-35bd76426c83/release/ [^] in test three
22:33:34 - [SCENE]: Incoming child agent update for e4f3924a-5a7c-4e1a-bee7-aa96580f2515 in test two
22:33:34 - [LLUDPSERVER]: Handling UseCircuitCode request for circuit 995135401 to test two from IP 192.168.1.2:55514
22:33:34 - [SCENE]: Adding new child scene presence Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 to scene test two at pos <120.161, 127.3456, 34.84365>
22:33:34 - [CAPS]: Received SEED caps request in test two for agent e4f3924a-5a7c-4e1a-bee7-aa96580f2515
22:33:35 - [CLIENT]: Got a logout request for Justin Clark-Casey in test three
22:33:35 - [CLIENT]: Close has been called for Justin Clark-Casey attached to scene test three
22:33:35 - [ENTITY TRANSFER MODULE]: Sending new CAPS seed url http://192.168.1.2:9000/CAPS/cbc805db-deb8-4f51-b998-9fb8015003fb0000/ [^] from test three to Justin Clark-Casey
22:33:35 - [SCENE]: Removing child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test three
22:33:36 - [ENTITY TRANSFER STATE MACHINE]: Agent with ID e4f3924a-5a7c-4e1a-bee7-aa96580f2515 should not exit directly from state Transferring, should go to CleaningUp state first in test three
22:33:36 - [ENTITY TRANSFER STATE MACHINE]: Agent e4f3924a-5a7c-4e1a-bee7-aa96580f2515 cleared from transit in test three
22:33:36 - [ENTITY TRANSFER MODULE]: Teleport of Justin Clark-Casey to test two from test three failed due to no callback from destination region. Returning avatar to source region.
22:33:36 - [ENTITY TRANSFER MODULE]: Exception on teleport of Justin Clark-Casey from <129.0615, 124.856, 34.84365>@test three to <120.161, 127.3456, 34.84365>@test two: Agent with ID e4f3924a-5a7c-4e1a-bee7-aa96580f2515 is not registered as in transit in test three at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferStateMachine.UpdateInTransit (UUID id, AgentTransferState newState) [0x00022] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferStateMachine.cs:124
at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.Fail (OpenSim.Region.Framework.Scenes.ScenePresence sp, OpenSim.Services.Interfaces.GridRegion finalDestination, Boolean logout) [0x00000] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:733
at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.DoTeleportInternal (OpenSim.Region.Framework.Scenes.ScenePresence sp, OpenSim.Services.Interfaces.GridRegion reg, OpenSim.Services.Interfaces.GridRegion finalDestination, Vector3 position, Vector3 lookAt, UInt32 teleportFlags) [0x005d5] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:676
at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.TeleportAgentToDifferentRegion (OpenSim.Region.Framework.Scenes.ScenePresence sp, UInt64 regionHandle, Vector3 position, Vector3 lookAt, UInt32 teleportFlags, OpenSim.Services.Interfaces.GridRegion& finalDestination) [0x000f0] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:360
at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.Teleport (OpenSim.Region.Framework.Scenes.ScenePresence sp, UInt64 regionHandle, Vector3 position, Vector3 lookAt, UInt32 teleportFlags) [0x000e1] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:225
22:34:35 - [LLUDPSERVER]: Ack timeout, disconnecting child agent for Justin Clark-Casey in test two
22:34:35 - [CLIENT]: Close has been called for Justin Clark-Casey attached to scene test two
22:34:35 - [SCENE]: Removing child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test two
This works but is extremely ugly with the exception spew. Now, in this case one could prevent this by checking for the InTransit state after receiving false from WaitForAgentArrivedAtDestination(). However, I think it emphasizes the fact that making this change swaps a known set of behaviours when a client is closed during teleport for a new set of behaviours where we need to have a good idea of the knock-on effects.
This may also be a crude way of implementing teleport cancel. At the moment, this is not hooked up at all and hitting cancel can strand an avatar and force a relog.
But of course, these changes narrow the window for races rather than close it. I think that what is ultimately needed is to extend the state model to encompass client close and login as well as inter-region transfers. It may well make sense for a presence to always have some kind of transfer state, where one of those states is 'not transferring'. Thoughts?
On a minor point, I did not carry through the logging change from debug to warn after a QueryAccess fails. This is because this call could fail for all kinds of legitimate reasons (region full, viewer banned, etc.) and so a log warning would be misleading.
I effectively implemented the client closing check in git master 7471bc7, after previously implementing teleport cancellation as best as can be done at the moment. This is done in a more controlled way with a new Aborting transfer state which the teleport process checks at strategic points.
This still leaves windows where behaviour is unpredictable. However, extending the existing AgentCircuitData logging that controls login/logout into entity transfer quickly became extremely hairy.
|The teleporting code has changed significantly in 0.7.6, so this patch isn't relevant anymore.|
|2012-10-24 06:26||orenh||New Issue|
|2012-10-24 06:26||orenh||File Added: 0001-Improved-teleport-error-handling.patch|
|2012-10-24 06:26||orenh||Assigned To||=> orenh|
|2012-10-24 06:26||orenh||Status||new => patch included|
|2012-10-24 06:26||orenh||Assigned To||orenh => justincc|
|2012-10-24 06:26||orenh||Status||patch included => assigned|
|2012-10-24 06:26||orenh||Status||assigned => patch included|
|2012-10-25 17:05||justincc||Note Added: 0022929|
|2012-10-25 17:05||justincc||Status||patch included => patch feedback|
|2012-12-16 06:24||orenh||File Deleted: 0001-Improved-teleport-error-handling.patch|
|2012-12-16 06:26||orenh||File Added: 0001-Improved-teleport-error-handling.patch|
|2012-12-16 06:33||orenh||Note Added: 0023268|
|2012-12-16 06:34||orenh||Status||patch feedback => patch included|
|2013-02-22 14:48||justincc||File Added: jc.reset-state-on-client-close.patch|
|2013-02-22 15:14||justincc||Note Added: 0023591|
|2013-02-22 15:14||justincc||Status||patch included => patch feedback|
|2013-03-26 18:24||justincc||Note Added: 0023703|
|2014-01-11 23:45||orenh||Note Added: 0024984|
|2014-01-11 23:45||orenh||Status||patch feedback => resolved|
|2014-01-11 23:45||orenh||Resolution||open => won't fix|
|2014-07-29 13:42||chi11ken||Status||resolved => closed|
|Copyright © 2000 - 2012 MantisBT Group|