Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006374opensim[REGION] OpenSim Corepublic2012-10-24 06:262014-07-29 13:42
Reporterorenh 
Assigned Tojustincc 
PrioritynormalSeverityminorReproducibilitysometimes
StatusclosedResolutionwon't fix 
PlatformOSOS Version
Product Versionmaster (dev code) 
Target Versionmaster (dev code)Fixed in Version 
Summary0006374: [PATCH] Improved teleport error handling
DescriptionImproved error-handling in case teleport fails:

1. Instead of resetting the agent's "In-Transit" flag separately in each place where the teleport can fail, do so in a "finally" clause. This is more robust because it reduces the need to remember to reset the flag wherever it's needed.

2. In addition, reset the "In-Transit" flag when the client disconnects. This ensures that if a bug did happen and leave the agent In-Transit, then simply disconnecting and reconnecting will fix the problem. Without this change only a region restart would fix the agent's status.
TagsNo tags attached.
Git Revision or version number9a8a5afc59c11719595096ff3b9926c8c6cc0d6a
Run Mode Grid (Multiple Regions per Sim)
Physics EngineODE
EnvironmentMono / Linux32
Mono Version2.10
Viewer
Attached Filespatch file icon 0001-Improved-teleport-error-handling.patch [^] (17,367 bytes) 2012-12-16 06:26 [Show Content]
patch file icon jc.reset-state-on-client-close.patch [^] (4,389 bytes) 2013-02-22 14:48 [Show Content]

- Relationships

-  Notes
(0022929)
justincc (administrator)
2012-10-25 17:05

I am certainly happy with changes to improve the readability of ResetFromTransit().

However, I believe this patch suffers from a problem if DoTeleport() exits early because the agent is already teleporting. In this case, from my reading ResetFromTransit() will be invoked when it should not be.

In addition, have you encountered any cases where the client was closed in the middle of teleport? I am extremely wary of belt and braces solutions here because they tend to hide underlying problems - I would far rather a bug start occurring so that it can be fixed properly rather than get hidden and possibly manifest itself in other ways that become much harder to debug.

However, it strikes me that perhaps a client close by the user is possible in the middle of teleport. Have you tested this?

For instance, hooking OnClientClosed() also has the effect of resetting the transit flag at sp.Scene.IncomingCloseAgent() in ETM.DoTeleport() rather than right at the very end. This would open up a race condition with sp.Scene.NeedSceneCacheClear(). It just so happens that I just looked through this old method and it appears completely pointless so I intend to comment it out/remove it. However, this is the kind of unexpected stuff that could be introduced without very careful analysis.

Also, why make the EntityTransferStateMachine (and OnClientClosed()) protected rather than private? ETSM should be managed purely by the ETM module and not touched by subclasses unless there's a very good reason.
(0023268)
orenh (administrator)
2012-12-16 06:33

I updated the patch with the following changes:

1. Handle correctly the case where an agent that is already in transit attempts to start another teleport/cross. There's a simple rule that ensures that this works correctly: calls to SetInTransit() and ResetFromTransit() must come in pairs. If SetInTransit() succeeds then start a try/finally block that calls ResetFromTransit() when it's done.

2. Changed m_entityTransferStateMachine to be private. I actually think it's ok to have it protected: that was done to allow HGEntityTransferModule to access it. Since HGEntityTransferModule is similar to EntityTransferModule, it should be allowed to call its internal members. But after this latest refactoring I no longer need to call m_entityTransferStateMachine from HGEntityTransferModule, so I made it private.

To answer your questions:

Yes, I did encounter a case where the client was closed in the middle of teleport. This is easy to do. A bigger problem happened when the teleport failed: in some cases the user's transit state wasn't reset, with the serious consequence that the user couldn't teleport any longer. Previous to this patch, this was a region-level error that required a region restart to fix. Now, if this happens again, it's just a user-level error that can be fixed simply by logging out and back in.
(0023591)
justincc (administrator)
2013-02-22 15:14

Hi Oren. I certainly agree with the try/finally changes so I've put these in as git commit ccb7cce. I separated out the other changes into jc.reset-state-on-client-close.patch above.

I actually do like the idea of resetting the transit state on the client close but not for the 'belt and braces' reason. Rather, it's because it has the potential to stop other transfer-related things from happening when the client is closed during teleport.

However, on quickly testing this, I do notice that if a client is closed whilst the first region is waiting for the 'everything is fine' reply from the destination region, the following kind of thing occurs.

22:33:34 - [ENTITY TRANSFER MODULE]: Teleporting Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test three to http://192.168.1.2:9000/ [^] (http://192.168.1.2:9000/ [^]) test two/<120.161, 127.3456, 34.84365>
22:33:34 - [ENTITY TRANSFER MODULE]: Destination is running version SIMULATION/0.1
22:33:34 - [SCENE]: Region test two told of incoming child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 (circuit code 995135401, IP 192.168.1.2, viewer Second Life Release 3.3.4.262321, teleportflags (ViaLocation), position <120.161, 127.3456, 34.84365>)
22:33:34 - [SCENE]: Region test two authenticated and authorized incoming child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 (circuit code 995135401)
22:33:34 - [SCENE PRESENCE]: Closing child agents. Checking 2 regions in test three
22:33:34 - [SCENE PRESENCE]: Closing 1 child agents
22:33:34 - [SCENE COMMUNICATION SERVICE]: Sending close agent ID e4f3924a-5a7c-4e1a-bee7-aa96580f2515 to test one
22:33:34 - [CLIENT]: Close has been called for Justin Clark-Casey attached to scene test one
22:33:34 - [SCENE]: Removing child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test one
22:33:34 - [ENTITY TRANSFER MODULE]: Set release callback URL to http://192.168.1.2:9000/agent/e4f3924a-5a7c-4e1a-bee7-aa96580f2515/dd5b77f8-bf88-45ac-aace-35bd76426c83/release/ [^] in test three
22:33:34 - [SCENE]: Incoming child agent update for e4f3924a-5a7c-4e1a-bee7-aa96580f2515 in test two
22:33:34 - [LLUDPSERVER]: Handling UseCircuitCode request for circuit 995135401 to test two from IP 192.168.1.2:55514
22:33:34 - [SCENE]: Adding new child scene presence Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 to scene test two at pos <120.161, 127.3456, 34.84365>
22:33:34 - [CAPS]: Received SEED caps request in test two for agent e4f3924a-5a7c-4e1a-bee7-aa96580f2515
22:33:35 - [CLIENT]: Got a logout request for Justin Clark-Casey in test three
22:33:35 - [CLIENT]: Close has been called for Justin Clark-Casey attached to scene test three
22:33:35 - [ENTITY TRANSFER MODULE]: Sending new CAPS seed url http://192.168.1.2:9000/CAPS/cbc805db-deb8-4f51-b998-9fb8015003fb0000/ [^] from test three to Justin Clark-Casey
22:33:35 - [SCENE]: Removing child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test three
22:33:36 - [ENTITY TRANSFER STATE MACHINE]: Agent with ID e4f3924a-5a7c-4e1a-bee7-aa96580f2515 should not exit directly from state Transferring, should go to CleaningUp state first in test three
22:33:36 - [ENTITY TRANSFER STATE MACHINE]: Agent e4f3924a-5a7c-4e1a-bee7-aa96580f2515 cleared from transit in test three
22:33:36 - [ENTITY TRANSFER MODULE]: Teleport of Justin Clark-Casey to test two from test three failed due to no callback from destination region. Returning avatar to source region.
22:33:36 - [ENTITY TRANSFER MODULE]: Exception on teleport of Justin Clark-Casey from <129.0615, 124.856, 34.84365>@test three to <120.161, 127.3456, 34.84365>@test two: Agent with ID e4f3924a-5a7c-4e1a-bee7-aa96580f2515 is not registered as in transit in test three at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferStateMachine.UpdateInTransit (UUID id, AgentTransferState newState) [0x00022] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferStateMachine.cs:124
  at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.Fail (OpenSim.Region.Framework.Scenes.ScenePresence sp, OpenSim.Services.Interfaces.GridRegion finalDestination, Boolean logout) [0x00000] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:733
  at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.DoTeleportInternal (OpenSim.Region.Framework.Scenes.ScenePresence sp, OpenSim.Services.Interfaces.GridRegion reg, OpenSim.Services.Interfaces.GridRegion finalDestination, Vector3 position, Vector3 lookAt, UInt32 teleportFlags) [0x005d5] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:676
  at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.TeleportAgentToDifferentRegion (OpenSim.Region.Framework.Scenes.ScenePresence sp, UInt64 regionHandle, Vector3 position, Vector3 lookAt, UInt32 teleportFlags, OpenSim.Services.Interfaces.GridRegion& finalDestination) [0x000f0] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:360
  at OpenSim.Region.CoreModules.Framework.EntityTransfer.EntityTransferModule.Teleport (OpenSim.Region.Framework.Scenes.ScenePresence sp, UInt64 regionHandle, Vector3 position, Vector3 lookAt, UInt32 teleportFlags) [0x000e1] in /home/justincc/jc/it/v/virtual-environments/specific/second-life/servers/opensim/src/opensim-git/OpenSim/Region/CoreModules/Framework/EntityTransfer/EntityTransferModule.cs:225
22:34:35 - [LLUDPSERVER]: Ack timeout, disconnecting child agent for Justin Clark-Casey in test two
22:34:35 - [CLIENT]: Close has been called for Justin Clark-Casey attached to scene test two
22:34:35 - [SCENE]: Removing child agent Justin Clark-Casey e4f3924a-5a7c-4e1a-bee7-aa96580f2515 from test two

This works but is extremely ugly with the exception spew. Now, in this case one could prevent this by checking for the InTransit state after receiving false from WaitForAgentArrivedAtDestination(). However, I think it emphasizes the fact that making this change swaps a known set of behaviours when a client is closed during teleport for a new set of behaviours where we need to have a good idea of the knock-on effects.

This may also be a crude way of implementing teleport cancel. At the moment, this is not hooked up at all and hitting cancel can strand an avatar and force a relog.

But of course, these changes narrow the window for races rather than close it. I think that what is ultimately needed is to extend the state model to encompass client close and login as well as inter-region transfers. It may well make sense for a presence to always have some kind of transfer state, where one of those states is 'not transferring'. Thoughts?

On a minor point, I did not carry through the logging change from debug to warn after a QueryAccess fails. This is because this call could fail for all kinds of legitimate reasons (region full, viewer banned, etc.) and so a log warning would be misleading.
(0023703)
justincc (administrator)
2013-03-26 18:24

I effectively implemented the client closing check in git master 7471bc7, after previously implementing teleport cancellation as best as can be done at the moment. This is done in a more controlled way with a new Aborting transfer state which the teleport process checks at strategic points.

This still leaves windows where behaviour is unpredictable. However, extending the existing AgentCircuitData logging that controls login/logout into entity transfer quickly became extremely hairy.
(0024984)
orenh (administrator)
2014-01-11 23:45

The teleporting code has changed significantly in 0.7.6, so this patch isn't relevant anymore.

- Issue History
Date Modified Username Field Change
2012-10-24 06:26 orenh New Issue
2012-10-24 06:26 orenh File Added: 0001-Improved-teleport-error-handling.patch
2012-10-24 06:26 orenh Assigned To => orenh
2012-10-24 06:26 orenh Status new => patch included
2012-10-24 06:26 orenh Assigned To orenh => justincc
2012-10-24 06:26 orenh Status patch included => assigned
2012-10-24 06:26 orenh Status assigned => patch included
2012-10-25 17:05 justincc Note Added: 0022929
2012-10-25 17:05 justincc Status patch included => patch feedback
2012-12-16 06:24 orenh File Deleted: 0001-Improved-teleport-error-handling.patch
2012-12-16 06:26 orenh File Added: 0001-Improved-teleport-error-handling.patch
2012-12-16 06:33 orenh Note Added: 0023268
2012-12-16 06:34 orenh Status patch feedback => patch included
2013-02-22 14:48 justincc File Added: jc.reset-state-on-client-close.patch
2013-02-22 15:14 justincc Note Added: 0023591
2013-02-22 15:14 justincc Status patch included => patch feedback
2013-03-26 18:24 justincc Note Added: 0023703
2014-01-11 23:45 orenh Note Added: 0024984
2014-01-11 23:45 orenh Status patch feedback => resolved
2014-01-11 23:45 orenh Resolution open => won't fix
2014-07-29 13:42 chi11ken Status resolved => closed


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker