0008580opensim[GRID] Grid Servicepublic2019-08-21 05:362021-10-15 10:33
Mono 6.0.1Ubuntu Linux18.04 
Grid (1 Region per Sim)
Mono / Linux64
Firestorm 6.0.1
0008580: Teleport to a region in the same grid fails with "Error Contacting Grid".
Teleporting from a region within a grid to a region within the same grid sometimes fails with the error "Error Contacting Grid". In the log when this occurs:

2019-08-20 03:06:13,542 DEBUG [HG ENTITY TRANSFER MODULE]: region Utopia Skye Welcome flags: 524
2019-08-20 03:06:13,544 DEBUG [HG ENTITY TRANSFER MODULE]: Destination region is hyperlink
2019-08-20 03:06:13,569 DEBUG [GATEKEEPER SERVICE CONNECTOR]: contacting [^]
2019-08-20 03:06:13,576 ERROR [GATEKEEPER SERVICE CONNECTOR]: remote call returned an error: Requested method [get_region] not found
2019-08-20 03:06:13,577 WARN [HG ENTITY TRANSFER MODULE]: GetHyperlinkRegion of region 06edeafa-a90d-402a-b312-8bf8eee34016 from Gatekeeper [^] failed: Error contacting grid.
2019-08-20 03:06:13,579 WARN [ENTITY TRANSFER MODULE] Final destination is having problems. Unable to teleport Mike Chase 6bee7b54-xxxx-4b5f-8b6f-ae8b5ecfdcf1: Error contacting grid.

Note that it seems to think this is a hypergrid transfer and the port number for the gatekeeper is incorrect and actually what you might expect for a standalone. This is correct in the config files and eventually without a restart things will start working again.
It seems to occur randomly. And the source and destination regions can be any though of course some destinations are more common so we see them more frequently.
I had one case where I tried to teleport to a neighboring region (where I should have had a child agent). The teleport failed as above. I was able to do a region crossing to the region successfully.

This is on a fairly recent MASTER + a few local changes build. Current as of mod July. Last commit was making DNS timing configurable (I dont think thats the cause though).

Havent had alot of time to debug this but I'll add notes as/if I do.
2019-09-01 10:23   
Some additional notes. Also seeing lots of cases where the teleport fails in progress and Firestom and often FS will disconnect you.
2019-09-01 10:26   
Do you happen to have more than one region per side? For example a VAR with more than one single region touching it's borders on the same side.
2019-09-01 10:42   
No and almost no vars at all. Teleporting has been super problematic with current master. Some kind of timing issue with Firestorm that causes disconnects frquently. And I also get the grid not found issue which I think is different but similar end result.
2019-09-01 15:43   
2019-08-20 03:06:13,542 DEBUG [HG ENTITY TRANSFER MODULE]: region Utopia Skye Welcome flags: 524
2019-08-20 03:06:13,544 DEBUG [HG ENTITY TRANSFER MODULE]: Destination region is hyperlink

no idea why that region has RegionFlags.Hyperlink (512) set on your dbs
2019-09-01 18:39   
LibOMV has RegionFlags = 512 as NullLayer. Not sure that makes sense either. What should it be? I did run an older release (0.9 from back in October 2018) for a short while. I suppose its possible the flag had a different purpose then. Is it safe to clear them and if so what should the value be for regions in a grid?
2019-09-01 18:43   
I think he meant 524, which is the flag for a hyperlinked region ...

Normal regions I think are just 4, which is "Region Online".
2019-09-01 19:05   
i did said 512 (OpenSim.Framework).RegionFlags.Hyperlink

not libomv Region flags
and not 524 == 512 | 8 | 4
 == RegionFlags.Hyperlink | RegionFlags.NoDirectLogin | RegionFlags.RegionOnline;

where only the first actually means hyperlink region.
2019-09-01 19:08   
something, somewhere misidentified a url to that region as external and created a HG link for it
2019-09-01 19:10   
Well yes, I was referring to the actual field, in the DB not the combined flags ..
2019-09-01 19:11   
Ok I found it. Wrong flags. it SHOULD be the default HG region. 524 would be HyperLink + NoDirectLogin + RegionOnline.

The online makes sense. Not sure what NoDirectLogin is set for. But this looks like a productive area to explore. Something is rotten in Denmark as they say.
2019-09-01 19:28   
(edited on: 2019-09-01 19:30)
that is the set of flags atributed to a hglink virtual region. The nodirectlogin is extra measure to prevent login service to login a local user to it.
a hglink virtual region is pointer to a grid gatekeeper that is queried to get actual region information as seen on:

2019-08-20 03:06:13,569 DEBUG [GATEKEEPER SERVICE CONNECTOR]: contacting [^] [^]
2019-08-20 03:06:13,576 ERROR [GATEKEEPER SERVICE CONNECTOR]: remote call returned an error: Requested method [get_region] not found

of course a real region does not have gatekeeper methods, so the fail

2019-09-01 19:38   
So should a region restart reset this or do I need to fix it in the database?
2019-09-01 19:41   
(edited on: 2019-09-01 19:59)
Ok, More mystery. The actual value in the database is 1031 which is:

DefaultHGRegion + RegionOnline + FallbackRegion + DefaultRegion.

That's correct. How did you see 524?

2019-09-02 00:06   
It is right there in the log snippet you posted ..

2019-08-20 03:06:13,542 DEBUG [HG ENTITY TRANSFER MODULE]: region Utopia Skye Welcome flags: 524
2019-08-20 03:06:13,544 DEBUG [HG ENTITY TRANSFER MODULE]: Destination region is hyperlink
2019-09-02 02:42   
check your hyperlinks in the console, 'show hyperlink' (there should be no local hyperlinks, which do happen occasionally for unknown reasons)
2019-09-02 07:49   
Ok so sorting this out. The original issue I reported showed the "from" region contacting the gatekeeper using the wrong address. Again the entry in the database is correct but for some reason the region I'm teleporting from isn't resolving the gatekeeper correctly. The entry in the config file is correct and it came up correctly but at some point the correct information was lost. The from region obviously thinks the destination is via hypergrid. So something has caused the from region to lose the correct information to the gatekeeper and to presumably start acting like a standalone. I appreciate the patience working through this. Most of my past experience has been on a closed grid without HG support. Good thing to learn more about. Now I just need to work out why this happens occasionally.
2019-09-02 10:20   
No, not exactly that.
Somewhere, sometime in past, sender region created that wrong hglink, for what should had been a direct teleport, without any gatekeeper.
2019-09-02 10:37   
plz check robust logs for "[HYPERGRID LINKER]: Link to" messages that may relate to that region, and see if messages around that can give a clue to help pinpout what code paths created that.
2019-09-03 01:49   
made a few changes on regions search code on master
2019-10-06 07:59   
I've tracked this down and fixed it. I was using a second private network for the inter-server comms on private ports (regions to private grid services, inventory, assets). Apparently something in the code doesn't like the two interface approach and you get intermittent failures resolving grid services from the regions. I resolved this by moving off the private network to a single public interface secured with firewall rules. Things are now happy once again. So something with the private network creates this problem.
2021-10-15 10:33   
I've encountered this issue also and found a way to reproduce it.

* User is on GRID A
* User attempts to teleport to GRID B

If GRID B's HG gateway region has the same name as a region on GRID A, this bug will be encountered.

The only way to clear it is to restart both ROBUST and also the source region you were teleporting from