Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007392opensim[GRID] Grid Servicepublic2014-12-09 09:062017-01-05 12:27
Reporteraiaustin 
Assigned Tomelanie 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusassignedResolutionreopened 
PlatformPCOSWindowsOS Version8.1
Product Versionmaster (dev code) 
Target VersionFixed in Version 
Summary0007392: Suggestion to serve robots.txt from grid/standalone base URL
DescriptionSearch engines and other probes frequently attempt to look up URLs on OpenSim grids using URLs they find on web pages, in bug reports online, etc. These often are URLs with avatar and asset UUIDs that are not meant for external content serving but hooks used by OpenSim system. They fail with red errors in the consoles for Robust.exe and OpenSim.exe, and have been known to hang the system in some cases I have not fully resolved.

If the base URL with, e.g. http://grid.com:8002/robots.txt [^] could be served then the search engines would use this as a guide that the site is not meant to be indexed.

The robots.txt file by default could serve the content..

# go away
User-agent: *
Disallow: /
TagsNo tags attached.
Git Revision or version number0.9.1.0 dev master
Run Mode Grid (Multiple Regions per Sim)
Physics EngineODE
Script Engine
Environment.NET / Windows64
Mono VersionNone
ViewerN/A
Attached Filestxt file icon robots.txt [^] (39 bytes) 2016-12-29 12:01 [Show Content]
patch file icon 0001-Serving-robots.txt-from-bin.patch [^] (2,257 bytes) 2016-12-29 14:23 [Show Content]
patch file icon 0001-GridRobotsHandler.patch [^] (2,518 bytes) 2017-01-04 15:35 [Show Content]

- Relationships

-  Notes
(0027087)
aiaustin (developer)
2014-12-11 06:43

Typical messages from firewall probing software that show as red errors are:

Robust.exe Console

14:33:18 - [BASE HTTP SERVER]: Handler not found for http request POST /av-centerd
14:33:20 - [BASE HTTP SERVER]: Handler not found for http request POST /
14:34:44 - [BASE HTTP SERVER]: Handler not found for http request SEARCH /
14:34:59 - [BASE HTTP SERVER]: Handler not found for http request GET /flex2gateway/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /blazeds/messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /lcds/messagebroker/http
14:35:55 - [BASE HTTP SERVER]: Handler not found for http request POST /

-------------------------------------
OpenSim.exe Console

14:29:54 - [AGENT HANDLER]: Invalid parameters for agent message /Agent/
14:33:18 - [BASE HTTP SERVER]: Handler not found for http request POST /av-centerd
14:33:18 - [BASE HTTP SERVER]: Handler not found for http request POST /av-centerd
14:33:20 - [BASE HTTP SERVER]: Handler not found for http request POST /
14:33:20 - [BASE HTTP SERVER]: Handler not found for http request POST /
14:34:34 - [BASE HTTP SERVER]: HttpServer.HttpListener had an exception: An existing connection was forcibly closed by the remote host System.Net.Sockets.Socket
Exception (0x80004005): An existing connection was forcibly closed by the remote host
   at System.Net.Sockets.Socket.EndAccept(Byte[]& buffer, Int32& bytesTransferred, IAsyncResult asyncResult)
   at System.Net.Sockets.Socket.EndAccept(IAsyncResult asyncResult)
   at System.Net.Sockets.TcpListener.EndAcceptSocket(IAsyncResult asyncResult)
   at HttpServer.HttpListenerBase.OnAccept(IAsyncResult ar)

14:34:44 - [BASE HTTP SERVER]: Handler not found for http request SEARCH /
14:34:45 - [BASE HTTP SERVER]: Handler not found for http request SEARCH /
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /flex2gateway/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /blazeds/messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /lcds/messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /flex2gateway/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /blazeds/messagebroker/http
14:35:00 - [BASE HTTP SERVER]: Handler not found for http request GET /lcds/messagebroker/http
(0027092)
aiaustin (developer)
2014-12-12 08:52
edited on: 2014-12-12 08:53

Here is a typical red error message when a probe comes in from a search engine... I think.. in the past on checking the IP it is a Google search bot. This one reports as a non-existent domain though.

10:44:22 - [AGENT HANDLER]: method GET not supported in agent message /agent/e24
a9015-f5ca-452b-8c95-d32e34cb9d64/, (caller is 220.181.51.102)

The /agent/UUID URL is one found, for example, OpenSim discussion forum message lists.

(0031501)
aiaustin (developer)
2016-12-29 12:01
edited on: 2016-12-29 12:05

I just want to note again that it would be useful I think to be able to serve http://domainname>:<port>/robots.txt [^] from the core setup for OpenSim for standalones and grids and to allow robots.txt to be provided... perhaps like the default http_404.html and http_500.html files can be.

A sample robot.txt that could be included in the bin directory by default is attached.

(0031502)
Mandarinka Tasty (reporter)
2016-12-29 13:26
edited on: 2016-12-29 13:27

Hello )

I have prepared solution for you. It works for me. Though, I rather doubt, that

core needs that.Please treat it as my personal solution for you.

I've written it as kind of exercise.

Probably, it can be made in more elegant and proper way.

(0031503)
melanie (administrator)
2016-12-29 13:57

Can we please not have a file for this but use

string robots = "# go away\nUser-agent: *\nDisallow: /\n";

in code?
(0031504)
Mandarinka Tasty (reporter)
2016-12-29 13:59

sure, we can ) one moment, I correct the patch.
(0031506)
Mandarinka Tasty (reporter)
2016-12-29 14:24

I've replaced the patch with an appropriate version.
(0031508)
aiaustin (developer)
2016-12-30 02:29
edited on: 2017-01-04 14:09

Thanks for incorporating the capability, which will be useful.

I note that the changes made allow the REGION server (on the base port used by OpenSim.exe instance) to serve robots.txt, and that does catch some of the incoming probes and web crawlers. But the Grid/Robust service for the main advertised GridURL does not yet serve robots.txt.

So, for example with this change in place on my test AiLand (ROBUST) grid...

http://ai.vue.ed.ac.uk:8002/robots.txt [^] does not work yet
http://ai.vue.ed.ac.uk:9000/robots.txt [^] does work (where 9000 is the OpenSim.ini http_listener_port)

I assume a similar change to that made already for OpenSim.exe needs to be incorporated in the relevant file(s) for Robust.exe?

(0031509)
djphil (reporter)
2016-12-30 03:18

Although it does not have any protective power, we might take this opportunity to add the humans.txt file too ...

Doc available here: http://humanstxt.org/ [^]
(0031512)
aiaustin (developer)
2016-12-30 04:36
edited on: 2016-12-30 04:39

@djphil, not with the mechanism used as that has the contents of the robots.txt file built in to the core code as a fixed string, and its just to let web crawlers etc know not to try to index below the root. The humans.txt mechanism you identified definitely needs to have an actual file and be changeable by the grid provider.

The main aim of providing robots.txt is to indicate this is not normal web contents and to head off the red errors that show on the console for probes and web crawlers using links they find through things like blogs and mantis entries as a root for their searches.

(0031513)
melanie (administrator)
2016-12-30 05:13

Hard -1 on humans.txt

Most grids don't want to publish names and there are also anonymous and pseudonymous developers. OpenSim grid URLs are not websites.
(0031527)
aiaustin (developer)
2017-01-04 14:07
edited on: 2017-01-04 14:15

@Mandarinka... I wonder if a similar change to the one you prepared and @Melanie incorporated for the OpenSim.exe base http server can also be incorporated in a suitable place for Robust.exe? See my earlier comment on 2016-12-30 10:29.

(0031528)
Mandarinka Tasty (reporter)
2017-01-04 15:36

Hello Ai :)

I have written idea of the patch. Please let me know, whether it works for you.

The patch has been attached: 0001-GridRobotsHandler.patch
(0031529)
Mandarinka Tasty (reporter)
2017-01-04 15:38

It uses GridInfoServer, that runs on 8002, defaultly.

I've made it intuitevely, so please describe and if there appears need

I can deal with it more.
(0031530)
aiaustin (developer)
2017-01-05 01:37
edited on: 2017-01-05 01:42

That looks good Mandarinka. I have installed the patch on the AiLand grid into the latest 0.9.1 dev master and both the Robust/grid level and OpenSinm.exe/Region level now serve robots.txt correctly...

http://ai.vue.ed.ac.uk:8002/robots.txt [^]
http://ai.vue.ed.ac.uk:9000/robots.txt [^]

This grid also uses @Diva Wifi which I wanted to check as that serves the root Grid URL in recent versions as well as its original WiFi home page...

http://ai.vue.ed.ac.uk:8002/ [^]
http://ai.vue.ed.ac.uk:8002/wifi [^]

It would be useful if Melanie could check the method used to see that it looks okay and then have it added into dev master so we can completely close this issue. Thanks again.

(0031531)
melanie (administrator)
2017-01-05 05:38

The patch isn't suitable for core. It doesn't take into account that grids usually run more than one ROBUST and also may choose to provide GridInfo without using the GridInfoService. So that service may not be running at all or may be running only one of a number of ROBUSTs. The serving of robots.txt needs to be done in the ServicesServerBase or a similarly low level place.
(0031532)
Mandarinka Tasty (reporter)
2017-01-05 09:57

Hello :)
@Ai I'm happy,that it works for you. Feel free to use it.

@Melanie I agree, that method of mine is not universal.
Especially,if grid info is served in other way,than typical Opensim grid configuration.

I try to prepare it in other way. Thank you for the suggestion
concerning using low level service
(0031535)
aiaustin (developer)
2017-01-05 12:27
edited on: 2017-01-06 00:46

Great.. thanks to Melanie for the advice. @Mandarinka's earlier patch worked for me as I run a single Robust.exe for all services in my setups for Openvue and AiLand grids.


- Issue History
Date Modified Username Field Change
2014-12-09 09:06 aiaustin New Issue
2014-12-09 09:06 aiaustin Summary Suggestion to serve robots.txt from grid/standalonw base URL => Suggestion to serve robots.txt from grid/standalone base URL
2014-12-11 06:43 aiaustin Note Added: 0027087
2014-12-12 08:52 aiaustin Note Added: 0027092
2014-12-12 08:53 aiaustin Note Edited: 0027092 View Revisions
2016-12-29 12:01 aiaustin Note Added: 0031501
2016-12-29 12:01 aiaustin File Added: robots.txt
2016-12-29 12:04 aiaustin Git Revision or version number r/25604 => 0.9.1.0 dev master
2016-12-29 12:05 aiaustin Note Edited: 0031501 View Revisions
2016-12-29 13:24 Mandarinka Tasty File Added: 0001-Serving-robots.txt-from-bin.patch
2016-12-29 13:26 Mandarinka Tasty Note Added: 0031502
2016-12-29 13:26 Mandarinka Tasty Status new => patch included
2016-12-29 13:27 Mandarinka Tasty Note Edited: 0031502 View Revisions
2016-12-29 13:57 melanie Note Added: 0031503
2016-12-29 13:59 Mandarinka Tasty Note Added: 0031504
2016-12-29 14:23 Mandarinka Tasty File Deleted: 0001-Serving-robots.txt-from-bin.patch
2016-12-29 14:23 Mandarinka Tasty File Added: 0001-Serving-robots.txt-from-bin.patch
2016-12-29 14:24 Mandarinka Tasty Note Added: 0031506
2016-12-29 15:18 melanie Status patch included => resolved
2016-12-29 15:18 melanie Resolution open => fixed
2016-12-29 15:18 melanie Assigned To => melanie
2016-12-30 02:29 aiaustin Note Added: 0031508
2016-12-30 02:29 aiaustin Status resolved => feedback
2016-12-30 02:29 aiaustin Resolution fixed => reopened
2016-12-30 02:31 aiaustin Note Edited: 0031508 View Revisions
2016-12-30 03:18 djphil Note Added: 0031509
2016-12-30 04:36 aiaustin Note Added: 0031512
2016-12-30 04:36 aiaustin Status feedback => assigned
2016-12-30 04:37 aiaustin Note Edited: 0031512 View Revisions
2016-12-30 04:37 aiaustin Note Edited: 0031512 View Revisions
2016-12-30 04:39 aiaustin Note Edited: 0031512 View Revisions
2016-12-30 05:13 melanie Note Added: 0031513
2016-12-30 08:12 aiaustin Note Edited: 0031508 View Revisions
2017-01-04 14:07 aiaustin Note Added: 0031527
2017-01-04 14:08 aiaustin Note Edited: 0031527 View Revisions
2017-01-04 14:09 aiaustin Note Edited: 0031508 View Revisions
2017-01-04 14:15 aiaustin Note Edited: 0031527 View Revisions
2017-01-04 14:15 aiaustin Note Edited: 0031527 View Revisions
2017-01-04 15:35 Mandarinka Tasty File Added: 0001-GridRobotsHandler.patch
2017-01-04 15:36 Mandarinka Tasty Note Added: 0031528
2017-01-04 15:38 Mandarinka Tasty Note Added: 0031529
2017-01-05 01:37 aiaustin Note Added: 0031530
2017-01-05 01:39 aiaustin Note Edited: 0031530 View Revisions
2017-01-05 01:39 aiaustin Note Edited: 0031530 View Revisions
2017-01-05 01:41 aiaustin Note Edited: 0031530 View Revisions
2017-01-05 01:42 aiaustin Note Edited: 0031530 View Revisions
2017-01-05 01:42 aiaustin Note Edited: 0031530 View Revisions
2017-01-05 05:38 melanie Note Added: 0031531
2017-01-05 09:57 Mandarinka Tasty Note Added: 0031532
2017-01-05 12:27 aiaustin Note Added: 0031535
2017-01-05 12:28 aiaustin Note Edited: 0031535 View Revisions
2017-01-06 00:46 aiaustin Note Edited: 0031535 View Revisions


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker