Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007395opensim[REGION] OpenSim Corepublic2014-12-16 16:472015-01-01 17:26
ReporterAliciaRaven 
Assigned Tocmickeyb 
PrioritynormalSeverityminorReproducibilityunable to reproduce
StatusassignedResolutionopen 
PlatformOSOS Version
Product Versionmaster (dev code) 
Target VersionFixed in Version 
Summary0007395: HG User with bad connection overloaded and froze networking
DescriptionThe sim had been running fine for a few days, then a HG user logged in and the console was filled with "slow post request" and "Could not get contents of folder" warnings. After about 5 mins the HG user and another avatar present were logged out with "No Packets Received" warnings. After this, all networking seems to fail for the next hour at which point the problem was discovered and the region restarted.

During the hour before restarting the console was being filled with "Request Timeouts" some to the robust presence service (on the same server) and a lot to the HG users xinventory service. Restarting also took along time as the requests to deregister from the grid service were timing out.

I would speculate that the requests to the HG users grid were failing and this caused a backlog in network data.
TagsNo tags attached.
Git Revision or version number
Run Mode Grid (Multiple Regions per Sim)
Physics EngineBulletSim
Script Engine
EnvironmentMono / Linux64
Mono Version3.6
Viewer
Attached Fileslog file icon NetworkOverload1.log [^] (88,229 bytes) 2014-12-16 16:47

- Relationships

-  Notes
(0027118)
Gavin Hird (reporter)
2014-12-18 08:47

Not saying this is a permanent fix, but I saw similar issues in the dev (not release) code and setting

enable_adaptive_throttles = false

in the [ClientStack.LindenUDP] section of OpenSim.ini largely removed the issue.

Further I adjusted the client_socket_rcvbuf_size = parameter in the same section to be in tune with the operating system. If I set it much higher than 3145728 mono would crash on startup, but this was on OS X so your limits may be different for Linux.
(0027119)
cmickeyb (administrator)
2014-12-18 08:53

What does "fixed" mean? Did you have another HG user come in with the same problem? The enable_adpative_throttles only affects the connection directly to the client. Some of the messages you discussed above are messages to core services & are not changed by the throttles.

Next time you see this happening, could you run 'show throttles', 'show queues' and 'show pqueues' on the console & post the results.
(0027120)
Gavin Hird (reporter)
2014-12-18 09:17

Yes, I had the same problem with HG users coming win with exactly the same type messages and behavior.

I do get the same "Could not get contents of folder" warnings with incoming HG users.

With the throttle set, the minute a HG user came in and I started getting "slow post request" this happened to all connected clients local or not, and it could take hours before it cleared back to normal speed.

After disabling the throttle, I see the occasional message, but nothing serious.
(0027121)
cmickeyb (administrator)
2014-12-18 09:20

Odd. The throttles has nothing to do with any post requests. Though it could be that something in the throttle backlog is causing the overall simulator performance to degrade.

Like I said... please post the output of show queues, show throttles and show pqueues the next time you see this.

And I should have a new build with additional debugging information coming sometime this week.
(0027122)
Gavin Hird (reporter)
2014-12-18 09:30

OK, I'll do that and watch this space.

To me, it looked like it throttled the entire simulator, but that was just my casual observation
(0027123)
cmickeyb (administrator)
2014-12-18 09:33

it only throttles a single connection based on the reliability of that connection. however, if processing the ever increasing queues for that connection slow down the simulator, then there will be problems. I know of at least one case where there is a linear pass through the resend queue. If some of the recent changes move that inside a lock, for example, then it would have the effect of slowing down everyone.
(0027144)
Gavin Hird (reporter)
2014-12-31 01:20

I have taken all of Mic Bowman's changes up to commit r/25641 and they are now live on grid.xmir.org:8002

The changes are now running with enable_adaptive_throttles = false but will change that later if everything else is stable.
(0027146)
Gavin Hird (reporter)
2014-12-31 04:49

I have set enable_adaptive_throttles = true on both the public facing and test sims.

The public facing has 29 regions in one simulator. It seemingly runs OK also on my own hypergrid transfers, but then again I have a pretty fast line. I guess the test is when someone drops in on a slow connection.
(0027147)
Mata Hari (reporter)
2014-12-31 09:42

Thinking this may very well be related to 0007308 and 0007393
(0027148)
AliciaRaven (manager)
2014-12-31 10:33

Mata Hari: I believe that the problems Gavin Hird is talking about above are related to those issues. However this report is similar but has a different cause.

I had talked to Mic about this problem in relation to those reports and he had asked me to create this as a separate report. The the root cause of this issue is not throttles. It seems that if a region adds enough requests for a remote hg region that is maybe misconfigured and causing them to timeout, it will block all networking. Mic had suggested this was probably thread starvation.

My initial report states that even requests to the local grids robust service where failing which indicates that this can not be a throttle problem.
(0027149)
Gavin Hird (reporter)
2015-01-01 05:12

In my case I run Robust on the same server as some of the regions while the rest run on a different machine. So the question is does trottling somehow spill over to Robust so that every instance of mono running on the same machine is influenced, or is there some interaction happening at the adapter level that slows down network accesses?

After I took the latest batch of code, I have not seen any slowdowns with or without throttling enabled, but then again external HG traffic has been low so it needs more observation.
(0027150)
Gavin Hird (reporter)
2015-01-01 05:23

On the subject of thread starvation I don't think that is the problem in my case as with

async_call_method = SmartThreadPool the peak use of system threads for mono-sgen is around 1300 threads during instance startup, and it fluctuates between 335 and 370 during normal conditions on the sims.

With async_call_method = Thread it peaks at closer to 2000 threads during startup, and then settles as the same level as above.

The setting for MaxPoolThreads = 350 but just allocates more if needed.
For Fire and forget pools MaxPoolThreads = 150 with the min at 10.
(0027151)
cmickeyb (administrator)
2015-01-01 17:26

Again... adaptive throttles have nothing in common with connections between the simulator and any other ROBUST service. They are exclusively for the connection between the viewer and the simulator. The issues that AliciaRaven reports as problems with the connection between the simulator and ROBUST services (whether the primary service associated with the simulator or an service associated with an HG user) have nothing to do with throttles, but are very real problems that others are reporting and warrant investigation.

- Issue History
Date Modified Username Field Change
2014-12-16 16:47 AliciaRaven New Issue
2014-12-16 16:47 AliciaRaven File Added: NetworkOverload1.log
2014-12-16 16:52 AliciaRaven Assigned To => cmickeyb
2014-12-16 16:52 AliciaRaven Status new => assigned
2014-12-18 08:47 Gavin Hird Note Added: 0027118
2014-12-18 08:53 cmickeyb Note Added: 0027119
2014-12-18 09:17 Gavin Hird Note Added: 0027120
2014-12-18 09:20 cmickeyb Note Added: 0027121
2014-12-18 09:30 Gavin Hird Note Added: 0027122
2014-12-18 09:33 cmickeyb Note Added: 0027123
2014-12-31 01:20 Gavin Hird Note Added: 0027144
2014-12-31 04:49 Gavin Hird Note Added: 0027146
2014-12-31 09:42 Mata Hari Note Added: 0027147
2014-12-31 10:33 AliciaRaven Note Added: 0027148
2015-01-01 05:12 Gavin Hird Note Added: 0027149
2015-01-01 05:23 Gavin Hird Note Added: 0027150
2015-01-01 17:26 cmickeyb Note Added: 0027151


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker