Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007002opensim[REGION] Script Functionspublic2014-02-07 14:532014-02-11 16:14
Assigned Tomelanie 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0007002: Use signaling instead of CPU-blocking sleep for LSL events (listen, timer, sensor and dataserver).
DescriptionThe current OpenSim code contains a lot of instances where Thread.Sleep is used instead of proper semantics for concurrency and asynchronous events. Thread.Sleep is only justified in cases where programs interact with hardware for low-level IO where a steady pulses (Hz) are needed. That is not OpenSim's case, and the usage of Thread.Sleep effectively eats a lot of the potential of the machine by keeping the CPU busy while doing absolutely nothing.

The suggested (attached) patch changes the behavior of CmdHandlerThreadLoop and DoOneCmdHandlerPass in order to use AutoResetEvent instead of Thread.Sleep. The patch addresses: listeners, timers, sensors and dataserver events and generally improves the performance of OpenSim scripts.

One of the most illustrative gains can be observed by running the script at: [^]

on OpenSim and comparing the speed with Second Life. The result for running the script in Second Life can be seen in the following animation: [^]

and the current result for running the script in OpenSim can be observed in the following animation: [^]

This patch allows 0.01s timer resolutions which is what Second Life can currently do.

Note that there are other uses of Thread.Sleep in the OpenSim code which should be addressed unless the hard-coded values carry meaning (in most cases, they do not).
TagsNo tags attached.
Git Revision or version number
Run Mode Standalone (Multiple Regions)
Physics EngineBasicPhysics
Script Engine
EnvironmentMono / Linux64
Mono Version2.10
Attached Filespatch file icon 0001-Use-AutoResetEvent-instead-of-Thread.Sleep-in-order.patch [^] (3,689 bytes) 2014-02-07 14:53 [Show Content]
gif file icon secondlife.gif [^] (56,221 bytes) 2014-02-07 14:54

gif file icon opensim.gif [^] (574,764 bytes) 2014-02-07 14:54

- Relationships
related to 0006031closedkenvc 'System.OutOfMemoryException' - Simulators are consuming much more memory than in the past. 
related to 0003353closedJamenai 100% CPU usage from the mono process 
related to 0006030new Mono resource allocation grows rapidly while running opensim until CPU cannot function 
related to 0007005closedjustincc Use timers for LSL event handlers (listen, timer, sensor and dataserver). 

-  Notes
2014-02-07 15:27

I have added a few references, perhaps the most relevant is Jamenai's report which is similar to the sluggishness of the OpenSim scripting engine that we experience on our servers. It practically resolves any outstanding CPU issues where the scripting engine is involved.
dahlia (administrator)
2014-02-07 15:32

I'm not all that familiar with the script engine internals so I'll leave it to Justin or Melanie to apply this. That said, I don't believe Thread.Sleep wastes CPU as you state because OpenSimulator makes heavy use of threadpools and if a task sleeps, it can switch to another threadpool task and can even do so without a kernel trap.

Also, I don't think dropping timer resolution to 0.01 seconds is a good idea. It may allow faster updating in scripts but those updates are throttled before being sent to viewers and may even be ignored in other parts of the code, specifically the lazy update queues in LLClientView. The simulation frame rate os around 0.1 seconds and generating updates faster than that will unnecessarily fill queues with data which will be effectively ignored.

I don't mean to be overly critical about your patch and it may very well be a useful contribution. I just don't agree with your rationale.
2014-02-07 15:51
edited on: 2014-02-07 15:55

Here is a more comprehensive explanation on the misuse of Thread.Sleep:

Thread.Sleep does not yield to the CPU, instead it gladly accepts the CPU and blocks it. What happens is that threads will block even if they have nothing to do. As the code is currently written, that incurs a penalty of 100ms per script using listen, timer, sensor or dataserver.

The following code-block is extracted from AsyncCommandManager.cs:

cmdHandlerThreadCycleSleepms = 100;
        private static void CmdHandlerThreadLoop()
            while (true)



As a side-comment, CmdHandlerThreadLoop runs in an unstoppable loop which is again an instance of bad programming since threads should gracefully terminate instead of aborting them (for example, by replacing true with a flag to cancel the thread). In any case, on every iteration of the while loop, the thread will block for 100ms, regardless how powerful your machine is, regardless how fast it is, regardless how many cores it has it will pause at Thread.Sleep and effectively "pretend to look busy".

The Thread.Sleep blocking happens per script using the listen, timer, sensor and dataserver events. So you have a very rapid consumption of your CPU if you use Thread.Sleep. You will effectively have a large number of scripts that eat-up CPU cycles in order to perform nothing. While the CPU is blocked, it cannot receive another task and the scheduler will consider that it is busy. This makes the lag rocket because other threads will not be granted any CPU time.

On the other hand, Auto (or Manual) ResetEvent will only block the thread until DoOneCmdHandlerPass has completed and effectively yields the CPU such that concurrent threads can be branched in. That is what the patch addresses, and it should improve performance for regions with large number of scripts.

2014-02-07 16:11
edited on: 2014-02-07 16:11

An example of a completely broken judgment is the following comment found in OpenSim/Region/Framework/Scenes/Scene.cs:


            // Wait here, or the kick messages won't actually get to the agents before the scene terminates.
            // We also need to wait to avoid a race condition with the scene update loop which might not yet
            // have checked ShuttingDown.


The fact that you wait for 500ms, does not imply that the message has gotten to the agent because the agent may be on a slow connection - or a fast one, in which case the sleep is unnecessary. It also does not imply that the scene will have already checked ShuttingDown and does not fix any sort of race condition.

In fact, 500ms means nothing in the context of OpenSim, unless you are doing some sort of weird temporary logic where all threads are tracked in terms of time (and you have a solid specification for what these 500, 100, 7000, etc... values represent).

Otherwise, that Thread.Sleep guarantees nothing. You can just as well remove it, because on a loopback connection, "most likely" the delay is unnecessary. Or you can increase it to, shall we say, 30 seconds for modem users?

dahlia (administrator)
2014-02-07 16:17

I believe Thread.Sleep blocks the calling thread; whatever scheduling system is in use (kernel or user-space threadpool) will apply those CPU cycles elsewhere or account for them as idle time.
2014-02-07 16:27
edited on: 2014-02-07 16:29

No. It shan't. Firstly, it blocks the current thread, secondly it does not yield the CPU.

Here is a reference, MSDN Thread.Sleep: [^]

melanie (administrator)
2014-02-07 16:29

That is correct. Also, gracefully stopping threads is not an option because the threads execute LSL-compiled-as-c#, so a simple endless for loop in the script would take control of that thread away from the engine without chance of recovery. No abort flag can and will be checked in that case.

Thread.Sleep does block the thread but NOT the CPU. Threads are scheduled to CPUs only when in a runnable state, when they sleep the CPU performs other tasks. Thread.Sleep is NOT a "busy wait".
melanie (administrator)
2014-02-07 16:32

Also, besides solving a non-issue, your patch attempts to sneak in faster timers. That option is configurable already and the default can NOT be 0.01s. Although SL has 0.01s, OpenSim chokes on it except under carefully controlled conditions.

-1 on this one.
dahlia (administrator)
2014-02-07 16:44

Where in that MSDN document does it say it does not yield the cpu?
2014-02-07 16:46

It will wake up every 100000/second if you use Thread.Sleep, the result will be very bad, especially on Windows or Linux without the RT patch. With a 100ms wait you will get to have a bunch of context switches and you will have threads competing for CPU time.

Just like the previous argument, 100ms is an unjustified delay because nothing about that value represents anything meaningful - why should there be a 100ms delay? Is there some reason?

As for the flag, if you do mention that scripts in OpenSim cannot be gracefully stopped, then perhaps you can offer a grace period and then set the flag to terminate. Thread.Abort should be avoided and allow the garbage collector to free those resources.
dahlia (administrator)
2014-02-07 16:51

Kernel mode context switching should be avoided via the treadpool.
100 ms is a very meaningful value as it corresponds with the simulation frame rate.
melanie (administrator)
2014-02-07 16:51

Where do you get the idea that the thread would wake up? Please quote and/or cite, with authoritative source. Your claim flies in the face of everything Linux especially stands for.

The MSDN document you linked also supports Dahlia's and my understanding. A thread that is _suspended_ does NOT wake up periodically, nor is it spinning busy burning cycles.

Threads in Linux run on top of what is known as LWP, light weight processes. They are handled by the system scheduler. There is no more efficient way to handle sleeping than this.

Nowhere in that document is any information that says different.

As for the delay value, 100ms is a compromise between system load and responsiveness.
dahlia (administrator)
2014-02-07 16:54

While I disagree with your claim about Thread.Sleep() not yielding and that 100ms is not meaningful, I do agree with you that aborting threads is a bad practice.
melanie (administrator)
2014-02-07 16:59

There is really o way to avoid aborting threads in this architecture. When a use deletes a script, especially one that is out of control, they expect an immediate stop, not a stop several seconds later.
The classes are designed in a way that aborting the thread is safe and GC will happen. I have done extensive tests when I wrote it and I can guarantee that the code as I wrote it will not leave uncollected items.
Also, Justincc has done some work on cooperative script termination and you can select the option if you like. Again, as the option already exists, there is no call for code change.
Just because you don't like one of the possibilities does not make it right to remove it for everyone. Simply configure your installation for cooperative termination, but don't force your views on everyone.
2014-02-07 17:02

dahlia, the first part of your comment is answered by MSDN, as for the second part, here's an explanation:

Thread.Sleep is limited by the scheduler's interrupt period on both Linux and Windows - on Windows, I think this is 10 milliseconds, on Linux it should be 1ms. The wakeup occurs when a thread with a higher priority is ready to run at that moment. As a consequence, it is also not a guarantee that the thread will regain control, that's up to the scheduler.

The API has changed, but 3.5 still has:

"Specify zero (0) to indicate that this thread should be suspended to allow other waiting threads to execute."

on the MSDN: [^]

Mind, this applies to all threads, even if you have different AppDomains.
2014-02-07 17:07

melanie, the while(true) is unchanged in the patch. What you are talking about aborting threads is completely out of the context of the discussions.

On a personal note, your comment about forcing something upon you is also out of place given that this is a suggested patch (not sent yet to mainstream). Nor do I see its relevance and it is rather presumptuous and insulting.
melanie (administrator)
2014-02-07 17:07

The document says the following:

Specify 0 to make it behave like yield(), suspending it only once and rescheduling it immediately.

Specify Infinite to suspend infinitely, which means it takes a signal to wake it.

Specify anything else to suspend for a specified time.

The scheduler _checks_ the threads to see which are runnable, but it does NOT _wake_ them. No user code in suspended threads is executed during a scheduler pass. Event waiting and sleeping are identical in that respect, as sleeps are merely waits for a time event.

Your claims fly in the face of knowledge people here have, some from working on kernel and scheduler code (I'm one).

Please check your facts and read the source.
melanie (administrator)
2014-02-07 17:08

Closing this to end the drama.
justincc (administrator)
2014-02-11 15:35

Reopening. "To end the drama" is NOT a valid reason for closing a patch mantis unless things have degenerated into a pure slanging match provoked by the submitter, which is CLEARLY not the case here.

I will make the following points.

1. I agree with Dahlia and Melanie that Thread.Sleep() does not busy wait and by both documentation and experience does not eat up CPU time - the most that it does is tie up the structures associated with a thread. The URL you quote does not support your contention, Kira, as far as I can see.

2. Aborting threads is a bad idea and in my real world experience has caused unpredictable mono crashes with certain patterns of script execution. That's why I introduced co-operative script stop in the first place. Unfortunately, this is undocumented since it's still considered experimental but if you want to learn more please let me know and I will be happy to write some doc.

Thread abort should also not be done in this loop or in any other place in OpenSimulator.

3. I agree that the Thread.Sleep(500) you reference is not ideal - really, there should be a mechanism to make the simulator wait on shutdown for the kick messages to be dispatched to clients and for the scene update loop to stop. However, this may be fairly complex to implement. The sleep is a pragmatic solution - since the scene loop executes every 1/11 of a second it almost always will have terminated by the time that the sleep ends. Of course, this is unrelated to this issue.

4. There is an issue with not being able to generate timer events with < 100 ms latency. Remember that OpenSimulator is used for more than running a public social grid and in some contexts it is perfectly acceptable to allow short duration timers.

That said, your patch simply appears to introduce a constant loop - the cmdHandlerResetEvent is always set at the end of DoOneCmdHandlerPass so it's redundant. A constantly spinning loop is probably not a good idea since that is much more likely to tie up a thread. A solution with a timer that genuinely allows for a pause in execution (as we had to talk about externally because this thread was closed) is likely a better approach.
2014-02-11 16:13

Please close this patch, the patch at: [^]

provides timers for under 100ms, notably 10ms and supersedes this one. It was an early discussion that went downhill.

- Issue History
Date Modified Username Field Change
2014-02-07 14:53 user2213 New Issue
2014-02-07 14:53 user2213 File Added: 0001-Use-AutoResetEvent-instead-of-Thread.Sleep-in-order.patch
2014-02-07 14:53 user2213 Status new => patch included
2014-02-07 14:54 user2213 File Added: secondlife.gif
2014-02-07 14:54 user2213 File Added: opensim.gif
2014-02-07 15:21 user2213 Relationship added related to 0006031
2014-02-07 15:25 user2213 Relationship added related to 0003353
2014-02-07 15:25 user2213 Relationship added related to 0006030
2014-02-07 15:27 user2213 Note Added: 0025141
2014-02-07 15:32 dahlia Note Added: 0025142
2014-02-07 15:51 user2213 Note Added: 0025143
2014-02-07 15:54 user2213 Note Edited: 0025143 View Revisions
2014-02-07 15:55 user2213 Note Edited: 0025143 View Revisions
2014-02-07 15:55 user2213 Note Edited: 0025143 View Revisions
2014-02-07 16:11 user2213 Note Added: 0025144
2014-02-07 16:11 user2213 Note Edited: 0025144 View Revisions
2014-02-07 16:17 dahlia Note Added: 0025145
2014-02-07 16:27 user2213 Note Added: 0025146
2014-02-07 16:27 user2213 Note Edited: 0025146 View Revisions
2014-02-07 16:29 melanie Note Added: 0025147
2014-02-07 16:29 user2213 Note Edited: 0025146 View Revisions
2014-02-07 16:32 melanie Note Added: 0025148
2014-02-07 16:44 dahlia Note Added: 0025149
2014-02-07 16:46 user2213 Note Added: 0025150
2014-02-07 16:51 dahlia Note Added: 0025151
2014-02-07 16:51 melanie Note Added: 0025152
2014-02-07 16:54 dahlia Note Added: 0025153
2014-02-07 16:59 melanie Note Added: 0025154
2014-02-07 17:02 user2213 Note Added: 0025155
2014-02-07 17:07 user2213 Note Added: 0025156
2014-02-07 17:07 melanie Note Added: 0025157
2014-02-07 17:08 melanie Note Added: 0025158
2014-02-07 17:08 melanie Status patch included => closed
2014-02-07 17:08 melanie Assigned To => melanie
2014-02-07 17:08 melanie Resolution open => no change required
2014-02-11 15:30 user2213 Relationship added related to 0007005
2014-02-11 15:35 justincc Assigned To melanie =>
2014-02-11 15:35 justincc Note Added: 0025164
2014-02-11 15:35 justincc Status closed => feedback
2014-02-11 15:35 justincc Resolution no change required => reopened
2014-02-11 16:13 user2213 Note Added: 0025170
2014-02-11 16:13 user2213 Status feedback => new
2014-02-11 16:14 melanie Status new => closed
2014-02-11 16:14 melanie Assigned To => melanie
2014-02-11 16:14 melanie Resolution reopened => fixed

Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker