[Opensim-dev] Issues with the Simulator under high load

Justin Clark-Casey jjustincc at googlemail.com
Tue Apr 24 00:45:34 UTC 2012


Those kind of traces (where lots of threads are waiting for WaitHandler.WaitOne() or similar) are usually indicative of 
deadlock somewhere in the system.

When this happens, I find that the best thing to do is take a vm thread dump and inspect the code to find out where the 
deadlock is happening.  I don't know how to do this on Windows but I'm sure it's possible.  Windows may well even 
provide some nice tools to identify the deadlock place rather than eyeballing the stacks.

But to be honest, if that's opensim 0.7.1.1 then the code has changed a lot since then.  There's a very good chance any 
problems you find will not apply to OpenSim 0.7.3 or even 0.7.2.  I suggest you upgrade before doing any other analysis. 
  Even OpenSim 0.7.3.1 will quickly become out of date - if you really want to winkle out bugs then mainline is often 
the place to be (though you could also try opensim-0.7.3-extended git branch which is 0.7.3.1 + selected things from git 
master - should be stable but not absolutely guaranteed.

Just to be clear, I'm happy to eyeball problems on recent releases or master but I lack the time to do thorough 
analysis.  However, I'm happy to add more stats/diagnostic commands to OpenSimulator if useful stuff can be identified 
and it's not super-difficult to do.

Best,

Justin


On 22/04/12 23:43, Akira Sonoda wrote:
> Hi Justin,
>
> I am still investigating the Slow GetTexture problem and the resulting instability under high Load.
>
> What i did so far:
>
>  1. I'm still using opensim-0007711 ( i didn't have the time to upgrade, the first upgrade was not so good due an error
>     on my side and since then i did not look too much into it)
>  2. I've created a Windows Instance in the Amazon cloud in order to be able to connect some profiling tools.
>  3. I've run the last two Friday Parties from there the first Party was quite okay ( MaxPoolThreads=90 in the
>     SmartPoolThreads settings, but i saw more on peaks, strange )
>  4. The second party from the 20. April went crazy after 3 hours here's a picture:
>
> http://farm9.staticflickr.com/8017/6957771466_4412ee83c4_b_d.jpg
>
> Most of the threads had a stack trace like this:
>
> http://farm9.staticflickr.com/8155/6957875232_0203631ed0_b_d.jpg
>
> Wondering why this increase started after approx 3 hours. We had at max about 18 avies on the region/sim with various
> different viewers. Because I did not attach this amazon Cloud instance to my splunk server i have no statistics about
> the viewers during the party ... i probably should do that in future.
>
> I will upgrade to a more recent version next week ...
>
> Thanks a lot!
>
> Akira
>
>
> Am 19. März 2012 01:57 schrieb Justin Clark-Casey <jjustincc at googlemail.com <mailto:jjustincc at googlemail.com>>:
>
>     It's quite possible the 3rd party HTTP server doesn't use the threadpool though I've never looked in detail.
>
>     You could supply any other http address in the GetTexture cap (e.g. the asset service directly with a suitable
>     handler).  However, I'm not sure that asset serving is such a bottleneck at the moment compared with scripting and
>     physics issues.
>
>
>     On 17/03/12 21:10, Dahlia Trimble wrote:
>
>         I've done a bit of tracing through the code and I can't seem to find where the http server in OpenSimulator uses
>         threadpool threads. I did find them used in the LLUDP server and in asyncronous requests from the asset service,
>         but I
>         have yet to find any other uses. is it possible that the http server is still using system threads, bypassing the
>         threadpool? I'm rather curious as I use the built-in http server in a few personal applications and I'm
>         concerned about
>         performance.
>
>         On another note, I believe part of the impetus behind LL designing the texture fetch capability was to allow a
>         separate
>         service from the simulator to supply assets to viewers, thereby reducing load on the individual simulator processes.
>         Perhaps this is something OpenSimulator can take advantage of? Probably some kind of asset proxy cache could do
>         a much
>         better job of serving textures and other assets to viewers than the existing monolithic process? I believe it
>         could even
>         be moved to a separate server with a different IP address.
>
>
>         On Fri, Mar 16, 2012 at 8:36 PM, Justin Clark-Casey <jjustincc at googlemail.com <mailto:jjustincc at googlemail.com>
>         <mailto:jjustincc at googlemail.__com <mailto:jjustincc at googlemail.com>>> wrote:
>
>             Hi Akira.  I have now updated the "show threads" method to show threadpool statistics for the main threadpool.
>               Please note that each XEngine script engine will also have it's own threadpool (which can be seen using the
>         "xengine status" command).  Might need to improve this further.
>
>         "show threads" will also show all in-use threads.  However, at least on mono 2.6.7 this isn't reported by the VM so
>             won't be shown.
>
>             I'm not sure whether this will help you or not in tracking down performance issues.  In some situations it could
>             help (e.g. if threads are encountering deadlock the number of 'in use' threads will leap up, though you've
>         probably
>             already noticed deadlock by the long-running threads reporting monitoring failures and the sim locking up).
>
>             So I'd be happy to hear suggestions for additional data and I'll implement them if I can, since I think this is
>             going to be a growing area of concern.  Unfortunately pinning down performance issues with a system as
>         complex as
>             OpenSimulator (with massive numbers of threads and user generated scripts) is likely to remain a significant
>             challenge for the forseeable future.
>
>
>             On 11/03/12 19:15, Akira Sonoda wrote:
>
>                 Am 10. März 2012 03:25 schrieb Justin Clark-Casey <jjustincc at googlemail.com
>         <mailto:jjustincc at googlemail.com> <mailto:jjustincc at googlemail.__com <mailto:jjustincc at googlemail.com>>
>         <mailto:jjustincc at googlemail. <mailto:jjustincc at googlemail.>____com <mailto:jjustincc at googlemail.__com
>         <mailto:jjustincc at googlemail.com>>>>:
>
>
>
>                     I'm sorry to say that you'll have to take the ThreadPool numbers with a very very very large pinch
>         of salt.  I
>                     believe they only refer to the built-in mono thread pool and not the SmartThreadPool which is the one
>                 actually used
>                     (and beyond that the core simulator and xengine use separate pools).  I will try and improve this
>         situation
>                 soon.
>
>
>                 Thank you Justin!
>
>                 Would be nice to have some meaningful statistics for all those ThreadPools! Maybe there is a possibility to
>                 write those
>                 statistics to the log from time to time ( e.g. every 30 seconds). Together with some documented "best
>         practices"
>                 from
>                 those who operate Sims, with lots of avatars on it - I'm thinking mainly the OSgrid Plazas are good
>         references -
>                 this
>                 could be highly valuable information for those who operate Sims for similar purposes ( meeting points,
>         parties,
>                 concerts
>                 etc. )
>
>
>
>
>
>
>                 ___________________________________________________
>
>                 Opensim-dev mailing list
>         Opensim-dev at lists.berlios.de <mailto:Opensim-dev at lists.berlios.de> <mailto:Opensim-dev at lists.__berlios.de
>         <mailto:Opensim-dev at lists.berlios.de>>
>         https://lists.berlios.de/____mailman/listinfo/opensim-dev
>         <https://lists.berlios.de/__mailman/listinfo/opensim-dev>
>         <https://lists.berlios.de/__mailman/listinfo/opensim-dev <https://lists.berlios.de/mailman/listinfo/opensim-dev>>
>
>
>
>
>             --
>             Justin Clark-Casey (justincc)
>         http://justincc.org/blog
>         http://twitter.com/justincc
>             ___________________________________________________
>
>             Opensim-dev mailing list
>         Opensim-dev at lists.berlios.de <mailto:Opensim-dev at lists.berlios.de> <mailto:Opensim-dev at lists.__berlios.de
>         <mailto:Opensim-dev at lists.berlios.de>>
>         https://lists.berlios.de/____mailman/listinfo/opensim-dev
>         <https://lists.berlios.de/__mailman/listinfo/opensim-dev>
>         <https://lists.berlios.de/__mailman/listinfo/opensim-dev <https://lists.berlios.de/mailman/listinfo/opensim-dev>>
>
>
>
>
>
>         _________________________________________________
>         Opensim-dev mailing list
>         Opensim-dev at lists.berlios.de <mailto:Opensim-dev at lists.berlios.de>
>         https://lists.berlios.de/__mailman/listinfo/opensim-dev <https://lists.berlios.de/mailman/listinfo/opensim-dev>
>
>
>
>     --
>     Justin Clark-Casey (justincc)
>     http://justincc.org/blog
>     http://twitter.com/justincc
>     _________________________________________________
>     Opensim-dev mailing list
>     Opensim-dev at lists.berlios.de <mailto:Opensim-dev at lists.berlios.de>
>     https://lists.berlios.de/__mailman/listinfo/opensim-dev <https://lists.berlios.de/mailman/listinfo/opensim-dev>
>
>
>
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/opensim-dev


-- 
Justin Clark-Casey (justincc)
http://justincc.org/blog
http://twitter.com/justincc



More information about the Opensim-dev mailing list