[Opensim-dev] Call to brainstorm on OpenSim core threading
Brian Wolfe
brianw at terrabox.com
Tue Feb 19 03:59:21 UTC 2008
As we are all aware there are some rather serious issues with locking,
thread loss, and memory consumption in OpenSim as of today's HEAD (and
for a while now in hiding). Mantis bug #622 is yet another indication of
these issues.
We're burning 1/3rd of the cpu cycles of an athlon64X2-5800 on a 60-user
region doing nothign but context switching. This is bad. It means we've
overloaded on threadcount.
We have MANY ops that block the entire region when used. It means that
we aren't locking in the right ways, or aren't delegating tasks to the
worker threadpool when we shoudl be.
We eventually run out of ram. Mantis #622 shows this rather blatantly.
8-( A large part of this is possibly lost threads based on past
experience with C# and the symptoms that I've been seeing over the last
2 weeks.
As a result of the number of issues I'm making a general call for
everyone to think hard about these things in each person's area of
expertise and to come up with ideas so that we can use our collective
brainpower to solve this stability issue.
I have my own ideas on how to fix it. However I'm also very new to the
system. SO mine may be really really bad. :) Anyways, here's what I'm
thinking.....
I have made the start of an effort at cleaning up locking and threading
with several patches posted to mantis through Sakai at openlife. What I
have found so far is not good. There is rampant lack of locking and
free-use of permanent thread creation in areas that I personally would
have done otherwise.
This is not an indictment of anyone or the process used, nor of the
quality of opensim. It's merely a symptom of a collaborative effort that
has reached the point where we must take a step back and re-evaluate
what we have before us. We need to look at what has been created and
ask the question "does this thread NEED to exist as an independant?".
Second, ANY variable that is not atomic by nature should be locked if
it's accessed at all by other threads, wether the sequence of execution
says it'll never get two at once or not. Yes this creates overhead, but
it's necessary is such a massively parallel application.
I'm exploring using a worker threadpool thread to do the
UDPServer.CreateUser() call and it's descendants. I also intend to
explore using this same threadpool for processing UserView packets in
order to eliminate the potential for lost threads as well as thread
concurrency overload from the UserView area. Once I have the creation
processing in a worker thread I think we won't have any real issues with
inbound (or outbound) packet loss anymore once all the remaining areas
of openlife have beenreexamined and tweaked to play nice in the threads
arena. :)
I have not looked at anything beyond the core networking code so far. SO
I don't know what other areas are creating threads, or how they are
using locks.
Thanks for listening!
As a great chunk of rock said once.... "It's clobberin' time!"
More information about the Opensim-dev
mailing list