[Opensim-dev] Lockless Lists?

Mon Nov 24 19:43:09 UTC 2008

'On Mon, Nov 24, 2008 at 11:23 AM, Frisby, Adam <adam at deepthink.com.au> wrote:
> ...
> Does anyone have any opposition if we try to replace some of the core
> structures with lock-free versions, and then removing the locks on them? I'm
> thinking Scene.Entities, Scene.Presences would be two good targets for the
> first lot, then maybe we can attack some deeper bits later.
Weeeellll...

I find lock-free programming a great idea, if it is done right and you
have enough spare processors to benefit from it. It isn't a topic that
is easy to grasp, though.

You won't be able to just replace the list/directory/queue. Lock-free
programming requires a completely different programming style (a small
hint is shown on the web-page). Basically you replace easy to
understand, relatively stable, and non-scalable code by much harder to
understand (and debug), not stable at all (i.e. behaviour depends very
much on timing; again hard to debug) and very well scalable code
(except if you cheat and copy the things you process
(list/dictionary/queue/whatever) first, which will lock in object
creation/heap allocation, so you'll lose the main property:
lock-freeness). As you have to code fallbacks/retries in, that code
will probably be considerably bigger and slower. In the main usage
scenario of lock-free code, i.e. multi-processor systems with higher
processor count, that doesn't matter as it scales very well with the
number of processors, so if you run it on 16 processors, you can run
it on 16 processors in parallel instead of on 1 of 16 processors at a
time. But this might lead to performance issues if you run it on 1 or
2 processors.

So, I'm not sure if we really should do that move. If at all, I'm for
a very slow move to lock-free versions from a rather stable software
base (which we currently don't have in trunk), so errors that are
introduced during that move are more easily identifiable, with much
testing in-between. Even then, I'm absolutely sure we will get a lot
of Heisenbugs in the process, which will take us weeks to find.

In the (very) long run: +1
At the moment (and the current code-quality in trunk/number of
failures during "normal" operations): -5

Just my 0.02€

Cheers,
  Homer