[Opensim-dev] Large page kernel tweaks for reduced memory prefetch penalties in high performance computing applications on linux

Mon Jun 16 03:45:25 UTC 2008

Thanks Marius!

For a very insightful and informative exposition in response to my post. I
eagerly await your forthcoming detailed post in this regard.

To be honest, I had not sought so much to address the garbage collection
problem with large pages - it was mentioned to me in discussion with others
that someone had obtained increases in region server performance on linux by
'compiling mono with large page support'.

Based on this shred of information, I set out to determine what we could do
to address certain stability problems on linux. I was unaware that similar
issues had been seen on other *nix flavors than linux. It is unfortunate,
but we have very little information on the issue.

Perhaps this is a sign of growing stability in other areas of the
application space.

In the process of bringing into focus the mention of large pages through
some web research this morning, I saw where we might make some gains in
performance on linux by utilising large pages; however, there were still
parts of the puzzle missing (part of the research shows that certain memory
allocation methods took advantage of large pages and others did not). This
would seem to indicate that the application, and I assume the application
would be the mono VM, would need to do something special to take advantage
of large page support. This brings us up to the point where the research
became unproductive, as I was unable to determine precisely how this might
be accomplished.

In light of what you've said in your post, I suspect that a very busy region
with a lot of asset overhead would benefit from both large page usage and
such optimizations you mention; object reuse springs directly into mind.

I am growing confident that we are drawing down on this issue, and will
bring solutions to light in the near term.

Cheers!
James

On Sun, Jun 15, 2008 at 6:52 PM, Mariusz Nowostawski <
mariusz at nowostawski.org> wrote:

> Hi all,
>
> I am happy that issues related to performance, efficiency and stability
> of OpenSim on various platforms are getting more attention recently and
> that many folks are looking into it.
>
> Together with 3Di (Japan) we are looking into those issues too, and will
> be more than happy to share our results and solutions in due time. We
> are putting efforts to put together a summary of our findings online and
> make it possible to monitor certain aspects of OpenSim performance and
> memory management into our nightly build system, so it will be easier
> for the community to monitor what is working where, on which platforms
> and with what results. I'll keep you updated on that.
>
> As for the large page support: this is very architecture specific issue:
> different intel, amd and sparc MMUs support different page sizes, and
> then various OSes again can be compiled with a specific page size
> support - so there is no single solution for every platform/OS
> combination. Most common denominator seems to be the use of 4k and 8k
> and most configurations of architecture/OS use those page sizes by
> default. Solaris kernel can support multiple page sizes, although it is
> a bit more tricky than it sounds, and for example on SPARC not
> everything can be easily negotiated with the kernel. It all depends on
> which architecture a given OS runs.
>
> Poor memory management and large memory footprint is not going to be
> solved by recompiling something with large page support - to the
> contrary - the footprint and memory usage would most likely got much
> worse then. Large page support is usually for apps that do their own
> memory management, and it is usually increasing the overall memory
> footprint and improving the performance. To really benefit from large
> pages the software must take exclusive care of memory management itself.
> And in case of OpenSim it is a bit tricky. Let me explain - normally -
> memory management (say, when programming in C) is difficult - one needs
> to make lots of decisions and trade-offs between 3 general issues: a)
> memory footprint b) performance c) maintainability.  To really make
> things fast and small one needs to re-implement the memory management
> herself, and take care of things - this will put strain on the
> maintainability but will keep both, memory footprint and performance in
> their best. What usually happens is that one uses standard libc to keep
> the maintanability high, and makes tradeoffs - to be fast and big, or
> slow and small.
>
> With software running on VMs, there are much more levels to be
> considered when managing memory:
> a) application level
> b) VM level
> c) OS level
> In case of OpenSim, there is the following to consider:
>
> 1. object instances allocations, de-allocations, arrays and collection
> management, hashes, large memory management, database queries etc. On
> the application level lots of good things can be done by "normal" C#
> programmers.  This can dramatically boost performance and reduce the
> memory footprint. For example - re-implementing some of the collection
> classes usually renders good results. Reducing the number of new object
> creation, and "recycling" the objects inside the application instead of
> creating new instances and letting the system to garbage collect unused
> instances - this also can dramatically improve both, performance and
> memory footprint. And so on - good programming practices, taking care of
> memory usage and memory management can make things really better -
> especially on systems running on VMs. Even little things like boxing,
> and efficient use of native data types - this all contribute.
>
> 2. VM-level (being it Mono or any VM) - performance here can be tweaked
> by many parameters, but, the biggest contributor is garbage collector
> itself. Different garbage collectors have different ways of managing
> memory, and these can substantially change the way applications behave.
>  From our limited experiences to date, mono with different GCs behaves
> completely different - stability, performance and footprint are all
> highly sensitive to the GC used. We are getting quite good results when
> using the latest Boehm GC - but things can be tweaked even better.
>
> 3. OS - what I mean here is:
> - the memory management left out and not handled by the APP itself or VM
> including large page support,
> - I/O,
> - threads management,
> - IPC (especially shared memory),
> - and networking.
> These all can be tweaked. Normally these are designed to be generic and
> handle wide range of cases and apps. In case of OpenSim alone, things
> can be improved and tweaked for a particular purpose alone.
>
> This is all pretty complex. There is no silver bullet that will
> magically make OpenSim run faster with small memory footprint. But -
> there are many areas improvement can be made, and it will be desirable
> to have a more targeted efforts towards that - Over here we are trying
> to draw a roadmap of all those various aspects, and I am grateful for
> good discussion and contribution from many people that put things in
> perspective.
>
> For any of you doing any testing - please take a note on the exact
> kernel, mono version, GC used and post these together with your
> results/observations. This will help replicating some scenarios and
> digging into causes of various behaviours. For one thing, we were unable
> to replicate most of problems with Mono on our own Linux setups - as for
> Solaris on SPARC, these are highly sensitive to exact version and GC
> used - we have cases of complete mono crashes, to the system running
> well, subject to various tweaking of compile parameters. As said
> earlier, we want to put a report together, to gather all these in a
> single place, so others can compare it to what is observed and so on.
> I'll keep you posted on that,
>
> --
> cheers
> Mariusz
>
>
>
>
>
> James Stallings II wrote:
> > Greetings,
> >
> > Included below is a transcript of a recent sunday morning discussion in
> re:
> > the mono/large pages stuff that's recently appeared on the radar.
> >
> > as you will see, it is really more of a kernel-tweaking issue, although
> the
> > application does come into play in the way it requests memory. For our
> > purposes, 'application' in that last sentence is mono, not opensim.
> >
> > Hope this provides some insights :)
> >
> > Cheers
> > daTwitch
> >
> > Oh, still researching how to take advantage of this end-to-end wrt our
> > application. Will update as I uncover more information.
> >
> >
> >
> > <daTwitch> this is somewhat relevant:
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664521
> > <daTwitch> although I finf the placating sycophantic tone of the bug
> > submitter makes me want to find him an emotional support group
> > <nebadon> lol
> > <daTwitch> the universe has surely reversed it's polarity; computer
> science
> > (which is where I learned the term "egoless programming") is now
> saturated
> > with sensitivity; and Fine Arts, once considered the most subjective
> subject
> > under quanititative and qualititative analysis, is consumed with issues
> > relating to process, review, and open, formal  critique
> > <daTwitch> aat least, it was where I went to school lols
> > <daTwitch> this is also relevant, if somewhat more out of date.
> > Unfortunately, this looks almost identical to the things we're seeing,
> and
> > given the age of the issue, and that we're still seeing it now, doesnt
> give
> > me a lot of hope for getting the mono folk to take the problem on.
> > <daTwitch>
> > http://lists.ximian.com/pipermail/mono-list/2006-April/031312.html
> > <daTwitch> although it is encouraging that the OpenSolaris folk claim to
> > have fixed the problem with a patch to their O/S
> > <daTwitch> maybe someone should investigate how this performs under
> > opensolaris
> > <daTwitch> The discussion of TLBs (translation buffers, which are crucial
> to
> > page addressing in these memory models), in this article:
> > http://lwn.net/Articles/173882/ suggests that some kernel optimizations
> on
> > the server hardware in question can significantly improve the performance
> of
> > memory accesses in general for a given program - if I read it right, it
> > would indicate that we would need to build the correct optimizations into
> > the k
> > <daTwitch> ernel, then compile mono locally and link it as described
> > <daTwitch> however, it may be that these effects would only be
> significant
> > on 64bit O/S
> > <daTwitch> that's about all I'm turning up of any significane
> > <daTwitch> *significance
> > <nebadon> hmm
> > <nebadon> do you recall anything about compiling mono with --large page
> > <nebadon> or large pages
> > <nebadon> something like that
> > <nebadon> someone was talking about it on -dev a while back
> > <nebadon> they said it helped with memory stuff with mono
> > <nebadon> i looked yesterday but couldnt find anything
> > <nebadon> it wasnt  one of the regulars  on -dev channel though
> > <daTwitch> that's what all the foregoing stuff is about
> > <nebadon> they claimed it really helped alot
> > <ckrinke> I dont see --large, but
> > http://www.mono-project.com/Compiling_Monohas mention of a special Xen
> > switch.
> > <nebadon> at the time i was less interetsed in the topic though
> > <nebadon> hmm
> > <daTwitch> we were discussing it with JustinCC at the office hours y/d
> too
> > <nebadon> yea
> > <nebadon> i brought it up then
> > <nebadon> i looked into it after the meeting
> > <nebadon> and couldnt find anything
> > <daTwitch> basically it comes down to this: the windows kernel allocates
> > memory far differently than a unix kernel
> > <daTwitch> and c#, as a result of being native to the platform, can take
> > advantage of that to compress data as it does garbage collection
> > <daTwitch> mono doesn't even try
> > <nebadon>
> > http://developer.amd.com/documentation/Articles/Pages/322006145.aspx
> > <daTwitch> compress is the term used, but is not technically correct
> > <nebadon> heres talk about its use in Java
> > <daTwitch> imagine your large page as a hard disk sector in need of
> > defragging
> > <daTwitch> in fact, that is an incredibly accurate metaphor
> > <daTwitch> windows defragments the data in memory
> > <daTwitch> mono doesnt
> > <nebadon> yea
> > <nebadon> i recall them saying that mono
> > <daTwitch> for the same reasons as a hard disk defrag and wit hsimilar
> > benefits
> > <nebadon> wastes the space  if because it requires more blocks that
> needed
> > or something
> > <nebadon> and lots of memory is wasted
> > <daTwitch> yes
> > <nebadon> unless large  pages is specified
> > <daTwitch> precisely
> > <daTwitch> ok, so we are long overdue making a mono with large pages then
> -
> > would that be a valid assertion?
> > <nebadon> yea
> > <nebadon> id like it see it tested
> > <nebadon> if we can figure out how
> > <daTwitch> I'm sooooo on it
> > <nebadon> sweet
> > <daTwitch> I can build any thing
> > <nebadon> great
> > <daTwitch> as long as I have enough ram
> > <nebadon> i think it will be a big help to see where it takes us
> > <daTwitch> ok, I'll be busy for a bit
> > <nebadon> k
> > <nebadon> thanks man
> > <daTwitch> I'll keep y'all posted
> > <nebadon> great
> > <ckrinke> maybe its a ./configure option and is something like
> > --memory=large
> > <daTwitch> quite possibly
> > <nebadon> yea its something like that
> > <nebadon> i wish i took notes
> > <nebadon> but like i said at the time
> > <nebadon> i was less interested
> > <ckrinke> do the mono guys have an irc channel on FreeNode?
> > <daTwitch> no idea
> > <daTwitch> pulling source now
> > <daTwitch> will see if I can locate their IRC channel
> > <nebadon> cool
> > <daTwitch> gimpnet servers only at irc.gnome.org and irc.gimp.net
> > <daTwitch> #mono
> > <daTwitch> #monodev
> > <daTwitch> #mono-winforms
> > <daTwitch> #monodevelop
> > <daTwitch> #cocoa
> > <daTwitch> #mono-hispano
> > <daTwitch> #monouml
> > <daTwitch> #gendarme
> > <daTwitch> #mono-ally
> > <daTwitch> #moonlight
> > <daTwitch> moonlight == silverlight for mono
> > <nebadon> nice
> > <daTwitch> ok source is down, back to work
> > <Ter_Afk> moonlight == loonmight?
> > <daTwitch> heh
> > <daTwitch> I dont even know what silverlight is, but I've heard
> discussion
> > of it, so it was a point of interest
> > <Ter_Afk> Microsoft's answer to Adobe Flash
> > <daTwitch> ok, no mention whatsoever of a --large-pages option to the
> > configuration
> > <daTwitch> we have --large-heap
> > <daTwitch> large_code
> > <Ter_Afk> large_fire?
> > <Ter_Afk> k, nuf with the word jokes.
> > <daTwitch> does anyone know if it was large-pages, or large_pages?
> > <nebadon> i dont recall
> > <nebadon> i just remember the term large  pages being used some how
> > <daTwitch> lol googling large pages turns up everything from beano to
> kirk
> > douglas
> > <nebadon> lol
> > <nebadon> yea
> > <nebadon> i had no luck on google
> > <nebadon> nor the mono website
> > <daTwitch> actually, I'm starting to think large_pages refers to a kernel
> > setting
> > <nebadon> well they said Compile Mono from source
> > <nebadon> with the large pages switch
> > <nebadon> i do remember that
> > <nebadon> its probably related more to the compiler
> > <nebadon> than mono
> > <nebadon> so maybe we are looking in the wrong  places
> > <daTwitch> hmmm
> > <daTwitch> that's a clue
> > <daTwitch> ok, I got configure to execute to completion very cleanly
> > <daTwitch> gotta take 5 tho
> > <daTwitch> bbiaf
> > <nebadon> ok
> > <daTwitch> ah needs mah gcc 4.2 doc
> > <daTwitch> The Virtual Memory (VM) Subsystem
> > <daTwitch> Most modern computer architectures support more than one
> memory
> > page size. To illustrate, the IA-32 architecture supports either 4KB or
> 4MB
> > pages. The 2.4 Linux kernel used to only utilize large pages for mapping
> the
> > kernel image. In general, large page usage is primarily intended to
> provide
> > performance improvements for high performance computing applications, as
> > well as database applications that have large working sets. A
> > <daTwitch> ny memory access intensive application that utilizes large
> > amounts of virtual memory may obtain performance improvements by using
> large
> > pages. Linux 2.6 can utilize 2MB or 4MB large pages, AIX uses 16MB large
> > pages, whereas Solaris large pages are 4MB in size. The large page
> > performance improvements are attributable to reduced translation
> lookaside
> > buffer (TLB) misses. Large pages further improve the process of memory
> prefe
> > <daTwitch> tching, by eliminating the necessity to restart prefetch
> > operations on 4KB boundaries.
> > <daTwitch> from: http://aplawrence.com/Linux/linux26_features.html
> > <daTwitch> it's a feature that must have support in the kernel, at the
> very
> > least
> > <daTwitch> though I can find neither build-time nor runtime configuration
> > points that take advantage of it in either gcc nor mono at this point
> > <nebadon> hmm
> > <daTwitch> still looking though ;)
> > <nebadon> sounds like the problem though
> > <daTwitch> yes, think we are in the process of pinning it down
> > <nebadon> nice
> > <daTwitch> even if we arent doing things to precisely duplicate how
> things
> > go under c#, this should yield a performancve gain that compensates
> > <daTwitch> I keep seeing the figure 10%
> > <nebadon> yea thats a good start
> > <daTwitch> that is significant when we consider how much we pay in memory
> > per-av
> > <daTwitch> here is some additional good background info, but still does
> not
> > complete the picture:
> > http://findarticles.com/p/articles/mi_m0ISJ/is_2_44/ai_n14793331/pg_10
> > <daTwitch> mysql can also benefit heavily from the use of large pages
> > <daTwitch> combining the benefits of mysql on large pages with our
> various
> > servers on large pages (actually, the UGAIM could possibly take a
> > performance *hit* from large pages) might yield even greater than 10%
> > performance increase
> > <daTwitch> probably the large pages switch to start with is a kernel
> > boot-time config point
> > <nebadon> nice
> > <nebadon> i would think though a program like mysql would already be
> > compiled to such a thing
> > <daTwitch> well, no, not necesarily
> > <nebadon> so the goal i assume
> > <nebadon> is 4mb page size?
> > <nebadon> vs 4k
> > <daTwitch> the underlying kernel has to be configured to support it, and
> if
> > the application isn't sufficiently demanding, it actually will take a
> > performance hit
> > <daTwitch> yes
> > <daTwitch> 4mI think 16mb is also supported in 2.6+ kernels, but I doubt
> we
> > need it yet
> > <nebadon> yea it sounds to me like any kernel thats 2.6 its already
> enabled?
> > <nebadon> but the app needs to be told to use it?
> > <nebadon> its amazing how useless google is for this topic
> > <nebadon> hehe
> > <daTwitch> well, it's a bit obscure, unless you know what you're looking
> for
> > <daTwitch> this is really about kernel tweaking, not so much mono
> > <daTwitch> the kernel needs to be told to support it at boot time -
> perhaps
> > even needs to be compiled for it
> > <daTwitch> but the support is in the source
> > <daTwitch> plus, not too many folks need to do this
> > <daTwitch> only high perf types with really demanding software
> > <daTwitch> (that would be us lols)
> > <daTwitch> the app does have to be told to utilise it somehow though
> > <nebadon> yea
> > <nebadon> opensim is definatly more demanding than say apache
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Opensim-dev mailing list
> > Opensim-dev at lists.berlios.de
> > https://lists.berlios.de/mailman/listinfo/opensim-dev
>
> _______________________________________________
> Opensim-dev mailing list
> Opensim-dev at lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/opensim-dev
>

-- 
===================================
The wind
scours the earth for prayers
The night obscures them
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20080615/e0dd395c/attachment-0001.html>