[Opensim-dev] Large page kernel tweaks for reduced memory prefetch penalties in high performance computing applications on linux

Sun Jun 15 16:32:53 UTC 2008

Greetings,

Included below is a transcript of a recent sunday morning discussion in re:
the mono/large pages stuff that's recently appeared on the radar.

as you will see, it is really more of a kernel-tweaking issue, although the
application does come into play in the way it requests memory. For our
purposes, 'application' in that last sentence is mono, not opensim.

Hope this provides some insights :)

Cheers
daTwitch

Oh, still researching how to take advantage of this end-to-end wrt our
application. Will update as I uncover more information.

<daTwitch> this is somewhat relevant:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6664521
<daTwitch> although I finf the placating sycophantic tone of the bug
submitter makes me want to find him an emotional support group
<nebadon> lol
<daTwitch> the universe has surely reversed it's polarity; computer science
(which is where I learned the term "egoless programming") is now saturated
with sensitivity; and Fine Arts, once considered the most subjective subject
under quanititative and qualititative analysis, is consumed with issues
relating to process, review, and open, formal  critique
<daTwitch> aat least, it was where I went to school lols
<daTwitch> this is also relevant, if somewhat more out of date.
Unfortunately, this looks almost identical to the things we're seeing, and
given the age of the issue, and that we're still seeing it now, doesnt give
me a lot of hope for getting the mono folk to take the problem on.
<daTwitch>
http://lists.ximian.com/pipermail/mono-list/2006-April/031312.html
<daTwitch> although it is encouraging that the OpenSolaris folk claim to
have fixed the problem with a patch to their O/S
<daTwitch> maybe someone should investigate how this performs under
opensolaris
<daTwitch> The discussion of TLBs (translation buffers, which are crucial to
page addressing in these memory models), in this article:
http://lwn.net/Articles/173882/ suggests that some kernel optimizations on
the server hardware in question can significantly improve the performance of
memory accesses in general for a given program - if I read it right, it
would indicate that we would need to build the correct optimizations into
the k
<daTwitch> ernel, then compile mono locally and link it as described
<daTwitch> however, it may be that these effects would only be significant
on 64bit O/S
<daTwitch> that's about all I'm turning up of any significane
<daTwitch> *significance
<nebadon> hmm
<nebadon> do you recall anything about compiling mono with --large page
<nebadon> or large pages
<nebadon> something like that
<nebadon> someone was talking about it on -dev a while back
<nebadon> they said it helped with memory stuff with mono
<nebadon> i looked yesterday but couldnt find anything
<nebadon> it wasnt  one of the regulars  on -dev channel though
<daTwitch> that's what all the foregoing stuff is about
<nebadon> they claimed it really helped alot
<ckrinke> I dont see --large, but
http://www.mono-project.com/Compiling_Monohas mention of a special Xen
switch.
<nebadon> at the time i was less interetsed in the topic though
<nebadon> hmm
<daTwitch> we were discussing it with JustinCC at the office hours y/d too
<nebadon> yea
<nebadon> i brought it up then
<nebadon> i looked into it after the meeting
<nebadon> and couldnt find anything
<daTwitch> basically it comes down to this: the windows kernel allocates
memory far differently than a unix kernel
<daTwitch> and c#, as a result of being native to the platform, can take
advantage of that to compress data as it does garbage collection
<daTwitch> mono doesn't even try
<nebadon>
http://developer.amd.com/documentation/Articles/Pages/322006145.aspx
<daTwitch> compress is the term used, but is not technically correct
<nebadon> heres talk about its use in Java
<daTwitch> imagine your large page as a hard disk sector in need of
defragging
<daTwitch> in fact, that is an incredibly accurate metaphor
<daTwitch> windows defragments the data in memory
<daTwitch> mono doesnt
<nebadon> yea
<nebadon> i recall them saying that mono
<daTwitch> for the same reasons as a hard disk defrag and wit hsimilar
benefits
<nebadon> wastes the space  if because it requires more blocks that needed
or something
<nebadon> and lots of memory is wasted
<daTwitch> yes
<nebadon> unless large  pages is specified
<daTwitch> precisely
<daTwitch> ok, so we are long overdue making a mono with large pages then -
would that be a valid assertion?
<nebadon> yea
<nebadon> id like it see it tested
<nebadon> if we can figure out how
<daTwitch> I'm sooooo on it
<nebadon> sweet
<daTwitch> I can build any thing
<nebadon> great
<daTwitch> as long as I have enough ram
<nebadon> i think it will be a big help to see where it takes us
<daTwitch> ok, I'll be busy for a bit
<nebadon> k
<nebadon> thanks man
<daTwitch> I'll keep y'all posted
<nebadon> great
<ckrinke> maybe its a ./configure option and is something like
--memory=large
<daTwitch> quite possibly
<nebadon> yea its something like that
<nebadon> i wish i took notes
<nebadon> but like i said at the time
<nebadon> i was less interested
<ckrinke> do the mono guys have an irc channel on FreeNode?
<daTwitch> no idea
<daTwitch> pulling source now
<daTwitch> will see if I can locate their IRC channel
<nebadon> cool
<daTwitch> gimpnet servers only at irc.gnome.org and irc.gimp.net
<daTwitch> #mono
<daTwitch> #monodev
<daTwitch> #mono-winforms
<daTwitch> #monodevelop
<daTwitch> #cocoa
<daTwitch> #mono-hispano
<daTwitch> #monouml
<daTwitch> #gendarme
<daTwitch> #mono-ally
<daTwitch> #moonlight
<daTwitch> moonlight == silverlight for mono
<nebadon> nice
<daTwitch> ok source is down, back to work
<Ter_Afk> moonlight == loonmight?
<daTwitch> heh
<daTwitch> I dont even know what silverlight is, but I've heard discussion
of it, so it was a point of interest
<Ter_Afk> Microsoft's answer to Adobe Flash
<daTwitch> ok, no mention whatsoever of a --large-pages option to the
configuration
<daTwitch> we have --large-heap
<daTwitch> large_code
<Ter_Afk> large_fire?
<Ter_Afk> k, nuf with the word jokes.
<daTwitch> does anyone know if it was large-pages, or large_pages?
<nebadon> i dont recall
<nebadon> i just remember the term large  pages being used some how
<daTwitch> lol googling large pages turns up everything from beano to kirk
douglas
<nebadon> lol
<nebadon> yea
<nebadon> i had no luck on google
<nebadon> nor the mono website
<daTwitch> actually, I'm starting to think large_pages refers to a kernel
setting
<nebadon> well they said Compile Mono from source
<nebadon> with the large pages switch
<nebadon> i do remember that
<nebadon> its probably related more to the compiler
<nebadon> than mono
<nebadon> so maybe we are looking in the wrong  places
<daTwitch> hmmm
<daTwitch> that's a clue
<daTwitch> ok, I got configure to execute to completion very cleanly
<daTwitch> gotta take 5 tho
<daTwitch> bbiaf
<nebadon> ok
<daTwitch> ah needs mah gcc 4.2 doc
<daTwitch> The Virtual Memory (VM) Subsystem
<daTwitch> Most modern computer architectures support more than one memory
page size. To illustrate, the IA-32 architecture supports either 4KB or 4MB
pages. The 2.4 Linux kernel used to only utilize large pages for mapping the
kernel image. In general, large page usage is primarily intended to provide
performance improvements for high performance computing applications, as
well as database applications that have large working sets. A
<daTwitch> ny memory access intensive application that utilizes large
amounts of virtual memory may obtain performance improvements by using large
pages. Linux 2.6 can utilize 2MB or 4MB large pages, AIX uses 16MB large
pages, whereas Solaris large pages are 4MB in size. The large page
performance improvements are attributable to reduced translation lookaside
buffer (TLB) misses. Large pages further improve the process of memory prefe
<daTwitch> tching, by eliminating the necessity to restart prefetch
operations on 4KB boundaries.
<daTwitch> from: http://aplawrence.com/Linux/linux26_features.html
<daTwitch> it's a feature that must have support in the kernel, at the very
least
<daTwitch> though I can find neither build-time nor runtime configuration
points that take advantage of it in either gcc nor mono at this point
<nebadon> hmm
<daTwitch> still looking though ;)
<nebadon> sounds like the problem though
<daTwitch> yes, think we are in the process of pinning it down
<nebadon> nice
<daTwitch> even if we arent doing things to precisely duplicate how things
go under c#, this should yield a performancve gain that compensates
<daTwitch> I keep seeing the figure 10%
<nebadon> yea thats a good start
<daTwitch> that is significant when we consider how much we pay in memory
per-av
<daTwitch> here is some additional good background info, but still does not
complete the picture:
http://findarticles.com/p/articles/mi_m0ISJ/is_2_44/ai_n14793331/pg_10
<daTwitch> mysql can also benefit heavily from the use of large pages
<daTwitch> combining the benefits of mysql on large pages with our various
servers on large pages (actually, the UGAIM could possibly take a
performance *hit* from large pages) might yield even greater than 10%
performance increase
<daTwitch> probably the large pages switch to start with is a kernel
boot-time config point
<nebadon> nice
<nebadon> i would think though a program like mysql would already be
compiled to such a thing
<daTwitch> well, no, not necesarily
<nebadon> so the goal i assume
<nebadon> is 4mb page size?
<nebadon> vs 4k
<daTwitch> the underlying kernel has to be configured to support it, and if
the application isn't sufficiently demanding, it actually will take a
performance hit
<daTwitch> yes
<daTwitch> 4mI think 16mb is also supported in 2.6+ kernels, but I doubt we
need it yet
<nebadon> yea it sounds to me like any kernel thats 2.6 its already enabled?
<nebadon> but the app needs to be told to use it?
<nebadon> its amazing how useless google is for this topic
<nebadon> hehe
<daTwitch> well, it's a bit obscure, unless you know what you're looking for
<daTwitch> this is really about kernel tweaking, not so much mono
<daTwitch> the kernel needs to be told to support it at boot time - perhaps
even needs to be compiled for it
<daTwitch> but the support is in the source
<daTwitch> plus, not too many folks need to do this
<daTwitch> only high perf types with really demanding software
<daTwitch> (that would be us lols)
<daTwitch> the app does have to be told to utilise it somehow though
<nebadon> yea
<nebadon> opensim is definatly more demanding than say apache
-- 
===================================
The wind
scours the earth for prayers
The night obscures them
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://opensimulator.org/pipermail/opensim-dev/attachments/20080615/454f564f/attachment-0001.html>