<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: more linux memory debugging</title>
	<atom:link href="http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/</link>
	<description></description>
	<lastBuildDate>Mon, 26 Jul 2010 12:19:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: netgear support fail at btm.geek</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-327</link>
		<dc:creator>netgear support fail at btm.geek</dc:creator>
		<pubDate>Mon, 19 May 2008 21:54:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-327</guid>
		<description>[...] been trying to deal with a linux appliance&#8217;s memory problems for a while, here, and here. Because Netgear/Infrant&#8217;s build system removes binaries post-dpkg, it&#8217;s not [...]</description>
		<content:encoded><![CDATA[<p>[...] been trying to deal with a linux appliance&#8217;s memory problems for a while, here, and here. Because Netgear/Infrant&#8217;s build system removes binaries post-dpkg, it&#8217;s not [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: btm</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-310</link>
		<dc:creator>btm</dc:creator>
		<pubDate>Fri, 16 May 2008 16:28:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-310</guid>
		<description>@Tom

Raidiator is the linux distribution itself. Best guess at what&#039;s leaking are these kernel modules I can&#039;t identify:

padre_nand_flash        4164  0 
padre_i2c_hwmon        14000  0 
padre_p0_led_button    17496  0 
padre_des               4328  0 
padre_gmac             74584  0 
padre_io              543984  0 
padre_i2c_rtc           8948  0 
padre_i2c              15960  3 padre_i2c_hwmon,padre_p0_led_button,padre_i2c_rtc

Probably for the custom hardware. Sure I could remove them and see what happens, but I&#039;m really not into debugging kernel modules unless I have to. Although I&#039;m not getting much feedback from netgear so I may have to.

If it was a user level daemon, killing it would free up the leaked memory. I&#039;ve taken the secondary nas down to a few daemons as possible and the memory usage hasn&#039;t dropped significantly.

It&#039;s really not hardware that I would want to try bootstrapping another distribution on to, and losing the current configs would be a PITA anyways.</description>
		<content:encoded><![CDATA[<p>@Tom</p>
<p>Raidiator is the linux distribution itself. Best guess at what&#8217;s leaking are these kernel modules I can&#8217;t identify:</p>
<p>padre_nand_flash        4164  0<br />
padre_i2c_hwmon        14000  0<br />
padre_p0_led_button    17496  0<br />
padre_des               4328  0<br />
padre_gmac             74584  0<br />
padre_io              543984  0<br />
padre_i2c_rtc           8948  0<br />
padre_i2c              15960  3 padre_i2c_hwmon,padre_p0_led_button,padre_i2c_rtc</p>
<p>Probably for the custom hardware. Sure I could remove them and see what happens, but I&#8217;m really not into debugging kernel modules unless I have to. Although I&#8217;m not getting much feedback from netgear so I may have to.</p>
<p>If it was a user level daemon, killing it would free up the leaked memory. I&#8217;ve taken the secondary nas down to a few daemons as possible and the memory usage hasn&#8217;t dropped significantly.</p>
<p>It&#8217;s really not hardware that I would want to try bootstrapping another distribution on to, and losing the current configs would be a PITA anyways.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom H</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-306</link>
		<dc:creator>Tom H</dc:creator>
		<pubDate>Fri, 16 May 2008 00:22:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-306</guid>
		<description>Hi Bryan... wow sounds like your having fun.  One approach I use when troubleshooting memory leaks is to start shutting down everything that isn&#039;t critical to the system to operate.  Kernel modules, drivers, anything.  I assume you have already done this.

If you turn off radiator does it still leak memory?  How about turning everything off for a bit and just letting the kernel modules load until you have a basic system.  Then start your memory monitoring tools and take a snapshot.  Piece by piece, start up each memory consuming process or library manually.  Keep taking snapshots of memory.. eventually this should lead you to a culprit somewhere.. but it might take quite a while.  

It seems this system is designed to run netgear&#039;s tweaked os.  If you loaded netgears os would it still leak memory?

You get what you pay for. ;)

Good Luck.</description>
		<content:encoded><![CDATA[<p>Hi Bryan&#8230; wow sounds like your having fun.  One approach I use when troubleshooting memory leaks is to start shutting down everything that isn&#8217;t critical to the system to operate.  Kernel modules, drivers, anything.  I assume you have already done this.</p>
<p>If you turn off radiator does it still leak memory?  How about turning everything off for a bit and just letting the kernel modules load until you have a basic system.  Then start your memory monitoring tools and take a snapshot.  Piece by piece, start up each memory consuming process or library manually.  Keep taking snapshots of memory.. eventually this should lead you to a culprit somewhere.. but it might take quite a while.  </p>
<p>It seems this system is designed to run netgear&#8217;s tweaked os.  If you loaded netgears os would it still leak memory?</p>
<p>You get what you pay for. <img src='http://blog.loftninjas.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Good Luck.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: btm</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-296</link>
		<dc:creator>btm</dc:creator>
		<pubDate>Tue, 13 May 2008 17:15:17 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-296</guid>
		<description>@Mike

Thanks for all the information!

The proftpd binary change was due to a &#039;firmware&#039; update. The annoying thing about infrant/netgear raidiator is that while it started out based on sarge, they do a bunch of un-debian like things. Rather than rebuilding core packages, their build system seems to remove files and trees that they don&#039;t want after the build. I suppose this is a lot less work, but while the proftpd package is custom build (1.3.0-9.netgear6) they didn&#039;t update the package when they updated the binary.

Granted their whole market seems to be SOHO, so they don&#039;t care much about the types that would care about these things. There&#039;s a &lt;a href=&quot;http://www.readynas.com/forum/viewtopic.php?f=10&amp;t=17105&quot; rel=&quot;nofollow&quot;&gt;thread&lt;/a&gt; I started about proftpd breaking, a &lt;a href=&quot;http://www.readynas.com/download/addons/4.01/Fix_ProFTPD_ADSAuth_0.1.bin&quot; rel=&quot;nofollow&quot;&gt;patch&lt;/a&gt; and it should be fixed in &lt;a href=&quot;http://kbserver.netgear.com/release_notes/d103170.asp&quot; rel=&quot;nofollow&quot;&gt;4.01c1-p2&lt;/a&gt;, which I don&#039;t think has been pushed out yet as non of my gear has wanted to automatically upgrade to it so I&#039;ve had to use the patch.

I&#039;ve been running a few diagnostic commands periodically via shell scripts and saving their output.

apps.value via the munin script went from 51478528 to 91701248 between 10:19 and 15:40 on friday.

I restarted the box and shut down munin via the init.d script yesterday and apps.value went from 50413568 to 53477376 between 15:33 and 09:43 today.

I would think that if perl was leaking memory, it would be reclaimed when the process died, whereas something like a kernel module leaking would be more likely as you suggested because it&#039;s always loaded until you reboot.

There are a number of modules loaded that appear custom, I have to track down where they are because the module names don&#039;t match anything in /lib/modules/*</description>
		<content:encoded><![CDATA[<p>@Mike</p>
<p>Thanks for all the information!</p>
<p>The proftpd binary change was due to a &#8216;firmware&#8217; update. The annoying thing about infrant/netgear raidiator is that while it started out based on sarge, they do a bunch of un-debian like things. Rather than rebuilding core packages, their build system seems to remove files and trees that they don&#8217;t want after the build. I suppose this is a lot less work, but while the proftpd package is custom build (1.3.0-9.netgear6) they didn&#8217;t update the package when they updated the binary.</p>
<p>Granted their whole market seems to be SOHO, so they don&#8217;t care much about the types that would care about these things. There&#8217;s a <a href="http://www.readynas.com/forum/viewtopic.php?f=10&#038;t=17105">thread</a> I started about proftpd breaking, a <a href="http://www.readynas.com/download/addons/4.01/Fix_ProFTPD_ADSAuth_0.1.bin">patch</a> and it should be fixed in <a href="http://kbserver.netgear.com/release_notes/d103170.asp">4.01c1-p2</a>, which I don&#8217;t think has been pushed out yet as non of my gear has wanted to automatically upgrade to it so I&#8217;ve had to use the patch.</p>
<p>I&#8217;ve been running a few diagnostic commands periodically via shell scripts and saving their output.</p>
<p>apps.value via the munin script went from 51478528 to 91701248 between 10:19 and 15:40 on friday.</p>
<p>I restarted the box and shut down munin via the init.d script yesterday and apps.value went from 50413568 to 53477376 between 15:33 and 09:43 today.</p>
<p>I would think that if perl was leaking memory, it would be reclaimed when the process died, whereas something like a kernel module leaking would be more likely as you suggested because it&#8217;s always loaded until you reboot.</p>
<p>There are a number of modules loaded that appear custom, I have to track down where they are because the module names don&#8217;t match anything in /lib/modules/*</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Fedyk</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-295</link>
		<dc:creator>Mike Fedyk</dc:creator>
		<pubDate>Tue, 13 May 2008 07:51:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-295</guid>
		<description>I saw your post on the proftpd binary changing.  You may have been hacked.  Check to see if you can reproduce the problem on your other NAS.

Mike</description>
		<content:encoded><![CDATA[<p>I saw your post on the proftpd binary changing.  You may have been hacked.  Check to see if you can reproduce the problem on your other NAS.</p>
<p>Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Fedyk</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-294</link>
		<dc:creator>Mike Fedyk</dc:creator>
		<pubDate>Tue, 13 May 2008 07:46:16 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-294</guid>
		<description>Oh, if you don&#039;t like the oomkiller, there&#039;s a simple way to avoid having it activate.

Turn off overcommit.

echo 1 &gt; /proc/sys/vm/overcommit_memory

That sets overcommit into &quot;strict&quot; mode.  All allocations have to fit into swap + (physical memory * .5).

echo 100 &gt; /proc/sys/vm/overcommit_ratio
This sets how much memory counts towards the overcommit total.  The default is 50% of the system&#039;s physical memory is toward your CommitLimit (check /proc/meminfo).

This means you&#039;ll need a *lot* more swap and most won&#039;t ever be used since such a small part of the address space allocations (that&#039;s what AS means in Committed_AS), but you&#039;ll never have to worry about the oomkiller activating.

Mike</description>
		<content:encoded><![CDATA[<p>Oh, if you don&#8217;t like the oomkiller, there&#8217;s a simple way to avoid having it activate.</p>
<p>Turn off overcommit.</p>
<p>echo 1 &gt; /proc/sys/vm/overcommit_memory</p>
<p>That sets overcommit into &#8220;strict&#8221; mode.  All allocations have to fit into swap + (physical memory * .5).</p>
<p>echo 100 &gt; /proc/sys/vm/overcommit_ratio<br />
This sets how much memory counts towards the overcommit total.  The default is 50% of the system&#8217;s physical memory is toward your CommitLimit (check /proc/meminfo).</p>
<p>This means you&#8217;ll need a *lot* more swap and most won&#8217;t ever be used since such a small part of the address space allocations (that&#8217;s what AS means in Committed_AS), but you&#8217;ll never have to worry about the oomkiller activating.</p>
<p>Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Fedyk</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-291</link>
		<dc:creator>Mike Fedyk</dc:creator>
		<pubDate>Tue, 13 May 2008 04:17:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-291</guid>
		<description>Also, the active/inactive lists overlap with all other lists (with a few exceptions).  Swap is performed on the inactive list in reverse LRU order (to swap out the Least Recently Used pages first).  A high inactive and high cached count usually means you have a lot of memory used only once or twice (I forget if Linus&#039; use-once algorithm is still in the kernel).

With these numbers you can infer what is happening on the insides once you see how they react to various loads and the munin graph allows you to really &quot;see&quot; it.  And it allows you to show others easily without having to figure out a way to get the picture that is in your (my) head in a visual format.

Mike</description>
		<content:encoded><![CDATA[<p>Also, the active/inactive lists overlap with all other lists (with a few exceptions).  Swap is performed on the inactive list in reverse LRU order (to swap out the Least Recently Used pages first).  A high inactive and high cached count usually means you have a lot of memory used only once or twice (I forget if Linus&#8217; use-once algorithm is still in the kernel).</p>
<p>With these numbers you can infer what is happening on the insides once you see how they react to various loads and the munin graph allows you to really &#8220;see&#8221; it.  And it allows you to show others easily without having to figure out a way to get the picture that is in your (my) head in a visual format.</p>
<p>Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Fedyk</title>
		<link>http://blog.loftninjas.org/2008/05/12/more-linux-memory-debugging/comment-page-1/#comment-290</link>
		<dc:creator>Mike Fedyk</dc:creator>
		<pubDate>Tue, 13 May 2008 03:52:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.loftninjas.org/?p=191#comment-290</guid>
		<description>Hi,

Charles forwarded your message to me and I got it today.  I&#039;d look for a memory leak in a kernel module (probably a bad hardware driver) or some hidden userspace process.  

The reason why my calculations turned out to be total minus cached minus bunch_of_other_stuff is that apps cover several memory lists in most operating systems.  The files are mmapp()ed, so it counts as mapped, also it counts as cached which includes dirty memory (modified pages in memory) dirty blocks that don&#039;t map back to files on disk (think executables and libs) are put in swap.  There&#039;s a quick synopsis for you.  Contact me if you&#039;d like to get a bit more in depth.

Mike</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>Charles forwarded your message to me and I got it today.  I&#8217;d look for a memory leak in a kernel module (probably a bad hardware driver) or some hidden userspace process.  </p>
<p>The reason why my calculations turned out to be total minus cached minus bunch_of_other_stuff is that apps cover several memory lists in most operating systems.  The files are mmapp()ed, so it counts as mapped, also it counts as cached which includes dirty memory (modified pages in memory) dirty blocks that don&#8217;t map back to files on disk (think executables and libs) are put in swap.  There&#8217;s a quick synopsis for you.  Contact me if you&#8217;d like to get a bit more in depth.</p>
<p>Mike</p>
]]></content:encoded>
	</item>
</channel>
</rss>
