munin-cgi-graph with fcgid on ubuntu lucid

Two and a half years have passed since I wrote about running Munin with fastcgi triggered graphs on Debian etch. Unfortunately, not a lot has changed since then. A revolution in trending would have been nice. When I started here munin was triggering graph generation using CGI and was painfully slow to use. I switched over to cron triggered graph generation and was happy. After a data center migration, drawing the munin graphs for that cluster from cron was taking about 130 seconds. As a precaution I wanted to get this down a bit.

Someone asked me why munin-graph would have caused data loss because munin-update collects the data and I couldn’t remember. I had problems with both munin-update and munin-update taking over five minutes in certain circumstances back then. The latter was primarily from the slow response time of the SNMP queries I was doing against MSSQL servers. That was back during Munin 1.2 as well and a few things have changed since then, most relevant is that you no longer have to patch Munin for fastcgi support.

This time around I used fcgid instead of fastcgi. There are less licensing hurdles for fcgid, which was written to be compatible with fastcgi. Provided you already have munin running, install the prerequsites first.

sudo apt-get install libcgi-fast-perl libdate-manip-perl libapache2-mod-fcgid

The packaging should restart Apache as required to load the new module we just installed, but we need to configure our Munin site a bit to link our CGI script to fcgid. Add this to or update the VirtualHost block for your Apache configuration and reload Apache.

  ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

  <Directory /usr/lib/cgi-bin/>
    AllowOverride None
    Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all
  </Directory>

  <Location /cgi-bin/munin-fastcgi-graph>
    SetHandler  fastcgi-script
  </Location>

Add the following lines to your munin.conf. This causes the munin-graph that is run from cron to not generate any graphs (noops) and munin-html will update the img src links to use the CGI script to generate the graphs rather than linking directly to files. You’ll need to wait for the cron job to run once or run munin-html yourself to trigger this.

graph_strategy cgi
cgiurl_graph /cgi-bin/munin-fastcgi-graph

Triggering munin-html manually:

sudo -s
sudo -u munin /usr/share/munin/munin-html --debug

Remember that Apache needs to be able to write the graphs out. You will get no graphs and HTTP 500 errors in your Apache logs if the munin-cgi-graph script cannot write the graphs out. My Munin data directory, /var/www/munin/ is owned by ‘munin’ while Apache runs as ‘www-data’. The following commands fix this, but Apache is going to change the user ownership to ‘www-data’ when it saves a file by default, so if you try to switch back to munin-graph via cron, you’ll need to fix permissions again.

sudo chgrp -R www-data /var/www/munin
sudo chmod -R g+w /var/www/munin
sudo chgrp www-data /var/log/munin /var/log/munin/munin-graph.log
sudo chmod g+w /var/log/munin /var/log/munin/munin-graph.log

After the switch to fcgid generated munin graphs, generating all the graphs for a single node would take minutes and was quite painful. I gave the node more CPU resources, but it still took two minutes to draw a page of graphs. I ended up switching back to cron based graph generation. The additional CPU resources cut about forty seconds off the munin-graph time from cron, which is progress. Having the graphs immediately available when you need them is worth the cost of the CPU resources you could otherwise share that you would save from demand based graph generation via CGI. For the time being I intend to keep giving Munin more CPU until I find settle on a better way to do trending.

7 thoughts on “munin-cgi-graph with fcgid on ubuntu lucid

  1. Pingback: Tweets that mention munin-cgi-graph with fcgid on ubuntu lucid « btm.geek -- Topsy.com

  2. Aaron Peterson

    Boy, 2 minutes for a single host? That’s awful. Maybe there’s something amiss here. If it took 2 minutes per host with the cron-based strategy that would not scale past a handful of nodes. Is the method of calling via cgi doing something much slower? Is fork width/max_cgi_graph_jobs tuned to the cores and write characteristics of the underlying system?

    I’d want to know if it takes 2 minutes for the same page reloaded a minute or a day later. rrdtool graphs of different timeframes are generally produced at different rates, not on every request, cron or cgi. A daily graph every 5 minutes and weekly graphs every 30 minutes, etc.

    It would be interesting to find out if a pre-/re-warmed cache allows speedy enough event-driven graph generation.

  3. btm Post author

    Two minutes is drawing over eighty graphs. But these graphs don’t have to be drawn every time. A node summary page usually shows the daily and weekly graph, and the latter isn’t redrawn every time for either fastcgi or cron. A page for a particular monitor also includes monthly and yearly graphs, which don’t need to be redrawn every time either. So it doesn’t take two minutes a minute later. I don’t know about a day later, because I got enough win with the additional CPU to be able to table this and move on for now.

    The issue is that you don’t view every page every day, but when you do you don’t want to have to wait a few minutes for the pages to load.

  4. btm Post author

    Robert,

    It is CPU. Significant amount of CPU resources are needed for munin-graph to draw all of the graphs. Lower hanging fruit is usually mitigated quickly, like storing the graphs and RRDs in a ramdisk.

  5. Robert Sharp

    Thank you so much for the information. I have been having some trouble with this as well. It seems like it is a problem with my CPU but still havent been able to fix it.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.