Munin monitoring Graph Performance

Goal

Speed up munins performance by moving the generated html files and graphs to a temporary file system (tmpfs)

Intro

Munin [1] (1.x) updates its data and graphs every 5 minutes. On a system that has a lot of monitoring to perform, the time to collect the data (munin-update),  generating the graphs (munin-graph), creating the html files (munin-html) and checking the limits (munin-limits) can exceed the 5 minutes available before the next update will be performed. The consequence are gaps in the graphs.

In the log files of munin the time needed for each of the four phases is documented. This can also be visualized with the munin processing time plugin. It shows that the graphing is the phase which takes the most time to complete.

Improvement ideas

Munin has a feature [2], which lets you skip the graphing performed during the periodical update and instead let the graphs be generated just when the browser demands the images. I have not tested this myself, but on the first view I found it rather complicated to set up and it also has the drawback, that you have to wait before graphing is complete to watch the graphs. So this was not an option for me.

I then found a blog post [3] on the internet which explained how to move the database files to a temporary file system. According to the author this will speed up the fetching and graphing time by reducing IO time. It is an interesting approach, but in my opinion it has a huge drawback: If the system gets powered off, the munin databases will be lost. The author recommends to copy the databases periodically to a directory on the hard disk. This leads to a trade off, because you have to weight the time improvement to backup the databases over the period of data loss you can take.

But this blog post finally brought me to a new idea. The database files are not written totally new each period, only few data will be deleted and few data will be added. So the time earned by avoiding IO will be short. Especially because reads from the disk may be already in the systems cache, as it was written 5 minutes before. The idea is to leave the database on the hard drive and just copy the graphs and html output to a temporary file system. This has several advantages: First no database will be lost if the system shuts down or crashes. Second this implies no data has to be backuped. And third the most IO consuming task, writing the html files and graphs to disk can be completely avoided. To be honest there is a (very) little drawback: After a restart it takes up to 5 minutes before you can watch the graphs again, as they have to be generated first. But they will contain all the history previously written to the databases.

On Debian 6.0 (Squeeze) the generated graphs and html files reside in /var/cache/munin/www. To make this directory available on a tmpfs add the following line to /etc/fstab:

tmpfs /var/cache/munin/www tmpfs rw,size=128M 0 0

The size of the tmpfs depends on specific environment. On my system munin takes about 70MByte for the html and graphing data. So a size of 128MByte for the tmpfs should be sufficient.

Now mount the tmpfs by running

mount /var/cache/munin/www

As the old data will be hidden you can remove it before mounting, to clean up the disk space by running

rm -r /var/cache/munin/www/*

Conclusion

On my system the graphing phase took around 180 seconds to complete. After switching to tmpfs the average graphing time dropped to 80 seconds. The overall munin processing time dropped by approximately 50%

Resources

One thought on “Munin monitoring Graph Performance

Leave a Reply to olafson Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>