Having switched over to a Galera database clustering setup for some of my applications recently, I needed to be able to monitor the health of the database cluster and its nodes. My environment uses Opsview for monitoring, which is basically a nice wrapper around Nagios for configuration and reporting, so you can use regular Nagios scripts. While looking for an existing Nagios script for Galera, I only found scripts that did not monitor the vital variables of Galera and didn’t fit my needs. So I decided to create a new one and share it by putting it on GitHub!
Running this is pretty straightforward, you should run it against all nodes in the cluster and it needs a hostname/IP address, username, password, optional port number and the number of nodes in the cluster you want to alert on. There’s a warning level and a critical level that can be used to send out different types of alerts.
For instance, you have a cluster of 4 nodes and you want a warning when 1 fails (because 3 can handle the load), but a critical alert when 2 have failed (as the 2 remaining would probably be overloaded). You can do this by setting –nodes-warn=3 –nodes-crit=2 (the number is equal or lower than).
<small>./check_galera_status.pl --user=myuser --password=mypassword --host=db01.local --port=3306 --nodes-warn=2 --nodes-crit=1</small>
By the way, if you have a MySQL or MariaDB replication setup and are not familiar with Galera, I definitely recommend having a look at Galera. It is a synchronous multi-master platform for InnoDB. Basically the fix for MySQL replication and the pain it is to manage multi-master replication or even master-slave on a large data scale. No need to configure auto increment with different increments, or freezing up databases (or transferring the entire database) when you’re synchronising a slave that was broken somewhere along the line. Galera does incremental replication fixes to nodes that left and re-entered the cluster. Also, adding a node is way easier then with replication…just add it to the cluster and one of the existing nodes will donate its time to synchronise this new node.
If you want to know more, have a look at this introduction tutorial.