At Ragnarson we believe that monitoring every aspect of the infrastructure is crucial for a long-term maintenance of infrastructure projects. This is extremely important, it is unlikely for us to take on projects with no budget for the monitoring. But even if you set up alerts for every service, there is one aspect of the monitoring which is overlooked by many beginner engineers.
I have seen and heard many times about infrastructure failing together with monitoring. The app goes down, but the same issue that causes the app to fail also takes down the monitoring service. No one noticed until the customer called and asked why the website is down. It happened to me in the early days of my career. Now setting up additional guards for the monitoring is a mandatory feature of any monitoring system we set up.
This important topic is rarely discussed and difficult to find resources for using Google or any other search methods. In this blog post, I will discuss how you to prevent situations when your infrastructure is down and you do not notice.