Hi, I am the top manager of the Uptimestat company, I monitor the failures of various services. On our website https://uptimestat.ru/ you can see the services we work with. Using third-party monitoring services, it is almost impossible to miss the problem. Except if the cases of caching the site somewhere on the way, but then in this case, his customers will see, right? Although, if a little to understand the additional settings, then here you can find ways to reliably and unambiguously check. An important parameter here is the monitoring interval. Checking the site every half hour, you need to be prepared for the fact that you will really learn about the problem only in half an hour.
The check algorithm provides for verification from several servers, if short-term failures do occur, which are not at all failures in terms of their own importance, then it is possible to delay the alert until clarification of the circumstances:
This means that after 3 minutes the site will be checked again, and if the problem is not solved by itself, then they will announce the alarm. Why can this happen? Network lag, reboot of network or server hardware, technical work on the server, the peak load on the server, or even just suddenly a little ping. You never know what. SLA is not 100% guaranteed by any hosting. Thus, short-term failures are filtered out.
What else is important and interesting - this delay can be set individually for each contact. For example, a completely working scheme:
The site administrator / developer receives an alert immediately.
Head of department - in 30 minutes, it's time to provide help if the problem is serious
Project manager - after 1 or 3 hours, at that time it is already time to look for excuses for clients, if the problem is still not resolved
That is, it is possible to prudently adjust everything so that the motivating kicks and Valuable Directions begin to arrive exactly at that moment when you really cannot understand it without them. There are companies and people who value the personal time of employees. And this is very commendable. For such cases, it is possible to customize the work schedule. This is very convenient if the “night admin” position is provided (or not even the administrator — just a non IT specialist can also reboot the server) or, for example, there are representative offices in different time zones and you can divide the areas of responsibility by time.