0
Fixed

Anturis central system alerting intermitten loss of server agent metrics

gumbuya 6 years ago updated by Konstantin (Here to help) 6 years ago 6
In recent days, we have been receiving a flood of alerts about the loss of server agent metric reports, only to have them restored shortly after.

Is the central system that collects the agents' data going through some sort of update? There does not appear to be anything wrong with our servers/agents (and for all of them to fail at the same time is one massive coincidence).

We tried configuring the monitors to alert at a higher number of successive failures, but it seems that rule does not apply to this type of situation; we still getting massive alert burst whenever the central system fails to pick up metrics from our servers.
Hi,
At that time we had a problem with one of our server that affected part of our customers. The problem was caused by our current hosting provider. It is solved by now. We are already in process of migration to another provider's dedicated hosting.
Sorry for any inconvenience.
Thank you about the infrastructure situation. But how do we tackle the issue of not wanting the alert system to immediately raise alarm the moment it fails to receive metric reports from agents?

Would like the rule to follow the configuration we set (currently to 5 subsequent failures).
We're getting these alerts periodically. There should be some way to adjust tolerance for this.
Hi,
The system now should respect what you configured in Status Rule. If this is not the case please forward such fake incident notifications to support@anturis.com and we will check what's going on. E.g. one such glitch happened yesterday at 09:54 GMT and affected a portion of customers.
Thank you again for bringing this up.
We still received an alert yesterday late afternoon. From my observation i believe it was because the memory monitor (which triggered the incident) was still set at 2 failures (2 minutes), even though memory was not set to raise any error or warning conditions.

Have adjusted memory monitors to 5 failures, will see how that goes. thanks.
Thanks. Could you also please forward all such incidents to support@anturis.com? Early next week we migrate to another hosting which should improve this.