Not a bug

Too many false positivies

InteractMarketing 9 years ago updated by Simon 6 years ago 12
I have noticed that since July there are too many false positives mostly time outs on ping or http access when the sites are up and running (Anturis report - critical, time out but site is available through a pc or a mobile device).

Any idea what is going on with Anturis? It became less reliable.

Under review

Could you, please, forward to support@anturis.com several emails with false positives?
Do you know, that your agent at server02.interacthosting.net is not connected. Some monitors use this agent and they could make alerts because of this?
I have disconnected that one as was creating stability issues under cPanel and any of those false positives messages were not related to that agent by to all Anturis agents - private and publlics.

Also I have just updated an agent on the logmon machine and now it wont start throwing that error:

The service Anturis Monitoring Agent could not start

Any clue what is wrong with it.
As I could see, logmon agent is connected now. It has latest version and sending data to server.

As to "false positives" - your site uscib.org were unavailable from the four different locations, including your own agent logmon:

Connectivity error on US-Michigan. Error 2: Timeout
[The target host or service is down or unreachable from this location.]
Connectivity error on US-Dallas. Error 2: Timeout
[The target host or service is down or unreachable from this location.]
Connectivity error on CA-Vancouver. Error 2: Timeout
[The target host or service is down or unreachable from this location.]
Connectivity error on logmon.interacthosting.net. Error 2: Timeout
[The target host or service is down or unreachable from this Anturis Agent.]

How this could be false positive?
I was able to access that site also that site is monitoring by another tool - WebSitePulse which didn't trigger any alert when Antruis did.

Right now your site have consistent packet loss even from our development office:

--- uscib.org ping statistics ---
36 packets transmitted, 34 received, 5% packet loss, time 35034ms
rtt min/avg/max/mdev = 119.461/171.909/302.044/59.376 ms

--- uscib.org ping statistics ---
42 packets transmitted, 41 received, 2% packet loss, time 41049ms
rtt min/avg/max/mdev = 119.280/128.972/260.028/27.852 ms

And from brazillia:

--- uscib.org ping statistics ---
72 packets transmitted, 70 received, 2% packet loss, time 71071ms
rtt min/avg/max/mdev = 151.137/161.398/165.338/2.702 ms

That's what I collected in last 5 minutes. You do have a problem with your provider.
Well, I have done my test an no packet lost:

From same state:
PING uscib.org ( 56(84) bytes of data.
--- uscib.org ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 7527ms
rtt min/avg/max/mdev = 17.592/21.528/23.491/1.882 ms

From the west coast:
--- uscib.org ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 1501ms
rtt min/avg/max/mdev = 118.066/118.310/118.715/0.487 ms

From UE (Paris):
--- uscib.org ping statistics ---
packets transmitted 4
received 4
packet loss 0 %
time 3004 ms
And from Poland - no packet loss
WebSitePulse utilize ping as well - no errors reported.
Please, try to sent at least 100 ping packets.

Right now from Netherlands, Amsterdam, from this very reliable datacenter https://www.exonet.nl/, from the server, that have no any network load:

--- uscib.org ping statistics ---
50 packets transmitted, 47 received, 6% packet loss, time 49005ms
rtt min/avg/max/mdev = 81.038/81.185/81.441/0.121 ms
Well so I have done pings with rounds of 100

Same state:
--- uscib.org ping statistics ---
100 packets transmitted, 99 received, 1% packet loss, time 100476ms
rtt min/avg/max/mdev = 15.761/35.858/669.807/66.991 ms

West Coast:
100 packets transmitted, 98 received, 2% packet loss, time 100173ms
rtt min/avg/max/mdev = 18.549/32.863/515.623/58.488 ms

Pakiety: Wysłane = 100, Odebrane = 98, Utracone = 2 (2% straty),
Szacunkowy czas błądzenia pakietów w millisekundach:
Minimum = 119 ms, Maksimum = 137 ms, Czas średni = 124 ms

Packet loss in a range of 1-2% are counted as "acceptable" and shouldn't trigger any critical errors especially as timed out to reach a destination.
Sorry, could not agree with you.

Persistent 1% packet loss - is an indication of problem.

2% packet loss - an indication of strong problem.

Packet loss 6%, as I show you, means that you network simply does not work.

And Anturis alerts were absolutely correct. But for sure it's up to you - resolve the problem or leave it as is.

CTO Anturis Inc.
Well it is your point of view.

1% pocket loss can be made by CPU 100% spike made by some process, disk operation on an array or NIC shortage not exactly an indication of problem not even talking about 2% as an indication of strong problem when most of the internet providers will tell you that value is acceptable. Anyway - that is regarding monitors using pings but not acceptable in a situation when there is a monitor using HTTP connection and I'm getting "timeout" - I should to see that error only when the site is not available. Packet loss it will increase only a load time in that case.

Regarding alerts and problems - I can tell same about Anturis infrastructure as packet may not to make back (full round trip - request sent, reached destination but response not reached back to a source) or high packet loss (like that 6% compare to 1-2% what I'm getting). Excellent example is Vancouver monitor from where I see continuous issues (high ping notices or time outs) and few months back Dallas when Anturis even agreed that there is a problem and fixed it.

I'm just saying that everything was OK up until those false positive messages grew to the point of not being acceptable, compared to a competitor like Web Site Pulse. What is the sense of renewing a monitoring service which is telling the admins that there is an issue (site down, appliance not available) when it is not true?