Not a bug

Too many false positivies

11 years ago • updated by 8 years ago • 12

I have noticed that since July there are too many false positives mostly time outs on ping or http access when the sites are up and running (Anturis report - critical, time out but site is available through a pc or a mobile device).

Any idea what is going on with Anturis? It became less reliable.

Thanks

Vote

Replies 12
Oldest first
- Newest first
- Oldest first

Under review

11 years ago

Hello

Could you, please, forward to support@anturis.com several emails with false positives?

11 years ago

Do you know, that your agent at server02.interacthosting.net is not connected. Some monitors use this agent and they could make alerts because of this?

11 years ago

I have disconnected that one as was creating stability issues under cPanel and any of those false positives messages were not related to that agent by to all Anturis agents - private and publlics.

Also I have just updated an agent on the logmon machine and now it wont start throwing that error:

The service Anturis Monitoring Agent could not start

Any clue what is wrong with it.

11 years ago

Hello
As I could see, logmon agent is connected now. It has latest version and sending data to server.

As to "false positives" - your site uscib.org were unavailable from the four different locations, including your own agent logmon:

Connectivity error on US-Michigan. Error 2: Timeout
[The target host or service is down or unreachable from this location.]
Connectivity error on US-Dallas. Error 2: Timeout
[The target host or service is down or unreachable from this location.]
Connectivity error on CA-Vancouver. Error 2: Timeout
[The target host or service is down or unreachable from this location.]
Connectivity error on logmon.interacthosting.net. Error 2: Timeout
[The target host or service is down or unreachable from this Anturis Agent.]

How this could be false positive?

11 years ago

I was able to access that site also that site is monitoring by another tool - WebSitePulse which didn't trigger any alert when Antruis did.

11 years ago

Hello

Right now your site have consistent packet loss even from our development office:

--- uscib.org ping statistics ---

36 packets transmitted, 34 received, 5% packet loss, time 35034ms

rtt min/avg/max/mdev = 119.461/171.909/302.044/59.376 ms

--- uscib.org ping statistics ---

42 packets transmitted, 41 received, 2% packet loss, time 41049ms

rtt min/avg/max/mdev = 119.280/128.972/260.028/27.852 ms

And from brazillia:

--- uscib.org ping statistics ---

72 packets transmitted, 70 received, 2% packet loss, time 71071ms

rtt min/avg/max/mdev = 151.137/161.398/165.338/2.702 ms

That's what I collected in last 5 minutes. You do have a problem with your provider.

11 years ago

Well, I have done my test an no packet lost:

From same state:
PING uscib.org (69.176.101.68) 56(84) bytes of data.
--- uscib.org ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 7527ms
rtt min/avg/max/mdev = 17.592/21.528/23.491/1.882 ms

From the west coast:
--- uscib.org ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 1501ms
rtt min/avg/max/mdev = 118.066/118.310/118.715/0.487 ms

From UE (Paris):
--- uscib.org ping statistics ---

packets transmitted	4
received	4
packet loss	0 %
time	3004 ms

And from Poland - no packet loss
WebSitePulse utilize ping as well - no errors reported.

11 years ago

Please, try to sent at least 100 ping packets.

Right now from Netherlands, Amsterdam, from this very reliable datacenter https://www.exonet.nl/, from the server, that have no any network load:

--- uscib.org ping statistics ---

50 packets transmitted, 47 received, 6% packet loss, time 49005ms

rtt min/avg/max/mdev = 81.038/81.185/81.441/0.121 ms

11 years ago

Well so I have done pings with rounds of 100

Same state:
--- uscib.org ping statistics ---
100 packets transmitted, 99 received, 1% packet loss, time 100476ms
rtt min/avg/max/mdev = 15.761/35.858/669.807/66.991 ms

West Coast:
100 packets transmitted, 98 received, 2% packet loss, time 100173ms
rtt min/avg/max/mdev = 18.549/32.863/515.623/58.488 ms

Poland:
Pakiety: Wysłane = 100, Odebrane = 98, Utracone = 2 (2% straty),
Szacunkowy czas błądzenia pakietów w millisekundach:
Minimum = 119 ms, Maksimum = 137 ms, Czas średni = 124 ms

Packet loss in a range of 1-2% are counted as "acceptable" and shouldn't trigger any critical errors especially as timed out to reach a destination.

11 years ago

Sorry, could not agree with you.

Persistent 1% packet loss - is an indication of problem.

2% packet loss - an indication of strong problem.

Packet loss 6%, as I show you, means that you network simply does not work.

And Anturis alerts were absolutely correct. But for sure it's up to you - resolve the problem or leave it as is.

BR,
Konstantin,
CTO Anturis Inc.

11 years ago

Well it is your point of view.

1% pocket loss can be made by CPU 100% spike made by some process, disk operation on an array or NIC shortage not exactly an indication of problem not even talking about 2% as an indication of strong problem when most of the internet providers will tell you that value is acceptable. Anyway - that is regarding monitors using pings but not acceptable in a situation when there is a monitor using HTTP connection and I'm getting "timeout" - I should to see that error only when the site is not available. Packet loss it will increase only a load time in that case.

Regarding alerts and problems - I can tell same about Anturis infrastructure as packet may not to make back (full round trip - request sent, reached destination but response not reached back to a source) or high packet loss (like that 6% compare to 1-2% what I'm getting). Excellent example is Vancouver monitor from where I see continuous issues (high ping notices or time outs) and few months back Dallas when Anturis even agreed that there is a problem and fixed it.

I'm just saying that everything was OK up until those false positive messages grew to the point of not being acceptable, compared to a competitor like Web Site Pulse. What is the sense of renewing a monitoring service which is telling the admins that there is an issue (site down, appliance not available) when it is not true?

Not a bug

11 years ago

Customer support service by UserEcho