Discussion:
[rrd-users] Using Maximum values to avoid spikes?
(too old to reply)
Florio, Christopher N
2015-08-20 18:19:48 UTC
Permalink
Hey all,

I've got a home brew rrd file that I've made to keep track of ISC bind statistics.

I've set the max to 4,000,000,000 .... I acrtually see one of my hosts do 1.5G every night for a couple hours.

/usr/bin/rrdtool create \
/usr/share/cacti/rra/crush_net_unc_edu_query_116761.rrd \
--step 300 \
DS:query:COUNTER:600:0:4000000000 \
DS:notify:COUNTER:600:0:4000000000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \

So, what happens is, if the bind process gets restarted, the data goes back to zero and I get a 4G spike on my graph.

Any ideas on fixing that? Should I use derive instead of counter to fix it up?
Robert C. Seiwert
2015-08-20 18:38:55 UTC
Permalink
Of course looking at the docs I could be wrong
"Internally, derive works exactly like COUNTER but without overflow checks. So if your counter does not reset at 32 or 64 bit you might want to use DERIVE and combine it with a MIN value of 0."
"NOTE on COUNTER vs DERIVE

by Don Baarda <***@baesystems.com>

If you cannot tolerate ever mistaking the occasional counter reset for a legitimate counter wrap, and would prefer "Unknowns" for all legitimate counter wraps and resets, always use DERIVE with min=0. Otherwise, using COUNTER with a suitable max will return correct values for all legitimate counter wraps, mark some counter resets as "Unknown", but can mistake some counter resets for a legitimate counter wrap.

For a 5 minute step and 32-bit counter, the probability of mistaking a counter reset for a legitimate wrap is arguably about 0.8% per 1Mbps of maximum bandwidth. Note that this equates to 80% for 100Mbps interfaces, so for high bandwidth interfaces and a 32bit counter, DERIVE with min=0 is probably preferable. If you are using a 64bit counter, just about any max setting will eliminate the possibility of mistaking a reset for a counter wrap."


-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob=***@lists.oetiker.ch] On Behalf Of Robert C. Seiwert
Sent: Thursday, August 20, 2015 2:35 PM
To: 'Florio, Christopher N' <***@email.unc.edu>; rrd-***@lists.oetiker.ch
Subject: [GRAYMAIL] Re: [rrd-users] Using Maximum values to avoid spikes?

The problem I think is that COUNTER only detects a reset at the 32bit or 64bit border.

I think that DERIVE would give you a negative spike. You might try DCOUNTER. This is floating point which I know is not ideal for the application. The only substantial difference to COUNTER is that DCOUNTER can either be upward counting or downward counting, but not both at the same time. The current direction is detected automatically on the second non-undefined counter update and any further change in the direction is considered a reset. The new direction is determined and locked in by the second update after reset and its difference to the value at reset.

BTW, Nice garden!

-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob=***@lists.oetiker.ch] On Behalf Of Florio, Christopher N
Sent: Thursday, August 20, 2015 2:20 PM
To: rrd-***@lists.oetiker.ch
Subject: [GRAYMAIL] [rrd-users] Using Maximum values to avoid spikes?

Hey all,

I've got a home brew rrd file that I've made to keep track of ISC bind statistics.

I've set the max to 4,000,000,000 .... I acrtually see one of my hosts do 1.5G every night for a couple hours.

/usr/bin/rrdtool create \
/usr/share/cacti/rra/crush_net_unc_edu_query_116761.rrd \ --step 300 \
DS:query:COUNTER:600:0:4000000000 \
DS:notify:COUNTER:600:0:4000000000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \

So, what happens is, if the bind process gets restarted, the data goes back to zero and I get a 4G spike on my graph.

Any ideas on fixing that? Should I use derive instead of counter to fix it up?
Florio, Christopher N
2015-08-20 18:41:13 UTC
Permalink
OK I'll try the Derive then. I've already got the Min set to zero. We shall see!
Post by Robert C. Seiwert
Of course looking at the docs I could be wrong
"Internally, derive works exactly like COUNTER but without overflow checks. So if your counter does not reset at 32 or 64 bit you might want to use DERIVE and combine it with a MIN value of 0."
"NOTE on COUNTER vs DERIVE
If you cannot tolerate ever mistaking the occasional counter reset for a legitimate counter wrap, and would prefer "Unknowns" for all legitimate counter wraps and resets, always use DERIVE with min=0. Otherwise, using COUNTER with a suitable max will return correct values for all legitimate counter wraps, mark some counter resets as "Unknown", but can mistake some counter resets for a legitimate counter wrap.
For a 5 minute step and 32-bit counter, the probability of mistaking a counter reset for a legitimate wrap is arguably about 0.8% per 1Mbps of maximum bandwidth. Note that this equates to 80% for 100Mbps interfaces, so for high bandwidth interfaces and a 32bit counter, DERIVE with min=0 is probably preferable. If you are using a 64bit counter, just about any max setting will eliminate the possibility of mistaking a reset for a counter wrap."
-----Original Message-----
Sent: Thursday, August 20, 2015 2:35 PM
Subject: [GRAYMAIL] Re: [rrd-users] Using Maximum values to avoid spikes?
The problem I think is that COUNTER only detects a reset at the 32bit or 64bit border.
I think that DERIVE would give you a negative spike. You might try DCOUNTER. This is floating point which I know is not ideal for the application. The only substantial difference to COUNTER is that DCOUNTER can either be upward counting or downward counting, but not both at the same time. The current direction is detected automatically on the second non-undefined counter update and any further change in the direction is considered a reset. The new direction is determined and locked in by the second update after reset and its difference to the value at reset.
BTW, Nice garden!
-----Original Message-----
Sent: Thursday, August 20, 2015 2:20 PM
Subject: [GRAYMAIL] [rrd-users] Using Maximum values to avoid spikes?
Hey all,
I've got a home brew rrd file that I've made to keep track of ISC bind statistics.
I've set the max to 4,000,000,000 .... I acrtually see one of my hosts do 1.5G every night for a couple hours.
/usr/bin/rrdtool create \
/usr/share/cacti/rra/crush_net_unc_edu_query_116761.rrd \ --step 300 \
DS:query:COUNTER:600:0:4000000000 \
DS:notify:COUNTER:600:0:4000000000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
So, what happens is, if the bind process gets restarted, the data goes back to zero and I get a 4G spike on my graph.
Any ideas on fixing that? Should I use derive instead of counter to fix it up?
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Yannick Marquet
2015-08-20 23:18:19 UTC
Permalink
Hello,

When you reboot your ISC bind, data retrieve from your ISC bind counter
restart at zero. So on the next "data poll" your counter have been
wrapped. To avoid rrdtool compute aberrant values, and according to
documentation you should use DERIVE with MIN = 0. But be careful, they
create UNK value and because of you RRA definition :
"RRA:AVERAGE:0.5:1:500", in somecases, there may**have a hole in your
data instead of having a spike in your displayed graph.

Yma
Post by Florio, Christopher N
OK I'll try the Derive then. I've already got the Min set to zero. We shall see!
Post by Robert C. Seiwert
Of course looking at the docs I could be wrong
"Internally, derive works exactly like COUNTER but without overflow checks. So if your counter does not reset at 32 or 64 bit you might want to use DERIVE and combine it with a MIN value of 0."
"NOTE on COUNTER vs DERIVE
If you cannot tolerate ever mistaking the occasional counter reset for a legitimate counter wrap, and would prefer "Unknowns" for all legitimate counter wraps and resets, always use DERIVE with min=0. Otherwise, using COUNTER with a suitable max will return correct values for all legitimate counter wraps, mark some counter resets as "Unknown", but can mistake some counter resets for a legitimate counter wrap.
For a 5 minute step and 32-bit counter, the probability of mistaking a counter reset for a legitimate wrap is arguably about 0.8% per 1Mbps of maximum bandwidth. Note that this equates to 80% for 100Mbps interfaces, so for high bandwidth interfaces and a 32bit counter, DERIVE with min=0 is probably preferable. If you are using a 64bit counter, just about any max setting will eliminate the possibility of mistaking a reset for a counter wrap."
-----Original Message-----
Sent: Thursday, August 20, 2015 2:35 PM
Subject: [GRAYMAIL] Re: [rrd-users] Using Maximum values to avoid spikes?
The problem I think is that COUNTER only detects a reset at the 32bit or 64bit border.
I think that DERIVE would give you a negative spike. You might try DCOUNTER. This is floating point which I know is not ideal for the application. The only substantial difference to COUNTER is that DCOUNTER can either be upward counting or downward counting, but not both at the same time. The current direction is detected automatically on the second non-undefined counter update and any further change in the direction is considered a reset. The new direction is determined and locked in by the second update after reset and its difference to the value at reset.
BTW, Nice garden!
-----Original Message-----
Sent: Thursday, August 20, 2015 2:20 PM
Subject: [GRAYMAIL] [rrd-users] Using Maximum values to avoid spikes?
Hey all,
I've got a home brew rrd file that I've made to keep track of ISC bind statistics.
I've set the max to 4,000,000,000 .... I acrtually see one of my hosts do 1.5G every night for a couple hours.
/usr/bin/rrdtool create \
/usr/share/cacti/rra/crush_net_unc_edu_query_116761.rrd \ --step 300 \
DS:query:COUNTER:600:0:4000000000 \
DS:notify:COUNTER:600:0:4000000000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
So, what happens is, if the bind process gets restarted, the data goes back to zero and I get a 4G spike on my graph.
Any ideas on fixing that? Should I use derive instead of counter to fix it up?
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Donovan Baarda
2015-08-20 23:46:28 UTC
Permalink
Wow... I must have written that about 13 years ago.

I'm pretty sure it's all still 100% correct, but I think things like
DCOUNTER didn't exist then, so there might be other things worth
investigating now.
Post by Robert C. Seiwert
Of course looking at the docs I could be wrong
"Internally, derive works exactly like COUNTER but without overflow
checks. So if your counter does not reset at 32 or 64 bit you might want to
use DERIVE and combine it with a MIN value of 0."
"NOTE on COUNTER vs DERIVE
If you cannot tolerate ever mistaking the occasional counter reset for a
legitimate counter wrap, and would prefer "Unknowns" for all legitimate
counter wraps and resets, always use DERIVE with min=0. Otherwise, using
COUNTER with a suitable max will return correct values for all legitimate
counter wraps, mark some counter resets as "Unknown", but can mistake some
counter resets for a legitimate counter wrap.
For a 5 minute step and 32-bit counter, the probability of mistaking a
counter reset for a legitimate wrap is arguably about 0.8% per 1Mbps of
maximum bandwidth. Note that this equates to 80% for 100Mbps interfaces, so
for high bandwidth interfaces and a 32bit counter, DERIVE with min=0 is
probably preferable. If you are using a 64bit counter, just about any max
setting will eliminate the possibility of mistaking a reset for a counter
wrap."
-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob=
Sent: Thursday, August 20, 2015 2:35 PM
Subject: [GRAYMAIL] Re: [rrd-users] Using Maximum values to avoid spikes?
The problem I think is that COUNTER only detects a reset at the 32bit or
64bit border.
I think that DERIVE would give you a negative spike. You might try
DCOUNTER. This is floating point which I know is not ideal for the
application. The only substantial difference to COUNTER is that DCOUNTER
can either be upward counting or downward counting, but not both at the
same time. The current direction is detected automatically on the second
non-undefined counter update and any further change in the direction is
considered a reset. The new direction is determined and locked in by the
second update after reset and its difference to the value at reset.
BTW, Nice garden!
-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob=
Sent: Thursday, August 20, 2015 2:20 PM
Subject: [GRAYMAIL] [rrd-users] Using Maximum values to avoid spikes?
Hey all,
I've got a home brew rrd file that I've made to keep track of ISC bind
statistics.
I've set the max to 4,000,000,000 .... I acrtually see one of my hosts do
1.5G every night for a couple hours.
/usr/bin/rrdtool create \
/usr/share/cacti/rra/crush_net_unc_edu_query_116761.rrd \ --step 300 \
DS:query:COUNTER:600:0:4000000000 \
DS:notify:COUNTER:600:0:4000000000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
So, what happens is, if the bind process gets restarted, the data goes
back to zero and I get a 4G spike on my graph.
Any ideas on fixing that? Should I use derive instead of counter to fix
it up?
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
--
Donovan Baarda <***@minkirri.apana.org.au>
Florio, Christopher N
2015-08-20 23:48:32 UTC
Permalink
I've tested and derive with the min set to zero seems to work. I am much less concerned about a gap than these massive spikes. Thanks for the help!!!!

Sent from my iPhone

On Aug 20, 2015, at 7:46 PM, Donovan Baarda <***@minkirri.apana.org.au<mailto:***@minkirri.apana.org.au>> wrote:

Wow... I must have written that about 13 years ago.

I'm pretty sure it's all still 100% correct, but I think things like DCOUNTER didn't exist then, so there might be other things worth investigating now.

On 21 August 2015 at 04:38, Robert C. Seiwert <***@vcaglobal.com<mailto:***@vcaglobal.com>> wrote:
Of course looking at the docs I could be wrong
"Internally, derive works exactly like COUNTER but without overflow checks. So if your counter does not reset at 32 or 64 bit you might want to use DERIVE and combine it with a MIN value of 0."
"NOTE on COUNTER vs DERIVE

by Don Baarda <***@baesystems.com<mailto:***@baesystems.com>>

If you cannot tolerate ever mistaking the occasional counter reset for a legitimate counter wrap, and would prefer "Unknowns" for all legitimate counter wraps and resets, always use DERIVE with min=0. Otherwise, using COUNTER with a suitable max will return correct values for all legitimate counter wraps, mark some counter resets as "Unknown", but can mistake some counter resets for a legitimate counter wrap.

For a 5 minute step and 32-bit counter, the probability of mistaking a counter reset for a legitimate wrap is arguably about 0.8% per 1Mbps of maximum bandwidth. Note that this equates to 80% for 100Mbps interfaces, so for high bandwidth interfaces and a 32bit counter, DERIVE with min=0 is probably preferable. If you are using a 64bit counter, just about any max setting will eliminate the possibility of mistaking a reset for a counter wrap."


-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob<mailto:rrd-users-bounces%2Brob>=***@lists.oetiker.ch<mailto:***@lists.oetiker.ch>] On Behalf Of Robert C. Seiwert
Sent: Thursday, August 20, 2015 2:35 PM
To: 'Florio, Christopher N' <***@email.unc.edu<mailto:***@email.unc.edu>>; rrd-***@lists.oetiker.ch<mailto:rrd-***@lists.oetiker.ch>
Subject: [GRAYMAIL] Re: [rrd-users] Using Maximum values to avoid spikes?

The problem I think is that COUNTER only detects a reset at the 32bit or 64bit border.

I think that DERIVE would give you a negative spike. You might try DCOUNTER. This is floating point which I know is not ideal for the application. The only substantial difference to COUNTER is that DCOUNTER can either be upward counting or downward counting, but not both at the same time. The current direction is detected automatically on the second non-undefined counter update and any further change in the direction is considered a reset. The new direction is determined and locked in by the second update after reset and its difference to the value at reset.

BTW, Nice garden!

-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob<mailto:rrd-users-bounces%2Brob>=***@lists.oetiker.ch<mailto:***@lists.oetiker.ch>] On Behalf Of Florio, Christopher N
Sent: Thursday, August 20, 2015 2:20 PM
To: rrd-***@lists.oetiker.ch<mailto:rrd-***@lists.oetiker.ch>
Subject: [GRAYMAIL] [rrd-users] Using Maximum values to avoid spikes?

Hey all,

I've got a home brew rrd file that I've made to keep track of ISC bind statistics.

I've set the max to 4,000,000,000 .... I acrtually see one of my hosts do 1.5G every night for a couple hours.

/usr/bin/rrdtool create \
/usr/share/cacti/rra/crush_net_unc_edu_query_116761.rrd \ --step 300 \
DS:query:COUNTER:600:0:4000000000 \
DS:notify:COUNTER:600:0:4000000000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \

So, what happens is, if the bind process gets restarted, the data goes back to zero and I get a 4G spike on my graph.

Any ideas on fixing that? Should I use derive instead of counter to fix it up?

_______________________________________________
rrd-users mailing list
rrd-***@lists.oetiker.ch<mailto:rrd-***@lists.oetiker.ch>
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

_______________________________________________
rrd-users mailing list
rrd-***@lists.oetiker.ch<mailto:rrd-***@lists.oetiker.ch>
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users




--
Donovan Baarda <***@minkirri.apana.org.au<mailto:***@minkirri.apana.org.au>>
Loading...