Discussion:
[rrd-users] Graphing inconsistencies graphing different time slices
(too old to reply)
Munroe Sollog
2015-06-08 13:27:07 UTC
Permalink
I have an inconsistency when I graph the same event over 10days versus 48hrs. I would expect
because I am using MAX that the peak on the 48hr graph and the 10day graph for the *same* event
would still read the same number. However, the peak for Sunday noon on the 48hr graph is at 600,
and the peak for that same period on the 10day graph is *higher* at 1.1k. I believe I have
included all the necessary files below. I didn't want to paste the raw info here as it would be
very large. Any insight would be appreciated. Thanks.

RRDtool 1.4.7 Copyright 1997-2012 by Tobias Oetiker <***@oetiker.ch>
Compiled Apr 5 2012 17:36:08

rrd info: https://pastee.org/qqxg8
48hr graph command: https://pastee.org/6yahu
10d graph command: https://pastee.org/gpk24


48hr graph: Loading Image...
10d graph: Loading Image...




- --
Munroe Sollog
LTS - Network Analyst
x85002
Simon Hobson
2015-06-08 13:47:49 UTC
Permalink
Post by Munroe Sollog
I have an inconsistency when I graph the same event over 10days versus 48hrs. I would expect
because I am using MAX that the peak on the 48hr graph and the 10day graph for the *same* event
would still read the same number. However, the peak for Sunday noon on the 48hr graph is at 600,
and the peak for that same period on the 10day graph is *higher* at 1.1k. I believe I have
included all the necessary files below. I didn't want to paste the raw info here as it would be
very large. Any insight would be appreciated.
You are adding up different max values. Consider this set of data :
a1 = 1 5 3 7 4
b1 = 2 6 9 1 4

If you do a graph of max (max a + max b) without any consolidation then you get the values :
m1 = 2 11 12 8 8

If you consolidate 5 values into one, you then get :
a2 ave = 4
a2 max = 7
b2 ave = 4.4
b2 max = 9

m2 (= max (max a2 + max b2)) = 16

So when you draw a graph over a longer time period, depending on the consolidation, you may see your maximum value go up (with these made up figures, by 1/3) when you add max values together.
Simon Hobson
2015-06-08 14:01:16 UTC
Permalink
Post by Simon Hobson
So when you draw a graph over a longer time period, depending on the consolidation, you may see your maximum value go up (with these made up figures, by 1/3) when you add max values together.
To add ...

I had something similar for some of my servers - I wanted a maximum mail queue size across several servers. Simply adding the max queue sizes doesn't work for the reasons I gave before, eg if 3 servers all have a queue size of 1 but at different times, the maximum cumulative queue size is 1, not 3.
I wrote a script* that would periodically query the stats for each server, get the latest figures, add them, and put them into another RRD file for the totals. So as well as RRD files for each server, I have one for the combined values. As long as the script keeps the combined file up to date enough, it can use unconsolidated values from the individual servers.

This combined RRD can then be consolidated and return the correct max values.

* Which turns out to be more complicated than you might think !
It needs to find out the last complete "bucket" of data for each of the source files and see if the oldest of these is newer than that for the combined file - and if so, then update the combined file. You need to allow for delayed data from one or more servers (I'm using RRD cached which can delay updates unless you keep flushing it which defeats the object of the cacheing).
But then, you need to allow for one (or more) servers being offline (as far as stats collection goes), so you need to decide how long to wait before taking an individual value as NaN.
Munroe Sollog
2015-06-08 14:10:27 UTC
Permalink
Very interesting. Thank you for clarifying it.

My problem is, as you can see from the graphing function, I am arbitrarily taking numerous rrds
and adding them together to get a cumulative graph. While some of these summation graphs are
predictable, many are not; thus I can't store a cumulative rrd in all situations.

My goal is to minimize this disparity that is caused by taking the MAX of a consolidated function.
I am going to see if using the AVG instead of the MAX of the consolidated function will help, but
if I really want to fix this, my only recourse is to not consolidate? I'd be storing a 1years
worth of 1min slices for ~5000 devices.
Post by Simon Hobson
Post by Simon Hobson
So when you draw a graph over a longer time period, depending on the consolidation, you may
see your maximum value go up (with these made up figures, by 1/3) when you add max values
together.
To add ...
I had something similar for some of my servers - I wanted a maximum mail queue size across
several servers. Simply adding the max queue sizes doesn't work for the reasons I gave before,
eg if 3 servers all have a queue size of 1 but at different times, the maximum cumulative queue
size is 1, not 3. I wrote a script* that would periodically query the stats for each server,
get the latest figures, add them, and put them into another RRD file for the totals. So as well
as RRD files for each server, I have one for the combined values. As long as the script keeps
the combined file up to date enough, it can use unconsolidated values from the individual
servers.
This combined RRD can then be consolidated and return the correct max values.
* Which turns out to be more complicated than you might think ! It needs to find out the last
complete "bucket" of data for each of the source files and see if the oldest of these is newer
than that for the combined file - and if so, then update the combined file. You need to allow
for delayed data from one or more servers (I'm using RRD cached which can delay updates unless
you keep flushing it which defeats the object of the cacheing). But then, you need to allow for
one (or more) servers being offline (as far as stats collection goes), so you need to decide
how long to wait before taking an individual value as NaN.
_______________________________________________ rrd-users mailing list
- --
Munroe Sollog
LTS - Network Analyst
x85002
Simon Hobson
2015-06-08 14:58:32 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Very interesting. Thank you for clarifying it.
My problem is, as you can see from the graphing function, I am arbitrarily taking numerous rrds
and adding them together to get a cumulative graph. While some of these summation graphs are
predictable, many are not; thus I can't store a cumulative rrd in all situations.
My goal is to minimize this disparity that is caused by taking the MAX of a consolidated function.
I am going to see if using the AVG instead of the MAX of the consolidated function will help, but
if I really want to fix this, my only recourse is to not consolidate? I'd be storing a 1years
worth of 1min slices for ~5000 devices.
That would be a lot of data to store, and process each time you draw a graph. Quick "finger in the air" guestimate suggests a few gigs at least. It also needs a lot of memory to make graphs.

AFAIK there is no way to consolidate a consolidation and still get an accurate MAX function, so you either have to store and work with unconsolidated data, or aggregate the data before consolidation.

It sounds like you may need to sit down and decide what's important and whether there are any simplifications you can make. Eg, if the devices naturally fall into groups, then consider if aggregating data by group would do what you want - even if the groups are dynamic in membership it's still possible to aggregate the data at the cost of more complexity in the scripts that do it.
Steve Shipway
2015-06-08 22:02:07 UTC
Permalink
(I just noticed that Simon Hobson has already said pretty much the same
thing (including an example!) already in a separate post. However I'll
leave this response on the list just in case it can help give any additional
clarification)
Post by Munroe Sollog
I have an inconsistency when I graph the same event over 10days versus
48hrs. I would expect because I am using MAX that the peak on the 48hr
graph and the 10day graph for the *same* event would still read the same
number. However, the peak for Sunday noon on the 48hr graph is at 600,
and the peak for that same period on the 10day graph is *higher* at 1.1k.
I
Post by Munroe Sollog
believe I have included all the necessary files below. I didn't want to
paste
Post by Munroe Sollog
the raw info here as it would be very large. Any insight would be
appreciated.

This is because the max of the sum is not the same as the sum of the max for
two series.

max( a + b ) != max( a ) + max( b )

Your problem is that, in the two graphs, you are likely using different RRAs
which have been consolidated BEFORE you do your addition-and-max
calculations, resulting in different outcomes.

Here is an example.

Series 1 (highest-granularity RRA):
1 2 3 4 5 6
Series 2 (highest-granularity RRA):
12 11 10 9 8 7

Now, working on the highest granularity, we add the series and get:
13 13 13 13 13 13

Take the max of this == 13.

However, if we have a lower-granularity RRA that consolidates 2 data points
using MAX, the series will become:
Series 1 (low-granularity RRA):
2 4 6
Series 2 (low-granularity RRA):
12 10 8

Then, adding the value together gives:
14 14 14

Take the max of this series and you get 14 -- the higher-granularity RRA
results in a lower value.

How to avoid this --

The only ways are to do the calculation before there is any consolidation at
all in every case. Try one of these options:
1. Create a separate RRD file that stores the sum of the values, and write
to this every interval as you update the other RRD files. Then report on
this for the total line. This is the easiest for rrdgraph, but it may be
difficult if your updates for the components come in at different times.
2. Extend the highest-granularity RRA to cover all 10 days (or more). Then,
in your rrdgraph command, force the use of this higher-granularity RRA, even
though it would default to the lower-granularity one. This means a lot more
consolidation done on the fly, but it gets done after the calculation. This
is probably OK for a 10dy graph, but I wouldn't want to do it on a 10yr
graph.

Steve

Steve Shipway
***@auckland.ac.nz

Loading...