[rrd-users] Getting an overview of many statistics...

Discussion:

(too old to reply)

Peter Valdemar Mørch

2015-05-29 09:38:24 UTC

Hi,

I'm looking for a little inspiration and experience here.

We have a customer that has about 400 interfaces and he'd like to get an
overview of "How these interfaces are doing". When there are more than
about 15-20, looking at each individual graph simply brakes down.

My user wants an idea of what the "normal" situation is, and information
about the worst outliers / extreme cases.

Looking at average and standard deviation is a possibility, but most of my
users (and I) really have no good intuitive feeling for what standard
deviation really "means". Plus "outlier/extreme" information is lost.

I've seen that smokeping does something interesting, see e.g.

Loading Image...

The "historgram" approach where darker grey implies more datapoints in this
"region" could be cool. This gives the overview. Have no idea how this is
accomplished, though.

I was thinking of using a "histogram" approach like above overlayed with
showing the actual graphs of the N worst outliers/extremes. But that
implies lots of scripting and analysis to create the histogram (I'm
guessing) and to identify the outliers.

So: What have you guys done when creating an overview of many statistics?
I'll leave you with this picture from the gallery:

Loading Image...

This is exactly the situation I want to avoid....

Sincerely,

Peter

--
Peter Valdemar MÃžrch
http://www.morch.com

Alex van den Bogaerdt

2015-05-29 10:38:08 UTC

Permalink

I would generate a number of squares, showing either green, amber or red.
Clicking on a square of interest would bring up detailed information for
that interface (together with an RRDtool graph).

The layout of the squares depend on what I'm looking at. It could be a
geographic map, a network map, or even just a matrix of 32 columns by as
many rows as needed.

All green is all good.

I would NOT include logic which compares one interface to the others. Bad is
still bad, even if other interfaces are also bad. If all interfaces are
doing equally bad, you would want to show all red, not all green.

Squares could show information from the last hour and last hour only. Or, if
so desired, a gradient from red to green, as a relative percentage of good
vs. bad. over the last 24 hours or so.
Squares could be divided into 2 or 4 triangles, showing independent
variables.
There will be some point where adding more information to the overview
results in less readability.

Defining 'normal', 'not so good' and 'really bad' is a challenge which needs
to be determined together with the customer. After all, these are his
interfaces and his expectations. Bandwidth utilisation near 100% and packet
loss would probably be important factors to make decisions. Maybe each
interface could have its own set of limits.

Input data for the script can come from 'rrdtool graph'. Do not use graphing
elements, use PRINT instead of GPRINT, and you can get averages, maxima, et
cetera to use in your decision tree. The more information you need to
extract, the more computing power will be needed.

Can RRDtool do the rest of what I suggested: no. RRDtool is not a graphing
program and although sometimes it is 'abused' as such, in many cases this
involves unnessesary complexity. Creating, filling and reading a database
every time just to display 24 columns (1 for each hour) is IMHO a waste of
resources. Just script it, or write a program in the language of choice.
Depending on how complex you want to make it, you could create the overview
page using a script generating html and css only, or create a complex
program which uses a graphics library and generates a clickable map.

I'm sure others will have more suggestions, or can even provide suggestions
for existing software to use instead of reinventing the wheel.

HTH
cheers,
Alex

----- Original Message -----
From: "Peter Valdemar Mørch" <***@morch.com>
To: <rrd-***@lists.oetiker.ch>
Sent: Friday, May 29, 2015 11:38 AM
Subject: [rrd-users] Getting an overview of many statistics...

Hi,

I'm looking for a little inspiration and experience here.

We have a customer that has about 400 interfaces and he'd like to get an
overview of "How these interfaces are doing". When there are more than
about 15-20, looking at each individual graph simply brakes down.

My user wants an idea of what the "normal" situation is, and information
about the worst outliers / extreme cases.

Looking at average and standard deviation is a possibility, but most of my
users (and I) really have no good intuitive feeling for what standard
deviation really "means". Plus "outlier/extreme" information is lost.

I've seen that smokeping does something interesting, see e.g.

http://oss.oetiker.ch/smokeping-demo/img/Customers/OP/james~octopus_last_10800.png

The "historgram" approach where darker grey implies more datapoints in this
"region" could be cool. This gives the overview. Have no idea how this is
accomplished, though.

I was thinking of using a "histogram" approach like above overlayed with
showing the actual graphs of the N worst outliers/extremes. But that
implies lots of scripting and analysis to create the histogram (I'm
guessing) and to identify the outliers.

So: What have you guys done when creating an overview of many statistics?
I'll leave you with this picture from the gallery:

http://oss.oetiker.ch/rrdtool/gallery/576_nodes.png

This is exactly the situation I want to avoid....

Sincerely,

Peter
--
Peter Valdemar Mørch
http://www.morch.com

--------------------------------------------------------------------------------

_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Simon Hobson

2015-05-29 11:07:53 UTC

Permalink

Looking at average and standard deviation is a possibility, but most of my users (and I) really have no good intuitive feeling for what standard deviation really "means".

+1, I don't either

I've seen that smokeping does something interesting, see e.g.
http://oss.oetiker.ch/smokeping-demo/img/Customers/OP/james~octopus_last_10800.png
The "historgram" approach where darker grey implies more datapoints in this "region" could be cool. This gives the overview. Have no idea how this is accomplished, though.

I'm not sure "dark = more" in the way you are expecting. I suspect it's more a case of shading ranges - so the central range (say the range that contains from 40% to 60% of the results when sorted by time) is drawn dark, the ranges either side of that are drawn lighter, and so on until the outmost ranges (eg from smallest to say 10% and 90% to largest) are drawn in light grey. Finally the median is drawn as a line - who's colour indicates packet loss.

The areas can be drawn three ways. Lets assume we have 11 values, representing the ping times for the fastest (t0), the 10th decile (t1), through to the slowest (t10).

We can draw t0 to t10 in very light grey, then overlay t1 to t9 in less light grey, t2 - t8, t3 - t7, and finally overlay t4 to t6 in dark grey/black.
Or we can draw t0 to t1, stack t1 to t2, stack t2 to t3, t4 - t4, t4- t6, t6-t7, t7-t8, t8-t9, and finally t9-t10.
Or we can draw 0-t10 in light grey, then draw 0-t9 in darker grey, and so on until you've drawn 0-t1 in light grey. Then draw 0-t0 in white to "erase" the bit between axis and lowest value.

Neither is right or wrong - personally I'd do it the first way, which would be (from memory) something like :
DEF:t010=t10,t0,-
DEF:t19=t9,t1,-
...
AREA:t0#FFFFFF00 <- *
AREA:t010#202020:STACK
AREA:t1#FFFFFF00 <- *
AREA:t19#404040:STACK
...
* Note that I've used full transparency to draw "nothing" from the axis up to the bottom of each range.

Then you need to draw the line, and again you need to generate bands and then draw several overlays. Again there is more than one way :
You can draw the line, in each colour, only where that colour is needed; or you can draw the line in each colour, overlaying each colour on top of the previous one.
Eg, you could draw a red line all the way, then draw the light blue line only where packet loss <19/20, and so on until you draw the green line only where loss=0. Or you can draw the red line only where loss >=19/20, the light blue line only where loss >=10/20 and <19/20, and so on. The line itself is drawn at the median value (t5 - bet you were wondering where that had gone !)
Something like this :
LINE:t5#FF0000
DEF:l10=loss,19,lt,t5,unkn,if
LINE:t5#0000FF
...
Which means, draw t5 in red, then calculate l10 which equals t5 where loss <19 otherwise it's set to unknown, then draw that in blue. Where L10 is unknown, then the line is not drawn and the red line shows through.
Repeat for the other steps.

So it's not actually all that hard to draw.

I would generate a number of squares, showing either green, amber or red.

Just be aware that those colours are the ones that are affected by the most common form of colour blindness (red-green deficiency), which affects about 1 in 7 males ! Many "red/amber/green" graphs are almost invisible to me (it depends on the area, the specific colours used, the display, ambient light, etc) and with some of them I really cannot see a change between the colour sections without blowing the screen up to make the areas larger.
A good example is the Unify software from Ubiquiti, where on one page, it shows a "health" bar for each access point showing "red" where there is a lot of competing traffic and packet loss and green for "good". The bars are thin, and it was a while before I even realised that there was a red section on some of them !

Alex Aminoff

2015-05-29 13:14:21 UTC

Permalink

Post by Simon Hobson

Looking at average and standard deviation is a possibility, but most of my users (and I) really have no good intuitive feeling for what standard deviation really "means".

+1, I don't either

I recommend "Full House", by Stephen Jay Gould, or other essays of his.

Summary of one of his most well-known explanations: Why are there no more
.400 hitters in baseball? Has the average quality of batters gone down, or
the average quality of pitchers gone up, or some change to the rules that
makes batting harder in general? No, none of those. What has happened is
that the variability of batting has shrunk. So there is less distance
between the very top batters and the rest of the (major league, already a
select group) batters.

Standard deviation is a measure of variability; I think of it as the range
in which an observed value is about 68% likely to be the result of random
chance (as opposed to being different from the expected value because of
some real cause).

If Babe Ruth bats .300 in 1915 and .320 in 1916 (I am making up these
numbers), you would not think it was a big deal, because a .20 difference
is batting average is pretty small compared to the standard deviation of
player batting averages at the time. Whereas if David Ortiz bats .300 in
2015 and .320 in 2016, you might be justified in thinking this is the
result of something he is doing differently, because the .20 difference is
big compared to the standard deviation of player batting averages in 2015.

Anyway, I wanted to respond to the OP with a script I wrote, attached. The
documentation is very scanty, but you never know when something will be
useful to someone.

- Alex Aminoff
BaseSpace.net
National Bureau of Economic Research (nber.org)