Discussion:
[rrd-users] find top 10 in > 4000 rrd files
(too old to reply)
Rob Hassing
2014-10-02 08:29:35 UTC
Permalink
I am measuring the bandwidth usage of over 4000 ports in a network using
sFlow.

The sFlow daemon I use generates a rrd file for each port.

So I have over 4000 rrd files named: x.x.x.x-Y.rrd
Where x.x.x.x is the ip address of the host and y is the portindex number.

Now I would like to find the top 10 of bandwidth usage in these 4000 files.

Anyone done something like this before or does somebody have an idea on how
to do this?

Thank you very much in advance

Best regards,
Rob Hassing



--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/find-top-10-in-4000-rrd-files-tp7582509.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
Martin Sperl
2014-10-02 08:49:42 UTC
Permalink
Here a simple way to do it:

for x in *.rrd; do
rrdtool graph /dev/null \
--start ... --end ... --step ... \
DEF:x=$x:column:AVERAGE \
VDEF:y=x,MAXIMUM \
PRINT:y:$x=%lf
done \
| grep = \
| sort -t= -nrk 2 \
| head -10

and you get the top 10

Set start,step,end and the column you are interested in.
possibly modify the vdef to what you need to measure.

Note: this assumes no ":" or "=" in the filename as you would otherwise
need to escape the ":" in the filename and then use a different
separator besides "=" for output/sorting !

Martin
Post by Rob Hassing
I am measuring the bandwidth usage of over 4000 ports in a network using
sFlow.
The sFlow daemon I use generates a rrd file for each port.
So I have over 4000 rrd files named: x.x.x.x-Y.rrd
Where x.x.x.x is the ip address of the host and y is the portindex number.
Now I would like to find the top 10 of bandwidth usage in these 4000 files.
Anyone done something like this before or does somebody have an idea on how
to do this?
Thank you very much in advance
Best regards,
Rob Hassing
--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/find-top-10-in-4000-rrd-files-tp7582509.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Simon Hobson
2014-10-02 08:52:14 UTC
Permalink
Post by Rob Hassing
I am measuring the bandwidth usage of over 4000 ports in a network using
sFlow.
The sFlow daemon I use generates a rrd file for each port.
So I have over 4000 rrd files named: x.x.x.x-Y.rrd
Where x.x.x.x is the ip address of the host and y is the portindex number.
Now I would like to find the top 10 of bandwidth usage in these 4000 files.
There's nothing in RRD to do this. I think what you'll need to do is use rrdtool fetch or graph* to get the data you want from each file, stuffing it into a "normal" database - and then do your query on the database you've created.

* rrdtool fetch will only give you actual values stored in the RRD file - multiple values if you ask for a period other than a single CDP. If you use rrdtool graph, you can use all the RPN stuff to munge data and then use "PRINT" (not "GPRINT") to output the (I assume) single value you are after.

Alternatively, you might run a periodic task to fetch certain data from your RRDs and insert/update a "normal" database. You could then run queries against that database.
For example, suppose you have data consolidated to 24 hour resolution in your RRDs, and shortly after midnight you pull this and update your (eg) SQL database. If you want "top 10 for the previous 7 days" then you do a query, group by x.x.x.x-Y, and with a total(d) in the select clause - sort by total(d) desc and you've got your top 10 in the first 10 results out.


Neither way is right or wrong, they just have different tradeoffs. The first method involves a lot of work (and modest storage) when you want to run the query, but nothing at other times. The second method involves periodic work and more storage even if you don't make any queries - but when you do run queries the results will come out quicker. So a lot depends on how often you will want to be running this.
David Thornton
2014-10-02 14:18:47 UTC
Permalink
I'd like to suggest that you could sped this up big time on a multi core
device by using "parallel"

http://savannah.gnu.org/projects/parallel/

You do the extraction in parallel:

rrdtool graph /dev/null \
--start ... --end ... --step ... \
DEF:x=$x:column:AVERAGE \
VDEF:y=x,MAXIMUM
PRINT:y:$x=%lf
grep = > /ramdisk/$ip.$id.data

and then you cat all the small little files, and sort , print.

(probaby the rrdtool command line is what takes most of the computing
power.)

David
Post by Rob Hassing
I am measuring the bandwidth usage of over 4000 ports in a network using
sFlow.
The sFlow daemon I use generates a rrd file for each port.
So I have over 4000 rrd files named: x.x.x.x-Y.rrd
Where x.x.x.x is the ip address of the host and y is the portindex number.
Now I would like to find the top 10 of bandwidth usage in these 4000 files.
Anyone done something like this before or does somebody have an idea on how
to do this?
Thank you very much in advance
Best regards,
Rob Hassing
--
http://rrd-mailinglists.937164.n2.nabble.com/find-top-10-in-4000-rrd-files-tp7582509.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Steve Shipway
2014-10-02 20:40:15 UTC
Permalink
Post by Rob Hassing
I am measuring the bandwidth usage of over 4000 ports in a network using
sFlow.
...
Post by Rob Hassing
Now I would like to find the top 10 of bandwidth usage in these 4000
files.
Post by Rob Hassing
Anyone done something like this before or does somebody have an idea on
how to do this?
The Routers2 frontend for MRTG/RRD has a feature to do similar to this,
ordering the component interface graphs on the Compact display or summary
page by current or average in or out. It also lets you suppress interfaces
with a zero usage, though it doesn't order by the total usage.

However, with 4000+ RRD files, there's a fair amount of processing required,
particularly if you want to order by the total usage (IE, sum[time
period](avg x interval)). To order by daily total, this is the same as
ordering by daily average (because daily total = daily average rate x 1day),
so make sure you have an RRA that pre-calculates this so that you can
retrieve the daily average with a single rrdfetch of a single value per RRD.
Then you'll just want to sort your resulting 4000+ element array to get the
top 10.

Steve

Steve Shipway
***@auckland.ac.nz
Tobi Oetiker
2014-10-03 05:23:29 UTC
Permalink
note that rrdgraph CAN do large amounts of rrds ... just make sure to use 1.4.9 ... 50000 DEFs are well within the realm of of possibility.

Tobias Oetiker
***@oetiker.ch
062 775 9902
Post by Rob Hassing
Post by Rob Hassing
I am measuring the bandwidth usage of over 4000 ports in a network using
sFlow.
...
Post by Rob Hassing
Now I would like to find the top 10 of bandwidth usage in these 4000
files.
Post by Rob Hassing
Anyone done something like this before or does somebody have an idea on
how to do this?
The Routers2 frontend for MRTG/RRD has a feature to do similar to this,
ordering the component interface graphs on the Compact display or summary
page by current or average in or out. It also lets you suppress interfaces
with a zero usage, though it doesn't order by the total usage.
However, with 4000+ RRD files, there's a fair amount of processing required,
particularly if you want to order by the total usage (IE, sum[time
period](avg x interval)). To order by daily total, this is the same as
ordering by daily average (because daily total = daily average rate x 1day),
so make sure you have an RRA that pre-calculates this so that you can
retrieve the daily average with a single rrdfetch of a single value per RRD.
Then you'll just want to sort your resulting 4000+ element array to get the
top 10.
Steve
Steve Shipway
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Loading...