Discussion:
[rrd-users] Can I stuff an RRD with data after-the-fact?
(too old to reply)
Alan McKay
2014-09-16 13:49:01 UTC
Permalink
Hi folks,

I know this may not seem to make sense to most of you but it does to me :-)

For various reasons I want to collect data in my own logfile formatted thus :

<EPOCHTIME>:data1:data2

And then when I want to see a graph, copy that over to another host and feed it
into an RRD file, then graph it from there. But of course in doing so I get the
error :

"illegal attempt to update using time" "when last update time is"

I did some googling and found a similar question in 2008 where Mr Oetiker said
this was not possible. Just wondering whether anything has changed since then.

Is there a way to do this? Basically do the following but tell it to
force timestamps

#!/bin/bash

while read line
do
rrdtool update foobar-1yr-5sec.rrd -t diskiops:diskutil $line
done < diskperf-1yr-5sec.log

thanks,
-Alan
--
"Don't eat anything you've ever seen advertised on TV"
- Michael Pollan, author of "In Defense of Food"
Simon Hobson
2014-09-16 15:00:07 UTC
Permalink
Post by Alan McKay
<EPOCHTIME>:data1:data2
And then when I want to see a graph, copy that over to another host and feed it
into an RRD file, then graph it from there.
That's OK.
Post by Alan McKay
But of course in doing so I get the
"illegal attempt to update using time" "when last update time is"
There's no "of course" about it.
Post by Alan McKay
Is there a way to do this? Basically do the following but tell it to
force timestamps
#!/bin/bash
while read line
do
rrdtool update foobar-1yr-5sec.rrd -t diskiops:diskutil $line
done < diskperf-1yr-5sec.log
That will work just fine *IF* none of the data in your file is older than data in the RRD file AND all the timestamps in the file are in increasing order. With each update, the RRD will update just the same as if you'd fed them in in real time as the data was collected.

Now, if you have (say) a big data file that you keep adding to, and you are trying to update an existing RRD file that's already had some of the data inserted then you'll get the error. In that case, you'd need to modify the script a bit. I see two ways of dealing with it :

1) Just throw away STDERR so you don't see the errors. RRD will barf on each update it's already seen, but then add the new stuff.

2) Have your script use rrdtool last to get the timestamp of (IIRC) the last complete bucket, and "discard" all the entries older than this - then insert the new values. You might still get one or two errors - IIRC rrdtool last gives the timestamp of the last complete bucket (aka Primary Data Point) which is likely to be earlier than the timestamp of the last value inserted.

Lastly, have you considered using rrdcached ? Collect the data on one machine, and do rrdtool updates from there specifying the cached address - the data is then transferred to the other machine and the RRD updated in real time*. It works really well for distributed data collection like this - I use it on many of my systems.

* Subject to flushing the cache.
Alan McKay
2014-09-16 18:17:06 UTC
Permalink
Post by Simon Hobson
That will work just fine *IF* none of the data in your file is older than data in the RRD file AND all the timestamps in the file are in increasing order. With each update, the RRD will update just the same as if you'd fed them in in real time as the data was collected.
Well I only wrote this because it did not seem to work at all. It was
a fresh RRD - newly created. And timestamps were increasing.
Post by Simon Hobson
2) Have your script use rrdtool last to get the timestamp of (IIRC) the last complete bucket, and "discard" all the entries older than this - then insert the new values. You might still get one or two errors - IIRC rrdtool last gives the timestamp of the last complete bucket (aka Primary Data Point) which is likely to be earlier than the timestamp of the last value inserted.
That's not a bad idea
Post by Simon Hobson
Lastly, have you considered using rrdcached ? Collect the data on one machine, and do rrdtool updates from there specifying the cached address - the data is then transferred to the other machine and the RRD updated in real time*. It works really well for distributed data collection like this - I use it on many of my systems.
I'll have a look at it, but in our environment any kind of extraneous
agent or daemon can only be introduced after considerable scrutiny.
Which is why I did not want to run rrdtool locally. THough I'm going
through the process to get it introduced.
It is technically not a daemon like that, so should be good.
Also, I can't really allow data to be pushed from a host to a collector.
I could only pull from the collector.

Thanks for your reply.
--
"Don't eat anything you've ever seen advertised on TV"
- Michael Pollan, author of "In Defense of Food"
Alan McKay
2014-09-16 20:09:30 UTC
Permalink
I think my basic problem is that when I create the RRD it must put in
a "last update time" of the time of creation.
Then all my datapoints are basically before that time.

ERROR: foobar-1yr-5sec.rrd: illegal attempt to update using time
1410898046 when last update time is 1410898050 (minimum one second
step)
Johan Elmerfjord
2014-09-16 20:19:31 UTC
Permalink
Hi Alan,

That sounds right to me.

You can specify the starttime of the rrd-file when you create it,
like:

rrdtool create $rrdFileName --start 1325372400 --step 300 ..and all the rest...

Hopefully that solves your problem.
RRD needs to fill in all the values from start until your first value - so creating a file long before your first value will probably be slow on the first update.

But that can easily be detected and adjusted I assume.

Good luck!

/Johan

________________________________________
From: rrd-users <rrd-users-bounces+jelmerfj=***@lists.oetiker.ch> on behalf of Alan McKay <***@gmail.com>
Sent: Tuesday, September 16, 2014 22:09
To: rrd-users
Subject: Re: [rrd-users] Can I stuff an RRD with data after-the-fact?

I think my basic problem is that when I create the RRD it must put in
a "last update time" of the time of creation.
Then all my datapoints are basically before that time.

ERROR: foobar-1yr-5sec.rrd: illegal attempt to update using time
1410898046 when last update time is 1410898050 (minimum one second
step)
Simon Hobson
2014-09-17 06:59:06 UTC
Permalink
Post by Alan McKay
Post by Simon Hobson
That will work just fine *IF* none of the data in your file is older than data in the RRD file AND all the timestamps in the file are in increasing order. With each update, the RRD will update just the same as if you'd fed them in in real time as the data was collected.
Well I only wrote this because it did not seem to work at all. It was
a fresh RRD - newly created. And timestamps were increasing.
As already mentioned, you can specify the start time for an RRD when it's created - the default is "now".
Post by Alan McKay
Post by Simon Hobson
Lastly, have you considered using rrdcached ? Collect the data on one machine, and do rrdtool updates from there specifying the cached address - the data is then transferred to the other machine and the RRD updated in real time*. It works really well for distributed data collection like this - I use it on many of my systems.
I'll have a look at it, but in our environment any kind of extraneous
agent or daemon can only be introduced after considerable scrutiny.
Which is why I did not want to run rrdtool locally. THough I'm going
through the process to get it introduced.
It is technically not a daemon like that, so should be good.
Also, I can't really allow data to be pushed from a host to a collector.
I could only pull from the collector.
Do you run Nagios (or anything similar) ? For some of my stats, I twigged that Nagios already collected some performance stats (but not all I wanted) - it just needs a bit of scripting to drop that data into a file and process it.
Simon Hobson
2014-09-17 09:32:34 UTC
Permalink
Post by Simon Hobson
Do you run Nagios (or anything similar) ? For some of my stats, I twigged that Nagios already collected some performance stats (but not all I wanted) - it just needs a bit of scripting to drop that data into a file and process it.
http://nagios.sourceforge.net/docs/3_0/perfdata.html
Alan McKay
2014-09-17 11:28:12 UTC
Permalink
Post by Johan Elmerfjord
You can specify the starttime of the rrd-file when you create it,
rrdtool create $rrdFileName --start 1325372400 --step 300 ..and all the rest...
Oh I missed that first time around - I'll double check that at work
today and report back.
--
"Don't eat anything you've ever seen advertised on TV"
- Michael Pollan, author of "In Defense of Food"
Alan McKay
2014-09-17 13:21:18 UTC
Permalink
Bingo - it was the create time.
When I create the RRD with a time of "one year ago" I can do the
import just fine with

while read line
do
rrdtool update foobar-1yr-5sec.rrd -t diskiops:diskutil $line
done < diskperf-1yr-5sec.log

Loading...