Discussion:
[rrd-users] How to calculate desired value?
(too old to reply)
spock
2014-08-22 17:20:53 UTC
Permalink
Hello,

I am currently working on a swimming pool control.
I store values every 15 min (precise timestamp; 16:15:00; 16:30:00; etc.)
and for every 15 min I store the "heat_flag", which is either "0" (heating
off) or "1" (heating on).

# rrdtool create temp_pool.rrd --step 900 --start "20140101 00:00" \
# DS:pump_flag:GAUGE:1200:0:1 \
# DS:heat_flag:GAUGE:1200:0:1 \
# DS:device:GAUGE:1200:0:90 \
# RRA:AVERAGE:0.5:1:103680 \
# RRA:MIN:0.5:96:3650 \
# RRA:MAX:0.5:96:3650 \
# RRA:AVERAGE:0.5:96:3650

I would like to calculate the total heating time per day.
For that, I would like to find the number of "1" values per day.
In the end, multiply by 15min and I know the total heating time.

I tried:
CDEF:heattime2=heat_flag,900,*,60,/ \
VDEF:totalheattime=heattime2,AVERAGE'
GPRINT:totalheattime:"Total\: %2.2lf h \n"'
but I get wrong results. I probably have to do the calculation completly
different.

Can anyone help me out here?

I should get perfectly aligned values as a result: 0.25 h; 0.50h; etc.
How can I achieve this?

Thanks a million,
spo



--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/How-to-calculate-desired-value-tp7582375.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
Simon Hobson
2014-08-23 14:06:46 UTC
Permalink
Post by spock
CDEF:heattime2=heat_flag,900,*,60,/ \
VDEF:totalheattime=heattime2,AVERAGE'
GPRINT:totalheattime:"Total\: %2.2lf h \n"'
but I get wrong results. I probably have to do the calculation completly
different.
Can anyone help me out here?
I should get perfectly aligned values as a result: 0.25 h; 0.50h; etc.
How can I achieve this?
Give us a clue to go on, what values do you get ?
Is it that you expected (say) 15 but got 14.99, or you got something completely different ?
Your approach is right, I suspect rounding errors come into it.
spock
2014-08-23 19:57:03 UTC
Permalink
Hi Simon, thanks for helping me out.
I will include a more detailed example here:
<Loading Image...>

The graph shows 24h; database has step 900
Lets take the grey area:
we have "1" values from 09:30 to 19:45 = 10.25h.

relevant graph parameters:

DEF:heat_flag=temp_pool.rrd:heat_flag:AVERAGE \
DEF:pump_flag=temp_pool.rrd:pump_flag:AVERAGE \
CDEF:pumptime=pump_flag,1,EQ,INF,UNKN,IF \
CDEF:heattime=heat_flag,1,EQ,INF,UNKN,IF \
CDEF:heattime2=heat_flag,900,*,60,/ \
CDEF:pumptime2=pump_flag,900,*,60,/ \
VDEF:totalpumptime=pumptime2,AVERAGE \
VDEF:totalheattime=heattime2,AVERAGE'
AREA:pumptime#10101010 \
AREA:heattime#FF000015 \
LINE2:pump_flag#000000 \
GPRINT:totalpumptime:"Filterzeit\: %2.2lf h " \
GPRINT:totalheattime:"Heizzeit\: %2.2lf h \n"'

As you can see, the calculation for totalpumptime is 6.49h, although it
should be 10.25h.
The LINE2: pump_flag shows clearly, that the data is there, all "1" values -
rest of the day "0" values.
Thanks again,
spock



--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/How-to-calculate-desired-value-tp7582375p7582382.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
Alex van den Bogaerdt
2014-08-24 01:17:04 UTC
Permalink
----- Original Message -----
From: "spock" <***@sappers.de>
To: <rrd-***@lists.oetiker.ch>
Sent: Saturday, August 23, 2014 9:57 PM
Subject: Re: [rrd-users] How to calculate desired value?
Post by spock
Hi Simon, thanks for helping me out.
<http://rrd-mailinglists.937164.n2.nabble.com/file/n7582382/tempday1.png>
The graph shows 24h; database has step 900
we have "1" values from 09:30 to 19:45 = 10.25h.
I think it is 09:15, and 10.5 hours.

Relevant parameters include start and end time, number of pixels.
You should have "--end {some timestamp equal to midnight} --start
end-24h --width {any number being a whole multiple of 96}"
If not, something has to give and question #1 (see below) is answered.

You have a rate of 1 from 09:15 to 19:45. 10.5 hours
You have a rate of 0 from 00:00 to 09:15 and from 19:45 to 24:00. 9.25
hours and 4.25 hours, total 13.5 hours.
You compute the average rate, which is 10.5/24=0.4375.
You multiply that by 900, divide by 60, and the answer should be 6.5625 so
the two questions are:
1: why do you get a different outcome
2: why do you think you should multiply by 15 ( *900/60 is the same as *15)
to get hours?

The answer to question #1 could be as Simon suggested: rounding errors. I
expect the amount of time could be not exactly 24 hours.
The solution to question #2 is to think it over again. Assuming your start
and end times are exactly midnight, you are averaging over a 24 hour period,
you have an average "pump on" ratio which you should multiply by 24 to get
the amount of hours. The answer you should get would be 10.5 (or 10.25 if I
see things wrong) except for that (relatively small) error from question #1.

Further testing:

Fake some data, have the (fake) pump on during exactly 1 hour 23:00 the day
before until midnight, and no other times. Then print the average pump time.
It should be zero. In a different testing round, do the same from midnight
at the end of that day until 01:00. Again the average pump time should be
zero. If it is not in either test, then you know there is a problem with
the start time or the end time.
In a third test, have the pump on from 12:00 to 15:00. The average rate
should be exactly 0.125

Just print the average of pump_flag to debug, no CDEFs involved.
Look at the output of rrdtool dump and verify that it matches what you
expect.

Oh and by the way: i got confused because you started talking about
heat_flag and then switched to pump_flag. I know how it is when debugging,
it is easy to confuse yourself in similar ways. Whenever a value is not what
you expect, take a step back and look at the problem again.



HTH
Alex
spock
2014-08-25 19:21:28 UTC
Permalink
Hi Alex,

ok, I did some more tests with artificial data. See script.
testdata.sh
<http://rrd-mailinglists.937164.n2.nabble.com/file/n7582385/testdata.sh>

Sorry for the confusion between heat_flag and pump_flag, both values
follow the same concept (value 0 = off; value 1 = on; no other values
allowed)
and I want to calculate the total hours.

I am not sure, if I should be more confused than before.

First of all, thanks for the correct formula. If I use the
CDEF:heattime2=heat_flag,24,*
VDEF:totalheattime=heattime2,AVERAGE
I get the desired value - most of the time.

Then I played around with some scenarios:
My timeframe is:
-s 1408917600 -e 1409004000
Mon Aug 25 00:00:00 CEST 2014
Tue Aug 26 00:00:00 CEST 2014

I found out, that the very first value (1408917600)
will NOT be taken into account for the calculation.
This is probably by design.

The very last value is taken into account (1409004000),
this is also by design.

My mistake was, that I did not always use a complete 24h timeframe for the
graph creation.
The other mistake was, that the calculation does not work, if you have
missing values within
the timeframe.

For that reason, I substituted missing values with "0" using:
CDEF:heattimeclean=heat_flag,UN,0,heat_flag,IF \
and then used the cleaned time series for the calculation.
CDEF:heattime2=heattimeclean,24,* \
VDEF:totalheattimeclean=heattime2,AVERAGE \

Question (1):

Unfortunately this introduces (rounding?) errors.
In my artificial testdata I have 17 x "1" values for heat_flag; which
corresponds to 4.25h of heattime.
The totalheattime is calculated as "4.25" - perfect!
But the totalheattimeclean is calculated as 4.21 - is this a rounding error?
I do not understand this, because the heat_flag,UN,0,heat_flag,IF should not
touch the original values, only unknowns.

How do I avoid the rounding?

Question (2)
My other problem is:
On my original "production" database, I have always the above described
"rounding" error - but for both values!
If I re-create the very same day in my artificial database, I get the
rounding errors only for the cleaned values.

We have 8.25h of heating time
"Production" database:
For this day (Aug. 21) with 8.25h heattime I get:
totalheattimeclean = totalheattime = 8.16h

"Artificial" database:
For this day (Aug. 21) with 8.25h heattime I get:
totalheattimeclean = 8.16; totalheattime = 8.25h


I thought, well, I am comparing apples with oranges.
I verified:
- parameters are the identical (copy & paste any and all parameter into
testdata script)
- used same environment (e.g. export LANG=de_DE.UTF8)
- used the same data

I took the data from my "production" rrd database and wrote it with enclosed
script into a new rrd database.
I then did a fetch with
rrdtool fetch /testtemp/temp_pool_debug.rrd -s 1408572000 -e 1408658400
AVERAGE
on both databases and compared the resulting file - no difference except for
the timestamp
1408659300, which is outside of the timeframe. Or is this the key?

The resulting graphs from production and artificial database look completly
identical - except for the calculated heattime.
Do you have an explanation for that?

-->

Regards,
Spock




--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/How-to-calculate-desired-value-tp7582375p7582385.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
Alex van den Bogaerdt
2014-08-25 22:41:49 UTC
Permalink
Post by spock
I am not sure, if I should be more confused than before.
First of all, thanks for the correct formula. If I use the
CDEF:heattime2=heat_flag,24,*
VDEF:totalheattime=heattime2,AVERAGE
I get the desired value - most of the time.
So focus on those other times and see what's different there.
Post by spock
-s 1408917600 -e 1409004000
Mon Aug 25 00:00:00 CEST 2014
Tue Aug 26 00:00:00 CEST 2014
I found out, that the very first value (1408917600)
will NOT be taken into account for the calculation.
This is probably by design.
It is. The intervals have a timestamp which denotes the end of that
interval.
Consider one interval of 15 minutes. It starts at 00:00 and it ends at
00:15.
If RRDtool would also include the interval stamped "00:00", you would get
two intervals, for a total of 30 minutes, from 23:45 to 00:15.
Post by spock
The very last value is taken into account (1409004000),
this is also by design.
Yup. I asked because sometimes bugs fly into the memory and then there are
off-by-one errors.
Post by spock
My mistake was, that I did not always use a complete 24h timeframe for the
graph creation.
The other mistake was, that the calculation does not work, if you have
missing values within
the timeframe.
And, if for some unknown reason, you get values which are not exactly 0 or
1, you will also see problems.
Post by spock
CDEF:heattimeclean=heat_flag,UN,0,heat_flag,IF \
and then used the cleaned time series for the calculation.
CDEF:heattime2=heattimeclean,24,* \
VDEF:totalheattimeclean=heattime2,AVERAGE \
Unfortunately this introduces (rounding?) errors.
In my artificial testdata I have 17 x "1" values for heat_flag; which
corresponds to 4.25h of heattime.
The totalheattime is calculated as "4.25" - perfect!
But the totalheattimeclean is calculated as 4.21 - is this a rounding
error?
Right now I have no clue what causes this. Your script is a bit big, and my
time is a little limited. Sorry.
Post by spock
I do not understand this, because the heat_flag,UN,0,heat_flag,IF should
not
touch the original values, only unknowns.
I agree.
Post by spock
How do I avoid the rounding?
Make sure you are working with exactly 96 intervals: 24 hours times 15
minutes per hour. Don't assume everything is working as designed, test and
verify. The problem may be in your script (i don't think so at a first
glance) or in RRDtool (it does sometimes happen, try another version), or
there is still an error in your logic which you and I don't spotted yet.
Post by spock
Question (2)
On my original "production" database, I have always the above described
"rounding" error - but for both values!
If I re-create the very same day in my artificial database, I get the
rounding errors only for the cleaned values.
Which sounds very unlikely. My guess is that "very same" is not true.
Post by spock
We have 8.25h of heating time
totalheattimeclean = totalheattime = 8.16h
33 periods "1", 63 others are 0.
Suppose you are looking at 97, not 96, intervals:
33/97 * 24 = 8,164948453608247422680412371134 which "%2.2lf" prints as 8.16

A bug may have creeped in somewhere.
Post by spock
totalheattimeclean = 8.16; totalheattime = 8.25h
I thought, well, I am comparing apples with oranges.
- parameters are the identical (copy & paste any and all parameter into
testdata script)
- used same environment (e.g. export LANG=de_DE.UTF8)
- used the same data
I took the data from my "production" rrd database and wrote it with
enclosed
script into a new rrd database.
I then did a fetch with
rrdtool fetch /testtemp/temp_pool_debug.rrd -s 1408572000 -e 1408658400
AVERAGE
on both databases and compared the resulting file - no difference except
for
the timestamp
1408659300, which is outside of the timeframe. Or is this the key?
You ask for "--end 1408658400" yet RRDtool gives you the interval beyond
that.

If memory serves me right, this has come up a couple of years ago, for some
unknown reason this bug was decided not to be fixed, and the workaround was
to "--end 1408658399".

HTH
Alex

Loading...