From the answers by Steve and Donovan, it seams like there's something
fundamental I'm not understanding about RRDTools.
Following is the explanation from Wikipedia
RRDtool assumes time-variable data in intervals of a certain length. This
interval, usually named step, is specified upon creation
of an RRD file and cannot be changed afterwards. Because data may not
always be available at just the right time, RRDtool will
automatically interpolate any submitted data to fit its internal
time-steps.
The value for a specific step, that has been interpolated, is named a
primary data point (PDP). Multiple PDPs may be consolidated
according to a consolidation function (CF) to form a consolidated data
point (CDP). Typical consolidation functions are average,
minimum, maximum.
After the data have been consolidated, the resulting CDP is stored in a
round-robin archive (RRA). A round-robin archive stores a
fixed number of CDPs and specifies how many PDPs should be consolidated
into one CDP and which CF to use.
My understanding was that when making the consolidation, al the primary
data points were available to the Consolidation function. Therefore, it
could be possible to calculate the percentile.
However, from what you explain, it looks as RRDTools re-calculates the
aggretate point with each arriving primary point. Is that correct?
Regards
---------------------------
Pablo Chacin
CTO
SenseFields SL
Tlf (+34) 93 250 45 98
Gran Via 674, principal 1º
08010 Barcelona, Spain
http://www.sensefields.com
En compliment del que disposa la Llei Orgà nica de Protecció de Dades
15/1999 i el seu reglament, Sensefields, S.L. us informa que les vostres
dades personals seran tractades i incorporades als nostres sistemes
informà tics i documentals, dels quals és titular aquesta empresa. Si voleu
podeu exercir els drets d'accés, rectificació, cancel·lació i oposició
previstos a la llei, adreçant un escrit amb la fotocòpia del DNI a
Sensefields, S.L. Gran Via Corts Catalanes, 674 Principal 1ª - 08010
Barcelona (Barcelona) o bé per e.mail a ***@sensefields.com
Aquest missatge va dirigit, de manera exclusiva, al seu destinatari, i
conté informació confidencial i privilegiada. En cas de rebre aquest
missatge per error, prego que ens ho comuniquin de forma immediata
mitjançant resposta per correu electrònic, o a través del telÚfon 0034 93
250 45 98, i procedeixi a la seva eliminació.
En cumplimiento de lo dispuesto en la Ley Orgánica de Protección de Datos
15/1999 y su reglamento, Sensefields, S.L. le informa que sus datos
personales serán tratados e incorporados a nuestros sistemas informáticos y
documentales, de los que es titular esta empresa. Si desea puede ejercer
los derechos de acceso, rectificación, cancelación y oposición previstos
en la ley, dirigiendo un escrito con la fotocopia del DNI a Sensefields,
S.L. Gran Via Corts Catalanes, 674 Principal 1ª - 08010 Barcelona
(Barcelona) o bien por e.mail a ***@sensefields.com
Este mensaje va dirigido, de manera exclusiva, a su destinatario y contiene
información confidencial y privilegiada. En caso de recibir este mensaje
por error, ruego que nos lo comuniquen de forma inmediata mediant respuesta
por correo electrónico, o a través del teléfono 0034 93 250 45 98, y
proceda a su eliminación.
In compliance with The Law of Data Protection Act 15/1999 and its
regulations, Sensefields, S.L. informs you that your personal data will be
processed and stored in our computer systems and documentaries owned by
this company. If you can exercise your rights of access, rectification,
cancellation and opposition under the Act, by writing the photocopy of ID
to Sensefields, S.L. Gran Via Corts Catalanes 674 Pral 1ª 08010 -
(Barcelona) (Barcelona) or by email to ***@sensefields.com
This message was directed exclusively at the recipient and contains
privileged and confidential information. If you receive this message in
error, I beg to inform us immediately by reply email or by phone 0034 93
250 45 98, and proceed to their elimination.
Note that variance, and hence stddev, can be calculated incrementally (by
keeping a timeseries of the average rate squared; variance = (average
rate^2 - average^2), stddev=sqrt(variance)), and assuming a normal
distribution, 95th percentile = 2*stddev. The accuracy of this depends on
how closely your samples match a normal distribution and is not as
resilient to outliers as calculating a true 95th percentile from all the
samples, but it's a pretty good approximation. If you know your
distribution is closer to log-normal (which it often is for things like
latency), you can calculate a more accurate 95th percentile from the
average and variance like this;
mu = ln(avg) - ln(var/avg**2 + 1)/2
sigma = sqrt(ln(var/avg**2 + 1))
p95 = lognorminv(0.95, mu, sigma)
Unfortunately right now rrd doesn't support RRA's of type variance
(CF=VAR?) or mean value squared (CF=AVERAGE2?). However, if you were going
to request a feature, this is something that is definitely possible. A true
95th percentile RRA is definitely not. Another ugly approximation uses
bucketed distributions, but I wouldn't request that.
Note having an RRA of type CF=AVERAGE2 is useful for calculating the "root
mean square", something that is also useful for eg AC power calculations.
Also, stddev is actually the "root mean square" of the distance from the
mean.
Post by Steve ShipwayPercentiles cannot be calculated incrementally; you need the entire
dataset to deduce them whereas mean, max, min only require the last
calculation result and possibly the number of samples so far. Hence you
cannot have the percentile as a CF
However the RRDTool RPN functions include a percentile calculator, so you
can still deduce this on the fly as you graph using the available samples.
You would need to be careful to ensure that the data series over which you
are aggregating is of maximum granularity though if you want to ensure
maximum accuracy
Steve
*Steve Shipway*
University of Auckland ITS
*UNIX Systems Design Lead*
Ph: +64 9 373 7599 ext 86487
------------------------------
*From:* rrd-users [rrd-users-bounces+s.shipway=
*Sent:* Saturday, 24 October 2015 11:43 p.m.
*Subject:* [rrd-users] Percentile consolidation
Greetings
Been able to pre-calculate an store certain data percentiles, like media
an 95 percentile is a common requirement for any metrics database, as these
aggregation functions are much more stable and representative of data than
the average or maximun values.
I saw that the mean was recently included as an consolidation function in
rrdtool, but still there's no possible to calculate other arbitrary
percentiles. Interestingly, percentiles have been available when retrieving
data for graphs or reporting.
Is there any compiling reason not to include percentiles as consolidation
functions? Is there any plan to do so in the future?
Regards
---------------------------
Pablo Chacin
CTO
SenseFields SL
Tlf (+34) 93 250 45 98
Gran Via 674, principal 1º
08010 Barcelona, Spain
http://www.sensefields.com
This message was directed exclusively at the recipient and contains
privileged and confidential information. If you receive this message in
error, I beg to inform us immediately by reply email or by phone 0034 93
250 45 98, and proceed to their elimination.
_______________________________________________
rrd-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
--