Statistics¶
Module: pycsou.util.stats
Statistic routines.
|
P-Square Algorithm. |
-
class
P2Algorithm
(pvalue: float)[source]¶ Bases:
object
P-Square Algorithm.
The P-Square Algorithm is an heuristic algorithm for dynamic calculation of empirical quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations. See [P2] for more details on the algorithm.
Examples
>>> rng = np.random.default_rng(0) >>> population_quantile = scipy.stats.norm.ppf(0.95) >>> def generate_sample(n): ... for i in range(n): ... yield rng.standard_normal() >>> p2 = P2Algorithm(pvalue=0.95) >>> samples=[] >>> for sample in generate_sample(1000): ... p2.add_sample(sample) ... samples.append(sample) >>> print(f'P2 Quantile: {p2.q}, Empirical Quantile: {np.quantile(samples, 0.95)}, Population Quantile: {population_quantile}.') P2 Quantile: [1.51436338], Empirical Quantile: 1.514048975492714, Population Quantile: 1.6448536269514722.
Notes
The estimated quantile is stored in the attribute
self.q
. Adding a new sample with the methodadd_sample
will trigger an update of the estimated empirical quantile. For multidimensional distributions, the quantiles of the marginal empirical distributions are estimated. The P-Square Algorithm has good accuracy: above 10,000 samples, the relative error between the estimated empirical estimates and the actual population quantiles is typically way below 1%.Warning
The P-Square Algorithm cannot be vectorised and involves a
for
loop of size equal to the dimension of the samples. For computational efficiency in high dimensional settings, thefor
loop is therefore jitted (just-in-time compiled) using Numba’s decorator@njit
.