
Module: pycsou.util.stats

Statistic routines.


P-Square Algorithm.

class P2Algorithm(pvalue: float)[source]

Bases: object

P-Square Algorithm.

The P-Square Algorithm is an heuristic algorithm for dynamic calculation of empirical quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations. See [P2] for more details on the algorithm.


>>> rng = np.random.default_rng(0)
>>> population_quantile = scipy.stats.norm.ppf(0.95)
>>> def generate_sample(n):
...     for i in range(n):
...         yield rng.standard_normal()
>>> p2 = P2Algorithm(pvalue=0.95)
>>> samples=[]
>>> for sample in generate_sample(1000):
...     p2.add_sample(sample)
...     samples.append(sample)
>>> print(f'P2 Quantile: {p2.q}, Empirical Quantile: {np.quantile(samples, 0.95)}, Population Quantile: {population_quantile}.')
P2 Quantile: [1.51436338], Empirical Quantile: 1.514048975492714, Population Quantile: 1.6448536269514722.


The estimated quantile is stored in the attribute self.q. Adding a new sample with the method add_sample will trigger an update of the estimated empirical quantile. For multidimensional distributions, the quantiles of the marginal empirical distributions are estimated. The P-Square Algorithm has good accuracy: above 10,000 samples, the relative error between the estimated empirical estimates and the actual population quantiles is typically way below 1%.


The P-Square Algorithm cannot be vectorised and involves a for loop of size equal to the dimension of the samples. For computational efficiency in high dimensional settings, the for loop is therefore jitted (just-in-time compiled) using Numba’s decorator @njit.

add_sample(sample: Union[float, numpy.ndarray])[source]

Update the estimate of the empirical quantile based on the new sample.


sample (np.ndarray) – New empirical sample.