biweight_midvariance

astropy.stats.biweight_midvariance(data, c=9.0, M=None, axis=None, modify_sample_size=False)[source] [edit on github]

Compute the biweight midvariance.

The biweight midvariance is a robust statistic for determining the variance of a distribution. Its square root is a robust estimator of scale (i.e. standard deviation). It is given by:

\[\zeta_{bivar} = n \ \frac{\Sigma_{|u_i| < 1} \ (x_i - M)^2 (1 - u_i^2)^4} {(\Sigma_{|u_i| < 1} \ (1 - u_i^2) (1 - 5u_i^2))^2}\]

where \(x\) is the input data, \(M\) is the sample median (or the input location) and \(u_i\) is given by:

\[u_{i} = \frac{(x_i - M)}{c * MAD}\]

where \(c\) is the tuning constant and \(MAD\) is the median absolute deviation. The biweight midvariance tuning constant c is typically 9.0 (the default).

For the standard definition of biweight midvariance, \(n\) is the total number of points in the array (or along the input axis, if specified). That definition is used if modify_sample_size is False, which is the default.

However, if modify_sample_size = True, then \(n\) is the number of points for which \(|u_i| < 1\) (i.e. the total number of non-rejected values), i.e.

\[n = \Sigma_{|u_i| < 1} \ 1\]

which results in a value closer to the true variance for small sample sizes or for a large number of rejected values.

Parameters:

dat : array-like

Input array or object that can be converted to an array.

c : float, optional

Tuning constant for the biweight estimator (default = 9.0).

M : float or array-like, optional

The location estimate. If M is a scalar value, then its value will be used for the entire array (or along each axis, if specified). If M is an array, then its must be an array containing the location estimate along each axis of the input array. If None (default), then the median of the input array will be used (or along each axis, if specified).

axis : int, optional

The axis along which the biweight midvariances are computed. If None (default), then the biweight midvariance of the flattened input array will be computed.

modify_sample_size : bool, optional

If False (default), then the sample size used is the total number of elements in the array (or along the input axis, if specified), which follows the standard definition of biweight midvariance. If True, then the sample size is reduced to correct for any rejected values (i.e. the sample size used includes only the non-rejected values), which results in a value closer to the true variance for small sample sizes or for a large number of rejected values.

Returns:

biweight_midvariance : float or ndarray

The biweight midvariance of the input data. If axis is None then a scalar will be returned, otherwise a ndarray will be returned.

References

[R61]https://en.wikipedia.org/wiki/Robust_measures_of_scale#The_biweight_midvariance
[R62]Beers, Flynn, and Gebhardt (1990; AJ 100, 32) (http://adsabs.harvard.edu/abs/1990AJ….100…32B)

Examples

Generate random variates from a Gaussian distribution and return the biweight midvariance of the distribution:

>>> import numpy as np
>>> from astropy.stats import biweight_midvariance
>>> rand = np.random.RandomState(12345)
>>> bivar = biweight_midvariance(rand.randn(1000))
>>> print(bivar)    
0.97362869104