Skip to content
Snippets Groups Projects
Unverified Commit f3212b1e authored by Miss Islington (bot)'s avatar Miss Islington (bot) Committed by GitHub
Browse files

GH-77265: Document NaN handling in statistics functions that sort or count (GH-94676) (#94726)

parent e5c8ad3e
Branches
Tags
No related merge requests found
......@@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
you may be able to use :func:`map` to ensure a consistent result, for
example: ``map(float, input_data)``.
Some datasets use ``NaN`` (not a number) values to represent missing data.
Since NaNs have unusual comparison semantics, they cause surprising or
undefined behaviors in the statistics functions that sort data or that count
occurrences. The functions affected are ``median()``, ``median_low()``,
``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and
``quantiles()``. The ``NaN`` values should be stripped before calling these
functions::
>>> from statistics import median
>>> from math import isnan
>>> from itertools import filterfalse
>>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
>>> sorted(data) # This has surprising behavior
[20.7, nan, 14.4, 18.3, 19.2, nan]
>>> median(data) # This result is unexpected
16.35
>>> sum(map(isnan, data)) # Number of missing values
2
>>> clean = list(filterfalse(isnan, data)) # Strip NaN values
>>> clean
[20.7, 19.2, 18.3, 14.4]
>>> sorted(clean) # Sorting now works as expected
[14.4, 18.3, 19.2, 20.7]
>>> median(clean) # This result is now well defined
18.75
Averages and measures of central location
-----------------------------------------
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment