spike_detection_vickers97

Functions

linear_interpolate_spikes

Replace detected spikes with linear interpolation.

spike_detection_vickers97

Detect and remove spikes in high-frequency eddy covariance data.

spike_detection_vickers97.linear_interpolate_spikes(data: ndarray, is_spike: ndarray, error_value: float) ndarray[source]

Replace detected spikes with linear interpolation.

This function replaces spike values in a time series with linearly interpolated values using valid neighboring points. It handles both isolated spikes and consecutive sequences of spikes, as well as error values in the data.

Parameters:
  • data (numpy.ndarray) – Input data array containing the original time series with spikes

  • is_spike (numpy.ndarray) – Boolean array of same length as data, True where spikes were detected

  • error_value (float) – Special value indicating invalid or missing data points These points are skipped when finding valid neighbors for interpolation

Returns:

Copy of input data with spikes replaced by linear interpolation

Return type:

numpy.ndarray

Notes

The interpolation strategy is: 1. For each sequence of spikes, find valid (non-spike, non-error) points

before and after the sequence

  1. If both points exist: perform linear interpolation

  2. If only one exists: use that value (nearest neighbor)

  3. If neither exists: spikes remain unchanged

This implementation follows the EddyPro software’s approach but is vectorized for better performance in Python.

See also

spike_detection_vickers97

Main spike detection algorithm

Examples

>>> import numpy as np
>>> # Create sample data with spikes
>>> data = np.array([1.0, 10.0, 1.1, np.nan, 1.2])
>>> spikes = np.array([False, True, False, False, False])
>>> cleaned = linear_interpolate_spikes(data, spikes, np.nan)
>>> print(cleaned)  # [1.0, 1.05, 1.1, nan, 1.2]
spike_detection_vickers97.spike_detection_vickers97(data: ndarray, spike_mode: int = 1, max_pass: int = 10, avrg_len: int = 30, ac_freq: int = 10, spike_limit: float = 3.5, max_consec_spikes: int = 3, ctrplot: bool = False) Tuple[ndarray, ndarray][source]

Detect and remove spikes in high-frequency eddy covariance data.

This function implements the Vickers and Mahrt (1997) despiking algorithm for eddy covariance data. It uses an iterative moving window approach to identify outliers based on local statistics. The algorithm can either flag spikes or both flag and remove them through linear interpolation.

Parameters:
  • data (numpy.ndarray) – 1D input data array containing high-frequency measurements (e.g., wind components, scalar concentrations)

  • spike_mode ({1, 2}, optional) – Operation mode: - 1: Only detect spikes - 2: Detect and remove spikes via linear interpolation Default is 1

  • max_pass (int, optional) – Maximum number of iterations for spike detection. Each pass may use progressively larger thresholds. Default is 10

  • avrg_len (int, optional) – Averaging period length in minutes. Used to determine the window size for local statistics. Default is 30

  • ac_freq (int, optional) – Data acquisition frequency in Hz. Used to calculate the number of samples in each window. Default is 10

  • spike_limit (float, optional) – Initial threshold for spike detection in standard deviations. Points exceeding mean ± (spike_limit × std) are flagged. Default is 3.5

  • max_consec_spikes (int, optional) – Maximum number of consecutive points that can be flagged as spikes. Longer sequences are not considered spikes. Default is 3

  • ctrplot (bool, optional) – If True, generates diagnostic plots showing: - Original data with detected spikes - Cleaned data with interpolated values Default is False

Returns:

  • data_out (numpy.ndarray) – If spike_mode=1: Copy of input with spikes still present If spike_mode=2: Data with spikes replaced by linear interpolation

  • is_spike (numpy.ndarray) – Boolean array same length as input, True where spikes were detected

Notes

The algorithm follows these steps: 1. Divides data into overlapping windows 2. Calculates local mean and standard deviation 3. Flags points exceeding threshold as potential spikes 4. Checks for consecutive outliers 5. Optionally interpolates across spike locations 6. Repeats with adjusted threshold if spikes found

The window advancement step is currently set to 100 samples, which differs from both the original VM97 paper (1 sample) and the EddyPro manual recommendation (half window size).

See also

linear_interpolate_spikes

Function used to replace detected spikes

References

Examples

>>> # Generate sample data with artificial spikes
>>> import numpy as np
>>> data = np.random.normal(0, 1, 18000)  # 30 min at 10 Hz
>>> data[1000:1002] = 10  # Add artificial spikes
>>> cleaned, spikes = spike_detection_vickers97(
...     data, spike_mode=2, ctrplot=True
... )
>>> print(f'Found {np.sum(spikes)} spikes')

Author

Written by Bernard Heinesch University of Liege, Gembloux Agro-Bio Tech