dataeval.core.factor_deviation¶

dataeval.core.factor_deviation(reference_factors, test_factors, indices)¶

Determine greatest deviation in metadata features per sample.

Parameters:¶

reference_factors : dict[str, NDArray]¶: A dictionary mapping factor names to arrays of reference values. - Keys: factor names (str) - Values: 1D arrays of shape (n_reference,) containing reference data All arrays must have the same length.
test_factors : dict[str, NDArray]¶: A dictionary mapping factor names to arrays of test values. - Keys: factor names (str) - must match keys in reference_factors - Values: 1D arrays of shape (n_test,) containing test data to be evaluated All arrays must have the same length.
indices : SequenceLike[int]¶: Array of test sample indices. Indices must not exceed the number of test samples.

Returns:¶

A sequence of maps, one per specified test sample index (in the order provided), where each dictionary maps all factor names to their deviation values for that sample. Within each dictionary, factors are sorted by deviation value (descending order). Returns empty list if no indices are provided.

Return type:¶

Sequence[Mapping[str, float]]

Notes

At least 3 reference samples are needed for meaningful deviation calculation
Deviations are calculated as scaled distance from reference median
Each dictionary contains all factors for a single test sample
The order of dictionaries in the result matches the order of indices in the input

Examples

>>> reference_factors = {
...     "time": np.array([1.0, 2.0, 3.0]),
...     "altitude": np.array([100, 110, 105]),
... }
>>> test_factors = {
...     "time": np.array([5.0, 12.0, 4.0]),
...     "altitude": np.array([108, 112, 500]),
... }
>>> indices = [1, 2]  # Second and third test sample
>>> factor_deviation(reference_factors, test_factors, indices)
[{'time': 10.0, 'altitude': 1.4}, {'altitude': 79.0, 'time': 2.0}]