dataeval.metadata.most_deviated_factors¶

dataeval.metadata.most_deviated_factors(metadata_1, metadata_2, ood)¶

Determines greatest deviation in metadata features per out of distribution sample in metadata_2.

Parameters:¶

metadata_1 : Metadata¶: A reference set of Metadata containing factor names and samples with discrete and/or continuous values per factor
metadata_2 : Metadata¶: The set of Metadata that is tested against the reference metadata. This set must have the same number of features but does not require the same number of samples.
ood : OODOutput¶: A class output by the DataEval’s OOD functions that contains which examples are OOD.

Returns:¶

An array of the factor name and deviation of the highest metadata deviation for each OOD example in metadata_2.

Return type:¶

list[tuple[str, float]]

Notes

Both Metadata inputs must have discrete and continuous data in the shape (samples, factors) and have equivalent factor names and lengths
The flag at index i in OODOutput.is_ood must correspond directly to sample i of metadata_2 being out-of-distribution from metadata_1

Examples

>>> from dataeval.detectors.ood import OODOutput

All samples are out-of-distribution

>>> is_ood = OODOutput(np.array([True, True, True]), np.array([]), np.array([]))
>>> most_deviated_factors(metadata1, metadata2, is_ood)
[('time', 2.0), ('time', 2.592), ('time', 3.51)]

If there are no out-of-distribution samples, a list is returned

>>> is_ood = OODOutput(np.array([False, False, False]), np.array([]), np.array([]))
>>> most_deviated_factors(metadata1, metadata2, is_ood)
[]