dataeval.metadata.find_most_deviated_factors¶

dataeval.metadata.find_most_deviated_factors(metadata_ref, metadata_tst, ood)¶

Determine greatest deviation in metadata features per out of distribution sample in test metadata.

Parameters:¶

metadata_ref : Metadata¶: A reference set of Metadata containing factor names and samples with discrete and/or continuous values per factor
metadata_tst : Metadata¶: The set of Metadata that is tested against the reference metadata. This set must have the same number of features but does not require the same number of samples.
ood : OODOutput¶: A class output by DataEval’s OOD functions that contains which examples are OOD.

Returns:¶

An output class containing the factor name and deviation of the highest metadata deviations for each OOD example in the test metadata.

Return type:¶

MostDeviatedFactorsOutput

Notes

Both Metadata inputs must have discrete and continuous data in the shape (samples, factors) and have equivalent factor names and lengths
The flag at index i in OODOutput.is_ood must correspond directly to sample i of metadata_tst being out-of-distribution from metadata_ref

Examples

>>> from dataeval.detectors.ood import OODOutput

All samples are out-of-distribution

>>> is_ood = OODOutput(np.array([True, True, True]), np.array([]), np.array([]))
>>> find_most_deviated_factors(metadata1, metadata2, is_ood)
MostDeviatedFactorsOutput([('time', 2.0), ('time', 2.592), ('time', 3.51)])

No samples are out-of-distribution

>>> is_ood = OODOutput(np.array([False, False, False]), np.array([]), np.array([]))
>>> find_most_deviated_factors(metadata1, metadata2, is_ood)
MostDeviatedFactorsOutput([])