dataeval.metadata.find_most_deviated_factors¶
- dataeval.metadata.find_most_deviated_factors(metadata_ref, metadata_tst, ood)¶
Determine greatest deviation in metadata features per out of distribution sample in test metadata.
- Parameters:¶
- metadata_ref : Metadata¶
A reference set of Metadata containing factor names and samples with discrete and/or continuous values per factor
- metadata_tst : Metadata¶
The set of Metadata that is tested against the reference metadata. This set must have the same number of features but does not require the same number of samples.
- ood : OODOutput¶
A class output by DataEval’s OOD functions that contains which examples are OOD.
- Returns:¶
An output class containing the factor name and deviation of the highest metadata deviations for each OOD example in the test metadata.
- Return type:¶
Notes
Both
Metadatainputs must have discrete and continuous data in the shape (samples, factors) and have equivalent factor names and lengthsThe flag at index i in
OODOutput.is_oodmust correspond directly to sample i of metadata_tst being out-of-distribution from metadata_ref
Examples
>>> from dataeval.detectors.ood import OODOutputAll samples are out-of-distribution
>>> is_ood = OODOutput(np.array([True, True, True]), np.array([]), np.array([])) >>> find_most_deviated_factors(metadata1, metadata2, is_ood) MostDeviatedFactorsOutput([('time', 2.0), ('time', 2.592), ('time', 3.51)])No samples are out-of-distribution
>>> is_ood = OODOutput(np.array([False, False, False]), np.array([]), np.array([])) >>> find_most_deviated_factors(metadata1, metadata2, is_ood) MostDeviatedFactorsOutput([])