dataeval.metadata.most_deviated_factors¶
- dataeval.metadata.most_deviated_factors(metadata_1, metadata_2, ood)¶
Determines greatest deviation in metadata features per out of distribution sample in metadata_2.
- Parameters:¶
- metadata_1 : Metadata¶
A reference set of Metadata containing factor names and samples with discrete and/or continuous values per factor
- metadata_2 : Metadata¶
The set of Metadata that is tested against the reference metadata. This set must have the same number of features but does not require the same number of samples.
- ood : OODOutput¶
A class output by the DataEval’s OOD functions that contains which examples are OOD.
- Returns:¶
An array of the factor name and deviation of the highest metadata deviation for each OOD example in metadata_2.
- Return type:¶
list[tuple[str, float]]
Notes
Both
Metadatainputs must have discrete and continuous data in the shape (samples, factors) and have equivalent factor names and lengthsThe flag at index i in
OODOutput.is_oodmust correspond directly to sample i of metadata_2 being out-of-distribution from metadata_1
Examples
>>> from dataeval.detectors.ood import OODOutputAll samples are out-of-distribution
>>> is_ood = OODOutput(np.array([True, True, True]), np.array([]), np.array([])) >>> most_deviated_factors(metadata1, metadata2, is_ood) [('time', 2.0), ('time', 2.592), ('time', 3.51)]If there are no out-of-distribution samples, a list is returned
>>> is_ood = OODOutput(np.array([False, False, False]), np.array([]), np.array([])) >>> most_deviated_factors(metadata1, metadata2, is_ood) []