dataeval.scope.Representation¶
-
class dataeval.scope.Representation(ontology, *, expected=
None, config=None)¶ Evaluate a dataset’s coverage of an ontology and prioritize what to collect.
Resolves the dataset’s class labels against the ontology, compares the observed distribution to an expected one, and returns a
RepresentationOutputworklist of the leaf species to acquire or augment. The default expectation is a uniform distribution over leaf species; passexpectedto assert a minimum share (a fraction of the whole dataset) for specific classes you know to be rare, which both right-sizes their target and is validated as an assertion.- Parameters:¶
- ontology : Ontology¶
Ontology whose leaf species define the label space to cover.
- expected : Mapping[str, float] or None, default None¶
Class name to its minimum expected share of the dataset (a fraction in
[0, 1]). Named classes use this floor as their target in place of the uniform share, and are validated inRepresentationOutput.violations; unnamed classes keep the uniform target.Nonemeans a uniform expectation for every leaf.- config : Representation.Config or None, default None¶
Optional configuration object; parameters passed directly to
__init__override its values.
See also
dataeval.core.label_coverageThe observation-only coverage facts this builds on.
dataeval.core.label_reconciliationResolve labels against an ontology.
Notes
Targets are rounded to the nearest whole label. A class named in
expectedthat does not resolve to exactly one concept is ignored (resolve it upstream).Examples
>>> from dataeval import Ontology >>> from dataeval.scope import Representation >>> ontology = Ontology.from_hierarchy({"animal": {"mammal": ["cat", "dog"], "bird": ["owl"]}}) >>> result = Representation(ontology).evaluate(dataset) >>> result.columns ['concept', 'label', 'parent', 'action', 'count', 'target', 'deficit']Assert that a known-rare class need only make up 5% of the dataset:
>>> result = Representation(ontology, expected={"owl": 0.05}).evaluate(dataset) >>> result.violations.columns ['concept', 'label', 'floor', 'actual', 'shortfall']- evaluate(data)¶
Evaluate a dataset’s coverage of the ontology.
- Parameters:¶
- data : AnnotatedDataset or Metadata¶
The dataset (or its
Metadata) to evaluate. Class labels and theindex2labelmapping are read from it; raw label counts are derived vialabel_stats().
- Returns:¶
The collection worklist (
acquire/augmentrows) withleaf_coverage,total_deficit,violations, anddark_branches.- Return type:¶
Examples
>>> ontology = Ontology.from_hierarchy({ ... "vehicle": {"land": ["car", "bike"], "water": ["boat"], "air": ["plane"]} ... }) >>> evaluator = Representation(ontology) >>> result = evaluator.evaluate(dataset) >>> result.data() shape: (2, 7) ┌─────────┬───────┬────────┬─────────┬───────┬────────┬─────────┐ │ concept ┆ label ┆ parent ┆ action ┆ count ┆ target ┆ deficit │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═══════╪════════╪═════════╪═══════╪════════╪═════════╡ │ bike ┆ bike ┆ land ┆ acquire ┆ 0 ┆ 23 ┆ 23 │ │ boat ┆ boat ┆ water ┆ augment ┆ 22 ┆ 23 ┆ 1 │ └─────────┴───────┴────────┴─────────┴───────┴────────┴─────────┘ >>> result.total_deficit 24 >>> result.leaf_coverage 0.75
Classes¶
Configuration for the |