dataeval.core.label_coverage¶
- dataeval.core.label_coverage(label_counts, ontology)¶
Report how a dataset’s label mass is distributed over an ontology.
Resolves each dataset class name against the ontology (by preferred label, synonym, or exact id), attributes its count to the matched concept, and reports the resulting coverage of the ontology’s structure — direct and subtree mass per concept, leaf and sibling coverage, the depth profile, and the empirical leaf distribution. The result is observation-only: it describes what the dataset does cover, leaving any notion of an expected distribution, sufficiency threshold, or collection recommendation to a downstream evaluator.
- Parameters:¶
- label_counts : Mapping[str, int]¶
Dataset class name to its label count (e.g.
label_stats(...)counts mapped throughindex2label). Counts are instance counts; for object detection that is detections-per-class, for classification images-per-class.- ontology : Ontology¶
Ontology whose concepts define the space coverage is measured against.
- Returns:¶
Resolution facts (
matched/unmatched/ambiguous), per-concept mass (direct_count/subtree_count), breadth and depth coverage (covered_leaves/covered_children/coverage_by_depth/leaf_coverage), and the empiricalleaf_distribution.- Return type:¶
See also
dataeval.core.label_reconciliationResolve labels against an ontology and recover their hierarchy (the matching this builds on).
dataeval.core.ontology_validationReport structural facts about the ontology artifact itself, independent of any dataset.
Notes
Ambiguous names (resolving to more than one concept) carry mass that cannot be attributed to a single concept, so they are reported but excluded from the coverage tallies. Names resolving to no concept are reported in
unmatchedwith their counts, since that mass signals concepts the ontology is missing.Examples
>>> from dataeval import Ontology >>> ontology = Ontology.from_hierarchy({"animal": {"mammal": ["cat", "dog"], "bird": ["owl", "hawk"]}}) >>> counts = {"cat": 8, "dog": 2, "owl": 1} # hawk never collected >>> result = label_coverage(counts, ontology) >>> result["leaf_coverage"] # 3 of 4 leaf species have any examples 0.75 >>> result["covered_leaves"]["bird"] # one of two bird species populated (1, 2) >>> result["coverage_by_depth"] # (covered, total) per is-a depth {0: (1, 1), 1: (2, 2), 2: (3, 4)} >>> result["subtree_count"]["mammal"] # cat + dog mass rolls up to mammal 10 >>> result["leaf_distribution"]["cat"] # 8 of 11 leaf-attributed labels 0.7272727272727273A class name absent from the ontology is reported as unmatched mass:
>>> label_coverage({"cat": 5, "unicorn": 3}, ontology)["unmatched"] {'unicorn': 3}