dataeval.core.label_coverage

dataeval.core.label_coverage(label_counts, ontology)

Report how a dataset’s label mass is distributed over an ontology.

Resolves each dataset class name against the ontology (by preferred label, synonym, or exact id), attributes its count to the matched concept, and reports the resulting coverage of the ontology’s structure — direct and subtree mass per concept, leaf and sibling coverage, the depth profile, and the empirical leaf distribution. The result is observation-only: it describes what the dataset does cover, leaving any notion of an expected distribution, sufficiency threshold, or collection recommendation to a downstream evaluator.

Parameters:
label_counts : Mapping[str, int]

Dataset class name to its label count (e.g. label_stats(...) counts mapped through index2label). Counts are instance counts; for object detection that is detections-per-class, for classification images-per-class.

ontology : Ontology

Ontology whose concepts define the space coverage is measured against.

Returns:

Resolution facts (matched / unmatched / ambiguous), per-concept mass (direct_count / subtree_count), breadth and depth coverage (covered_leaves / covered_children / coverage_by_depth / leaf_coverage), and the empirical leaf_distribution.

Return type:

LabelCoverageResult

See also

dataeval.core.label_reconciliation

Resolve labels against an ontology and recover their hierarchy (the matching this builds on).

dataeval.core.ontology_validation

Report structural facts about the ontology artifact itself, independent of any dataset.

Notes

Ambiguous names (resolving to more than one concept) carry mass that cannot be attributed to a single concept, so they are reported but excluded from the coverage tallies. Names resolving to no concept are reported in unmatched with their counts, since that mass signals concepts the ontology is missing.

Examples

>>> from dataeval import Ontology
>>> ontology = Ontology.from_hierarchy({"animal": {"mammal": ["cat", "dog"], "bird": ["owl", "hawk"]}})
>>> counts = {"cat": 8, "dog": 2, "owl": 1}  # hawk never collected
>>> result = label_coverage(counts, ontology)
>>> result["leaf_coverage"]  # 3 of 4 leaf species have any examples
0.75
>>> result["covered_leaves"]["bird"]  # one of two bird species populated
(1, 2)
>>> result["coverage_by_depth"]  # (covered, total) per is-a depth
{0: (1, 1), 1: (2, 2), 2: (3, 4)}
>>> result["subtree_count"]["mammal"]  # cat + dog mass rolls up to mammal
10
>>> result["leaf_distribution"]["cat"]  # 8 of 11 leaf-attributed labels
0.7272727272727273

A class name absent from the ontology is reported as unmatched mass:

>>> label_coverage({"cat": 5, "unicorn": 3}, ontology)["unmatched"]
{'unicorn': 3}