dataeval.core.LabelCoverageResult

class dataeval.core.LabelCoverageResult

Observed distribution of a dataset’s label mass over an Ontology.

Every per-concept mapping is keyed over all defined concepts, so an unlabeled concept appears with a zero/empty entry rather than being absent — that visibility of the unpopulated parts of the ontology is the whole point. All fields are observations; none assumes an expected distribution.

matched

Dataset class name to the single concept id it resolved to. Distinct names may resolve to the same id (synonyms); their counts are summed downstream.

Type:

Mapping[str, str]

unmatched

Class name to its count, for names that resolved to no concept — label mass the ontology does not cover (a missing concept, or a junk label).

Type:

Mapping[str, int]

ambiguous

Class name to the more-than-one candidate concept ids it resolved to. Their counts are not attributed to any concept; resolve them upstream (e.g. by passing concept ids) to fold them into the coverage tallies.

Type:

Mapping[str, Sequence[str]]

direct_count

Concept id to the label mass landing exactly on it (0 when unlabeled). Labels may land on internal concepts, not only leaves.

Type:

Mapping[str, int]

subtree_count

Concept id to the mass on it plus all its descendants (its subtree). On a DAG a multi-parent concept contributes to every ancestor’s subtree but is counted once per ancestor.

Type:

Mapping[str, int]

covered_leaves

Concept id to (covered, total) leaf species in its subtree, where a leaf is covered if it has any direct mass. The breadth-of-coverage signal at a glance: (0, n) is a wholly dark branch.

Type:

Mapping[str, tuple[int, int]]

covered_children

Concept id to (covered, total) direct children whose subtree holds any mass — sibling fill under each parent. Leaves report (0, 0).

Type:

Mapping[str, tuple[int, int]]

coverage_by_depth

Is-a depth to (covered, total) concepts at that depth, where covered means the concept’s subtree holds any mass. The depth profile of coverage.

Type:

Mapping[int, tuple[int, int]]

leaf_coverage

Fraction of the ontology’s leaf species with any direct mass — a single observed coverage scalar (no prior). 0.0 when the ontology has no leaves.

Type:

float

leaf_distribution

Leaf concept id to its share of total leaf-attributed mass (the entries sum to 1 over leaves, or are all 0.0 when no leaf is labeled). The empirical class distribution at the finest granularity, for an evaluator to compare against an expected one.

Type:

Mapping[str, float]