dataeval.core.ontology_validation

dataeval.core.ontology_validation(ontology, *, label_pattern=None)

Validate an ontology artifact and report its structural and naming facts.

Inspects the ontology’s own graph and labels — independent of any dataset — and reports the connectivity, redundancy/contradiction, naming, and shape facts from which ontology quality can be judged. The result is verdict-free: it provides the ingredients (e.g. dangling ancestors, redundant edges, label collisions, per-concept depth and fan-out) for a downstream evaluator to turn into a pass/fail determination under its own policy and thresholds.

Construction-time invariants (no duplicate ids, acyclic is-a graph) are already guaranteed by Ontology, so a built ontology cannot violate them; this function reports the legal-but-smelly structure they do not preclude.

Parameters:
ontology : Ontology

The ontology artifact to validate.

label_pattern : str or None, optional

A regular expression a concept label must fully match (via re.fullmatch()) to be considered well-formed; labels that do not are reported in nonconforming_labels. The naming convention is policy, so the check is opt-in: when None (default) it is skipped and nonconforming_labels is empty. Pass e.g. r"[a-z0-9]+(_[a-z0-9]+)*" to lint for lowercase_snake_case.

Returns:

Connectivity (roots/leaves/isolated/external_ancestors), redundancy/contradiction (redundant_edges/ancestor_siblings/ unary_parents), naming (label_collisions/nonconforming_labels), and the per-concept shape profile (depth/fan_out/parent_count).

Return type:

OntologyValidationResult

See also

dataeval.core.label_reconciliation

Validate a dataset’s labels against an ontology (rather than the ontology artifact itself).

Notes

Every finding identifies concepts by id. The findings are facts, not failures: a non-empty external_ancestors is expected for a deliberately distributed subset, for instance, and is only a defect for an ontology meant to be complete.