dataeval.core.ontology_validation¶
-
dataeval.core.ontology_validation(ontology, *, label_pattern=
None)¶ Validate an ontology artifact and report its structural and naming facts.
Inspects the ontology’s own graph and labels — independent of any dataset — and reports the connectivity, redundancy/contradiction, naming, and shape facts from which ontology quality can be judged. The result is verdict-free: it provides the ingredients (e.g. dangling ancestors, redundant edges, label collisions, per-concept depth and fan-out) for a downstream evaluator to turn into a pass/fail determination under its own policy and thresholds.
Construction-time invariants (no duplicate ids, acyclic is-a graph) are already guaranteed by
Ontology, so a built ontology cannot violate them; this function reports the legal-but-smelly structure they do not preclude.- Parameters:¶
- ontology : Ontology¶
The ontology artifact to validate.
- label_pattern : str or None, optional¶
A regular expression a concept label must fully match (via
re.fullmatch()) to be considered well-formed; labels that do not are reported innonconforming_labels. The naming convention is policy, so the check is opt-in: whenNone(default) it is skipped andnonconforming_labelsis empty. Pass e.g.r"[a-z0-9]+(_[a-z0-9]+)*"to lint forlowercase_snake_case.
- Returns:¶
Connectivity (
roots/leaves/isolated/external_ancestors), redundancy/contradiction (redundant_edges/ancestor_siblings/unary_parents), naming (label_collisions/nonconforming_labels), and the per-concept shape profile (depth/fan_out/parent_count).- Return type:¶
See also
dataeval.core.label_reconciliationValidate a dataset’s labels against an ontology (rather than the ontology artifact itself).
Notes
Every finding identifies concepts by id. The findings are facts, not failures: a non-empty
external_ancestorsis expected for a deliberately distributed subset, for instance, and is only a defect for an ontology meant to be complete.