dataeval.Ontology

class dataeval.Ontology(concepts)

An immutable, in-memory directed acyclic graph of OntologyConcept.

The graph is built from a collection of concepts linked by their parents (is-a edges). A concept may have more than one parent, so the graph is a DAG rather than a tree; cycles are rejected. Parent ids referencing concepts not present in the collection are kept as external references — they participate in ancestor/LCA queries but are not themselves concepts.

Once built, the graph is queryable for ancestors, descendants, siblings, lowest common ancestors, depth, and rooted subtrees, and resolves class names to concepts via find().

Parameters:
concepts : Iterable[OntologyConcept]

Concepts comprising the ontology. Ids must be unique.

Raises:

See also

Ontology.from_rdf

Build from in-memory RDF/OWL/JSON-LD content.

Ontology.from_hierarchy

Build from a plain nested dict / list (no rdflib).

ancestors(concept_id)

Return all ancestor ids of a concept, nearest-first (breadth-first).

Ancestors are the concept’s transitive superclasses (broader concepts). May include external reference ids. Raises KeyError if concept_id is not a defined concept.

children(concept_id)

Return the ids of the direct subclasses (children) of concept_id.

Children are the defined concepts that declare concept_id among their parents; order follows concept insertion order. Unlike descendants() this is the immediate, non-transitive layer. Raises KeyError if concept_id is not a defined concept.

concept(concept_id)

Return the concept for concept_id (raises KeyError if absent).

depth_of(concept_id)

Return the length of the longest is-a path from a root to concept_id.

A concept with no parents has depth 0; a concept whose only parent is an external reference has depth 1. Raises KeyError if concept_id is not a defined concept.

descendants(concept_id)

Return all descendant concept ids of concept_id, nearest-first.

Descendants are the concept’s transitive subclasses (narrower concepts). Raises KeyError if concept_id is not a defined concept.

find(name)

Resolve a human-readable name (or exact id) to matching concept ids.

Matching is case-insensitive over each concept’s preferred label and synonyms. An exact id match is also returned.

Parameters:
name : str

Class name or concept id to resolve.

Returns:

Matching concept ids. Empty if unmatched; length > 1 if ambiguous.

Return type:

tuple[str, …]

classmethod from_hierarchy(data)

Build an Ontology from a plain, hand-authored hierarchy.

A dependency-free constructor for the common case where you don’t have an RDF/OWL file. Labels double as concept ids (no IRIs, synonyms, or definitions). Accepts:

  • a flat list of labels: ["car", "dog"]

  • a one-level mapping: {"car": ["sedan", "SUV"], "dog": None}

  • an arbitrarily nested mapping: {"vehicle": {"car": {"sedan": None}}}

Mapping values may be None (leaf), a list of labels (children), or a nested mapping. A label appearing under more than one parent yields a DAG.

Parameters:
data : Mapping or Sequence

The hierarchy specification.

Return type:

Ontology

Raises:
classmethod from_rdf(source, *, format=None)

Build an Ontology from in-memory RDF content.

Parses already-in-memory serialized RDF (OWL/RDF-XML, Turtle, N-Triples, JSON-LD, …) via rdflib. This does not read files; callers should load file contents themselves and pass the text/bytes.

Parameters:
source : str or bytes

Serialized RDF content.

format : str or None, optional

rdflib format hint, e.g. "xml", "turtle", "json-ld", "nt". If None, rdflib attempts to guess.

Return type:

Ontology

Raises:

ImportError – If rdflib is not installed. Install via dataeval[ontology].

classmethod from_rdflib(graph)

Build an Ontology from an in-memory rdflib.Graph.

Concepts are collected from subjects typed owl:Class / rdfs:Class / skos:Concept and from any subject of rdfs:subClassOf / skos:broader. For each: label is skos:prefLabel (falling back to rdfs:label), synonyms are skos:altLabel (plus a differing rdfs:label), parents are the IRI objects of rdfs:subClassOf / skos:broader, and definition is skos:definition. Blank-node superclasses (e.g. owl:Restriction) are ignored.

Parameters:
graph : rdflib.Graph

Parsed RDF graph.

Return type:

Ontology

is_a(a, b)

Return whether concept a is a (transitive) subclass of b.

Equivalently, whether b is an ancestor (superclass) of a. Raises KeyError if a is not a defined concept; b may be any id, including an external reference.

lowest_common_ancestor(a, b)

Return a single lowest common ancestor of a and b, or None.

A deterministic projection of lowest_common_ancestors(): on a tree the LCA is unique; on a DAG with several incomparable lowest common ancestors this returns the deepest (the id with the most ancestors), ties broken by id. Use lowest_common_ancestors() to get the full set. Returns None when the two share no ancestor; may return an external reference id.

Raises KeyError if a or b is not a defined concept.

lowest_common_ancestors(a, b)

Return all lowest common ancestors of a and b, id-sorted.

A common ancestor is an id in both concepts’ ancestor sets; a concept counts as an ancestor of itself, so the LCA of a concept and its descendant is the concept itself. A common ancestor is lowest when none of its own descendants is also a common ancestor. On a tree this is always a single id, but on a DAG two concepts may meet at several mutually incomparable points, so the result may hold more than one. May include an external reference id (the meeting point can lie outside the defined concepts). Returns an empty tuple when the two share no ancestor.

Raises KeyError if a or b is not a defined concept.

siblings(concept_id)

Return defined concepts sharing at least one parent with concept_id.

Excludes the concept itself. Siblings under an external (undefined) parent are included, so this works on subset ontologies. Raises KeyError if concept_id is not a defined concept.

subtree(concept_id)

Return a new Ontology rooted at concept_id.

Contains the concept and all its descendants; parent links pointing outside the subtree are pruned so concept_id becomes a root. Raises KeyError if concept_id is not a defined concept.

subtree_ids(concept_id)

Return concept_id together with all its descendant ids (its subtree).

A lightweight id-set form of subtree(), for membership and disjointedness tests that do not need a full sub-ontology. Raises KeyError if concept_id is not a defined concept.

property external_ids : tuple[str, Ellipsis]

Ids referenced as parents but not present as defined concepts.

These are external references: the ontology references them (e.g. it was distributed as a subset) but does not define them, so they have no label, definition, or further ancestors. Their presence means the is-a hierarchy is truncated at those points.

property ids : tuple[str, Ellipsis]

Ids of all defined concepts.

property label_collisions : dict[str, tuple[str, Ellipsis]]

Case-folded names that resolve to more than one concept.

Each entry maps a normalized name (a preferred label or synonym shared across concepts) to the distinct concept ids find() would return for it — the artifact-side source of reconciliation ambiguity. Empty when every name resolves uniquely. Unlike find(), exact-id matches are not considered, since an id is unique by construction.

property leaves : tuple[str, Ellipsis]

Ids of defined concepts that have no children (most specific concepts).

property roots : tuple[str, Ellipsis]

Ids of defined concepts that declare no parents.