dataeval.core.label_alignment¶
-
dataeval.core.label_alignment(source, target, *, matchers=
(), threshold=0.0)¶ Align a source label vocabulary against a target ontology.
Establishes typed
Correspondenceobjects from the source’s concepts to the target’s, in three passes: exact terminological anchoring (unique label/synonym/id matches becomeequivalent), any additionalmatchersfor the concepts left unanchored, then structural propagation that coarsens each still-unanchored source up to its nearest equivalent-anchored ancestor (narrower).broadercorrespondences are added as diagnostics where an equivalent source spans several finer target concepts. The result reports the safe label rewrite (class_remap), the open-world unaligned concepts on each side, and amergeabilitysummary.- Parameters:¶
- source : Ontology or Iterable[str]¶
The vocabulary to map from. A bare sequence of class names is treated as a structureless ontology (via
Ontology.from_hierarchy()); in that caselabel_alignmentreduces to label reconciliation plus structural inference against the target.- target : Ontology¶
The reference vocabulary to map to.
- matchers : Iterable[Matcher], optional¶
Additional element-level matchers (e.g. a fuzzy string-similarity matcher) consulted for source concepts the exact pass did not anchor. Each must implement the
Matcherprotocol. Exact anchoring is always performed first.- threshold : float, optional¶
Minimum confidence for accepting a matcher’s proposal, in
[0, 1]. Defaults to0.0(accept any proposal a matcher emits).
- Returns:¶
Correspondences, unaligned concepts on each side, the carry-over
class_remap, and themergeabilityof the source into the target.- Return type:¶
See also
dataeval.core.label_reconciliationThe restriction of alignment to a single structureless source and exact equivalence.
Notes
Acceptance favors precision over recall: a source concept with no unique exact match and no above-threshold, unambiguous matcher proposal is left unanchored (and only coarsened if it has an anchored ancestor), rather than committing a likely-wrong correspondence.