dataeval.types.DatasetInfo

class dataeval.types.DatasetInfo

Descriptive metadata for a dataset artifact.

All fields except name are optional so the schema accommodates partially-known datasets (e.g. when class names are unavailable).

name

Human-readable identifier for the dataset.

Type:

str

version

Dataset version string (semantic versioning recommended).

Type:

str or None

description

Free-text description.

Type:

str or None

source

Origin URL or provenance string.

Type:

str or None

format

Storage/exchange format (e.g. "COCO", "HuggingFace", "MAITE").

Type:

str or None

n_samples

Total number of samples across all splits.

Type:

int or None

n_classes

Number of distinct classes/categories.

Type:

int or None

class_names

Ordered list of class names (length should equal n_classes when both set).

Type:

list of str or None

splits

Per-split sample counts (e.g. {"train": 800, "val": 100, "test": 100}).

Type:

dict of str to int or None

checksum

Content checksum (recommended: sha256:<hex>).

Type:

str or None

license

SPDX license identifier or license name.

Type:

str or None

selections

Applied selection lineage, grouped by construction call to mirror dataeval.selection.Select.selection_groups. The outer list is ordered innermost (oldest) first; each inner list is one Select(...) call’s worth of selectors.

Type:

list of list of SelectionInfo or None