dataeval.quality.OutliersOutput

class dataeval.quality.OutliersOutput

Output class for Outliers lint detector.

issues

DataFrame of outlier issues with columns: - item_id: int - Index of the outlier image - target_id: int | None - Index of the target/detection within the image (None for image-level outliers). This column is omitted when all outliers are image-level (all target_id values would be None). - metric_name: str - Name of the metric that flagged this image/target - metric_value: float - Value of the metric for this image/target

Type:

pl.DataFrame | Sequence[pl.DataFrame]

- For a single dataset, a single DataFrame
- For multiple stats outputs, a sequence of DataFrames
aggregate_by_class(metadata)

Returns a Polars DataFrame summarizing outliers per class and metric.

Creates a pivot table showing the count of outlier images for each combination of class and metric. Includes a Total row showing the total number of outliers per metric across all classes, and a Total column showing the total number of outliers per class across all metrics.

Parameters:
metadata : Metadata

Metadata object containing class labels and image-to-class mappings for the dataset.

Returns:

DataFrame with columns: - class_name: cat - Name of the class - <metric_name>: int - Count of outliers for each metric (one column per metric) - Total: int - Total outlier count for the class across all metrics The last row is “Total” showing the sum across all classes for each metric. Rows are sorted by Total in descending order (excluding the Total row).

Return type:

pl.DataFrame

Raises:

ValueError – If the issues contain multiple DataFrames (from multiple datasets).

Examples

>>> outliers = Outliers(flags=ImageStats.VISUAL)
>>> results = outliers.evaluate(dataset)
>>> metadata = Metadata(dataset)
>>> summary = results.aggregate_by_class(metadata)
>>> summary
shape: (6, 6)
┌────────────┬────────────┬──────────┬──────────┬───────────┬───────┐
│ class_name ┆ brightness ┆ contrast ┆ darkness ┆ sharpness ┆ Total │
│ ---        ┆ ---        ┆ ---      ┆ ---      ┆ ---       ┆ ---   │
│ cat        ┆ u32        ┆ u32      ┆ u32      ┆ u32       ┆ u32   │
╞════════════╪════════════╪══════════╪══════════╪═══════════╪═══════╡
│ cow        ┆ 7          ┆ 1        ┆ 5        ┆ 2         ┆ 15    │
│ chicken    ┆ 3          ┆ 1        ┆ 3        ┆ 3         ┆ 10    │
│ pig        ┆ 4          ┆ 0        ┆ 3        ┆ 2         ┆ 9     │
│ sheep      ┆ 2          ┆ 0        ┆ 2        ┆ 2         ┆ 6     │
│ horse      ┆ 1          ┆ 0        ┆ 1        ┆ 1         ┆ 3     │
│ Total      ┆ 17         ┆ 2        ┆ 14       ┆ 10        ┆ 43    │
└────────────┴────────────┴──────────┴──────────┴───────────┴───────┘
aggregate_by_item()

Returns a Polars DataFrame summarizing outliers per item (item_id, target_id pair) and metric.

Creates a pivot table showing whether each item is flagged by each metric (1 if flagged, 0 if not). Includes a Total column showing the total number of metrics that flagged each item.

Returns:

DataFrame with columns: - item_id: int - Image identifier - target_id: int or None - Target identifier (Only with per_target outliers) - <metric_name>: int - Binary indicator (1 or 0) for each metric - count: int - Total number of metrics that flagged this item

Return type:

pl.DataFrame

Raises:

ValueError – If the issues contain multiple DataFrames (from multiple datasets).

Examples

>>> outliers = Outliers()
>>> results = outliers.evaluate(dataset)
>>> summary = results.aggregate_by_item()
>>> summary
shape: (10, 17)
┌─────────┬───────────┬────────────┬──────────┬───┬─────┬───────┬───────┬───────┐
│ item_id ┆ target_id ┆ brightness ┆ contrast ┆ … ┆ var ┆ width ┆ zeros ┆ Total │
│ ---     ┆ ---       ┆ ---        ┆ ---      ┆   ┆ --- ┆ ---   ┆ ---   ┆ ---   │
│ i64     ┆ i64       ┆ u32        ┆ u32      ┆   ┆ u32 ┆ u32   ┆ u32   ┆ u32   │
╞═════════╪═══════════╪════════════╪══════════╪═══╪═════╪═══════╪═══════╪═══════╡
│ 0       ┆ null      ┆ 0          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 1     ┆ 2     │
│ 0       ┆ 0         ┆ 1          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 4     │
│ 1       ┆ 0         ┆ 0          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 1     │
│ 2       ┆ null      ┆ 1          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 5     │
│ 2       ┆ 0         ┆ 1          ┆ 0        ┆ … ┆ 1   ┆ 0     ┆ 0     ┆ 4     │
│ 2       ┆ 1         ┆ 1          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 4     │
│ 4       ┆ null      ┆ 1          ┆ 1        ┆ … ┆ 1   ┆ 1     ┆ 0     ┆ 8     │
│ 4       ┆ 0         ┆ 0          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 1     │
│ 5       ┆ 0         ┆ 1          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 4     │
│ 7       ┆ 2         ┆ 1          ┆ 0        ┆ … ┆ 0   ┆ 0     ┆ 0     ┆ 4     │
└─────────┴───────────┴────────────┴──────────┴───┴─────┴───────┴───────┴───────┘
aggregate_by_metric()

Returns a Polars DataFrame summarizing outlier counts per metric.

Returns:

DataFrame with columns: - metric_name: str - Name of the metric - Total: int - Number of images flagged by this metric

Return type:

pl.DataFrame

Examples

>>> outliers = Outliers(flags=ImageStats.PIXEL)
>>> results = outliers.evaluate(dataset)
>>> summary = results.aggregate_by_metric()
>>> summary
shape: (7, 2)
┌─────────────┬───────┐
│ metric_name ┆ Total │
│ ---         ┆ ---   │
│ cat         ┆ u32   │
╞═════════════╪═══════╡
│ mean        ┆ 4     │
│ entropy     ┆ 2     │
│ var         ┆ 2     │
│ kurtosis    ┆ 1     │
│ skew        ┆ 1     │
│ std         ┆ 1     │
│ zeros       ┆ 1     │
└─────────────┴───────┘
data()

Returns the underlying DataFrame(s).

Return type:

TDataFrame

meta()

Metadata about the execution of the function or method for the Output class.

Return type:

ExecutionMetadata