dataeval.quality.OutliersOutput¶
- class dataeval.quality.OutliersOutput¶
Output class for
Outlierslint detector.- issues¶
DataFrame of outlier issues with columns: - item_id: int - Index of the outlier image - target_id: int | None - Index of the target/detection within the image (None for image-level outliers). This column is omitted when all outliers are image-level (all target_id values would be None). - metric_name: str - Name of the metric that flagged this image/target - metric_value: float - Value of the metric for this image/target
- Type:¶
pl.DataFrame | Sequence[pl.DataFrame]
- - For a single dataset, a single DataFrame
- - For multiple stats outputs, a sequence of DataFrames
- aggregate_by_class(metadata)¶
Returns a Polars DataFrame summarizing outliers per class and metric.
Creates a pivot table showing the count of outlier images for each combination of class and metric. Includes a Total row showing the total number of outliers per metric across all classes, and a Total column showing the total number of outliers per class across all metrics.
- Parameters:¶
- Returns:¶
DataFrame with columns: - class_name: cat - Name of the class - <metric_name>: int - Count of outliers for each metric (one column per metric) - Total: int - Total outlier count for the class across all metrics The last row is “Total” showing the sum across all classes for each metric. Rows are sorted by Total in descending order (excluding the Total row).
- Return type:¶
pl.DataFrame
- Raises:¶
ValueError – If the issues contain multiple DataFrames (from multiple datasets).
Examples
>>> outliers = Outliers(flags=ImageStats.VISUAL) >>> results = outliers.evaluate(dataset) >>> metadata = Metadata(dataset) >>> summary = results.aggregate_by_class(metadata) >>> summary shape: (6, 6) ┌────────────┬────────────┬──────────┬──────────┬───────────┬───────┐ │ class_name ┆ brightness ┆ contrast ┆ darkness ┆ sharpness ┆ Total │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ cat ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 │ ╞════════════╪════════════╪══════════╪══════════╪═══════════╪═══════╡ │ cow ┆ 7 ┆ 1 ┆ 5 ┆ 2 ┆ 15 │ │ chicken ┆ 3 ┆ 1 ┆ 3 ┆ 3 ┆ 10 │ │ pig ┆ 4 ┆ 0 ┆ 3 ┆ 2 ┆ 9 │ │ sheep ┆ 2 ┆ 0 ┆ 2 ┆ 2 ┆ 6 │ │ horse ┆ 1 ┆ 0 ┆ 1 ┆ 1 ┆ 3 │ │ Total ┆ 17 ┆ 2 ┆ 14 ┆ 10 ┆ 43 │ └────────────┴────────────┴──────────┴──────────┴───────────┴───────┘
- aggregate_by_item()¶
Returns a Polars DataFrame summarizing outliers per item (item_id, target_id pair) and metric.
Creates a pivot table showing whether each item is flagged by each metric (1 if flagged, 0 if not). Includes a Total column showing the total number of metrics that flagged each item.
- Returns:¶
DataFrame with columns: - item_id: int - Image identifier - target_id: int or None - Target identifier (Only with per_target outliers) - <metric_name>: int - Binary indicator (1 or 0) for each metric - count: int - Total number of metrics that flagged this item
- Return type:¶
pl.DataFrame
- Raises:¶
ValueError – If the issues contain multiple DataFrames (from multiple datasets).
Examples
>>> outliers = Outliers() >>> results = outliers.evaluate(dataset) >>> summary = results.aggregate_by_item() >>> summary shape: (10, 17) ┌─────────┬───────────┬────────────┬──────────┬───┬─────┬───────┬───────┬───────┐ │ item_id ┆ target_id ┆ brightness ┆ contrast ┆ … ┆ var ┆ width ┆ zeros ┆ Total │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ u32 ┆ u32 ┆ ┆ u32 ┆ u32 ┆ u32 ┆ u32 │ ╞═════════╪═══════════╪════════════╪══════════╪═══╪═════╪═══════╪═══════╪═══════╡ │ 0 ┆ null ┆ 0 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 1 ┆ 2 │ │ 0 ┆ 0 ┆ 1 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 4 │ │ 1 ┆ 0 ┆ 0 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 1 │ │ 2 ┆ null ┆ 1 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 5 │ │ 2 ┆ 0 ┆ 1 ┆ 0 ┆ … ┆ 1 ┆ 0 ┆ 0 ┆ 4 │ │ 2 ┆ 1 ┆ 1 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 4 │ │ 4 ┆ null ┆ 1 ┆ 1 ┆ … ┆ 1 ┆ 1 ┆ 0 ┆ 8 │ │ 4 ┆ 0 ┆ 0 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 1 │ │ 5 ┆ 0 ┆ 1 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 4 │ │ 7 ┆ 2 ┆ 1 ┆ 0 ┆ … ┆ 0 ┆ 0 ┆ 0 ┆ 4 │ └─────────┴───────────┴────────────┴──────────┴───┴─────┴───────┴───────┴───────┘
- aggregate_by_metric()¶
Returns a Polars DataFrame summarizing outlier counts per metric.
- Returns:¶
DataFrame with columns: - metric_name: str - Name of the metric - Total: int - Number of images flagged by this metric
- Return type:¶
pl.DataFrame
Examples
>>> outliers = Outliers(flags=ImageStats.PIXEL) >>> results = outliers.evaluate(dataset) >>> summary = results.aggregate_by_metric() >>> summary shape: (7, 2) ┌─────────────┬───────┐ │ metric_name ┆ Total │ │ --- ┆ --- │ │ cat ┆ u32 │ ╞═════════════╪═══════╡ │ mean ┆ 4 │ │ entropy ┆ 2 │ │ var ┆ 2 │ │ kurtosis ┆ 1 │ │ skew ┆ 1 │ │ std ┆ 1 │ │ zeros ┆ 1 │ └─────────────┴───────┘