dataeval.outputs.HashStatsOutput

class dataeval.outputs.HashStatsOutput

Output class for hashstats() stats metric.

xxhash

xxHash hash of the images as a hex string

Type:

List[str]

pchash

Perception-based Hash of the images as a hex string

Type:

List[str]

data()

The output data as a dictionary.

Return type:

dict[str, Any]

factors(filter=None, exclude_constant=False)

Returns all 1-dimensional data as a dictionary of numpy arrays.

Parameters:
filter : str, Sequence[str] or None, default None:

If provided, only returns keys that match the filter.

exclude_constant : bool, default False

If True, exclude arrays that contain only a single unique value.

Return type:

Mapping[str, NDArray[Any]]

get_channel_mask(channel_index, channel_count=None)

Boolean mask for results filtered to specified channel index and optionally the count of the channels per image.

Parameters:
channel_index : int | Iterable[int] | None

Index or indices of channel(s) to filter for

channel_count : int | Iterable[int] | None

Optional count(s) of channels to filter for

Return type:

collections.abc.Sequence[bool]

meta()

Metadata about the execution of the function or method for the Output class.

Return type:

ExecutionMetadata

plot(log, channel_limit=None, channel_index=None)

Plots the statistics as a set of histograms.

Parameters:
log : bool

If True, plots the histograms on a logarithmic scale.

channel_limit : int or None

The maximum number of channels to plot. If None, all channels are plotted.

channel_index : int, Iterable[int] or None

The index or indices of the channels to plot. If None, all channels are plotted.

Return type:

matplotlib.Figure

to_dataframe()

Returns a polars dataframe for the xxhash and pchash attributes of each sample

Note

xxhash and pchash do not follow the normal definition of factors but are helpful attributes of the data

Examples

Display the hashes of a dataset of images, whose shape is (C, H, W), as a polars DataFrame

>>> from dataeval.metrics.stats import hashstats
>>> results = hashstats(dataset)
>>> print(results.to_dataframe())
shape: (8, 2)
┌──────────────────┬──────────────────┐
│ xxhash           ┆ pchash           │
│ ---              ┆ ---              │
│ str              ┆ str              │
╞══════════════════╪══════════════════╡
│ 66a93f556577c086 ┆ e666999999266666 │
│ d8b686fb405c4105 ┆ e666999999266666 │
│ 7ffdb4990ad44ac6 ┆ e666999966666299 │
│ 42cd4c34c80f6006 ┆ e666999999266666 │
│ c5519e36ac1f8839 ┆ 96e91656e91616e9 │
│ 39b4af4ffd1cba71 ┆ e666999999266666 │
│ d2f4564b9d21dcf5 ┆ e666999999266666 │
│ c7616bc627a12ddc ┆ e666999999266666 │
└──────────────────┴──────────────────┘
Return type:

polars.DataFrame