dataeval.metrics.stats.HashStatsOutput¶
- class dataeval.metrics.stats.HashStatsOutput¶
Output class for
hashstats()stats metric.- pchash¶
Perception-based Hash of the images as a hex string
- Type:¶
List[str]
-
factors(filter=
None, exclude_constant=False)¶ Returns all 1-dimensional data as a dictionary of numpy arrays.
-
get_channel_mask(channel_index, channel_count=
None)¶ Boolean mask for results filtered to specified channel index and optionally the count of the channels per image.
-
plot(log, channel_limit=
None, channel_index=None)¶ Plots the statistics as a set of histograms.
- to_dataframe()¶
Returns a polars dataframe for the xxhash and pchash attributes of each sample
Note
xxhash and pchash do not follow the normal definition of factors but are helpful attributes of the data
Examples
Display the hashes of a dataset of images, whose shape is (C, H, W), as a polars DataFrame
>>> from dataeval.metrics.stats import hashstats >>> results = hashstats(dataset) >>> print(results.to_dataframe()) shape: (8, 2) ┌──────────────────┬──────────────────┐ │ xxhash ┆ pchash │ │ --- ┆ --- │ │ str ┆ str │ ╞══════════════════╪══════════════════╡ │ 66a93f556577c086 ┆ e666999999266666 │ │ d8b686fb405c4105 ┆ e666999999266666 │ │ 7ffdb4990ad44ac6 ┆ e666999966666299 │ │ 42cd4c34c80f6006 ┆ e666999999266666 │ │ c5519e36ac1f8839 ┆ 96e91656e91616e9 │ │ 39b4af4ffd1cba71 ┆ e666999999266666 │ │ d2f4564b9d21dcf5 ┆ e666999999266666 │ │ c7616bc627a12ddc ┆ e666999999266666 │ └──────────────────┴──────────────────┘- Return type:¶
polars.DataFrame