Skip to content

v0.17.12

Latest

Choose a tag to compare

@edwinpav edwinpav released this 02 Mar 20:09
· 2 commits to master since this release
878ca05

Add dataset-scoped image deduplication support.

Supports deduplication on image and video datasets for an entire dataset, select reference ids, or select dataset item ids.

Example usage:

dataset = client.get_dataset("ds_...")

# Deduplicate entire dataset
result = dataset.deduplicate(threshold=10)

# Deduplicate specific items by reference IDs
result = dataset.deduplicate(threshold=10, reference_ids=["ref_1", "ref_2", "ref_3"])

# Deduplicate by internal item IDs (more efficient if you have them)
result = dataset.deduplicate_by_ids(threshold=10, dataset_item_ids=["item_1", "item_2"])

# Access results
print(f"Threshold: {result.stats.threshold}")
print(f"Original: {result.stats.original_count}, Unique: {result.stats.deduplicated_count}")
print(result.unique_reference_ids)