When using the Upload API, Cloudinary provides you with powerful criteria for determining duplications - one for identity comparison and the other for similarity:
- MD5 signature: The
"etag"attribute included in the upload response can be used for recognizing completely identical images. The returned
"etag"value will match if the upload is identical.
pHash: If requested (
"phash" => true) this will be included in the upload response, allowing to commit a similarity check and distinguish how close the images are.
More information on pHash: http://www.phash.org/demo/
ETag as public_id: A cloud-level flag that can be enabled to tell Cloudinary to use the ETag value (MD5 checksum) of the resource as its public_id. The "Default Public ID" setting that allows the ETag as Public ID flag to be enabled can be found under the Settings -> Upload tab in your cloud. Note that this configuration is applied for all new uploads following the flag being enabled.
- Cloudinary Duplicate Image Detection (Beta)
The Cloudinary Duplicate Image Detection add-on can be invoked either on image upload, or on images already stored in your Cloudinary product environment, to determine if duplicate images exist in your media library. The add-on uses hashing algorithms to provide 'fingerprints' for selected images. A configurable threshold determines how close a fingerprint has to be to produce a match. Therefore, images do not need to be identical - for example, they can differ subtly in compression, resolution, contrast or brightness and still be close enough to be termed a duplicate. The add-on uses the moderation flow, so you can manually override any decisions made about the image.
Note: Images must be added to the moderation queue in order to be included in the duplicate comparison.
Checking image dimensions is always a good idea too. Using different hashes instead of just one reduces the chance of hash collisions.
Please sign in to leave a comment.