When using the Upload API, Cloudinary provides you with powerful criteria for determining duplications - one for identity comparison and the other for similarity:
- MD5 signature: The
"etag"
attribute included in the upload response can be used for recognizing identical images. The returned"etag"
value will match if the upload is identical. -
pHash: If requested (
"phash" => true
) this will be included in the upload response, allowing you to perform a similarity check and distinguish how close the images are.
See: http://cloudinary.com/blog/how_to_automatically_identify_similar_images_using_phash
More information on pHash: http://www.phash.org/demo/ -
ETag as public_id: A flag that can be enabled on your product environment to tell Cloudinary to use the ETag value (MD5 checksum) of the asset as its public_id. The "Default Public ID" setting that allows the ETag as Public ID flag to be enabled can be found under the Settings -> Upload tab in your product environment. Note that this configuration is applied for all new uploads following the flag being enabled.
- Cloudinary Duplicate Image Detection (Beta)
https://cloudinary.com/documentation/cloudinary_duplicate_image_detection_addon
The Cloudinary Duplicate Image Detection add-on can be invoked either on image upload, or on images already stored in your Cloudinary product environment, to determine if duplicate images exist in your media library. The add-on uses hashing algorithms to provide 'fingerprints' for selected images. A configurable threshold determines how close a fingerprint has to be to produce a match. Therefore, images do not need to be identical - for example, they can differ subtly in compression, resolution, contrast or brightness and still be close enough to be termed a duplicate. The add-on uses the moderation flow, so you can manually override any decisions made about the image.
Note: Images must be added to the moderation queue in order to be included in the duplicate comparison.
Checking image dimensions is always a good idea too. Using different hashes instead of just one reduces the chance of hash collisions.
Comments
0 comments
Please sign in to leave a comment.