Trying to solve a problem of preventing duplicate images to be uploaded.

I have two JPGs. Looking at them I can see that they are in fact identical. But for some reason they have different file size (one is pulled from a backup, the other is another upload) and so they have a different md5 checksum.

How can I efficiently and confidently compare two images in the same sense as a human would be able to see that they are clearly identical?

Example: http://static.peterbe.com/a.jpg and http://static.peterbe.com/b.jpg

**Update**

I wrote this script:

```
import math, operator
from PIL import Image
def compare(file1, file2):
image1 = Image.open(file1)
image2 = Image.open(file2)
h1 = image1.histogram()
h2 = image2.histogram()
rms = math.sqrt(reduce(operator.add,
map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
return rms
if __name__=='__main__':
import sys
file1, file2 = sys.argv[1:]
print compare(file1, file2)
```

Then I downloaded the two visually identical images and ran the script. Output:

```
58.9830484122
```

Can anybody tell me what a suitable cutoff should be?

**Update II**

The difference between a.jpg and b.jpg is that the second one has been saved with PIL:

```
b=Image.open('a.jpg')
b.save(open('b.jpg','wb'))
```

This apparently applies some very very light quality modifications. I've now solved my problem by applying the same PIL save to the file being uploaded without doing anything with it and it now works!

## Solution 1

There is a OSS project that uses WebDriver to take screen shots and then compares the images to see if there are any issues (http://code.google.com/p/fighting-layout-bugs/)). It does it by openning the file into a stream and then comparing every bit.

You may be able to do something similar with PIL.

EDIT:

After more research I found

```
h1 = Image.open("image1").histogram()
h2 = Image.open("image2").histogram()
rms = math.sqrt(reduce(operator.add,
map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
```

on http://snipplr.com/view/757/compare-two-pil-images-in-python/ and http://effbot.org/zone/pil-comparing-images.htm

## Solution 2

From here

The quickest way to determine if two images have exactly the same contents is to get the difference between the two images, and then calculate the bounding box of the non-zero regions in this image.

If the images are identical, all pixels in the difference image are zero, and the bounding box function returns None.

```
from PIL import ImageChops
def equal(im1, im2):
return ImageChops.difference(im1, im2).getbbox() is None
```

## Solution 3

I guess you should decode the images and do a pixel by pixel comparison to see if they're reasonably similar.

With PIL and Numpy you can do it quite easily:

```
import Image
import numpy
import sys
def main():
img1 = Image.open(sys.argv[1])
img2 = Image.open(sys.argv[2])
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
s = 0
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
s += numpy.sum(numpy.abs(m1-m2))
print s
if __name__ == "__main__":
sys.exit(main())
```

This will give you a numeric value that should be very close to 0 if the images are quite the same.

Note that images that are shifted/rotated will be reported as very different, as the pixels won't match one by one.

## Solution 4

Using ImageMagick, you can simply use in your shell [or call via the OS library from within a program]

```
compare image1 image2 output
```

This will create an output image with the differences marked

```
compare -metric AE -fuzz 5% image1 image2 output
```

Will give you a fuzziness factor of 5% to ignore minor pixel differences. More information can be procured from here

## Solution 5

the problem of knowing what makes some features of the image more important than other is a whole scientific program. I would suggest some alternatives depending on the solution you want:

if your problem is to see if there is a flipping of bits in your JPEGs, then try to image the difference image (there was perhaps a minor edit locally?),

to see if images are globally the same, use the Kullback Leibler distance to compare your histograms,

to see if you have some qualittative change, before applying other answers, filter your image using the functions below to raise the importance of high-level frequencies:

code:

```
def FTfilter(image,FTfilter):
from scipy.fftpack import fft2, fftshift, ifft2, ifftshift
from scipy import real
FTimage = fftshift(fft2(image)) * FTfilter
return real(ifft2(ifftshift(FTimage)))
#return real(ifft2(fft2(image)* FTfilter))
#### whitening
def olshausen_whitening_filt(size, f_0 = .78, alpha = 4., N = 0.01):
"""
Returns the whitening filter used by (Olshausen, 98)
f_0 = 200 / 512
/!\ you will have some problems at dewhitening without a low-pass
"""
from scipy import mgrid, absolute
fx, fy = mgrid[-1:1:1j*size[0],-1:1:1j*size[1]]
rho = numpy.sqrt(fx**2+fy**2)
K_ols = (N**2 + rho**2)**.5 * low_pass(size, f_0 = f_0, alpha = alpha)
K_ols /= numpy.max(K_ols)
return K_ols
def low_pass(size, f_0, alpha):
"""
Returns the low_pass filter used by (Olshausen, 98)
parameters from Atick (p.240)
f_0 = 22 c/deg in primates: the full image is approx 45 deg
alpha makes the aspect change (1=diamond on the vert and hor, 2 = anisotropic)
"""
from scipy import mgrid, absolute
fx, fy = mgrid[-1:1:1j*size[0],-1:1:1j*size[1]]
rho = numpy.sqrt(fx**2+fy**2)
low_pass = numpy.exp(-(rho/f_0)**alpha)
return low_pass
```

(shameless copy from http://www.incm.cnrs-mrs.fr/LaurentPerrinet/Publications/Perrinet08spie )

## Solution 6

Using only PIL and some of the Python math libraries, it is possible to see if two images are identical to each other in a simple, concise manner. This method has only been tested on image files with the same dimensions and extension, but has avoided several errors made in other answers to this question.

```
import math, operator
from PIL import Image
from PIL import ImageChops
def images_are_similar(img1, img2, error=90):
diff = ImageChops.difference(img1, img2).histogram()
sq = (value * (i % 256) ** 2 for i, value in enumerate(diff))
sum_squares = sum(sq)
rms = math.sqrt(sum_squares / float(img1.size[0] * img1.size[1]))
# Error is an arbitrary value, based on values when
# comparing 2 rotated images & 2 different images.
return rms < error
```

Advantages:

Adding `% 256`

to the computation of squares weights each color equally. Many previous answers' RMS formulas give Blue pixel values 3x the weight of Red values, and Green pixel values 2x the weight of Red values.

Easier to grok. While the RMS calculation could be written as a one-liner, with lambdas and the reduce method, expanding it out to 3 lines greatly improves at-a-glance readability.

This code properly detects that rotated images are different from a differently oriented base image. This avoids a pitfall when using histograms to compare images, as pointed out by @musicinmybrain. If histograms of 2 images are created then compared to each other, if one image is a rotation of the other, the comparison will report that there are no differences in the images because the images' histograms are identical. On the other hand, if the images are compared first, then a histogram of the comparison results is created, the images will compare accurately, even if one is a rotation of the other.

The code used in this answer is a copy/paste from this code.activestate.com post, taking into consideration the 3rd comment, which corrects the heavier weighting of Green and Blue pixel values.

## Solution 7

First, I should note theyre **not** identical; b has been recompressed and lost quality. You can see this if you look carefully on a good monitor.

To determine that they are subjectively the same, you would have to do something like what fortran suggested, although you will have to arbitrarily establish a threshold for sameness. To make s independent of image size, and to handle channels a little more sensibly, I would consider doing the RMS (root mean square) Euclidean distance in colorspace between the pixels of the two images. I dont have time to write out the code right now, but basically for each pixel, you compute

```
(R_2 - R_1) ** 2 + (G_2 - G_1) ** 2 + (B_2 - B_1) ** 2
```

, adding in an

(A_2 - A_1) ** 2

term if the image has an alpha channel, etc. The result is the square of the colorspace distance between the two images. Find the mean (average) across all pixels, then take the square root of the resulting scalar. Then decide a reasonable threshold for this value.

Or, you might just decide that copies of the same original image with different lossy compression are not truly the same and stick with the file hash.

## Solution 8

I tested this one and it works the best of all methods and extremly fast!

```
def rmsdiff_1997(im1, im2):
"Calculate the root-mean-square difference between two images"
h = ImageChops.difference(im1, im2).histogram()
# calculate rms
return math.sqrt(reduce(operator.add,
map(lambda h, i: h*(i**2), h, range(256))
) / (float(im1.size[0]) * im1.size[1]))
```

here link for reference

## Solution 9

You can either compare it using PIL (iterate through pixels / segments of the picture and compare) or if you're looking for a complete identical copy comparison, try comparing the MD5 hash of both files.

## Solution 10

I have tried 3 methods mentioned above and elsewhere. There seems to be two main type of image comparison, Pixel-By-Pixel, and Histogram.

I have tried both, and Pixel one does fail 100%, as it actually should, as if we shift second image by 1 pixel, all pixel will not match and we will have 100% no match.

But Histogram comparison should work really good in theory, but it does not.

Here are two images with slightly shifted view port and histogram looks 99% similar, yet algorithm produces result that says "Very Different"

4 different Algorithm result:

- Perfect match: False
- Pixel difference: 115816402
- Histogram Comparison: 83.69564286668303
- HistComparison: 1744.8160719686186

And same comparison of the first image (centred QR) with a 100% different image:

Totally different image and histogram

Algorithm results:

- Perfect match: False
- Pixel difference: 207893096
- HistogramComparison: 104.30194643642095
- HistComparison: 6875.766716148522

Any suggestions on how to measure two image difference in a more precise and usable way would be much appreciated. At this stage none of these algorithms seem to produce usable results as slightly different image has very similar/close results to a 100% different image.

```
from PIL import Image
from PIL import ImageChops
from functools import reduce
import numpy
import sys
import math
import operator
# Just checking if images are 100% the same
def equal(im1, im2):
img1 = Image.open(im1)
img2 = Image.open(im2)
return ImageChops.difference(img1, img2).getbbox() is None
def histCompare(im1, im2):
h1 = Image.open(im1).histogram()
h2 = Image.open(im2).histogram()
rms = math.sqrt(reduce(operator.add, map(lambda a, b: (a - b)**2, h1, h2)) / len(h1))
return rms
# To get a measure of how similar two images are, we calculate the root-mean-square (RMS)
# value of the difference between the images. If the images are exactly identical,
# this value is zero. The following function uses the difference function,
# and then calculates the RMS value from the histogram of the resulting image.
def rmsdiff_1997(im1, im2):
#"Calculate the root-mean-square difference between two images"
img1 = Image.open(im1)
img2 = Image.open(im2)
h = ImageChops.difference(img1, img2).histogram()
# calculate rms
return math.sqrt(reduce(operator.add,
map(lambda h, i: h * (i**2), h, range(256))
) / (float(img1.size[0]) * img1.size[1]))
# Pixel by pixel comparison to see if images are reasonably similar.
def countDiff(im1, im2):
s = 0
img1 = Image.open(im1)
img2 = Image.open(im2)
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
s += numpy.sum(numpy.abs(m1 - m2))
return s
print("[Same Image]")
print("Perfect match:", equal("data/start.jpg", "data/start.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/start.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/start.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/start.jpg"))
print("\n[Same Position]")
print("Perfect match:", equal("data/start.jpg", "data/end.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end.jpg"))
print("\n[~5º off]")
print("Perfect match:", equal("data/start.jpg", "data/end2.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end2.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end2.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end2.jpg"))
print("\n[~15º off]")
print("Perfect match:", equal("data/start.jpg", "data/end3.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end3.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end3.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end3.jpg"))
print("\n[100% different]")
print("Perfect match:", equal("data/start.jpg", "data/end4.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end4.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end4.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end4.jpg"))
```

## Solution 11

A simple solution with `numpy`

```
img1 = Image.open(img1_path)
img2 = Image.open(img2_path)
```

Then use array_equal

`np.array_equal(img1, img2)`

If the return is `True`

then their exactly the same across all channels

## Solution 12

```
import cv2
import numpy as np
original = cv2.imread("Test.jpg")
duplicate = cv2.imread("Test1.jpg")
# 1) Check if 2 images are equals
if original.shape == duplicate.shape:
print("The images have same size and channels")
difference = cv2.subtract(original, duplicate)
b, g, r = cv2.split(difference)
if cv2.countNonZero(b) == 0 and cv2.countNonZero(g) == 0 and cv2.countNonZero(r) == 0:
print("The images are completely Equal")
cv2.imshow("Original", original)
cv2.imshow("Duplicate", duplicate)
cv2.waitKey(0)
cv2.destroyAllWindows()
```

## Solution 13

if you want to check two images same or not,this code may help you.

```
import binascii
with open('pic1.png', 'rb') as f:
content1 = f.read()
with open('pic2.png', 'rb') as f:
content2 = f.read()
if content1 == content2:
print("same")
else:
print("not same")
```

`