Continuing to Code

Well, my python’s not exactly getting prettier but I’ve been able to make it more functional, may I present “find_image_dups.py”!

Although I’m learning this 4G language (is it truly?) I still tend towards a shell scripting approach so I write small bits of code rather then trying to write one script that will both subdivide my pictures as well as find duplicates.

You can see I’ve experimented with different ways to determine if files are “equal”. If they’re the exact same file then the MD5’s would match and there are plenty of tools for that, but I anticipate some instances where a few pixels may be different, so checking a thumbnail may be a more repeatable test of identify.

Although my first script allows me to circumvent the constraints imposed by working with too many files, I’m starting to feel frustrated that the limits aren’t easier to “ignore”. I don’t want to waste my time tuning linux limits or tweaking python, I want it to just work in the most simplistic (even if it’s brute force) manner possible!

One last comment on a language deficiency I find constraining. Mike and I were discussing the matter and his view, is that it’s not a problem. However, the solution is more code, which is sometimes a silly metric as both of pieces of code would operate equally “efficiently” I believe my “style” to be more readable (although it doesn’t work so Mike is clearly the victor in the argument).

Let’s assume for the purposes of the illustration that you were an old C programmer and didn’t use IDE’s all that much, nor want to learn the python debugger yet. So during your while loop you may be tempted to do;

print "counter: %s p: %s i:%s" % (counter, p.filename, i.filename)

Unfortunately, the first time through p is "None" and you’ll have problems for this single corner case. In C, when printing pointer references, I would use the ?: tertiary operator which led me to try this in python;

print "counter: %s p: %s i:%s" % (counter, p ? p.filename : "None", i.filename)

I think it’s pretty straightforward to understand what’s going on there and what you’d like done, but unfortunately python’s ternary is

op1 if condition else op2

Thus, in python I feel like I should be able to do;

print "counter: %s p: %s i:%s" % (counter, p.filename if p else "None", i.filename)

But that doesn’t work!
Mike’s solution is;

if p:
print "counter: %s p: %s i:%s" % (counter, p ? p.filename : "None", i.filename)
else:
print "counter: %s p: %s i:%s" % (counter, p ? p.filename : "None", i.filename)

While clearly functional but I dislike the added conditionals since, to me, they don’t feel like the main intent of the code. However, Mike expressed a good point; If the code were permanent, the ternary operator might cause to someone examining the printouts and sees two possible outputs from a single statement.

Nevertheless it’s great learning and discussing some of the finer points of style in a language I’m only newly familiar with!


#!/usr/bin/python

import Image
import ImageStat

from glob import glob
from os import mkdir
from shutil import move

def RMS_cmp(x, y):
# if ImageStat.Stat(x).rms == ImageStat.Stat(y).rms:
# if x.copy().resize((300,300)).histogram() == y.copy().resize((300,300)).histogram():
if x.histogram() == y.histogram():
return 0
elif ImageStat.Stat(x).rms > ImageStat.Stat(y).rms:
return 1
else: # x<y
return -1

images = map(Image.open, glob(“*.[Jj][Pp][Gg]”))
images.sort(RMS_cmp)

counter = 1
p = None
for i in images:
if p and ( RMS_cmp(p, i) == 0 ):
move(i.filename, str(counter))
else: #no match start a new directory
if p:
counter += 1
try:
mkdir(str(counter))
except:
pass
move(i.filename, str(counter))
p = i

About jay

I'm trying to build something interactive where I can learn from others and hopefully share useful knowledge too. thecapacity@gmail.com
This entry was posted in code, opensource, python. Bookmark the permalink.

2 Responses to Continuing to Code

  1. jay says:

    Hey Kevin!

    I thought about going for Ruby but although Rails seems to be popular with all the Web 2.0 stuff, Python came across as more “practical”. That and Ruby just came across as a little too “out there” in syntax for me. For example. Object.each seems strange because the iterator should be a keyword not a call to the object itself… but that’s just my view on it.

    It’s by no means a disregard for Ruby more a sense or feeling that led me to go with Python. Though looking @ MapReduce and Hadoop I’ve been impressed with what I’ve read about the Ruby implementation “SkyNet”.

    I really like the “MapReduce” model, I could think of a few ways to better search photos if I had a cluster setup but I’m not there yet.

    As you said though it’s interesting to work with languages “outside our day job” so I’m glad you’ve been getting to play too!

  2. Kevin Tambascio says:

    Hey Jay..

    I’ve never really done much with Python, but I do enjoy Ruby. You eventually get used to writing loops like

    Object.each do |obj|
    ….
    end

    which is really strange when I spend all my day-job time writing C++. Ruby has a tertiary operator much like C/C++’s, it’s the one element of familiarity I have when working with it.

    Just saw the blog in your Facebook profile…

    -Kevin

Comments are closed.