I’m a big fan of O’Reilly books, as I’m sure most of you are. They’re great technical resources for me and have cute animals my wife can really enjoy!
A friend of mine got Programming Collective Intelligence and recomended it to me, so my mother-in-law gave it to me for my birthday (yay, I’m old!). I’m stoked to see O’Reilly focused on moving “up the stack” of technology in such an approachable way.
I finally got a chance to start last night and reading the preface it was immediately apparent this was going to challenge my newly developed python skills.
e.g.
{xvii} //That’s the page #
string_list = ['a', 'b', 'c', 'd']
string_list[2] # returns ‘b’ #wrong it should be ‘c’
You know when they’re teaching you incorrect python that it’s going to be a fun way to learn. I worked my way up to page 11 lastnight and found about ~8+ errata. This is the first time I’ve felt completely comfortable marking up a book (oh the sacrilege!) but I do focus better when I can’t simply skim…
I expressed my recent activities on twitter, and another friend asked if I was keeping a list. So, FJ, this post’s for you and for everyone else who doesn’t want to scratch the same grove in their head that I did.
O’Reilly’s great about leveraging the collective intelligence [pun intended] and you can Submit and Find errata (perhaps I should order by frequency and say “Find and Submit”) a O’Reilly’s website for the book.
- Unfortunately, the official list only has two and hasn’t been updated since the 18th of Feb!!!
I submitted mine there and there’s a ton more (but the user format is a little hard to scroll through).
So here’s my quick list till now (p11) [I'll try to add new ones as comments so you can track this post] and if anyone from O’Reilly’s reading I think I’d make a great editor, if only to actually update the official list with the good community feedback and help others out!
{xvii} string_list[2] = ‘c’
{xviii} /* first list compression should change v1>4 to v>4 */
{xix} // Chapter 2, 2nd to last line “move” should be “movie”
{9} critics['Toby'] #output is missing ‘Superman Returns’: 4.0
{10} //The results of both math functions are wrong as they use the wrong datapoints (5,4) & (4,1) which should be (1,4.5) and (2,4)
{11} //sim_distance() - the return function should be; return 1/(1+sqrt(sum_of_squares))
{11} from recommendations import critics, sim_distance #reload(recommendations) didn’t work for me. You’ll have to change the subsequent function call as well and because of the previous errata the returned # should be 0.2942 (approximately) and not 0.1481
{11} This wasn’t my find, I learned it from the user submitted errata, but someone mentioned using “si = set()” and then “si.add(item)” instead of “si[item]=1″ … Both make sense, but the set seems cleaner and was a new semantic for me.

{ 4 } Comments
p14 (3rd paragraph) - “sim_vecror” should be “sim_distance”.
More interesting, I finished the two movie recommendations exercises (but not done “Top Matches” yet) and I was really surprised at the variability of the results given by the two methods (Euclidean vs. Pearson );
Jack Matthews and Mick LaSalle => D(0.286) P(0.211)
Jack Matthews and Claudia Puig => D(0.320) P(0.029)
Jack Matthews and Lisa Rose => D(0.341) P(0.747)
Jack Matthews and Toby => D(0.267) P(0.663)
Jack Matthews and Gene Seymour => D(0.667) P(0.964)
Jack Matthews and Michael Phillips => D(0.320) P(0.135)
Mick LaSalle and Jack Matthews => D(0.286) P(0.211)
Mick LaSalle and Claudia Puig => D(0.315) P(0.567)
Mick LaSalle and Lisa Rose => D(0.414) P(0.594)
Mick LaSalle and Toby => D(0.400) P(0.924)
Mick LaSalle and Gene Seymour => D(0.278) P(0.412)
Mick LaSalle and Michael Phillips => D(0.387) P(-0.258)
Claudia Puig and Jack Matthews => D(0.320) P(0.029)
Claudia Puig and Mick LaSalle => D(0.315) P(0.567)
Claudia Puig and Lisa Rose => D(0.387) P(0.567)
Claudia Puig and Toby => D(0.357) P(0.893)
Claudia Puig and Gene Seymour => D(0.282) P(0.315)
Claudia Puig and Michael Phillips => D(0.536) P(1.000)
Lisa Rose and Jack Matthews => D(0.341) P(0.747)
Lisa Rose and Mick LaSalle => D(0.414) P(0.594)
Lisa Rose and Claudia Puig => D(0.387) P(0.567)
Lisa Rose and Toby => D(0.348) P(0.991)
Lisa Rose and Gene Seymour => D(0.294) P(0.396)
Lisa Rose and Michael Phillips => D(0.472) P(0.405)
Toby and Jack Matthews => D(0.267) P(0.663)
Toby and Mick LaSalle => D(0.400) P(0.924)
Toby and Claudia Puig => D(0.357) P(0.893)
Toby and Lisa Rose => D(0.348) P(0.991)
Toby and Gene Seymour => D(0.258) P(0.381)
Toby and Michael Phillips => D(0.387) P(-1.000)
Gene Seymour and Jack Matthews => D(0.667) P(0.964)
Gene Seymour and Mick LaSalle => D(0.278) P(0.412)
Gene Seymour and Claudia Puig => D(0.282) P(0.315)
Gene Seymour and Lisa Rose => D(0.294) P(0.396)
Gene Seymour and Toby => D(0.258) P(0.381)
Gene Seymour and Michael Phillips => D(0.341) P(0.205)
Michael Phillips and Jack Matthews => D(0.320) P(0.135)
Michael Phillips and Mick LaSalle => D(0.387) P(-0.258)
Michael Phillips and Claudia Puig => D(0.536) P(1.000)
Michael Phillips and Lisa Rose => D(0.472) P(0.405)
Michael Phillips and Toby => D(0.387) P(-1.000)
Michael Phillips and Gene Seymour => D(0.341) P(0.205)
I assume some of it could be related to the relatively few data points (sometimes critics only share 2 of 5 movies.
Any other ideas?
As I mentioned I’ve found it instructive to hand type in the python code for this book. Also I didn’t really feel like signing up for Safari (O’Reilly’s online book library) but today I discovered that you can get a zip file of all the code!
Thanks to this post
http://blog.kiwitobes.com/?p=44
Here’s the direct link
http://kiwitobes.com/PCI_Code.zip
p14 -
Not a big but in “Ranking the Critics…” in the code there’s no need to reverse the full list;
Here’s the code ‘as is’;
scores.sort()
scores.reverse()
return scores[0:n]
This could simply be;
scores.sort()
return scores[-1:-(n+1):-1]
The syntax’s a little strange but it would save you reversing a big list.
I think you could also probably sort the list then just slice the parts you need off and then just reverse that new list.
P17 when doing the “getRecommendations()” call with “similarity=sim_distance” you will get slightly different values for the 3 movies then what’s listed (because of the previous function errors) but it’s a minimal error.
{ 2 } Trackbacks
[...] weeks! Wow, yes it’s really been that long since I started reading Programming Collective Intelligence and last posted [...]
[...] you’ve seen from my previous posts, I’ve been playing with python, couchdb and been working my way through the “Programming Collective Intelligence” book. However, what I haven’t been [...]
Post a Comment