O’Reilly Make me an Offer!
I’m a big fan of O’Reilly books, as I’m sure most of you are. They’re great technical resources for me and have cute animals my wife can really enjoy!
A friend of mine got Programming Collective Intelligence and recomended it to me, so my mother-in-law gave it to me for my birthday (yay, I’m old!). I’m stoked to see O’Reilly focused on moving “up the stack” of technology in such an approachable way.
I finally got a chance to start last night and reading the preface it was immediately apparent this was going to challenge my newly developed python skills.
e.g.
{xvii} //That’s the page #
string_list = ['a', 'b', 'c', 'd']
string_list[2] # returns ‘b’ #wrong it should be ‘c’
You know when they’re teaching you incorrect python that it’s going to be a fun way to learn. I worked my way up to page 11 lastnight and found about ~8+ errata. This is the first time I’ve felt completely comfortable marking up a book (oh the sacrilege!) but I do focus better when I can’t simply skim…
I expressed my recent activities on twitter, and another friend asked if I was keeping a list. So, FJ, this post’s for you and for everyone else who doesn’t want to scratch the same grove in their head that I did.
O’Reilly’s great about leveraging the collective intelligence [pun intended] and you can Submit and Find errata (perhaps I should order by frequency and say “Find and Submit”) a O’Reilly’s website for the book.
- Unfortunately, the official list only has two and hasn’t been updated since the 18th of Feb!!!
I submitted mine there and there’s a ton more (but the user format is a little hard to scroll through).
So here’s my quick list till now (p11) [I'll try to add new ones as comments so you can track this post] and if anyone from O’Reilly’s reading I think I’d make a great editor, if only to actually update the official list with the good community feedback and help others out!
{xvii} string_list[2] = ‘c’
{xviii} /* first list compression should change v1>4 to v>4 */
{xix} // Chapter 2, 2nd to last line “move” should be “movie”
{9} critics['Toby'] #output is missing ‘Superman Returns’: 4.0
{10} //The results of both math functions are wrong as they use the wrong datapoints (5,4) & (4,1) which should be (1,4.5) and (2,4)
{11} //sim_distance() – the return function should be; return 1/(1+sqrt(sum_of_squares))
{11} from recommendations import critics, sim_distance #reload(recommendations) didn’t work for me. You’ll have to change the subsequent function call as well and because of the previous errata the returned # should be 0.2942 (approximately) and not 0.1481
{11} This wasn’t my find, I learned it from the user submitted errata, but someone mentioned using “si = set()” and then “si.add(item)” instead of “si[item]=1″ … Both make sense, but the set seems cleaner and was a new semantic for me.


April 19th, 2008 at 6:06 pm
p14 (3rd paragraph) – “sim_vecror” should be “sim_distance”.
More interesting, I finished the two movie recommendations exercises (but not done “Top Matches” yet) and I was really surprised at the variability of the results given by the two methods (Euclidean vs. Pearson );
Jack Matthews and Mick LaSalle => D(0.286) P(0.211)
Jack Matthews and Claudia Puig => D(0.320) P(0.029)
Jack Matthews and Lisa Rose => D(0.341) P(0.747)
Jack Matthews and Toby => D(0.267) P(0.663)
Jack Matthews and Gene Seymour => D(0.667) P(0.964)
Jack Matthews and Michael Phillips => D(0.320) P(0.135)
Mick LaSalle and Jack Matthews => D(0.286) P(0.211)
Mick LaSalle and Claudia Puig => D(0.315) P(0.567)
Mick LaSalle and Lisa Rose => D(0.414) P(0.594)
Mick LaSalle and Toby => D(0.400) P(0.924)
Mick LaSalle and Gene Seymour => D(0.278) P(0.412)
Mick LaSalle and Michael Phillips => D(0.387) P(-0.258)
Claudia Puig and Jack Matthews => D(0.320) P(0.029)
Claudia Puig and Mick LaSalle => D(0.315) P(0.567)
Claudia Puig and Lisa Rose => D(0.387) P(0.567)
Claudia Puig and Toby => D(0.357) P(0.893)
Claudia Puig and Gene Seymour => D(0.282) P(0.315)
Claudia Puig and Michael Phillips => D(0.536) P(1.000)
Lisa Rose and Jack Matthews => D(0.341) P(0.747)
Lisa Rose and Mick LaSalle => D(0.414) P(0.594)
Lisa Rose and Claudia Puig => D(0.387) P(0.567)
Lisa Rose and Toby => D(0.348) P(0.991)
Lisa Rose and Gene Seymour => D(0.294) P(0.396)
Lisa Rose and Michael Phillips => D(0.472) P(0.405)
Toby and Jack Matthews => D(0.267) P(0.663)
Toby and Mick LaSalle => D(0.400) P(0.924)
Toby and Claudia Puig => D(0.357) P(0.893)
Toby and Lisa Rose => D(0.348) P(0.991)
Toby and Gene Seymour => D(0.258) P(0.381)
Toby and Michael Phillips => D(0.387) P(-1.000)
Gene Seymour and Jack Matthews => D(0.667) P(0.964)
Gene Seymour and Mick LaSalle => D(0.278) P(0.412)
Gene Seymour and Claudia Puig => D(0.282) P(0.315)
Gene Seymour and Lisa Rose => D(0.294) P(0.396)
Gene Seymour and Toby => D(0.258) P(0.381)
Gene Seymour and Michael Phillips => D(0.341) P(0.205)
Michael Phillips and Jack Matthews => D(0.320) P(0.135)
Michael Phillips and Mick LaSalle => D(0.387) P(-0.258)
Michael Phillips and Claudia Puig => D(0.536) P(1.000)
Michael Phillips and Lisa Rose => D(0.472) P(0.405)
Michael Phillips and Toby => D(0.387) P(-1.000)
Michael Phillips and Gene Seymour => D(0.341) P(0.205)
I assume some of it could be related to the relatively few data points (sometimes critics only share 2 of 5 movies.
Any other ideas?
April 20th, 2008 at 12:22 pm
As I mentioned I’ve found it instructive to hand type in the python code for this book. Also I didn’t really feel like signing up for Safari (O’Reilly’s online book library) but today I discovered that you can get a zip file of all the code!
Thanks to this post
http://blog.kiwitobes.com/?p=44
Here’s the direct link
http://kiwitobes.com/PCI_Code.zip
April 21st, 2008 at 9:02 pm
p14 -
Not a big but in “Ranking the Critics…” in the code there’s no need to reverse the full list;
Here’s the code ‘as is’;
scores.sort()
scores.reverse()
return scores[0:n]
This could simply be;
scores.sort()
return scores[-1:-(n+1):-1]
The syntax’s a little strange but it would save you reversing a big list.
I think you could also probably sort the list then just slice the parts you need off and then just reverse that new list.
April 21st, 2008 at 9:26 pm
P17 when doing the “getRecommendations()” call with “similarity=sim_distance” you will get slightly different values for the 3 movies then what’s listed (because of the previous function errors) but it’s a minimal error.
May 30th, 2008 at 12:20 pm
[...] weeks! Wow, yes it’s really been that long since I started reading Programming Collective Intelligence and last posted [...]
July 7th, 2008 at 3:40 pm
[...] you’ve seen from my previous posts, I’ve been playing with python, couchdb and been working my way through the “Programming Collective Intelligence” book. However, what I haven’t been [...]