python & couchdb sample

Lacking any good examples of how to use python’s couchdb module, I’ve managed to make pretty impressive progress (for me) on a 4th of July holiday.

I’ll try to recreated it here for others although I know it’ll be incomplete.

Consider it a syntactical example;

import couchdb
s = couchdb.Server(‘http://localhost:5984/’) ##why can’t it default to this?
db = s[‘stock_values’]
ids = []
stock_values = {}
for doc in db:

ids.append(doc)
d = db[doc]
stock_values[d[‘symbol’]] = d[‘historical_data’]

More good examples are in the code;

http://code.google.com/p/couchdb-python/source/browse/trunk/couchdb/client.py?r=61

Posted in code, couchdb, python | 4 Comments

Quick mean python bug…

So I’ve been learning some python and love it! It’s highly functional and most of the libraries are “good enough”.

This morning I recently discovered a bug I introduced when testing… not a bug in the class, but a bug in the way Python works against expectations (even if they are just my own).

Let’s say I have a list of stock symbols;

stocks = (‘GOOG’, ‘IBM’, ‘JDSU’, ‘HPQ’, ‘GE’, ‘MSFT’, ‘BUD’, ‘DIS’)

It’s quite a collection I know and ideally I want to do some complicated operations on each;

for sym in stocks:

If we call things the way they are now;

>>> stocks = (‘GOOG’, ‘IBM’, ‘JDSU’, ‘HPQ’, ‘GE’, ‘MSFT’, ‘BUD’, ‘DIS’)
>>> for sym in stocks:
…     print sym

GOOG
IBM
JDSU
HPQ
GE
MSFT
BUD
DIS
>>>

That code’s working so far, so let’s go on to debug our function; do_stock_stuff()

It’s wasteful to worry about looping through all the stocks so we’ll simply make a single pass;

>>> stocks = (“GOOG”)
>>> for sym in stocks:
…     do_stock_stuff(sym)

So do you see the bug?

I didn’t and my “do_stock_stuff” procedure actually worked. However it got called 4 times… with “G”, “O”, “O”, “G”.

The counter intuitive fix?

>>> stocks = (“GOOG”,)
>>> for sym in stocks:
…     do_stock_stuff(sym)

Can you tell the difference? It’s simply adding a ‘,’ at the end!

This ocurred because I’m using a “touple” type when I should just use a list;

>>> stocks = [“GOOG”]
>>> for s in stocks:
…     print s

GOOG

This ability to test expectations interactively is a great aspect of programming python. But for anyone who says it matches exactly how they think is thinking about the wrong things i.e. syntax over function (don’t get me started on the ‘:’).

Posted in python | Comments Off on Quick mean python bug…

Can your datacenter handle this?

Google recently hosted their I/O conference and, during that, a Google Fellow named Jeff Dean illuminated some of their operational measurements;

  • A single search query touches 700 to 1,000 machines in less then 0.25 seconds.
  • They currently have 36 data centers containing over 800,000 servers
    with 40 servers/rack.

That’s about 555 racks per datacenter and if a standard 19″ rack is ~61 sqft that means they’ve got 33,855 sqft of raised floor space. Which averaged (be careful of averages) over 36 datacenters is about 950 sqft of commutate space each. Which is probably much smaller then the actual sizes.

We know from experience that they use BigTable (their distributed storage service) and MapReduce (cluster computing) a lot.

  • The largest BigTable instance manages about 6 petabytes of data spread across thousands of machines.

I think 6 petabytes actually seems kind of low. Although I realize that’s about one hundred times the amount of data in the Library of Congress, it seems to me that they likely have a very large number of BigTable clusters.

  • They’ve had 29,000 MapReduce
    jobs in August 2004 and 2.2 million in September 2007 and the average time to complete a job has dropped from about 10 minutes to 6 minutes. .

That seems like an astounding increase and makes a clear statement that it’s a valid programming paradigm for data processing. One can only imagine how much their infrastructure has compounded (both in size and computing capacity) to accommodate such an increase in volume and still cut the time almost in half.

  • In a typical day will they’ll run about 100,000 MapReduce jobs each of which occupies about 400 servers.

If you take 400 servers per job times 100,000 jobs that would imply about 40M machines. I know they’re not all being run at the same time (and we know from earlier that they have about 800,000) servers but combined that suggests they’re seeing a contribution factor of ~50 ( 40M / 800K ) from each machine.

  • The data output by these MapReduce tasks has risen from 193 terabytes to 14,018 terabytes.

I’m not sure it’s valid to try to compare the data out with the data being stored since we don’t know how many BigTable instances they have running, but they’ll often recompute data instead of storing a cached copy. Their other big challenge in computing is getting the data
shuttled around the network. It also seems typical (especially in the web world) that the data you
compute from a source can be much larger then that original data. So it seems likely that Google’s found a well balanced compute cost vs. data storage tradeoff that works for them.

They also have some interesting insight into the frequency and costs of various failures for a 1 year period;
On average, for a typical cluster configuration of 1000 machines you’ll have;

  • 1000+ HD failures, 20 mini switch failures and 5 full switch failures and 1 PDU failure
  • ~1/2 will overheat, forcing a power down of most machines in <5 mins and taking ~1-2 days to recover.
  • ~1 PDU failure, ~500-1000 machines suddenly disappear and take ~6 hours to come back
  • ~1 rack-move advanced notice but ~500-1000 machines powered down and take ~6 hours to bring back up [Note this seems to contradict the 40 machines per rack statement but it may have to do with intra cluster communication links]
  • ~1 network rewiring, rolling ~5% of machines down over 2-day span
  • ~20 rack failures, 40-80 machines instantly disappear, 1-6 hours to get back
  • ~5 racks go wonky, 40-80 machines see 50% packetloss
  • ~8 network maintenances, 4 might cause ~30-minute random connectivity losses
  • ~12 router reloads, takes out DNS and external VIPs for a couple minutes
  • ~3 router failures, have to immediately pull traffic for an hour
  • ~dozens of minor 30-second blips for DNS
Posted in datacenter, enterprise, Google | Comments Off on Can your datacenter handle this?

You must be 38 or younger to view this post

Working at a large technology company I’m familiar with the “graying” of IT. While often public perspective on “technology” is skewed by the Kevin Rose‘s of the world in enterprise situations it’s often much different.

It’s not uncommon to start a job as the only “new hire” around, surrounded by people who’ve been working in their respective fields for 20-30 years. It’s an intimidating position to be in, necessitating a certain type of individual, and I’ve seen many people make that transition (or transition out).

I’ve heard that you can live a thousand lifetimes through books, but I’ve lived at least that many years through the stories of my colleagues. My first officemate could disassemble HEX in his head faster then I could look up mnemonics and I’ve learned about life, as well as IT, from him and many since.

The phrase “There’s a lot of history here” has a particular place in my field and those who don’t learn from the history of others are doomed to repeat it.

However, I have felt at times that the “oldsters” could afford to let some of us “young’ens” have a chance. I don’t mean to imply they should “step aside”, simply provide better opportunities for “us” to learn and try. Learning involves making mistakes but often there’s not enough of a “penalty free” environment in day to day office politics. Slate has a business perspective on this situation though their view of age-ism is the inverse of mine.

I sometimes worry we’re creating a void, where those “too young” won’t be qualified (i.e. have the same opportunities and experience of their predecessors) to take over from those who will be retired in 5-10 years. I think the rise of the “still going” businessperson is probably one of the factors driving the shifts in innovation and entrepreneurship we’re seeing today.

A few weeks ago, during dinner, I expressed this feeling to a colleague who’s been in the business a long time, predominantly on the sales side. What I got was one of those tidbits of history and insight that makes me appreciate the wisdom of the years. He looked at me and in effect said “you’ll be fine” but what convinced me the most was what he said next;

We’ve had some rough years and back when it got really rough and all the talent had left, they threw us green guys out in the field. And you know what? You learn, you learn real fast.

Sink or swim, trial by fire… sometimes I wish life didn’t have to be so binary, but the reminder that no true opportunity can every really be cushioned is priceless.

Posted in business, career, enterprise, inspiration, management, social | 1 Comment

How speed pitching ends up as slow pitch softball

Every so often, an article on “the elevator pitch” comes along where you’re supposed to present your product in 30 seconds or less… five sentences or less… two eye blinks or less….

I don’t want to add specific examples, because I don’t want to single out any one example, but you’ve certainly seen this advice multiple times for multiple scenarios… resumes, product pitches, confessional booths, marriage proposals…

I’m sure when I first heard the suggestion my brain did something like;

Hmm… that’s a neat idea…

Well, yea I can really see the befit to being succinct…. and I know I hate it when people “broadcast only”

I really learn by getting a chance to ask questions…

So, I guess once they’ve gotten a good summary they’re smart enough to see the value on their own..

Then they’re hooked and will want to learn more!!

Although I’m a planner by profession and have a consumer nature, I’m naturally a “less talk more do” kinda person. So this aligns with the “optimization” and slight east coast mentality that I have.

So the faith in this advice has permeated my career for many years now. I’ve practiced giving succinct status statements (I also know rambling can get you in trouble) and I’m usually able to answer technical questions with something succinct and satisfactory like “yes, I’ll make that happen”.

Thus today’s revelation has come as a bit of a shock when I realized the advice I’ve been following all these years is very misleading and often downright wrong!

In such an information dense dialog what should be a rapid give and take has, in retrospect, degenerated into “take and move on” with you losing out. Often I’ve delivered such a fantastically succinct statement that it’s apparently left the exec, customer, or finance speechless.

It’s a clear alternative for you to consider my possible ineptitude and I admit to not being a natural sales man, often expecting facts and my passion to speak for itself. So think back to a time where you’ve epitomized this approach and how often the person on the other end was left speechless, uncertain where to continue.

I’m sure there was one of those pauses where you were expecting them to ask a question or say something. You had the next reply ready, only the chance never came and you were forced to continue as though you’d only meant to take that awkward break.

My experience has shown me that too often the person on the other end is scared looking stupid, and that without something for them to “grasp” in the verbal discourse they resort to Mark Twain’s old advice that keeping silent is the wisest thing to do. As another counter example consider meetings, where it’s often the person who talks the most who’s given credit for being the expert.

I saw this illustrated clearly while working at a tradeshow last week. My instinct was to give a quick pitch and then answer questions to help explore their understanding. However, the recipient of my “wisdom” wasn’t certain how to continue the conversation and had I not continued with “trivialities” they would have left with no continued interest beyond a “thanks”.

I’m certainly not advocating a dialog of dysentery, however I believe there’s a bit of human psychology that’s at play here which people “elevator presenters” overlook;

  • People are far more likely to forgive you for telling them something they know then for making them feel like idiots.
  • There’s also a level of repetition required for people to intuit and internalize information. Repeating information through variation is a powerful tactic shunned in the 60 second pitch.
  • Even if the person on the other end has already been told “this is fast” saying “the obvious” is a chance for them to judge your passion and authenticity.

That’s just a start, but I think there’s many reasons why you should take as long as you’ve got, to say as much as you can. Just as Web 2.0 focuses on being “feature stingy” rather then “feature rich” the new 2.0 way to pitch your plan is in through a conversation, not a soundbight.

You clearly need to think out how to say what you want to say, this is no excuse to not prepare. However, thin-slicing aside, no matter what power tie you’re wearing, there’s too much competition for you to expect people to get hooked by 60 seconds of information.

Posted in business, career, marketing | 2 Comments