Archive for the ‘enterprise’ Category

The Reciprocal World of IT and Business

Monday, September 21st, 2009

Working as an Enterprise Architect, you will frequently hear how technology must support a business need. It’s a cliché, yet accurate, reminder that technologists often deploy something that doesn’t best satisfy the problem.

Although play and creativity has a place, even in business, no IT environment can long survive without supplying the business with the means to meet its objectives. There is surly no better route to bankruptcy then wasting time and money, which is what happens when IT divorces itself from the business.

However, often overlooked is the reciprocal need for the business to support IT.

This doesn’t happen when a CIO or CTO is relegated to the back office and denied a seat at the executive table and the effects are more insidious though no less disastrous in the end. Or imagine being asked to run a massive IT department with the wrong skills, or responding to a mandate for change without the ability to make the proper investments.

For any organization to succeed it’s important to realize that all offices must be imbued with the same driving passion and resources for success.

Links as Code

Tuesday, July 14th, 2009

John Willis’ “Infrastructure as Code” should be a startling epiphany for anyone who has long neglected process and people in favor of technological solutions. Yet, I hope anyone here doesn’t need convincing about the validity of institutionalizing the collective knowledge

However, I wonder about a critical level of infrastructure maintenance that seems to be missed, document maintenance.

I’m sure everyone has experienced the frustration of reading a document with an invalid URL, but is this be accepted?

Should documents not be kept in repositories as well? Why not take the same proactive approach to maintaining links as we do “Not breaking the Build” when programming?

So why doesn’t your team have a utility to scan internal documents and links when they propose changing page structures, before they’re made live?

Can your datacenter handle this?

Wednesday, June 25th, 2008

Google recently hosted their I/O conference and, during that, a Google Fellow named Jeff Dean illuminated some of their operational measurements;

  • A single search query touches 700 to 1,000 machines in less then 0.25 seconds.
  • They currently have 36 data centers containing over 800,000 servers
    with 40 servers/rack.

That’s about 555 racks per datacenter and if a standard 19″ rack is ~61 sqft that means they’ve got 33,855 sqft of raised floor space. Which averaged (be careful of averages) over 36 datacenters is about 950 sqft of commutate space each. Which is probably much smaller then the actual sizes.

We know from experience that they use BigTable (their distributed storage service) and MapReduce (cluster computing) a lot.

  • The largest BigTable instance manages about 6 petabytes of data spread across thousands of machines.

I think 6 petabytes actually seems kind of low. Although I realize that’s about one hundred times the amount of data in the Library of Congress, it seems to me that they likely have a very large number of BigTable clusters.

  • They’ve had 29,000 MapReduce
    jobs in August 2004 and 2.2 million in September 2007 and the average time to complete a job has dropped from about 10 minutes to 6 minutes. .

That seems like an astounding increase and makes a clear statement that it’s a valid programming paradigm for data processing. One can only imagine how much their infrastructure has compounded (both in size and computing capacity) to accommodate such an increase in volume and still cut the time almost in half.

  • In a typical day will they’ll run about 100,000 MapReduce jobs each of which occupies about 400 servers.

If you take 400 servers per job times 100,000 jobs that would imply about 40M machines. I know they’re not all being run at the same time (and we know from earlier that they have about 800,000) servers but combined that suggests they’re seeing a contribution factor of ~50 ( 40M / 800K ) from each machine.

  • The data output by these MapReduce tasks has risen from 193 terabytes to 14,018 terabytes.

I’m not sure it’s valid to try to compare the data out with the data being stored since we don’t know how many BigTable instances they have running, but they’ll often recompute data instead of storing a cached copy. Their other big challenge in computing is getting the data
shuttled around the network. It also seems typical (especially in the web world) that the data you
compute from a source can be much larger then that original data. So it seems likely that Google’s found a well balanced compute cost vs. data storage tradeoff that works for them.

They also have some interesting insight into the frequency and costs of various failures for a 1 year period;
On average, for a typical cluster configuration of 1000 machines you’ll have;

  • 1000+ HD failures, 20 mini switch failures and 5 full switch failures and 1 PDU failure
  • ~1/2 will overheat, forcing a power down of most machines in <5 mins and taking ~1-2 days to recover.
  • ~1 PDU failure, ~500-1000 machines suddenly disappear and take ~6 hours to come back
  • ~1 rack-move advanced notice but ~500-1000 machines powered down and take ~6 hours to bring back up [Note this seems to contradict the 40 machines per rack statement but it may have to do with intra cluster communication links]
  • ~1 network rewiring, rolling ~5% of machines down over 2-day span
  • ~20 rack failures, 40-80 machines instantly disappear, 1-6 hours to get back
  • ~5 racks go wonky, 40-80 machines see 50% packetloss
  • ~8 network maintenances, 4 might cause ~30-minute random connectivity losses
  • ~12 router reloads, takes out DNS and external VIPs for a couple minutes
  • ~3 router failures, have to immediately pull traffic for an hour
  • ~dozens of minor 30-second blips for DNS

You must be 38 or younger to view this post

Tuesday, June 24th, 2008

Working at a large technology company I’m familiar with the “graying” of IT. While often public perspective on “technology” is skewed by the Kevin Rose’s of the world in enterprise situations it’s often much different.

It’s not uncommon to start a job as the only “new hire” around, surrounded by people who’ve been working in their respective fields for 20-30 years. It’s an intimidating position to be in, necessitating a certain type of individual, and I’ve seen many people make that transition (or transition out).

I’ve heard that you can live a thousand lifetimes through books, but I’ve lived at least that many years through the stories of my colleagues. My first officemate could disassemble HEX in his head faster then I could look up mnemonics and I’ve learned about life, as well as IT, from him and many since.

The phrase “There’s a lot of history here” has a particular place in my field and those who don’t learn from the history of others are doomed to repeat it.

However, I have felt at times that the “oldsters” could afford to let some of us “young’ens” have a chance. I don’t mean to imply they should “step aside”, simply provide better opportunities for “us” to learn and try. Learning involves making mistakes but often there’s not enough of a “penalty free” environment in day to day office politics. Slate has a business perspective on this situation though their view of age-ism is the inverse of mine.

I sometimes worry we’re creating a void, where those “too young” won’t be qualified (i.e. have the same opportunities and experience of their predecessors) to take over from those who will be retired in 5-10 years. I think the rise of the “still going” businessperson is probably one of the factors driving the shifts in innovation and entrepreneurship we’re seeing today.

A few weeks ago, during dinner, I expressed this feeling to a colleague who’s been in the business a long time, predominantly on the sales side. What I got was one of those tidbits of history and insight that makes me appreciate the wisdom of the years. He looked at me and in effect said “you’ll be fine” but what convinced me the most was what he said next;

We’ve had some rough years and back when it got really rough and all the talent had left, they threw us green guys out in the field. And you know what? You learn, you learn real fast.

Sink or swim, trial by fire… sometimes I wish life didn’t have to be so binary, but the reminder that no true opportunity can every really be cushioned is priceless.

How traditional IT skills are becoming irrelevant

Wednesday, June 4th, 2008

I hope those who know me wouldn’t peg me as an alarmist. So take my title with a grain of salt but also, because of that same optimism, with a sense of sobriety.

I’ve followed “cloud computing” for a while (before it was called that), most often in the context of Amazon. From my position, it’s been really interesting to see the growth and dead-ends of this shift. And although in some ways it represents an outside disruptive force for my job, in others it’s a technology and mindset I’m trying to drive internally and externally.

My analogy for my job is that I help design, edit and publish “books” but never write one of my own, so some of my perspectives are gleaned second hand without the heat and intensity of battle. Yet, I’m also keen to learn from other’s failures (and successes) so I do my best to leverage the examples others provide.

SmugMug is a photo sharing site that’s been a big champion (and occasional critic) of Amazon’s services and despite seeing their use of them as a competitive advantage they’ve been very open about their practices. Recently they described how they’ve built a very successful workflow around these concepts and I think you should give it a read.

There’s a tangible shift in computing that I don’t think has been felt in more traditional environments. Certainly enterprise IT is used to hearing fads fall to the floor, anyone remember “The Mainframe is dead”? But it’s also very easy to point to successful companies like SmugMug and claim they’re not enterprise players.

However, consider Amazon (or Google) and remember they don’t just provide this stuff for fun. It’s what they themselves use internally for their “day jobs” and that it’s because of these same services and not in spite of them, that they’ve reached their current heights.

Time will only tell if they can hold these lofty positions, but my belief is that the future’s in the clouds.

Amazon Overview

Tuesday, March 18th, 2008

If you’ve read my earlier posts you know I spend a bit of time following Amazon, both from a business perspective as well as my interest in the energy they’ve invested in webservice (SaaS) technologies.

I recently gave a presentation to discuss their offerings and wanted to make that available to anyone interested.

I build presentations that can also act as “guidebooks” once the discussion is over, i.e. the presentation interests someone in the topic, but the charts should also be useful as a starting point for their own experience. Thus I’ve included links and citations for the various sections. It may seem a little overwhelming when you’re just paging through but it seems to work well for my presentation style and my typical audience.

I always find it interesting to compare and contrast my experience with a presentation given verbally vs. paging through the deck later. In an engaging conversation, some of the more interesting and thought provoking dialogs revolve around a single bullet point. However, when paging through a deck you’re often drawn to “examples” which are really for a reference or to substantiate a divergent discussion.

I’m most interested in the “implications and extrapolation” phase of a presentation as opposed to ones that review the “what and why” of an activity.

I hope you’ll find this interesting and helpful and if there’s any parts I can help elaborate on please let me know.

The Titanic is not a SaaS model

Thursday, February 7th, 2008

I used to to have many great discussions with a coworker, Dan. I think (hope) we both learned a lot from each other, because even though we have similar “technical philosophies” we approached problems differently, akin to the “Generalist vs Specialist” debates.

Though Dan and I don’t get to have face to face discussions anymore, we still frequently trade links and thanks to technology it doesn’t seem too hard to continue the development of our insights.

He recently sent me an article about architecting defensive SaaS deployments , which is something we’ve talked a lot about in the past. The article proposes a good analogy but I think makes a mistake in equating SaaS architectures as the “extreme” end of that analogy, i.e. large cruise ships.

In my experience, large heavily-defensive deployments are more analogous to a mainframe environments. Sinclair seems to overlook this larger extreme and only contrast SaaS with a localized service model.

In my experience although SaaS lies somewhere in between a truly distributed model and a truly centralized model it’s not really fulfilling this role in the same ways, i.e. it’s not a differently sized ship.

SaaS architectures are more like getting around Europe. You’ve got planes, trains, automobiles and yes, even boats. Each transport has it’s own qualities of service, it’s own pros and cons.

It’s not a defensive posture that makes SaaS successful but rather the flexibility in choice.

What I do for a living…

Wednesday, November 28th, 2007

As the year draws to a close, I’ve been busy trying to complete multiple projects concurrently. However, a few months ago I purchased an iPhone and it’s proven to be very effective at productively filling the “in between” time.

I only owned an nano previously and never used it for much beyond the gym. However, with the iPhone I have been able to watch many interesting presentations such as the Google Tech Talks or listen to Standard lectures on iTunes.

Two of the more interesting sessions have been Dan Pritchett on the Architecture at eBay and Cuong Do
Cuong on the amazing s
calability efforts of YouTube
.

In the mornings, I wake up early and read web articles at home before work. Yesterday, I stumbled across the blog of Nati Shalom of GigaSpaces, in particular his summary of the Qcon conference. I had never heard of Qcon before but a quick look reveals a track list like an enterprise CV.

  • Architecture Quality – Modifiability, Product Lines, Latency, Performance and Scalability, Architecture
    Patterns
  • Banking Architectures – Real time, STP, Messaging, AMQP, SEPA, MiFID, Front office
  • Connecting SOA and the Web: How much REST do we need? – REST & SOA, Internet Scale Integration, REST & WS Myths

This reminded me of another convention I recently became aware of, O’Reilly’s first ever Money:Tech converence.

I can honestly summarize the work I’m a part of by saying we make IT infrastructures like these or bigger, happen.

The good news is that I don’t manage these architectures long term or on a day to day basis. Primarily, I’m part of a team that designs (pen and paper exercise), builds (install and configure a proof of concept) or tests (benchmark) environments that other businesses will use, and maintain, in their production environments.

Although there are feedback loops, the bad news is I sometimes feel out of sync with reality, and a touch ADD since I’m moving between architectures and technologies so rapidly. However, the more difficult problem is that I often work twice as hard to make sure a “thought exercise”, which someone’s going to use for their business, will be what is needed even, or perhaps especially, when they’re not sure what those needs are.

Although my company has plenty of internal conferences, it’s pretty safe to assume that what I do doesn’t make the tracks at most of these external conferences. It’s kind of like the difference between gyms and country clubs, you don’t hear much about the latter but they’re out there.

It’s probably also certain that what I do will never serve you a funny spoof or some video of a cute boy/girl booty dancing, but I work for institutions that house your money or your medical records and it’s a safe bet that what it lacks in glamor it makes up for in success.

While we’re not as vocal about what we do as eBay or YouTube, I feel confidant that our clients haven’t endured the endless cycle of sleepless nights of the YouTube team (watch the video), or suffered the numerous downtime issues for which eBay become known.

Maybe we should make a bigger effort to be present at these conferences, I’d like to believe I could effect that change. I’ve learned a lot just by watching the after effects, and I believe sharing with the same community would make us all more successful.