Archive for the ‘Google’ Category

Where are the filters for Google Reader?

Thursday, October 29th, 2009

If I can create filters for GMail, to push notes to certain folders or automatically star things, then why can’t I create similar rules for my RSS feeds?

RSS has quickly become at least as important to me as email, so I think it deserves at least as many tools.

Who’s Really Testing Chrome?

Monday, July 6th, 2009

Just a quick gripe to share with anyone using Chrome.

For all of Chrome’s new high performance design, there’s a very simple way to bring your tabbed experience to it’s knees, Print something…

In my case it was a 100+ page PDF printed 2 pages per sheet, but I’m sure most anything of a decent size would work.

Disapopinting to say the least, since printing is supposed to be a background task, that’s why we have spooling, not take front and center stage!

Mashing up the Dashboard

Wednesday, July 1st, 2009

This post is for anyone interested in any of the Government Transparency inituatives. If you’ve been following this topic then you’re probably aware that Vivek Kundra sees a dashboard as a way of accelerating the transparency and transformation of the Government.

After watching groups like the Sunlight Foundation and Change Congress work their magic, I’ve now begun seeing much of this transformation from the inside due to my new job.

However, I still endeavor to participate externally as well and so I wanted to do some analysis on the public data.

In order to start, I wanted to import the USASpending.gov information into Google spreadsheets, since I can kill two buzzwords at once by leveraging Cloud Computing services for Transparency!

For a while, I fought with the easiest way to import the data and wanted to share what eventually worked for me.

First, create an new spreadsheet and name your tab appropriately. Then go to the USASpending Feeds page to select the specific data you want. I suggest starting with the Exhibit 300 information since it’s typically a smaller dataset and my Exhibit 53 tab with more than 1200 rows has proven to be very slow.

Next pick, and I highly suggest reordering, the data fields you want you must then pick which Agency or Agencies you’d like info on. Again, considering that lots of data will be pretty slow.

Finally, select the CSV icon which should open the download prompt for your browser. It’s unfortunate that the implementers didn’t use a dynamic tag here because you can’t simply copy the URL. Instead I had to first download the file itself and then copy the originating URL into my clipboard (I was using Chrome so how to do this will depend on your browser).

The URL should look like a much longer version of this:

http://it.usaspending.gov/customcode/build_feed.php?extype=300&select1=agencyName&columns%5B%5D=bureauName…

Now that we’ve got the URL we can import everything into our spreadsheet by selecting the A1 cell and entering:

=ImportData("<url>")

Where “<url>” is of course the long URL you copied earlier.

After a quick few seconds your data should be automajically imported!

For the Exhibit 300, things worked just great but for the Exhibit 53 data I ended up with each cell of Column A holding the full data for each entry. So in B1 I simply entered: =SPLIT(A1, “,”) (note thre’s a bug with Google where the quotes ave to be double quotes not single) and then things auto populated left to right.

Unfortunatley, the “SPLIT()” didn’t auto-populate downwards as well and dragging the function down the full B column is very very painful.

Happy Data Hacking!

Google’s Unspoken Security Vulnerability

Thursday, January 29th, 2009

Let’s be honest, I really like Google. Without them I couldn’t be as productive or near as smart as I am today. I often tell people, the most important Quotient out there isn’t Intelligence or Emotional, it’s their Google IQ. It’s OK if you don’t know it… but if you can’t find it then you’re in trouble.

I mentioned previously that Google’s browser, Chrome, fails what I consider to be an important security test, but I’ve been largely silent on another issue Google seems to have ignored.

However, I can only conclude that it’s a threat larger then we faced with GMail and should be rectified quickly.

Initially, when GMail was released it had no comprehensive security, i.e. most of the communication between you and GMail was unencrypted. Immediately, there was a outcry from the computing-public (at least those savvy enough to understand the implications) and Grease Monkey scripts were written to force an encrypted connection for all the transactions and now it’s an easily configured feature in GMail’s settings.

However, the same flaw has systematically been overlooked in Google Reader. As any ATOM / RSS convert knows, feeds have become a critical component of our computing existence and as any social network participant knows… they’re not  just for websites anymore.

Gone are the days when RSS was used simply for notifications that a public post or comment had been written. Now it’s used for some of my most intimate (at least of the digital sort) conversations. I get everything from status messages (which on Facebook aren’t as public as on twitter) to direct private messages all sent to my reader. Not only are they sent unencrypted, but even worse I’m forced to use an unencrypted connection to read them.

Historically, email was rarely encrypted on the wire when it was sent from the sender to the receiver’s email system, although that has recently changed. However, the main security concer with GMail was anyone on the same network could view the contents of their inbox as they were reading their messages!

I really don’t use email all that much anymore and instead rely on social networks and RSS notifications for the bulk of my personal communications. Which, thanks to Google Reader’s lack of an encrypted configuration, is sent free and in the clear!

I think it’s time Google acknowledges the role and responsibility that Google Reader has in people’s private lives and works to properly secrure that information.

Taking (and keeping) your temperature!

Friday, January 2nd, 2009

I swear I don’t have a penchant for medical terminology but this ioBridge stuff is making me feel like that time I stayed at a Holiday Inn… so refreshing, I think I could perform surgery!

After my heart hacks (see my previous posts) I had questions from some friends about how we could graph the data from an ioWidget’s (my term). Initially, I wanted to push the data into a Google Spreadsheet but unfortunately there doesn’t seem to be a higher level javascript library supporting Google Docs and sorting through the feed URL’s was just too complicated.

At the same time I was thinking through this request, I received a resounding response from the ioBridge team! Although I’d worked around the need for a simple API they quickly responded to the desire (apparently I wasn’t the only one with interesting ideas) and now there’s a full JSON API!

I won’t bore you with my initial graphing solution as the sample made it into their official API demo (along with some much needed code enhancements). However, there were a few pitfalls with that approach that I still didn’t like, most specifically that the data is “lost” every time you reload the page.

It took me until after the holiday break (along with a welcome return to python) but I’ve solved my initial frustration with great results.

Here’s a script which will poll my ioBridge module and then store the results of my tempreature sensor in a Google Spreadsheet that I created! Once the data’s there you can use Google’s visualization widgets to make some fun graphs!

Aside from some setup and “ease of use” code, the real work is done by two very brief classes. I deliberately didn’t add some error checking nor make the widget class generic (it’s actually proxying the full ioBridge module) so I think it should be straightforward enough to modify for your own uses!

All you need to do is create a spreadsheet and open it in your browser. Copy the key from that URL and paste it, along with your ioBridge feed URL, into the appropriate places in the script (the locations are commented).

I simply run this from a cron script every few minutes (once I get more data I’ll reduce the time) and although there’s not a lot of variation in the data (I deliberately introduced some to make the graph more interesting) it’s a spectacular way to record, visualization and act on the sensor’s findings!

Good luck with your own modifications and let me know if I can help!

Suggestions for Google Friend Connect

Monday, December 15th, 2008

I’ve been building my personal site and one of the things I’m excited about is the chance to interconnect my work with the larger social networks out there.

I believe every site should do what it’s best at and my intent isn’t to manage comments, wall posts or user signups and security. So once I got a basic life aggregator put together my next step was to integrate Google Friend Connect and see what it was all about.

The idea of Friend Connect is to let Google proxy most of the “social interactions” for you so you can simply deal with what you do best (which is likely developing useful content and interacting with other humans… not swearing over spam and comment plugins).

It’s incredibility easy to get up and running, you simply copy two HTML files to your site and let Google give it a once over. After that things are “configured” but there’s still no interface for interacting “socially” with the site.

In order do that you add a members gadget (or a smaller signon module) and ‘viola – People are now able to “join” your site! To get one of these plugins installed you use Google’s site to generate the necessary Javascript and HTML. I simply copied these codes into a “friend.html” file which is loaded via a corresponding menu item.

It’s not very difficult but you need to have a fairly straightforward “color scheme” defined and I don’t understand why “Links” and “Secondary Links” are two separate categories. I also don’t like that the CSS Style information is coded inline but I understand this makes it a single step to setup.

Nontheless it’s pretty straightforward, but it annoys me that Google makes you re-enter your color settings each and every time, i.e. not only if you regenerate one of the plugins but a new plugin is similarly “blind” as to your preferences. You also have to pick a general “size” for your widget which isn’t difficult thanks to the Web Developer’s Dislpay Ruler (under Miscellaneous).

So once it’s up and running, what are my impressions?

Well it’s certainly a neat idea but I was a bit underwhelmed. Currently, there’s only a “Wall” a “Rate and Review” widgets available and neither of my two test posts “left” the walled garden of my site. What I’d like is the ability to control the publishing of these “events” so that my twitter friends currently know if I’m active and engadged on a site!

It also wasn’t clear with the widgets how I’d solve typical “use cases”. For example currently my site’s “Wall” is only on a single page, but if you wanted to use this plugin for post comments how would that be done? How could I connect it to Akismet so I didn’t have to worry about spam filtering? How about other common features like emailing people when someone posts a folow up, or what about an RSS feed for this widget?

There doesn’t seem to be the typical wealth of developer documentation either. I haven’t yet investigated the “Lame Game Demonstration” or how to build a custom gadget. But given that it’s Google I think API’s and sample code is likely forthecoming.

Bottom line, it’s a start but given all the noise this has been making for so long I was expecting things to  be much farther along. Still get started and you’ll find yourself on the forefront of yet another Beta product from Google!

Google Chrome fails the Google incognito test

Tuesday, September 9th, 2008

There’s been a lot of talk about Google’s new Chrome browser. If you haven’t checked it out I’d recommend it from a “neat” factor but it’s less practical then upgrading to Firefox 3.

Chrome is fast and has some great features and one which I was excited about was an ability to go “incognito“. Going incognito will prevent the browser from storing cookies or you browsing history and is supposed to isolate the window as a completely separate “island” of web presence which is then “thrown away” when you close the window.

Google’s example was that when shopping you don’t want your significant other to stumble across your surprise. Although I saw suggestions of *cough* other places you could browse where less repercussions might be welcome. You can recognize this mode by the little White Spy icon from the “spy vs spy” series.

However, the site I most wanted to visit with completely private windows was Gmail! I don’t think I’m rare in having multiple email accounts and the challenge with Google is that they only let you be logged in to one account across all your sessions. While there are techniques which can mitigate this, I end up letting email languish because I don’t want to go through the – log out, log in, log out, and log back in as my primary ID – dance.

So having multiple concurrently active Gmail tabs seemed like an obvious use of incognito mode!

Alas, it’s of course not to be;

First, I created an incognito window and then logged into Gmail. So far so good, however when you open a second tab and log in with a different ID it logs you out of the first tab! That doesn’t seem to “isolated” does it?

My second thought was to create a second incognito window (since Google hasn’t been clear about the level of isolation). I noticed that this option is grayed out in the incognito window. If you go to your original “public” window and select “New incognito window” the options exists but simply opens another tab on the original incognito window (which still fails the “multiple login” test).

Obviously, this lack of true isolation surprises to me. Cookies appear to be shared across tabs and it appears you’re forced into having only one private window at time! This would be awful if you were browsing multiple sites looking for a great shopping deal, but didn’t want them to know about other sites or if you were a web tester trying to isolate cookies from test runs.

Chrome’s a work in progress and Google’s opensourceed the project, so I can only hope someone will address these concerns. However, in the meantime it pays to test your expectations and if Google really wants to make webapps more like desktop apps I think this needs to be addressed.

Can your datacenter handle this?

Wednesday, June 25th, 2008

Google recently hosted their I/O conference and, during that, a Google Fellow named Jeff Dean illuminated some of their operational measurements;

  • A single search query touches 700 to 1,000 machines in less then 0.25 seconds.
  • They currently have 36 data centers containing over 800,000 servers
    with 40 servers/rack.

That’s about 555 racks per datacenter and if a standard 19″ rack is ~61 sqft that means they’ve got 33,855 sqft of raised floor space. Which averaged (be careful of averages) over 36 datacenters is about 950 sqft of commutate space each. Which is probably much smaller then the actual sizes.

We know from experience that they use BigTable (their distributed storage service) and MapReduce (cluster computing) a lot.

  • The largest BigTable instance manages about 6 petabytes of data spread across thousands of machines.

I think 6 petabytes actually seems kind of low. Although I realize that’s about one hundred times the amount of data in the Library of Congress, it seems to me that they likely have a very large number of BigTable clusters.

  • They’ve had 29,000 MapReduce
    jobs in August 2004 and 2.2 million in September 2007 and the average time to complete a job has dropped from about 10 minutes to 6 minutes. .

That seems like an astounding increase and makes a clear statement that it’s a valid programming paradigm for data processing. One can only imagine how much their infrastructure has compounded (both in size and computing capacity) to accommodate such an increase in volume and still cut the time almost in half.

  • In a typical day will they’ll run about 100,000 MapReduce jobs each of which occupies about 400 servers.

If you take 400 servers per job times 100,000 jobs that would imply about 40M machines. I know they’re not all being run at the same time (and we know from earlier that they have about 800,000) servers but combined that suggests they’re seeing a contribution factor of ~50 ( 40M / 800K ) from each machine.

  • The data output by these MapReduce tasks has risen from 193 terabytes to 14,018 terabytes.

I’m not sure it’s valid to try to compare the data out with the data being stored since we don’t know how many BigTable instances they have running, but they’ll often recompute data instead of storing a cached copy. Their other big challenge in computing is getting the data
shuttled around the network. It also seems typical (especially in the web world) that the data you
compute from a source can be much larger then that original data. So it seems likely that Google’s found a well balanced compute cost vs. data storage tradeoff that works for them.

They also have some interesting insight into the frequency and costs of various failures for a 1 year period;
On average, for a typical cluster configuration of 1000 machines you’ll have;

  • 1000+ HD failures, 20 mini switch failures and 5 full switch failures and 1 PDU failure
  • ~1/2 will overheat, forcing a power down of most machines in <5 mins and taking ~1-2 days to recover.
  • ~1 PDU failure, ~500-1000 machines suddenly disappear and take ~6 hours to come back
  • ~1 rack-move advanced notice but ~500-1000 machines powered down and take ~6 hours to bring back up [Note this seems to contradict the 40 machines per rack statement but it may have to do with intra cluster communication links]
  • ~1 network rewiring, rolling ~5% of machines down over 2-day span
  • ~20 rack failures, 40-80 machines instantly disappear, 1-6 hours to get back
  • ~5 racks go wonky, 40-80 machines see 50% packetloss
  • ~8 network maintenances, 4 might cause ~30-minute random connectivity losses
  • ~12 router reloads, takes out DNS and external VIPs for a couple minutes
  • ~3 router failures, have to immediately pull traffic for an hour
  • ~dozens of minor 30-second blips for DNS