Archive for the ‘frustration’ Category

Where are the filters for Google Reader?

Thursday, October 29th, 2009

If I can create filters for GMail, to push notes to certain folders or automatically star things, then why can’t I create similar rules for my RSS feeds?

RSS has quickly become at least as important to me as email, so I think it deserves at least as many tools.

Skinned Programming Paradigms

Monday, October 19th, 2009

Here’s a free thought for you.

How much of people choice in programming languages is really syntax dependent?

For example, I dislike Java (I hate it for other reasons) simply because of the verbosity of ‘System.out.println’ and don’t really understand why Scala would chose ‘println’ instead of Python’s terse use of ‘print’.

And I’m pretty sure despite overt rationalizations like ’saving myself keystrokes’ that’s just a petty reason.

However, what I learned in compiler construction is that the parser or tokenizer is really separate from the language itself.

So, for example, there’s no reason there couldn’t be a plugin for Java that allowed me to write with python’s syntax, or vice versa. Such a technique might require a little bit of library support, but I suspect adding pythons ‘map()’ even to C/C++ would be fairly trivial.

We should be able to ’skin’ our languages with our syntax of choice regardless of the underlying compiler, JVM or bytecode.

If this were possible, then ‘language wars’ could be less about syntax and interface (a la emacs vs. vi) and more about the underlying value of the language itself.

If we can theme operating systems and user interfaces, then why not programming languages?

Welcome to the White House State of Confusion

Monday, September 28th, 2009

I know it’s easy to sit on the sidelines and poke fun at people trying actually do something. And we’ve been given many reasons to respect the technical proficiency of the recent administration’s IT personnel.

However, here’s an example of drop down box, from a section of the White House site, which seems frustratingly naive:

Does anyone see a problem?

For starters it’s not alphabetical, which makes finding anything atrocious!

However, beyond that I see three entries for “Departmental Administration”!

That’s what happens when you don’t sanitize your data, you can’t sanitize what you do with it!

Links as Code

Tuesday, July 14th, 2009

John Willis’ “Infrastructure as Code” should be a startling epiphany for anyone who has long neglected process and people in favor of technological solutions. Yet, I hope anyone here doesn’t need convincing about the validity of institutionalizing the collective knowledge

However, I wonder about a critical level of infrastructure maintenance that seems to be missed, document maintenance.

I’m sure everyone has experienced the frustration of reading a document with an invalid URL, but is this be accepted?

Should documents not be kept in repositories as well? Why not take the same proactive approach to maintaining links as we do “Not breaking the Build” when programming?

So why doesn’t your team have a utility to scan internal documents and links when they propose changing page structures, before they’re made live?

Who’s Really Testing Chrome?

Monday, July 6th, 2009

Just a quick gripe to share with anyone using Chrome.

For all of Chrome’s new high performance design, there’s a very simple way to bring your tabbed experience to it’s knees, Print something…

In my case it was a 100+ page PDF printed 2 pages per sheet, but I’m sure most anything of a decent size would work.

Disapopinting to say the least, since printing is supposed to be a background task, that’s why we have spooling, not take front and center stage!

Mashing up the Dashboard

Wednesday, July 1st, 2009

This post is for anyone interested in any of the Government Transparency inituatives. If you’ve been following this topic then you’re probably aware that Vivek Kundra sees a dashboard as a way of accelerating the transparency and transformation of the Government.

After watching groups like the Sunlight Foundation and Change Congress work their magic, I’ve now begun seeing much of this transformation from the inside due to my new job.

However, I still endeavor to participate externally as well and so I wanted to do some analysis on the public data.

In order to start, I wanted to import the USASpending.gov information into Google spreadsheets, since I can kill two buzzwords at once by leveraging Cloud Computing services for Transparency!

For a while, I fought with the easiest way to import the data and wanted to share what eventually worked for me.

First, create an new spreadsheet and name your tab appropriately. Then go to the USASpending Feeds page to select the specific data you want. I suggest starting with the Exhibit 300 information since it’s typically a smaller dataset and my Exhibit 53 tab with more than 1200 rows has proven to be very slow.

Next pick, and I highly suggest reordering, the data fields you want you must then pick which Agency or Agencies you’d like info on. Again, considering that lots of data will be pretty slow.

Finally, select the CSV icon which should open the download prompt for your browser. It’s unfortunate that the implementers didn’t use a dynamic tag here because you can’t simply copy the URL. Instead I had to first download the file itself and then copy the originating URL into my clipboard (I was using Chrome so how to do this will depend on your browser).

The URL should look like a much longer version of this:

http://it.usaspending.gov/customcode/build_feed.php?extype=300&select1=agencyName&columns%5B%5D=bureauName…

Now that we’ve got the URL we can import everything into our spreadsheet by selecting the A1 cell and entering:

=ImportData("<url>")

Where “<url>” is of course the long URL you copied earlier.

After a quick few seconds your data should be automajically imported!

For the Exhibit 300, things worked just great but for the Exhibit 53 data I ended up with each cell of Column A holding the full data for each entry. So in B1 I simply entered: =SPLIT(A1, “,”) (note thre’s a bug with Google where the quotes ave to be double quotes not single) and then things auto populated left to right.

Unfortunatley, the “SPLIT()” didn’t auto-populate downwards as well and dragging the function down the full B column is very very painful.

Happy Data Hacking!

Could a couchdb guru explain this, please?

Friday, March 13th, 2009

I’m in the process of trying to build (and benchmarking) a couchdb project and I decided to use some word count & frequency samples as data. Since “word count” and “grep” are the quintessential map/reduce examples I thought this would be fairly simple.

However, couchdb doesn’t seem to be following the expected semantics.

Let’s say I’ve got some data, here’s how it looks in python;

>>> import couchdb
>>> s = couchdb.Server()
>>> db = s['kw2']
>>> for d in db: print db[d]
...
<Document '133da883092e206d7191f81661beb813'@'3188228489' {'word': 'ho'}>
<Document '2287406943e627278d98a3a2f3d3483b'@'634745217' {'word': 'do'}>
<Document '2717deb4df8ba09601166021fb758126'@'2083376980' {'word': 'mo'}>
<Document '38d48e8e069538a55902dd2d2b7e1771'@'2475366164' {'word': 'ho'}>
<Document '39ef4a9e3eb0eeb02d483ce658d08356'@'2904312995' {'word': 'hi'}>
<Document '4237064ad7a89fa11e9bbbc8ca4ed302'@'722283984' {'word': 'do'}>
<Document '4d0e61dedaf2af93a9d4d261cab696de'@'996995145' {'word': 'we'}>
<Document '55ba96501ed1e9573b2cb6e647c35b47'@'3153984663' {'word': 'my'}>
<Document '5be13ca69c76d202b131d50f5b9c1ecb'@'1584030189' {'word': 'do'}>
<Document '612e4a0d32f4c91f7fb2414e4de47845'@'3488016124' {'word': 'be'}>
<Document '61426c868dc388e6edb2b4ce2078ce06'@'2761346180' {'word': 'me'}>
<Document '908acaf4ad704951dbb08d27ddfbe9a9'@'941727127' {'word': 'mo'}>
<Document '9136e093fda2dda7d5585983299fcbc7'@'4166962206' {'word': 'mo'}>
<Document '9decb25944110c04d040feb31e532c78'@'1016718857' {'word': 'do'}>
<Document 'ad7f4aab329d55c3a2fb97390df5ae0a'@'1660663052' {'word': 'my'}>
<Document 'c4d976a789e37e1c3eb4d57bd50d47aa'@'923287257' {'word': 'my'}>
<Document 'cccf15515077d100498573fe40244130'@'3846996388' {'word': 'hi'}>
<Document 'd747a88eb2cb18776237852aceff96fc'@'3596694550' {'word': 'we'}>
<Document 'dc115f5d42d442f0b5e7d3680aeb62c2'@'3446491946' {'word': 'to'}>

Feel free to add your own but that’s what I’ve got. Each doc has a simple structure, an “_id” (supplied by couchdb when the document is created) and an element called “word” which obviously contains some fabricated two letter structures (which I hesitate to actually call words).

What’s important to note is that the same word may appear in multiple documents.

Now we want to build a view to show each word as well as the sum of how many times it appears in our database.

Again, following the classic paradigm we build our map function (in javascript) as such;

function(doc) {
  emit(doc["word"], 1);
}

So far so good, now reduce;

function(key, value, rereduce) {
   if (rereduce) {
      return sum(value);
   }
   else {
      return value.length;
   }
}

You can pretty much ignore the “rereduce” clause as our dataset’s not big enough right now, nor are we updating it. However, I will mention explain the function’s trick which is that while sum(value) is actually the “mathematically correct” action to take regardless of whether this is our first time through, we’re relying on the fact that since we’re emitting a “1″ for each key (i.e. each word instance) that the sum of those values is simply the length of the array we’re passed in. [I learned this from one of the masters]

Ok, despite the attempt at “premature optimization” this actually seems to work out, or at least it looks to when shown in the couchdb key/value view. Here’s my screenshot for proof;

picture-21

However, what I see from a direct URL query to this view is markedly different then the data that’s represented. To test this either use Firefox or a command line client like curl and go to the following url;

http://localhost:5984/kw2/_view/finding/word_count

What I see (and I suspect you will as well) is

{"rows":[{"key":null,"value":19}]}

Which seems to break our expected key/value pairing!!!

Suspecting my understanding of couchdb’s map/reduce representation has been occluded by all the Google videos I’ve watched, it seems like an intuitive modification might be to change our reduce function to return the key & and the value, like this;

return [key, value];

However, that yields an even more shocking outcome;

{"rows":[{"key":null,"value":[[["we","d747a88eb2cb18776237852aceff96fc"],["we","4d0e61dedaf2af93a9d4d261cab696de"],["to","dc115f5d42d442f0b5e7d3680aeb62c2"],["my","c4d976a789e37e1c3eb4d57bd50d47aa"],["my","ad7f4aab329d55c3a2fb97390df5ae0a"],["my","55ba96501ed1e9573b2cb6e647c35b47"],["mo","9136e093fda2dda7d5585983299fcbc7"],["mo","908acaf4ad704951dbb08d27ddfbe9a9"],["mo","2717deb4df8ba09601166021fb758126"],["me","61426c868dc388e6edb2b4ce2078ce06"],["ho","38d48e8e069538a55902dd2d2b7e1771"],["ho","133da883092e206d7191f81661beb813"],["hi","cccf15515077d100498573fe40244130"],["hi","39ef4a9e3eb0eeb02d483ce658d08356"],["do","9decb25944110c04d040feb31e532c78"],["do","5be13ca69c76d202b131d50f5b9c1ecb"],["do","4237064ad7a89fa11e9bbbc8ca4ed302"],["do","2287406943e627278d98a3a2f3d3483b"],["be","612e4a0d32f4c91f7fb2414e4de47845"]],19]}]}

Of course I’m still baffled as to why we seem to have no entry set for key and all our rows as values.

However, my larger concern is beyond even that perplexing situation;

What’s most surprising here is that the key we’re being passed includes the doc id even though it was not emitted as part of our map phase!

Let’s give it one last go here, thinking perhaps we need to be more explicit;

function(key, value, rereduce) {
   if (rereduce) {
      return sum(value);
   }
   else {
      return {"key": key[0],"value": value.length};
   }
}

Unfortunately, this seems to still not yield the organized rows we expected and returns;

{"rows":[{"key":null,"value":{"key":["we","d747a88eb2cb18776237852aceff96fc"],"value":19}}]}

Which stands in high contrast to what couchdb continues to show us;

picture-11

So whatever we emit from reduce ends up as the value part of the reply (as indexed by “value”). Which matches our original expectation (that couchdb will handles setting this based) but doesn’t explain why it’s “null”.

In short I’m left with three questions;

1) Why does couchdb pass our reduce function the doc ID, when it’s not emitted in the map phase!

2) Why is “key” null in our output?

3) How do we get our JSON output to match the same pretty key/value representation that couchdb shows?

I wish I could promise that if you tune in next time I’ll have the answers but we’ll have to rely on the good nature of our experts out there to help us out.

Google’s Unspoken Security Vulnerability

Thursday, January 29th, 2009

Let’s be honest, I really like Google. Without them I couldn’t be as productive or near as smart as I am today. I often tell people, the most important Quotient out there isn’t Intelligence or Emotional, it’s their Google IQ. It’s OK if you don’t know it… but if you can’t find it then you’re in trouble.

I mentioned previously that Google’s browser, Chrome, fails what I consider to be an important security test, but I’ve been largely silent on another issue Google seems to have ignored.

However, I can only conclude that it’s a threat larger then we faced with GMail and should be rectified quickly.

Initially, when GMail was released it had no comprehensive security, i.e. most of the communication between you and GMail was unencrypted. Immediately, there was a outcry from the computing-public (at least those savvy enough to understand the implications) and Grease Monkey scripts were written to force an encrypted connection for all the transactions and now it’s an easily configured feature in GMail’s settings.

However, the same flaw has systematically been overlooked in Google Reader. As any ATOM / RSS convert knows, feeds have become a critical component of our computing existence and as any social network participant knows… they’re not  just for websites anymore.

Gone are the days when RSS was used simply for notifications that a public post or comment had been written. Now it’s used for some of my most intimate (at least of the digital sort) conversations. I get everything from status messages (which on Facebook aren’t as public as on twitter) to direct private messages all sent to my reader. Not only are they sent unencrypted, but even worse I’m forced to use an unencrypted connection to read them.

Historically, email was rarely encrypted on the wire when it was sent from the sender to the receiver’s email system, although that has recently changed. However, the main security concer with GMail was anyone on the same network could view the contents of their inbox as they were reading their messages!

I really don’t use email all that much anymore and instead rely on social networks and RSS notifications for the bulk of my personal communications. Which, thanks to Google Reader’s lack of an encrypted configuration, is sent free and in the clear!

I think it’s time Google acknowledges the role and responsibility that Google Reader has in people’s private lives and works to properly secrure that information.

My least favorite part of JSON…

Thursday, December 4th, 2008

I love how simple the JSON spec is. I never enjoyed reading through all the XML closures, etc. JSON just feels more programming so you don’t have to shift your brain as much as you do with XML.

However, I hate that ” is the only quoting character you can use. I’ve come to love python’s equal tolerance for ‘c’ and “c”.

I like that JSON’s simple but wish it was simply more accommodating.

Not to be inflamatory…

Tuesday, September 16th, 2008

I don’t want to join in with another one of those “me too” complaints…

However, it seems to me like we might have money money for Social Security if the Government wasn’t spending what they had bailing out all these banks.