Processing… Processing

Like most people I’ve been bitten by the “social networks take up all my blogging time” bug.

But in reality, I think that’s OK. I once read someone who said SMS is becoming the polite way of conversing, as opposed to a phone call, because it permits the recipient to respond when it’s convenient for them as opposed to when it’s convenient for the initiator.

So blogging, remains a ‘slow and steady’ system to support the hivemind; Convenient for the author, and the searcher.

With that in mind here’s a bit of code I wrote a while back to create a Processing visualization.

I’ve loved the ‘clock’ charts where data is presented around a circle, from 1 o’clock to 12 o’clock, but never had a solid reason to use one. Unlike the Sunlight Foundation’s excellent example visualizing transparency.

However, ‘recently’ (in blogger time) I had a unique opportunity at work to apply the technique to visualizing how participants interacted with a site over time.

Here’s what I came up with:

'clock' chart of activity

User Website Activity

This represents how often users interacted with our site during it’s availability period (which spanned a number of days).

I first created the data with Python, taking the time portion of each action and creating a count, which I recorded East Coast (blue) and West Coast (orange) time. You’ll notice that the data is exactly the same in intensity, simply shifted by the appropriate timezone. This is because we had no way of knowing where the users were connecting from and wanted to simply look at the overall data to discern any patterns (and my brain doesn’t have a timezone converter built in).

Looking at the data, it should be fairly obvious that:

  • An “East Coast” interepretation meant that users interacted with the site either as the first thing during their day (i.e. 8am) or during the latter part of lunch (noon and 1:00 PM).
  • Meanwhile, “West Coast” users probably responded before lunch (11) or towards the end of the day.

I was able to build upon a great example from Jer Thorp and I apologize greatly for butchering his solution. Thankfully I didn’t need anything near as complex (and didn’t have to fight Java for JSON data since I embedded my arrays directly in the code).

So in case it helps anyone else, here’s my solution with all its warts (e.g. I know the data keys are drawn twice and there’s some manual tweaking to the drawHeight for sizing):

import processing.opengl.*;

int maxVal = 0;           //Keeps track of the maximum returned value over the all terms
int localMax = 0;         //Keeps track of the maximum returned value over the each term
float drawHeight = 0.45;  //Portion of the screen height that the largest bar takes up
//.45 for hours .65 for dates
float drawWidth = .25;  //Portion of the screen height that the largest bar takes up

int center_size = 10;
int border = 1;           //Border between bars
int lastTotal = 0;        

color[] colours = { #400101, #E46D0A, #009BF1 }; //Graph Colours
color textColor1 = #D98D30;
color textColor = #333333;
color backColor = #F2F2F2;
color curColor;

int MAX_HEIGHT = 1200;
int MAX_WIDTH = 800;

// Time Shifted Data built with a separate Python script 
String[] Hours = {"00:00", "01:00", "02:00", "03:00", "04:00", "05:00", "06:00", "07:00", "08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00"};

int[] mod_est_Activity_Hours_Values = {3, 0,0,0,0,0,0, 3, 7, 9, 10, 26, 4, 7, 6, 20, 38, 10, 1, 5, 3, 2, 1, 1};
int[] mod_pdt_Activity_Hours_Values = {0,0,0,0, 3, 7, 9, 10, 26, 4, 7, 6, 20, 38, 10, 1, 5, 3, 2, 1, 1,3, 0,0};
// end generated Data

void draw() {

};

void setup() {
  PFont font = loadFont("Meta-Normal-48.vlw");
  textFont(font);

  //Set the size of the stage & set the background
  size(MAX_HEIGHT,MAX_WIDTH);
  frameRate(60);
  background(backColor);
  smooth();

  curColor = colours[1];
  drawData(Hours, mod_est_Activity_Hours_Values, Hours.length);

  curColor = colours[2];
  drawData(Hours, mod_pdt_Activity_Hours_Values, Hours.length);

  save("visual.png");
};

void drawKey(String key, float xinc, float theta) {
      float y = 0;
      float x = .5;
      String s = key;

      pushMatrix();
      translate(x + xinc/2 * drawWidth, y - (xinc * 2) * drawHeight);

      //Draw key
      rotate(-PI/2);
      fill(textColor);

      textSize(max(xinc/2, 13));
      text(s, 0, 0);

      translate(cos(theta), sin(theta));

      popMatrix();
};

void drawData(String[] keys, int[] data, int len) {
  parseData(keys, data, len);

  fill(curColor);
  float xinc = float(width)/len;

  //Move to the center of the screen
  pushMatrix();
  translate(width/2, height/4);
  noStroke();

  //Draw each value as a bar
  for (int i = 0; i < len; i++) {
    color c = color(red(curColor), green(curColor), blue(curColor), random(100,255));
    fill(c);

    float h = float(data[i])/float(maxVal);
    float theta = i * (PI / (len/2));

    translate(cos(theta) * center_size, sin(theta) * center_size);

    //Rotate
    pushMatrix();
    rotate(theta);

    //Draw the bar
    rect(0, 0, xinc * drawWidth, -h * height * drawHeight);

    drawKey(keys[i], xinc, theta);

    popMatrix();
  };

  popMatrix(); //back to original center
};

// Helps normalize the chart
void parseData(String[] keys, int[] data, int len) {
  for (int i = 0; i < len; i++) {
    if (data[i] > localMax) {
      localMax = min(data[i], 3000);
      if (localMax > maxVal) maxVal = localMax;
    };
  };
};

To make it work you’ll need to make sure the font’s installed (see Jer’s original tutorial) but it should be relatively straightforward after that.

Cheers!

Posted in code, visualization | 2 Comments

I love Chrome Cookies

OK,

I know it’s been way too long since I’ve posted and this isn’t intended as a heartfelt explanation, merely a reference for those in need.

I’ve been hacking up a storm and shoehorning my way around problems and this is no different.

If you need to use wget with a site that requires authentication, then you need to dump some cookies from Chrome (or Firefox) because they’ve moved from cookies.txt to sqlite3.

Here’s how (when typing this from the cmdline hit ^V then <Tab> for the separator, i.e. after the 1st ‘:

sqlite3 -separator '       ' Cookies 'select host_key, httponly, path, secure, expires_utc, name, value from cookies' > ~/Sites/uservice/chrome_cookies.txt

Unfortunately that didn’t work for me (not sure why wget wasn’t correctly reading the file) so I ended up using Firebug to look at the HTTP headers, and then using wget with:

wget --no-cookies --header= "Cookie: name1=v1; name2=v2; name3=v3"

It’s actually way easier than it looks to get your header correctly, just copy and paste what you see from Firebug (look at the “Net” field and have it show all communications, what you want is the 1st line sent).

I figured this out with some help but mostly by my lonesome given the differences for Chrome.

Posted in frustration, hacks | 1 Comment

Turing Test for Clouds

One of the ‘trends’ in programming is Monkey Patching which bypasses fixed static types and is used in more dynamic languages. I internalize the technique as; “if it looks like a duck, walks like a duck and quacks like a duck…. then who cares what it really is”.

Yea, as a philosophy I know it lacks nuance but it’s worked well historically so let that dog hunt.

Another important bit of geek-trivia is the famous Turing Test, if you’re here and don’t know what that is (or how to figure it out) then you should move along now, this isn’t the droid you’re looking for.

Simplified, Turing’s Test and Monkey Patching both suggest that explicit identifications aren’t practical. Rather that implicit behaviors should define the use of something. It’s a very expedient supposition that anyone who’s dealt with contracts would envy.

What’s in this for Cloud, given that NIST has done a nice job of defining cloud in practical terms?

As a buzzword, cloud’s seen more then it’s fair share of hype;

google trends for cloud computing

So everyone’s been trying to claim the moniker, and today I was reading about a ‘cloud based product’ that really was simply a web portal much like Walmart. Though I’m sure it can accurately claim to be cloud under a number of definitions, my instinct was “No, definitely not”.

However, a colleague replied to my skepticism saying; “it underlines that there are already commonplace applications in use that are legitimately ‘cloud’.”

Where do you stand on such a claim? That online shopping or market makers such as eBay are SaaS cloud services?

Underlying it all, are deep philosophical questions as integral to humanity’s future as determining where the soul resides!

  • What if I have an amazingly dynamic and responsive application, run by monkeys behind the curtain?
  • Would I be cloud computing if I used twitter via snail mail?
  • Does my subdivision’s swimming pool classify as IaaS, with its broad network-wide (i.e. roads) access, and rapid elasticity (easy capacity management) and measured service (towel charge) if there’s no lifeguard (On-demand self-service)? Surely you don’t need me to explain “resource pooling”.

Strictly speaking I’m not sure where I stand, but I think Turing would tell me to go with the duck and even a million monkeys patching the pool shouldn’t change my mind.

Posted in cloud_computing, technology | 1 Comment

couchdb coming back for more

Not that long ago, JChris pointed out that not only was there a new version of couchdb out but that Janl had released a new version of his OSX package, CouchDBX!!

So I knew I needed to find a time to try both new versions out.

‘Thankfully’, I can’t really get to sleep right now so I thought I’d try to be productive and give them both a go again with my small performance test.

And here’s the latest results.

Here’s a baseline, which if you recall loads the file from disk.

$time ./finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real    0m0.259s
user    0m0.216s
sys    0m0.041s

Now for couchdb’s results. Here’s the portion of time required for the database load:

$ time ./couchdb_finding_keywords.py
real	16m53.912s
user	2m57.409s
sys	1m35.209s

This is down quite substantially from the 28 minutes the last version tested took to load.

Rather then run the timing for the loading stage again (since it’s clearly way beyond the time required to analyze the text file), I thought I’d jump to an actual query.

Unfortunately in the process of running the real test I realized I hadn’t created the necessary views for the new database.

Then, in doing so, I made a typo in my map() function and had to wait through many, many error messages like:

OS Process :: function raised exception (ReferenceError: worse is not defined) with doc._id ############

This was certainly my fault, but it would be nice if couch could take a break from spitting out error messages and not bake my processor any further running a bad map()!

I finally was able to click off the temporary view page and found the “Stop” button.

I managed to get most of my view function squared away but then missed the quotes around the dictionary key “word”, so while it should have read:

"map": function(doc) {
    emit(doc[\"word\"], 1);
}

"reduce": function(key, value, rereduce) {
    if (rereduce) {
        return sum(value);
    }
    else {
      return value.length;
    }
}

It didn’t and the bad line came out as:

emit(doc[word], 1);

So as you can imagine, I had to do the dance all over again. This time, after I was able to stop it I went directly to the document for the design itself and edited the code there.

I know I hit the green arrow to save, but when I went back to the design view to see the results it still had the same mistake. So I corrected it there, and quickly hit ‘Save’ and then couchdbx promptly crashed on me.

After I told OSX to restart it I got:

"The application beam.smp quit unexpectedly after it was relaunched"

So yes… sometimes software and I don’t get along. What can I say, but that it makes me a great tester!

I was able to restart couchdbx though, and it seemed to load fine, and eventually got data from a browser after the view was built.

But I also got an interesting tidbit from the DBX console too:

1> [info] [<0.66.0>] 127.0.0.1 - - 'GET' /_config/native_query_servers/ 200
1> [info] [<0.86.0>] checkpointing view update at seq 92542 for keywords _design/finding
1> [error] [<0.69.0>] Uncaught error in HTTP request: {exit,normal}
1> [info] [<0.69.0>] Stacktrace: [{mochiweb_request,send,2},
             {couch_httpd,send_chunk,2},
             {couch_httpd_view,send_json_reduce_row,3},
             {couch_httpd_view,'-make_reduce_fold_funs/5-fun-1-',8},
             {couch_btree,reduce_stream_kv_node2,8},
             {couch_btree,reduce_stream_kp_node2,11},
             {couch_btree,fold_reduce,7},
             {couch_httpd_view,'-output_reduce_view/6-fun-0-',12}]
1> [info] [<0.81.0>] 127.0.0.1 - - 'GET' /keywords/_design/finding/_view/word_count?group=true 200
1>

Yep, I think I broke it yet again…

A subsequent query to:

http://localhost:5984/keywords/_design/finding/_view/word_count?group=true

Seemws to show all’s well, so I thought I’d get fancy:

wget -O - http://localhost:5984/keywords/_design/finding/_view/word_count?group=true

But when I hit Control-C to cancel the get (because I realized I hadn’t redirected output to /dev/null) I got yet another stack trace:

1> [info] [<0.126.0>] 127.0.0.1 - - 'GET' /keywords/_design/finding/_view/word_count?group=true 304
1> [error] [<0.387.0>] Uncaught error in HTTP request: {exit,normal}
1> [info] [<0.387.0>] Stacktrace: [{mochiweb_request,send,2},
             {couch_httpd,send_chunk,2},
             {couch_httpd_view,send_json_reduce_row,3},
             {couch_httpd_view,'-make_reduce_fold_funs/5-fun-1-',8},
             {couch_btree,reduce_stream_kv_node2,8},
             {couch_btree,reduce_stream_kp_node2,11},
             {couch_btree,fold_reduce,7},
             {couch_httpd_view,'-output_reduce_view/6-fun-0-',12}]

So let’s just get on with the performance test I guess…

After another DBX restart (more to be sure then anything since couchdb seems to almost enjoy dumping stack traces while still merrily marching along).

I changed my URLs:
old_url u = “http://localhost:5984/%s/_view/finding/word_count” % (db_name)

new_url = “http://localhost:5984/%s/_design/finding/_view/word_count” % (db_name)

And can now officially tell you (after one more stack traces) that:

$ time ./couchdb_finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real	0m31.559s
user	0m0.730s
sys	0m0.429s

It’s still an impressive bit of performance for the functionality, and I think I’ve clearly shown it’s fault resistance. I just wish it didn’t come at more than 100 times the cost of the flat file.

Posted in couchdb | 3 Comments

Where are the filters for Google Reader?

If I can create filters for GMail, to push notes to certain folders or automatically star things, then why can’t I create similar rules for my RSS feeds?

RSS has quickly become at least as important to me as email, so I think it deserves at least as many tools.

Posted in frustration, Google | 2 Comments