Postgresql Mac Installation

Just a quick note for Thanksgiving.

I’m doing some experimentation and thought, given my frustrations with Sun and Oracle that I’d check out PostgreSQL instead of mysql (my typical DB of choice).

Unfortunately, the Mac installer threw some issues, stating:

“The database cluster initialization failed.”

If you’re interested you can check out:

sudo less /tmp/bitrock_installer.log

And at the end you’ll probably see a failure such as:

“su: no directory”

I tried running the command manually, but got the same error and realized that’s because I set my user to “postgres” but that user doesn’t exist.

The installer tells you it will create the user, but apparently doesn’t. So if you run the initcluster.sh script as your normal user (or set the installer to use a user that exists) you should be ok.

Happy Turkey Day and hope that helps someone!!

Posted in frustration | 1 Comment

Processing… Processing

Like most people I’ve been bitten by the “social networks take up all my blogging time” bug.

But in reality, I think that’s OK. I once read someone who said SMS is becoming the polite way of conversing, as opposed to a phone call, because it permits the recipient to respond when it’s convenient for them as opposed to when it’s convenient for the initiator.

So blogging, remains a ‘slow and steady’ system to support the hivemind; Convenient for the author, and the searcher.

With that in mind here’s a bit of code I wrote a while back to create a Processing visualization.

I’ve loved the ‘clock’ charts where data is presented around a circle, from 1 o’clock to 12 o’clock, but never had a solid reason to use one. Unlike the Sunlight Foundation’s excellent example visualizing transparency.

However, ‘recently’ (in blogger time) I had a unique opportunity at work to apply the technique to visualizing how participants interacted with a site over time.

Here’s what I came up with:

'clock' chart of activity

User Website Activity

This represents how often users interacted with our site during it’s availability period (which spanned a number of days).

I first created the data with Python, taking the time portion of each action and creating a count, which I recorded East Coast (blue) and West Coast (orange) time. You’ll notice that the data is exactly the same in intensity, simply shifted by the appropriate timezone. This is because we had no way of knowing where the users were connecting from and wanted to simply look at the overall data to discern any patterns (and my brain doesn’t have a timezone converter built in).

Looking at the data, it should be fairly obvious that:

  • An “East Coast” interepretation meant that users interacted with the site either as the first thing during their day (i.e. 8am) or during the latter part of lunch (noon and 1:00 PM).
  • Meanwhile, “West Coast” users probably responded before lunch (11) or towards the end of the day.

I was able to build upon a great example from Jer Thorp and I apologize greatly for butchering his solution. Thankfully I didn’t need anything near as complex (and didn’t have to fight Java for JSON data since I embedded my arrays directly in the code).

So in case it helps anyone else, here’s my solution with all its warts (e.g. I know the data keys are drawn twice and there’s some manual tweaking to the drawHeight for sizing):

import processing.opengl.*;

int maxVal = 0;           //Keeps track of the maximum returned value over the all terms
int localMax = 0;         //Keeps track of the maximum returned value over the each term
float drawHeight = 0.45;  //Portion of the screen height that the largest bar takes up
//.45 for hours .65 for dates
float drawWidth = .25;  //Portion of the screen height that the largest bar takes up

int center_size = 10;
int border = 1;           //Border between bars
int lastTotal = 0;        

color[] colours = { #400101, #E46D0A, #009BF1 }; //Graph Colours
color textColor1 = #D98D30;
color textColor = #333333;
color backColor = #F2F2F2;
color curColor;

int MAX_HEIGHT = 1200;
int MAX_WIDTH = 800;

// Time Shifted Data built with a separate Python script 
String[] Hours = {"00:00", "01:00", "02:00", "03:00", "04:00", "05:00", "06:00", "07:00", "08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00"};

int[] mod_est_Activity_Hours_Values = {3, 0,0,0,0,0,0, 3, 7, 9, 10, 26, 4, 7, 6, 20, 38, 10, 1, 5, 3, 2, 1, 1};
int[] mod_pdt_Activity_Hours_Values = {0,0,0,0, 3, 7, 9, 10, 26, 4, 7, 6, 20, 38, 10, 1, 5, 3, 2, 1, 1,3, 0,0};
// end generated Data

void draw() {

};

void setup() {
  PFont font = loadFont("Meta-Normal-48.vlw");
  textFont(font);

  //Set the size of the stage & set the background
  size(MAX_HEIGHT,MAX_WIDTH);
  frameRate(60);
  background(backColor);
  smooth();

  curColor = colours[1];
  drawData(Hours, mod_est_Activity_Hours_Values, Hours.length);

  curColor = colours[2];
  drawData(Hours, mod_pdt_Activity_Hours_Values, Hours.length);

  save("visual.png");
};

void drawKey(String key, float xinc, float theta) {
      float y = 0;
      float x = .5;
      String s = key;

      pushMatrix();
      translate(x + xinc/2 * drawWidth, y - (xinc * 2) * drawHeight);

      //Draw key
      rotate(-PI/2);
      fill(textColor);

      textSize(max(xinc/2, 13));
      text(s, 0, 0);

      translate(cos(theta), sin(theta));

      popMatrix();
};

void drawData(String[] keys, int[] data, int len) {
  parseData(keys, data, len);

  fill(curColor);
  float xinc = float(width)/len;

  //Move to the center of the screen
  pushMatrix();
  translate(width/2, height/4);
  noStroke();

  //Draw each value as a bar
  for (int i = 0; i < len; i++) {
    color c = color(red(curColor), green(curColor), blue(curColor), random(100,255));
    fill(c);

    float h = float(data[i])/float(maxVal);
    float theta = i * (PI / (len/2));

    translate(cos(theta) * center_size, sin(theta) * center_size);

    //Rotate
    pushMatrix();
    rotate(theta);

    //Draw the bar
    rect(0, 0, xinc * drawWidth, -h * height * drawHeight);

    drawKey(keys[i], xinc, theta);

    popMatrix();
  };

  popMatrix(); //back to original center
};

// Helps normalize the chart
void parseData(String[] keys, int[] data, int len) {
  for (int i = 0; i < len; i++) {
    if (data[i] > localMax) {
      localMax = min(data[i], 3000);
      if (localMax > maxVal) maxVal = localMax;
    };
  };
};

To make it work you’ll need to make sure the font’s installed (see Jer’s original tutorial) but it should be relatively straightforward after that.

Cheers!

Posted in code, visualization | 2 Comments

I love Chrome Cookies

OK,

I know it’s been way too long since I’ve posted and this isn’t intended as a heartfelt explanation, merely a reference for those in need.

I’ve been hacking up a storm and shoehorning my way around problems and this is no different.

If you need to use wget with a site that requires authentication, then you need to dump some cookies from Chrome (or Firefox) because they’ve moved from cookies.txt to sqlite3.

Here’s how (when typing this from the cmdline hit ^V then <Tab> for the separator, i.e. after the 1st ‘:

sqlite3 -separator '       ' Cookies 'select host_key, httponly, path, secure, expires_utc, name, value from cookies' > ~/Sites/uservice/chrome_cookies.txt

Unfortunately that didn’t work for me (not sure why wget wasn’t correctly reading the file) so I ended up using Firebug to look at the HTTP headers, and then using wget with:

wget --no-cookies --header= "Cookie: name1=v1; name2=v2; name3=v3"

It’s actually way easier than it looks to get your header correctly, just copy and paste what you see from Firebug (look at the “Net” field and have it show all communications, what you want is the 1st line sent).

I figured this out with some help but mostly by my lonesome given the differences for Chrome.

Posted in frustration, hacks | 1 Comment

Turing Test for Clouds

One of the ‘trends’ in programming is Monkey Patching which bypasses fixed static types and is used in more dynamic languages. I internalize the technique as; “if it looks like a duck, walks like a duck and quacks like a duck…. then who cares what it really is”.

Yea, as a philosophy I know it lacks nuance but it’s worked well historically so let that dog hunt.

Another important bit of geek-trivia is the famous Turing Test, if you’re here and don’t know what that is (or how to figure it out) then you should move along now, this isn’t the droid you’re looking for.

Simplified, Turing’s Test and Monkey Patching both suggest that explicit identifications aren’t practical. Rather that implicit behaviors should define the use of something. It’s a very expedient supposition that anyone who’s dealt with contracts would envy.

What’s in this for Cloud, given that NIST has done a nice job of defining cloud in practical terms?

As a buzzword, cloud’s seen more then it’s fair share of hype;

google trends for cloud computing

So everyone’s been trying to claim the moniker, and today I was reading about a ‘cloud based product’ that really was simply a web portal much like Walmart. Though I’m sure it can accurately claim to be cloud under a number of definitions, my instinct was “No, definitely not”.

However, a colleague replied to my skepticism saying; “it underlines that there are already commonplace applications in use that are legitimately ‘cloud’.”

Where do you stand on such a claim? That online shopping or market makers such as eBay are SaaS cloud services?

Underlying it all, are deep philosophical questions as integral to humanity’s future as determining where the soul resides!

  • What if I have an amazingly dynamic and responsive application, run by monkeys behind the curtain?
  • Would I be cloud computing if I used twitter via snail mail?
  • Does my subdivision’s swimming pool classify as IaaS, with its broad network-wide (i.e. roads) access, and rapid elasticity (easy capacity management) and measured service (towel charge) if there’s no lifeguard (On-demand self-service)? Surely you don’t need me to explain “resource pooling”.

Strictly speaking I’m not sure where I stand, but I think Turing would tell me to go with the duck and even a million monkeys patching the pool shouldn’t change my mind.

Posted in cloud_computing, technology | 1 Comment

couchdb coming back for more

Not that long ago, JChris pointed out that not only was there a new version of couchdb out but that Janl had released a new version of his OSX package, CouchDBX!!

So I knew I needed to find a time to try both new versions out.

‘Thankfully’, I can’t really get to sleep right now so I thought I’d try to be productive and give them both a go again with my small performance test.

And here’s the latest results.

Here’s a baseline, which if you recall loads the file from disk.

$time ./finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real    0m0.259s
user    0m0.216s
sys    0m0.041s

Now for couchdb’s results. Here’s the portion of time required for the database load:

$ time ./couchdb_finding_keywords.py
real	16m53.912s
user	2m57.409s
sys	1m35.209s

This is down quite substantially from the 28 minutes the last version tested took to load.

Rather then run the timing for the loading stage again (since it’s clearly way beyond the time required to analyze the text file), I thought I’d jump to an actual query.

Unfortunately in the process of running the real test I realized I hadn’t created the necessary views for the new database.

Then, in doing so, I made a typo in my map() function and had to wait through many, many error messages like:

OS Process :: function raised exception (ReferenceError: worse is not defined) with doc._id ############

This was certainly my fault, but it would be nice if couch could take a break from spitting out error messages and not bake my processor any further running a bad map()!

I finally was able to click off the temporary view page and found the “Stop” button.

I managed to get most of my view function squared away but then missed the quotes around the dictionary key “word”, so while it should have read:

"map": function(doc) {
    emit(doc[\"word\"], 1);
}

"reduce": function(key, value, rereduce) {
    if (rereduce) {
        return sum(value);
    }
    else {
      return value.length;
    }
}

It didn’t and the bad line came out as:

emit(doc[word], 1);

So as you can imagine, I had to do the dance all over again. This time, after I was able to stop it I went directly to the document for the design itself and edited the code there.

I know I hit the green arrow to save, but when I went back to the design view to see the results it still had the same mistake. So I corrected it there, and quickly hit ‘Save’ and then couchdbx promptly crashed on me.

After I told OSX to restart it I got:

"The application beam.smp quit unexpectedly after it was relaunched"

So yes… sometimes software and I don’t get along. What can I say, but that it makes me a great tester!

I was able to restart couchdbx though, and it seemed to load fine, and eventually got data from a browser after the view was built.

But I also got an interesting tidbit from the DBX console too:

1> [info] [<0.66.0>] 127.0.0.1 - - 'GET' /_config/native_query_servers/ 200
1> [info] [<0.86.0>] checkpointing view update at seq 92542 for keywords _design/finding
1> [error] [<0.69.0>] Uncaught error in HTTP request: {exit,normal}
1> [info] [<0.69.0>] Stacktrace: [{mochiweb_request,send,2},
             {couch_httpd,send_chunk,2},
             {couch_httpd_view,send_json_reduce_row,3},
             {couch_httpd_view,'-make_reduce_fold_funs/5-fun-1-',8},
             {couch_btree,reduce_stream_kv_node2,8},
             {couch_btree,reduce_stream_kp_node2,11},
             {couch_btree,fold_reduce,7},
             {couch_httpd_view,'-output_reduce_view/6-fun-0-',12}]
1> [info] [<0.81.0>] 127.0.0.1 - - 'GET' /keywords/_design/finding/_view/word_count?group=true 200
1>

Yep, I think I broke it yet again…

A subsequent query to:

http://localhost:5984/keywords/_design/finding/_view/word_count?group=true

Seemws to show all’s well, so I thought I’d get fancy:

wget -O - http://localhost:5984/keywords/_design/finding/_view/word_count?group=true

But when I hit Control-C to cancel the get (because I realized I hadn’t redirected output to /dev/null) I got yet another stack trace:

1> [info] [<0.126.0>] 127.0.0.1 - - 'GET' /keywords/_design/finding/_view/word_count?group=true 304
1> [error] [<0.387.0>] Uncaught error in HTTP request: {exit,normal}
1> [info] [<0.387.0>] Stacktrace: [{mochiweb_request,send,2},
             {couch_httpd,send_chunk,2},
             {couch_httpd_view,send_json_reduce_row,3},
             {couch_httpd_view,'-make_reduce_fold_funs/5-fun-1-',8},
             {couch_btree,reduce_stream_kv_node2,8},
             {couch_btree,reduce_stream_kp_node2,11},
             {couch_btree,fold_reduce,7},
             {couch_httpd_view,'-output_reduce_view/6-fun-0-',12}]

So let’s just get on with the performance test I guess…

After another DBX restart (more to be sure then anything since couchdb seems to almost enjoy dumping stack traces while still merrily marching along).

I changed my URLs:
old_url u = “http://localhost:5984/%s/_view/finding/word_count” % (db_name)

new_url = “http://localhost:5984/%s/_design/finding/_view/word_count” % (db_name)

And can now officially tell you (after one more stack traces) that:

$ time ./couchdb_finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real	0m31.559s
user	0m0.730s
sys	0m0.429s

It’s still an impressive bit of performance for the functionality, and I think I’ve clearly shown it’s fault resistance. I just wish it didn’t come at more than 100 times the cost of the flat file.

Posted in couchdb | 3 Comments

Where are the filters for Google Reader?

If I can create filters for GMail, to push notes to certain folders or automatically star things, then why can’t I create similar rules for my RSS feeds?

RSS has quickly become at least as important to me as email, so I think it deserves at least as many tools.

Posted in frustration, Google | 2 Comments

Can Android Equal Apple ?

Here’s an idea for any aspiring hacker out there.

Find a way to make Android mimic an iPhone when it’s connected.

Users will gain the ability to use iTunes to sync music and podcasts (and possibly Apple Apps too if the emulation went that far).

However, more important then just leveraging a known user interface it provides an obvious migration path off of Apple’s proprietary lock-in platform.

Posted in Apple, hacks | 2 Comments

Skinned Programming Paradigms

Here’s a free thought for you.

How much of people choice in programming languages is really syntax dependent?

For example, I dislike Java (I hate it for other reasons) simply because of the verbosity of ‘System.out.println’ and don’t really understand why Scala would chose ‘println’ instead of Python’s terse use of ‘print’.

And I’m pretty sure despite overt rationalizations like ‘saving myself keystrokes’ that’s just a petty reason.

However, what I learned in compiler construction is that the parser or tokenizer is really separate from the language itself.

So, for example, there’s no reason there couldn’t be a plugin for Java that allowed me to write with python’s syntax, or vice versa. Such a technique might require a little bit of library support, but I suspect adding pythons ‘map()’ even to C/C++ would be fairly trivial.

We should be able to ‘skin’ our languages with our syntax of choice regardless of the underlying compiler, JVM or bytecode.

If this were possible, then ‘language wars’ could be less about syntax and interface (a la emacs vs. vi) and more about the underlying value of the language itself.

If we can theme operating systems and user interfaces, then why not programming languages?

Posted in code, frustration, inspiration | 5 Comments

Welcome to the White House State of Confusion

I know it’s easy to sit on the sidelines and poke fun at people trying actually do something. And we’ve been given many reasons to respect the technical proficiency of the recent administration’s IT personnel.

However, here’s an example of drop down box, from a section of the White House site, which seems frustratingly naive:

Does anyone see a problem?

For starters it’s not alphabetical, which makes finding anything atrocious!

However, beyond that I see three entries for “Departmental Administration”!

That’s what happens when you don’t sanitize your data, you can’t sanitize what you do with it!

Posted in frustration | 4 Comments

The Reciprocal World of IT and Business

Working as an Enterprise Architect, you will frequently hear how technology must support a business need. It’s a cliché, yet accurate, reminder that technologists often deploy something that doesn’t best satisfy the problem.

Although play and creativity has a place, even in business, no IT environment can long survive without supplying the business with the means to meet its objectives. There is surly no better route to bankruptcy then wasting time and money, which is what happens when IT divorces itself from the business.

However, often overlooked is the reciprocal need for the business to support IT.

This doesn’t happen when a CIO or CTO is relegated to the back office and denied a seat at the executive table and the effects are more insidious though no less disastrous in the end. Or imagine being asked to run a massive IT department with the wrong skills, or responding to a mandate for change without the ability to make the proper investments.

For any organization to succeed it’s important to realize that all offices must be imbued with the same driving passion and resources for success.

Posted in business, enterprise, management | Comments Off