Archive for the ‘code’ Category

Skinned Programming Paradigms

Monday, October 19th, 2009

Here’s a free thought for you.

How much of people choice in programming languages is really syntax dependent?

For example, I dislike Java (I hate it for other reasons) simply because of the verbosity of ‘System.out.println’ and don’t really understand why Scala would chose ‘println’ instead of Python’s terse use of ‘print’.

And I’m pretty sure despite overt rationalizations like ’saving myself keystrokes’ that’s just a petty reason.

However, what I learned in compiler construction is that the parser or tokenizer is really separate from the language itself.

So, for example, there’s no reason there couldn’t be a plugin for Java that allowed me to write with python’s syntax, or vice versa. Such a technique might require a little bit of library support, but I suspect adding pythons ‘map()’ even to C/C++ would be fairly trivial.

We should be able to ’skin’ our languages with our syntax of choice regardless of the underlying compiler, JVM or bytecode.

If this were possible, then ‘language wars’ could be less about syntax and interface (a la emacs vs. vi) and more about the underlying value of the language itself.

If we can theme operating systems and user interfaces, then why not programming languages?

Cloudera’s Hadoop Education

Sunday, June 14th, 2009

A while back, after Cloudera released their lectures and VMware image for Hadoop, I watched the training sessions and worked through some of the initial exercises.

I must say I was a little disappointed by the videos but I believe that’s because I’d seen Christophe Bisciglia’s lectures when he was still at Google.

However, the exercises are definitely something to get you thinking and are worth giving a shot. It’s sort of like ‘programming golf‘ and I thought I’d share my version of the first map function vs. the packaged solution.

Here’s my map function

import sys, re
WORDS = re.compile(r'(\w+)')
PARSER = re.compile('(.+?)\t(.+?)\n')

for input in sys.stdin.readlines():
m = PARSER.match(input)
if m:
    key = m.groups()[0]
    for word in WORDS.findall(m.groups()[1]):
        print "%s\t%s" % (word, key)

Cloudera’s version is:

import re
import sys
NONALPHA = re.compile("\W")

for input in sys.stdin.readlines():
    keyline = input.split("\t", 1)
    if (len(keyline) == 2):
        (key, line) = keyline
        for w in NONALPHA.split(line):
            if w:
                print w + "\t" + key

By definition they should produce the same output, i.e. the mappings should be identical, and barring buggy corner cases mine certainly passed the test.

What I found interesting was my instinctual desire to let regexps do the work, whereas their version relies on a simple “split()” to sort the input. It’s likely a faster solution and given the massive amounts of data for large data passes, it’s worth benchmarking.

However, although I’m clearly biased, I must admit I found mine easier to grok and should be more flexible, e.g. perhaps the input pattern could become a parameter rather then hard-coded into the flow.

There’s certainly not a “right” way to do it, other then one that works. The advantage of the MapReduce model is that the necessary code is often really really short and easy to modify but I thought others might find it interesting to realize that perl doesn’t have an exclusive license on ‘TMTOWTDI

CouchDB Performance – Too much TCP

Sunday, May 31st, 2009

It’s been a while since I ran my CouchDB performance test, but many of the comments I received suggested that updating my codebase should yield some significant performance improvements. Unfortunately, at the time I didn’t have spare cycles to invest in building the latest branches of erlang, couchdb and everything else, so I hadn’t previously been able to rerun my tests.

However, I started a new project today and, like most developers, I took some time to sharpen my tools before I felt sufficiently prepared to proceed. Of course since one of my favorite tools is CouchDB itself I checked in to see how it had been progressing and I was thrilled to see Janl, and it looks like others have contributed, had released a new version of the excellent DBX bundle!

So after a round of updating DBX and CouchDB python library components, I decided to suffer a small distraction and give the new code a test drive.
I wanted to check my baseline, so here’s a rough time sample for the original, file based, keywords code:

time ./finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real    0m0.329s
user    0m0.225s
sys    0m0.046s

I ran the initial load and it looks much the same as the previous test:

time ./couchdb_finding_keywords.py

real    28m16.430s
user    2m55.550ssys    1m30.335s

So perhaps around 20% faster, though on a second test run this actually took more than 39 minutes!

Well, now that the load is out of the way, let’s see how are our queries are looking.

Well after making a view, the results with wget aren’t any more promising then last time (note the view location has changed):

wget -O - http://localhost:5984/keywords/_design/finding/_view/word_count?group=true > /dev/null
--20:35:08--  http://localhost:5984/keywords/_design/finding/_view/word_count?group=true
           => `-'
Resolving localhost... 127.0.0.1, ::1, fe80::1
Connecting to localhost|127.0.0.1|:5984... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]

    [                     <=>                        ] 422,776       12.54K/s             

20:35:40 (12.94 KB/s) - `-' saved [422776]

Alas, it doesn’t look like most of the performance improvements have really paid off for this testcase, in fact every run I tried was slower then last version.
Here’s a sample run which is fairly indicative of the rest:

time ./couchdb_finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real	0m52.659s
user	0m0.702s
sys	0m0.441s

And again, with more of the full debugging info:

time ./couchdb_finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]
>>>---- Begin profiling print
         788 function calls (709 primitive calls) in 51.297 CPU seconds

   Ordered by: internal time, call count
   List reduced from 118 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2   51.092   25.546   51.092   25.546 socket.py:278(read)
        2    0.049    0.024    0.049    0.024 decoder.py:320(raw_decode)
        1    0.040    0.040    0.040    0.040 ic.py:182(__contains__)
       14    0.036    0.003    0.036    0.003 socket.py:321(readline)
        1    0.020    0.020    0.020    0.020 couchdb_finding_keywords.py:61(build_prob_dict)
        1    0.011    0.011    0.019    0.019 ic.py:1()
        1    0.011    0.011   51.297   51.297 couchdb_finding_keywords.py:68(find_keyword)
        8    0.008    0.001    0.008    0.001 :1(connect)
        1    0.007    0.007    0.066    0.066 urllib.py:1296(getproxies_internetconfig)
        1    0.005    0.005   51.228   51.228 couchdb_finding_keywords.py:41(all_word_count)
        1    0.003    0.003    0.069    0.069 urllib.py:1329(getproxies)
        1    0.003    0.003    0.003    0.003 Res.py:1()
        1    0.002    0.002    0.002    0.002 File.py:1()
        1    0.001    0.001    0.002    0.002 macostools.py:5()
        2    0.001    0.001    0.001    0.001 socket.py:229(close)
        2    0.001    0.000    0.002    0.001 httplib.py:659(connect)
        2    0.001    0.000    0.005    0.002 httplib.py:224(readheaders)
     11/6    0.001    0.000    0.001    0.000 sre_parse.py:385(_parse)
        1    0.000    0.000    0.000    0.000 ic.py:161(__init__)
        2    0.000    0.000    0.000    0.000 httplib.py:323(__init__)

>>>---- End profiling print

real	0m52.023s
user	0m0.732s
sys	0m0.437s

I’m no erlang expert but seeing that many socket calls makes me still suspect that some TCP level tuning (window size & buffering) might be helpful.

As a final note, I did a database compaction and reran the query which helped significantly compared to the worse case 0.9 time but at best only matched 0.8.

time ./couchdb_finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real	0m34.605s
user	0m0.687s
sys	0m0.394s

You can find the test script, with changes to work with the slightly different view URL’s, and if you’d like to recreate the test all you will need to do (beyond setting up couchdb) is:

  1. Swap the comments on the last two lines to run “load_db()”
  2. Create a map / reduce wordcount view
  3. Change the “view_url” parameter on line 25
  4. Invert the comments again to just run “find_keyword()”

A simple twitter library in python

Wednesday, April 29th, 2009

I’ve been working on a project built on Google App Engine and I’m relying on twitter to mediate some of the interaction with my end users.

What I find great about the growing prevalence of social interfaces is that I don’t have to focus predominately on coding an interface and with so many clients my users can interact with in whatever way is most appropriate, i.e. from a mobile phone or a desktop client.

Unfortunately, the standard python-twitter library doesn’t readily run under GAE because of some library issues. Originally, I was looking at providing some code changes for it but it’s a spiderweb of more abstractions then I think the problem deserves.

In the process of building my own library I found out that Avinash had figured out how to setup the authentication properly so I built upon his work and added a few other functions I needed.

We all know about twitter’s growing popularity so I thought I’d share my version as well in case it proved helpful to anyone. Twitter provides a great mechanism to decouple your interface from your backend code and I hope to see many more smart systems to come!

CouchDB Performance or Use a File

Thursday, March 19th, 2009

If this is the first post you’ve read from my blog you should probably go check some others and assert for yourself that I’m a big fan of couchDB.

Even if it wasn’t easy to be impressed by Damon Katz’s, it would be hard to overlook the interest his code has created. If even those miracles weren’t enough for you, then just look to what other amazing minds have done. Finally, for the truly skeptical there’s now a business you can contact.

There are quite a few things that make it an amazing piece of engineering, which includes its simplicity of purpose, something you don’t often get a chance to appreciate these days. I’m a big fan of pipes, lots of individual pieces doing their dedicated task, and to me the MapReduce model epitomizes that behavior.

Still, there’s a lot I’ve taken on faith, since I haven’t been able to dedicate weeks to it’s internals instead trying to leverage it for projects. One of those traits has been the assumption of performance.

Truly, it’s more then just an assumption. Reports are that couchDB’s performance is already quite decent and it’s not even been tuned, so I’ve never attempted to benchmark it’s behavior.

Instead I’ve been working on some language processing code for my wife. Learning about NLP has aligned with my A.I. background, although it’s reminded me about all the math I’ve forgotten!

And reading samples and feeling like I’d gotten my legs underneath me I decided to “port” a nice little example over to couchDB. If you want to play along at home then you’ll want to check out the article and grab his code (and keywords2.txt file).

Although the code is geared more for education then performance it still runs fairly snappy on my laptop, running with some pretty consistent times;

time ./finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]

real	0m0.286s
user	0m0.228s
sys	0m0.047s

I’ve also run it with some performance sampling but let’s stick with simple timing for now.

There’s about 124,580 “words” in the text file;

>>> key_file = open("keywords2.txt")
>>> data = key_file.read()
>>> words = data.split()
>>> len(words)
124580

This data is then used to create a word frequency count and is generated each time the program is run, not a bad 0.2 seconds worth of work!

Naturally, having static text data and supplementing this original data with some derived “data structures” (like total count, or a view showing each word and the number of times it appears), is a perfect case for couchDB.

So I decided to simply load this data right into couchDB. Here’s how I did this, skipping details like creating the database itself;

def load_db():
    key_file = open('keywords2.txt')
    data = key_file.read()
    words = data.split()
    for word in words:
        node = db.create( { "word": word } )

You’d expect this to take some time, databases provide valuable services but of course can only do so at the expense of some cycles. However, I was surprised to find out this took almost 30 minutes!

time ./couchdb_finding_keywords.py 

real	27m2.356s
user	2m35.921s
sys	1m14.478s

Based on the low user and sys times you can guess most of the delay is due to transport overhead, i.e. network communication. This is all going to a couchdb running on localhost, a MacBook with 4G RAM and an Intel 2.0 GHz Core 2 Duo, so it’s a bit surprising but not really critical.

I didn’t bother running this three times and taking an average. The keywords2.txt file should already be in memory having been read by the file backed example. Nor is upfront cost a big consideration for me, I’m willing to spend the time once especially if it can save me work on the backend!

So naturally I was pretty excited port things over to a more couchDB / pythonic example and here’s what I came up with. After you load your data you then need a view, which you can get from my previous post, along with jChris’ helpful comment. Note, if this is your first time with this stuff (or even if it isn’t) you may want to practice on a smaller database first!!

Next we’ll need some code to get this data, and while I highly recommend the fantastic couchdb-python library for the rest of my examples I’ll use JSON & urllib to remove a layer of indirection.

Here’s how we can get the overall word count (used to calculate relative frequencies);

def total_word_count(word):
    try:
        u = "http://localhost:5984/%s/_view/finding/word_count" % (db_name)
        j = simplejson.loads(urllib.urlopen(u).read())
        # Sample Output: {"rows":[{"key":null,"value":19}]}
        return j['rows'][0]['value']
    except:
        return 0

We can do the same thing with “?group=true” in our URL to get the individual words each with their respective count. Here’s some code and a contrived bit of output to serve as our sample;

def all_word_count():
    try:
        u = "http://localhost:5984/%s/_view/finding/word_count?group=true" % (db_name)
        ### Example Output: {"rows":[{"key":"be","value":1},{"key":"do","value":4},{"key":"to","value":1},{"key":"we","value":2}]}
        j = json.loads(urllib.urlopen(u).read())
        return j['rows']
    except:
        return [{}]

Now what is a bit problematic from this (vs the original example) is that we’re actually getting a long list of dictionaries instead of one dictionary, but we can convert this to a full word frequency dictionary and end up on equal footing again all at the same time.

def build_prob_dict(word_list, total_words):
    num = float(total_words)
    try:
        return dict([ (r['key'], r['value'] / num ) for r in word_list])
    except:
        return {}

So that should get us the rest of the way. Here’s the relevant excerpt from the new script vs the original:

def find_keyword(test_string = None):
    if not test_string:
        test_string = 'Hacker news is a good site while Techcrunch not so much'
    word_prob_dict = build_prob_dict(all_word_count(), total_word_count())
    non_exist_prob = min(word_prob_dict.values()) / 2.0
    #... everything blow should function unchanged

OK, so how does this fair? Well let’s give it a try;

time ./couchdb_finding_keywords.py
[('Hacker', 249160.0), ('Techcrunch', 249160.0)]
real    0m33.878s
user    0m0.692s
sys    0m0.408s

Ouch… this is after the view had been generated, by multiple calls (and thus cached), by couchDB. If you look at some more detailed numbers you can see that the bulk of the delay is again spend in socket calls. Even downloading the view results via wget is painful at ~ 11.8 KB/s vs. ~163 MB/s when serving a static file with the results via apache.

Here’s an interesting tidbit from a more detailed profiling;

   ncalls  tottime  percall  cumtime  percall      filename:lineno(function)
      2     32.437   16.218   32.437  16.218        socket.py:278(read)

I know the team has not focused on tuning couchDB, and I’ve read lots of anecdotal evidence that erlang is fast for computation especially on multicore systems, but my hope is they can get the transport layer working quickly as well!

As a final curiosity I’d love it if couchDB supported queries from STDIN!! Think about the piping fun you could have you could insert couchDB as part of your bash pipe! I also wouldn’t have to worry about adding another network server to my hosted service!

Did I mess up here? Can someone try this and tell me if they get similar results?

Could a couchdb guru explain this, please?

Friday, March 13th, 2009

I’m in the process of trying to build (and benchmarking) a couchdb project and I decided to use some word count & frequency samples as data. Since “word count” and “grep” are the quintessential map/reduce examples I thought this would be fairly simple.

However, couchdb doesn’t seem to be following the expected semantics.

Let’s say I’ve got some data, here’s how it looks in python;

>>> import couchdb
>>> s = couchdb.Server()
>>> db = s['kw2']
>>> for d in db: print db[d]
...
<Document '133da883092e206d7191f81661beb813'@'3188228489' {'word': 'ho'}>
<Document '2287406943e627278d98a3a2f3d3483b'@'634745217' {'word': 'do'}>
<Document '2717deb4df8ba09601166021fb758126'@'2083376980' {'word': 'mo'}>
<Document '38d48e8e069538a55902dd2d2b7e1771'@'2475366164' {'word': 'ho'}>
<Document '39ef4a9e3eb0eeb02d483ce658d08356'@'2904312995' {'word': 'hi'}>
<Document '4237064ad7a89fa11e9bbbc8ca4ed302'@'722283984' {'word': 'do'}>
<Document '4d0e61dedaf2af93a9d4d261cab696de'@'996995145' {'word': 'we'}>
<Document '55ba96501ed1e9573b2cb6e647c35b47'@'3153984663' {'word': 'my'}>
<Document '5be13ca69c76d202b131d50f5b9c1ecb'@'1584030189' {'word': 'do'}>
<Document '612e4a0d32f4c91f7fb2414e4de47845'@'3488016124' {'word': 'be'}>
<Document '61426c868dc388e6edb2b4ce2078ce06'@'2761346180' {'word': 'me'}>
<Document '908acaf4ad704951dbb08d27ddfbe9a9'@'941727127' {'word': 'mo'}>
<Document '9136e093fda2dda7d5585983299fcbc7'@'4166962206' {'word': 'mo'}>
<Document '9decb25944110c04d040feb31e532c78'@'1016718857' {'word': 'do'}>
<Document 'ad7f4aab329d55c3a2fb97390df5ae0a'@'1660663052' {'word': 'my'}>
<Document 'c4d976a789e37e1c3eb4d57bd50d47aa'@'923287257' {'word': 'my'}>
<Document 'cccf15515077d100498573fe40244130'@'3846996388' {'word': 'hi'}>
<Document 'd747a88eb2cb18776237852aceff96fc'@'3596694550' {'word': 'we'}>
<Document 'dc115f5d42d442f0b5e7d3680aeb62c2'@'3446491946' {'word': 'to'}>

Feel free to add your own but that’s what I’ve got. Each doc has a simple structure, an “_id” (supplied by couchdb when the document is created) and an element called “word” which obviously contains some fabricated two letter structures (which I hesitate to actually call words).

What’s important to note is that the same word may appear in multiple documents.

Now we want to build a view to show each word as well as the sum of how many times it appears in our database.

Again, following the classic paradigm we build our map function (in javascript) as such;

function(doc) {
  emit(doc["word"], 1);
}

So far so good, now reduce;

function(key, value, rereduce) {
   if (rereduce) {
      return sum(value);
   }
   else {
      return value.length;
   }
}

You can pretty much ignore the “rereduce” clause as our dataset’s not big enough right now, nor are we updating it. However, I will mention explain the function’s trick which is that while sum(value) is actually the “mathematically correct” action to take regardless of whether this is our first time through, we’re relying on the fact that since we’re emitting a “1″ for each key (i.e. each word instance) that the sum of those values is simply the length of the array we’re passed in. [I learned this from one of the masters]

Ok, despite the attempt at “premature optimization” this actually seems to work out, or at least it looks to when shown in the couchdb key/value view. Here’s my screenshot for proof;

picture-21

However, what I see from a direct URL query to this view is markedly different then the data that’s represented. To test this either use Firefox or a command line client like curl and go to the following url;

http://localhost:5984/kw2/_view/finding/word_count

What I see (and I suspect you will as well) is

{"rows":[{"key":null,"value":19}]}

Which seems to break our expected key/value pairing!!!

Suspecting my understanding of couchdb’s map/reduce representation has been occluded by all the Google videos I’ve watched, it seems like an intuitive modification might be to change our reduce function to return the key & and the value, like this;

return [key, value];

However, that yields an even more shocking outcome;

{"rows":[{"key":null,"value":[[["we","d747a88eb2cb18776237852aceff96fc"],["we","4d0e61dedaf2af93a9d4d261cab696de"],["to","dc115f5d42d442f0b5e7d3680aeb62c2"],["my","c4d976a789e37e1c3eb4d57bd50d47aa"],["my","ad7f4aab329d55c3a2fb97390df5ae0a"],["my","55ba96501ed1e9573b2cb6e647c35b47"],["mo","9136e093fda2dda7d5585983299fcbc7"],["mo","908acaf4ad704951dbb08d27ddfbe9a9"],["mo","2717deb4df8ba09601166021fb758126"],["me","61426c868dc388e6edb2b4ce2078ce06"],["ho","38d48e8e069538a55902dd2d2b7e1771"],["ho","133da883092e206d7191f81661beb813"],["hi","cccf15515077d100498573fe40244130"],["hi","39ef4a9e3eb0eeb02d483ce658d08356"],["do","9decb25944110c04d040feb31e532c78"],["do","5be13ca69c76d202b131d50f5b9c1ecb"],["do","4237064ad7a89fa11e9bbbc8ca4ed302"],["do","2287406943e627278d98a3a2f3d3483b"],["be","612e4a0d32f4c91f7fb2414e4de47845"]],19]}]}

Of course I’m still baffled as to why we seem to have no entry set for key and all our rows as values.

However, my larger concern is beyond even that perplexing situation;

What’s most surprising here is that the key we’re being passed includes the doc id even though it was not emitted as part of our map phase!

Let’s give it one last go here, thinking perhaps we need to be more explicit;

function(key, value, rereduce) {
   if (rereduce) {
      return sum(value);
   }
   else {
      return {"key": key[0],"value": value.length};
   }
}

Unfortunately, this seems to still not yield the organized rows we expected and returns;

{"rows":[{"key":null,"value":{"key":["we","d747a88eb2cb18776237852aceff96fc"],"value":19}}]}

Which stands in high contrast to what couchdb continues to show us;

picture-11

So whatever we emit from reduce ends up as the value part of the reply (as indexed by “value”). Which matches our original expectation (that couchdb will handles setting this based) but doesn’t explain why it’s “null”.

In short I’m left with three questions;

1) Why does couchdb pass our reduce function the doc ID, when it’s not emitted in the map phase!

2) Why is “key” null in our output?

3) How do we get our JSON output to match the same pretty key/value representation that couchdb shows?

I wish I could promise that if you tune in next time I’ll have the answers but we’ll have to rely on the good nature of our experts out there to help us out.

How to build Couchdb on Dreamhost

Wednesday, March 4th, 2009

As you know from many of my entries I’m a big fan of couchdb, and if you’re interested you should really be following janl, jchris and lethain as they push this technology forward.

As you might also guess from my earlier post I’m working to build and install it on Dreamhost, another thing I support enthusiastically.

Unfortunately, being on the outer fringe of technology meant I wasn’t able to get them to install it for me, but that’s completely understandable. Given that the current package release has no Auth support (I believe the repository builds do but that would have required more software installs) if I were supporting a multi-user production environment it might make me a little nervous too.

However, to in order to continue my interests it’s a major component so I wanted to give it a shot. I don’t have it up and running 100% right now (it appears to have run though I can’t connect) but I wanted to document to build side of things before I forgot :D

So here’s the rundown;

I was fortunate to follow some excellent advice about getting Django up and running on Dreamhost. It advised that you setup a “~/run” directory to install all your add-on software too and these steps below will build on that existing environment.

First you need to download some software, I needed to get; Erlang, SpiderMonkey, ICU & CouchDB.

I downloaded all my files into “~/repo” but wherever you like to store them will be fine (“~/software for example). Now create a temp directory and unpack everything, replacing the paths (and potentially filenames if you picked different versions) as appropriate.

mkdir ~/tmp && cd ~/tmp
tar zxf ~/repo/otp_src*
tar zxf ~/repo/js-1.7.0*
tar zxf ~/repo/icu4c*
tar zxf ~/repo/apache*

I will show the build commands in the order I did them but as long as you save CouchDB for the final step (naturally) then I think you should be fine. Though it’s important to realize I didn’t get this right first time through so I did have some partial installs at times.

cd js/src
make -f Makefile.ref BUILD_OPT=1 JS_DIST=$RUN
cp *.h $RUN/include/js
cd Linux_All_OPT.OPJ
cp jsproto.tbl jsautocfg.h $RUN/include/js
cp libjs.so $RUN/lib

Now for Erlang:

cd ~/tmp/otp_src*
./configure --prefix=$RUN --enable-smp-support --enable-threads --enable-hipe
make && make install

Next, Unicode Support (ICU):

cd ~/tmp/icu/source
./configure --prefix=$RUN
make && make install

And finally, CouchDB!

./configure --prefix=$RUN --with-js-lib=$RUN/lib --with-js-include=$RUN/include/js --with-erlang=$RUN/lib/erlang/usr/include
make && make install

Once that’s completed I was able to run “couchdb” and see the famous “Apache CouchDB has started. Time to relax.” !!!

Unfortunately, running “couchdb -s” in another window tells me that “Apache CouchDB is not running.” :(

However, I suspect that’s an easier issue for Dreamhost to help me with then building everything from source!

Building CouchDB

Wednesday, March 4th, 2009

This is a quick technical post for posterity, if you’re not interested I won’t be offended if you leave now.

I’m working to try to get couchdb on my hosting provider, Dreamhost. They’ve been a great service for me so far, but understandably this isn’t something they’re yet ready to support. So rather then a few apt-get’s I’m building it from source, which means erlang and spidermonkey.

I’ve got those two dependencies build (I hope) but I kept hitting an error with couchdb’s configure script complaining that it couldn’t find the libraries, despite me setting;

./configure --prefix=$RUN --with-js-lib=/home/wjhuie/run/lib/ --with-js-include=/home/wjhuie/run/include/js

Note, since I don’t have root or sudo access that I use a ~/run directory;

export RUN=/home/wjhuie/run

This is so I can isolate my installed software from the system stuff. In configuring couchdb it was able to find libjs.so but complained about not being able to find jsapi.h. In the end the error isn’t that it couldn’t find it but that the header wasn’t able to compile successfully.

I searched far and wide with out much luck (and having seen others with similar problems) but in the end I realized I was missing some required files. Since the instructions from the couchdb wiki weren’t very useful;

http://wiki.apache.org/couchdb/Installing_SpiderMonkey

I had followed this, more helpful, advice;

http://avidemux.org/admWiki/index.php?title=Compile_SpiderMonkey

However, the commands listed needs to be modified slightly to include two files generated during the build, jsproto.tbl and jsautocfg.h;

So for the record (in case it helps someone else someday) I built my spidermonkey like this;

make -f Makefile.ref BUILD_OPT=1 JS_DIST=$RUN

Then installed it with;

cp *.h /usr/include/js
cd Linux_All_OPT.OPJ
cp jsproto.tbl jsautocfg.h $RUN/include/js
cp libjs.so $RUN/lib

A good way to get a hint on the problem is to create a sample “program” and try to compile it. To do this, create a file called “test.c” with the contents;

#include </home/wjhuie/run/include/jsapi.h>
void main()
{ }

Then try to compile it with;

make test 2>&1 | less

That should allow you to see what’s going on and once that works then couchdb’s configure test should also! Now I have to sort out the unicode library requirements!

twitterline

Friday, February 20th, 2009

I’ve been using a lot of python, jQuery and web services recently and thought it was time to pull together those skills into a public app.

Like most developers, I often “scratch my own itch” and write code to solve a problem or learn something. I try to post what I can but some solutions are hosted internally and I know there are numerous code fragments scattered across my hard drives which haven’t made it into posts.

Many of these projects are “evolutionary dead ends” but I think it’s important to engage in “purposeful play” without anticipating success or failure. You really have to take time to nurture your childlike creativity and it’s often in these limitless exercises that we develop the foundation for real breakthroughs for more “respected” works.

I was reminded of this recently when I watched a presentation by Aaron Koblin on some of his creative works. His compositions are stunning and while I recall noticing many of those projects independently over time, it was seeing the evolution of his portfolio that really inspired me.

If you watch the video you can see how his work went from a type of deliberate play to having a full “application”. It’s a lesson I try to perpetually embody with a “just do it” attitude, and it’s rewarding to see someone having applied it with such success.

So in that vein, I decided to clean up one of my sites and pull together a lot of these components into something “useful”. I call it “twitterline”, because “twitterbar” may be more descriptive but doesn’t roll off the tongue as well. You can see an example and get a pretty good idea of what it’s used for.

The API is “RESTful” and is simply “http://twitterline.shelv.us/twitterline” followed by your twitter ID, e.g. “/wjhuie” and the number of days that you’d like to graph, e.g. “/4″. I’ve limited the number between [1, 14] and if you don’t supply a number the default is 7, which all make for reasonable defaults.

However, beyond just looking at the bar graph on my site, you should be able to embed it wherever you wish! You can check the source on my example, mostly you’ll need to make sure jQuery and jQuery.Flot are embedded first and you’ll likely want to tweak the CSS. Just let me know if you need help to it up and running or if you’d like some different defaults.

It’s intended to be a simple culmination of a more complex process (which I’ll blog more on later) but I hope it inspires you to dust off a project of your own or start a new one!

Build your own ioGun!

Friday, January 16th, 2009

I apologize for what will effectively be a brain dump post, but a new friend of mine from the hackaday forums is getting started on his own accelerometer controlled system and I wanted to see if I could save him some time and frustration.

I think standing on the shoulders of Giants is a fantastic aspect of human nature, now if only someone could show me to some friendly Cyclopi (is that really the right pluralization?), and frankly I could not exist, let along thrive in the technological world if it were for the good graces of a great many people.

Perhaps I will write a _very long_ blog thank-you to them (if I can remember them all). So consider this my little chance at saving some of you a few hours of your precious time (and brain strain). Because of the very nature of hacking (i.e. everything’s just a little different and throw together) this can’t quite be the same quality as an instructable but it should get some of you started!

I’ve put up my code here so that’s quite obviously the place to start. You’ll find a python script and an HTML page. The script connects to the MoteDaemon port I referenced earlier and will write these data points out to a JSON file which can be served via your webserver, and is then picked up by the HTML page.

If you look through the code you’ll see there’s clearly some tuning that can be done. As people on many forums pointed out there’s a “lag” which is really just because of my polling rates (and less to do with the network traffic).

Two important things here, first you’ll see that I created a separate thread for the monitoring and one for the output, that’s because the wiimote data is fast and furious and you don’t want to block and miss any!

Second, I know writing to an JSON file isn’t ideal so the output part actually buffers 10 events and writes those, luckily jQuery is smart enough to only pull the file if it’s changed so it’s not as bad as it sounds.

Once you’ve got the accelerometer data into python (and then into JSON) it’s ‘just’ a matter of writing the webpage you want! jQuery makes some stuff really easy so I suggest giving it a look if you haven’t yet!

Of course what makes it really easy for me was the ioBridge module. I just plugged in a servo there and defined a widget on their page and ‘viola I had a webservice I could send commands to!

I hope that helps give some of you (or at least one of you) a boost, and if I can help out at all please let me know!