Archive for the ‘visualization’ Category

twitterline

Friday, February 20th, 2009

I’ve been using a lot of python, jQuery and web services recently and thought it was time to pull together those skills into a public app.

Like most developers, I often “scratch my own itch” and write code to solve a problem or learn something. I try to post what I can but some solutions are hosted internally and I know there are numerous code fragments scattered across my hard drives which haven’t made it into posts.

Many of these projects are “evolutionary dead ends” but I think it’s important to engage in “purposeful play” without anticipating success or failure. You really have to take time to nurture your childlike creativity and it’s often in these limitless exercises that we develop the foundation for real breakthroughs for more “respected” works.

I was reminded of this recently when I watched a presentation by Aaron Koblin on some of his creative works. His compositions are stunning and while I recall noticing many of those projects independently over time, it was seeing the evolution of his portfolio that really inspired me.

If you watch the video you can see how his work went from a type of deliberate play to having a full “application”. It’s a lesson I try to perpetually embody with a “just do it” attitude, and it’s rewarding to see someone having applied it with such success.

So in that vein, I decided to clean up one of my sites and pull together a lot of these components into something “useful”. I call it “twitterline”, because “twitterbar” may be more descriptive but doesn’t roll off the tongue as well. You can see an example and get a pretty good idea of what it’s used for.

The API is “RESTful” and is simply “http://twitterline.shelv.us/twitterline” followed by your twitter ID, e.g. “/wjhuie” and the number of days that you’d like to graph, e.g. “/4″. I’ve limited the number between [1, 14] and if you don’t supply a number the default is 7, which all make for reasonable defaults.

However, beyond just looking at the bar graph on my site, you should be able to embed it wherever you wish! You can check the source on my example, mostly you’ll need to make sure jQuery and jQuery.Flot are embedded first and you’ll likely want to tweak the CSS. Just let me know if you need help to it up and running or if you’d like some different defaults.

It’s intended to be a simple culmination of a more complex process (which I’ll blog more on later) but I hope it inspires you to dust off a project of your own or start a new one!

Taking (and keeping) your temperature!

Friday, January 2nd, 2009

I swear I don’t have a penchant for medical terminology but this ioBridge stuff is making me feel like that time I stayed at a Holiday Inn… so refreshing, I think I could perform surgery!

After my heart hacks (see my previous posts) I had questions from some friends about how we could graph the data from an ioWidget’s (my term). Initially, I wanted to push the data into a Google Spreadsheet but unfortunately there doesn’t seem to be a higher level javascript library supporting Google Docs and sorting through the feed URL’s was just too complicated.

At the same time I was thinking through this request, I received a resounding response from the ioBridge team! Although I’d worked around the need for a simple API they quickly responded to the desire (apparently I wasn’t the only one with interesting ideas) and now there’s a full JSON API!

I won’t bore you with my initial graphing solution as the sample made it into their official API demo (along with some much needed code enhancements). However, there were a few pitfalls with that approach that I still didn’t like, most specifically that the data is “lost” every time you reload the page.

It took me until after the holiday break (along with a welcome return to python) but I’ve solved my initial frustration with great results.

Here’s a script which will poll my ioBridge module and then store the results of my tempreature sensor in a Google Spreadsheet that I created! Once the data’s there you can use Google’s visualization widgets to make some fun graphs!

Aside from some setup and “ease of use” code, the real work is done by two very brief classes. I deliberately didn’t add some error checking nor make the widget class generic (it’s actually proxying the full ioBridge module) so I think it should be straightforward enough to modify for your own uses!

All you need to do is create a spreadsheet and open it in your browser. Copy the key from that URL and paste it, along with your ioBridge feed URL, into the appropriate places in the script (the locations are commented).

I simply run this from a cron script every few minutes (once I get more data I’ll reduce the time) and although there’s not a lot of variation in the data (I deliberately introduced some to make the graph more interesting) it’s a spectacular way to record, visualization and act on the sensor’s findings!

Good luck with your own modifications and let me know if I can help!

What if stocks were movies?

Monday, July 7th, 2008

As you’ve seen from my previous posts, I’ve been playing with python, couchdb and been working my way through the “Programming Collective Intelligence” book. However, what I haven’t been talking about much is what I’ve actually been doing with it!

When learning something, I’m at a disadvantage unless I can relate the technique toward an application, even if it’s just hypothetical. Since my day job is mostly spent at an IT / Architecture level it can often be difficult to get a chance to move beyond theoretical “what ifs”.

It’s a constant balance to make progress through a book or tutorial vs. branching off to investigate and actually apply something. As a rampant “consumer” I naturally error toward the “more data” side of things rather then being able to take time and explore in depth (I still have Google App Engine in my queue to revisit).

With the potential of PCI, I’ve been really focused on working through the examples and trying to explore what a good tangential application might be. Outside of technology I have a lot of varied interests (e.g. some of my friends call me ‘Longbow’) not the least of which is the idea that business and money is always an interesting area to investigate.

So I decided to mess around with applying PCI’s recommendation techniques to the field of stock analytics. There’s a theory called the “efficient market hypothesis” which holds that the crowd is wise and that if someone does have some corner on knowledge it (a) won’t be you and (b) won’t be legal.

Ok, the last two implications are my own but the theory basically suggests that you shouldn’t try to (and in fact can’t) beat the market because it’s the one making all the rules. Also, consider that if a stock is priced at $90, even if the stock is “worth” $100, it’s actually not worth $100 because the market will only pay $90!

Let’s leave off arguing the truth (or un-truth) of this hypothesis, we all know fortunes are made by being ahead of the crowd (whether by skill or luck) and see what sort of insights we can gain from the data itself!

Here’s what I did;

  1. I took a list of 125 stock symbols and built a python class to help me out ( can someone please share a good wordpress plugin for code? ). The class is used to parse Google Finance data and extract stock information (currently just the closing price information).
  2. Once I had this data and as an educational aside (and since I was certain Google didn’t want me hitting up their servers all the time) I built a couchdb database for this information. Now that Google finance has an API it might be unnecessary but I’ve yet to investigate it.
  3. Now I can query the database and build a python dictionary of stocks and their historical information and apply the PCI techniques for “recommendations” on this datastructure.

I don’t know about you but I think that’s pretty cool!

Let me corral my excitement for a second and explain my thought process. The initial examples in PCI are based on movie recommendations; a critic will watch a movie and issue a numerical rating. In my stock analogy a critic => stock_symbol a movie => date and a rating => closing_price.

That might be a stretch for the relationship mapping but remember the PCI techniques can rate clusters for nearly everything which has a numerical value (or can be turned into a numerical representation).

So since they’ve been nice enough to share the financial data, let’s take GOOG as an example;

>>> stock_values["GOOG"]
{u’26-Mar-08′: u’458.19′, u’4-Dec-07′: u’684.16′ … }

In my example GOOG’s is not actually a critic but being critiqued by the crowd and on March 26th they rated it a 458.19 on the scale (from 0 – infinite). Here’s the rating for a different stock on that same day;

>>> stock_values["YHOO"]['26-Mar-08']
u’28.49′

Obviously sim_distance() (PCI’s Euclidian measure) won’t do because of the massive discrepancy in the stock prices. However, the sim_pearson() rating will take these “rating tendencies” into effect, just like it will for critics who always judge movies optimistically, in effect normalizing your data for you. Note, you’ll have to tweak these functions from the original PCI to cast the value to floats before summing.

Now, for a given stock I can find the one other stock (among my sample of 125 stocks) which matches it most (or least) similarly in price fluctuations based on the closing price per day traded.

>>> topMatches(stock_values, “GOOG”, n=1)
[(0.92134859322750906, u'LOGI')]
>>> topMatches(stock_values, “YHOO”, n=1)
[(0.45166743704701778, u'BIIB')]

You might also consider finding relationships based on trading volumes but I haven’t yet learned how to do correlations based on multiple relationships.

So let’s look at all the stocks;

>>> for s in stock_values.keys():
… rank, stock = topMatches(stock_values, s, n=1)[0]
… print “%s is best matched (%s) by %s” % (s, rank, stock)

AAPL is best matched (0.903380774431) by BIDU

That’s the “best matches” for 125 of the NYSE’s top stocks, and a quick bit of shell scripting;

cat matches | cut -d ” ” -f7 | sort | uniq | wc -l
68

Shows us that the prices (up and down) for the 125 stocks I’d selected are in fact represented by the movement of just 68 overall stocks!

That’s like being able to capture the excitement from a full summer of movies by only going to see 55% of the movies!

Fun with twitter

Thursday, April 17th, 2008

Lauren shares two fun twitter tools, which I don’t want to re-describe here but I thought I’d share since visualizations are so much fun!