Visualising the men’s 100m Olympic champions with D3.js

As the 2012 Olympics draw near, the BBC has begun to dial up the airtime their Olympics related features get. This Monday, they broadcast a documentary covering the men’s 100m sprint event over the years. Towards the end, they mentioned an interesting factoid – if Thomas Burke, the 1896 Athens Olympics 100m gold winner, were to run against Usain Bolt, the 2008 Beijing Olympics 100m gold winner, roughly 20 meters would separate the two at the end of the race.

This made me wonder, how some of the other sprinters I grew up watching in the 80s and the 90s, would do against Usain Bolt?

Mathematically, this is simple to calculate. Since this is the 100m sprint, everyone ran the same distance. Everyone’s Olympic race time is also well documented. Using the two, you can compute everyone’s speed and then multiply it with Usain Bolt’s 2008 time (9.69s) to get the distance everyone would’ve run when Bolt finishes.

Technically, the exercise is deeply flawed. Even if you discount technical advances, like the changes in turf, footwear and other gear, you cannot possibly account for different speeds of different sprinters at each stage of the race (everyone accelerates and decelerates at a different rate), the effect of windspeed on their performance and so on.

Despite the futility of it, it’s an interesting exercise and conveys some sense of how far sprinting has come since 1896. So I went about collecting the data in a spreadsheet that very night. I collected data all the way back to the 1972 Olympics, and added Thomas Burke to it. I also added Jesse Owens, the 1936 winner and a personal favourite, to my set.

I’ve been wanting to play with D3.js for some time now. This was the perfect excuse to learn it and put it to use.

Here is what my visualisation of the performance of the different sprinters over the last 20 meters of the race looks like (click on the image to open the interactive version in a new window):

Visualisation of men's 100m Olympic champions (click to open the interactive version)

The (rather verbosely commented) source code is available on GitHub.

I hope you enjoy playing with it. My day job involves writing Perl for, so I am not a designer by any stretch of imagination. I’d love to see where the real designers would take this.

Emacs + org-mode = todo list nirvana

Here is how I use Emacs and org-mode to keep myself organized.

1. I keep all my todo lists in a single file. My work items, my shopping list, my hobby projects all reside in the same file.

2. I use one top-level heading for each category.

top level headings - one per category

3. Under each top-level heading I have ‘checkboxes’ for each todo item. If you begin an item with – [ ] it is treated as a checkbox. You can use C-c C-c to toggle between the checked [x] and unchecked states [ ]. There is also an intermediate stage [-] that you can use to denote work in progress (set using C-u C-u C-c C-c).

top-level headings with checkboxes

checkboxes can be toggled

4. You can place your cursor anywhere on a top-level heading press tab to hide or show the sub-items.

the sub-items can be hidden

5. If you type [/] against a top-level heading that has checkboxes under it, each time you check or uncheck an item, it’ll automatically update itself with a count of checked/total items. You can also force an update of the checkbox count, by placing your cursor over [/] and pressing C-c C-#

showing counts of items under each top-level heading

6. You can press C-c . to insert a date against a todo item. By default the calendar loads with the current date selected (you can use S–>/S-<- to select a later/earlier date) and press enter to insert the selected ate.

dates can be assigned to each item easily

7. org-mode has support for Emacs’ narrow-mode and you can use it to narrow down your todo list to just one top-level item. C-x n s will leave you with just the top-level heading your cursor was on. You can press C-x n w to see all your headings again.

showing counts of items under each top-level heading

8. org-mode also has support for hyperlinks. You can type/paste something that looks like a link and org-mode will allow you to launch it using C-c C-o:

links can also be added to your todo list

Links these days can look a bit unwieldy. You can give your links a title and hide the url like this: [[href][text]]. e.g. the url in the example above can be written as [[][oreilly store]] and you’ll get a compact link:

compact links

9. A lot of our items at work these days originate as emails. Having the email corresponding to your todo item readily accessible next to it can be a big timesaver. If you use Emacs on MacOS, you can link to individual emails in Visiting the mail link (C-c C-o) will open the email in Copying the currently selected email’s link can be a little tricky. I made a minor tweak to this AppleScript snippet that I found on

tell application "Mail"
    set _sel to get selection
    set _links to {}
    repeat with _msg in _sel
        set _messageURL to "[[message://%3c" & _msg's message id & "%3e][email]]"
        set end of _links to _messageURL
    end repeat
    set AppleScript's text item delimiters to return
    set the clipboard to (_links as string)
end tell

I now use it with Alfred [I've assigned it a keyboard shortcut - cpm] to quickly copy a link to my currently selected email for use in my todo list.

compact links

Extracting audio from 3gp files using ffmpeg

Extracting audio from a 3gp video file recorded on an Android phone (this was tested on a video file recorded on Google Nexus One running Gingerbread).

First find out what audio format is present in the file:

ffmpeg -i VID_20110518_184415.3gp
Stream #0.0(eng): Audio: aac, 16000 Hz, mono, s16, 96 kb/s

Turns out, the audio encoded as .aac. Here’s what can be done next:

1. Don’t transcode the audio just extract the audio track as it is:

ffmpeg -i VID_20110518_184415.3gp -vn -acodec copy clarinet.aac

2. Extract audio and transcode it to mp3 at 64kbps:

ffmpeg -i VID_20110518_184415.3gp -vn -acodec libmp3lame -ab 64k clarinet.mp3

3. Extract audio and transcode it to ogg at medium quality:

ffmpeg -i VID_20110518_184415.3gp -vn -acodec libvorbis -aq 50 clarinet.ogg

This can be handy if you want to embed the file using the new HTML5 audio tag. e.g.

<audio controls>
  <source src="clarinet.ogg"/>
  <source src="clarinet.mp3"/>

With a lot of help from:

Deploying a Perl Dancer application on Amazon EC2

Dancer is a new Perl web framework that I’ve been playing with since April. I finally got some time to build a small application and take it live. I chose Amazon’s EC2 for deployment because in addition to Dancer, that’s another area which I had been wanting to explore and with their (then) recently introduced free usage tier, there wasn’t much to lose. Here are some details on how the app was built and deployed:

Dancer: pulls out some interesting stats about people you follow on twitter. I use Net::Twitter::Lite to talk to Twitter. I wrote a small class to analyze the data I get from Twitter and that keeps my route clean. I use Template Toolkit for, well, templating. There is a ‘lite’ version of Template Toolkit which comes as default with Dancer, but since I’ve been a TT user (dare I add a power user) for a while now, I went with the real thing.

I initially ran my app using Perl as:

perl bin/

I tried enabling “auto reloading” of my module so that any changes to it are immediately availble without restarting the app but for some reasons I couldn’t get it to work consistently on MacOS. A quick note to the Dancer mailing list revealed an alternate solution – using Plack with the ‘shotgun’ loader. The latter reloads your entire app for each request – a bit like CGI. If you are using modules that tend to have a long start-up time (like Moose), you can also tell plack to not load them every time:

plackup -L Shotgun -p 3000 bin/

To prevent certain modules from being reloaded:

plackup -MMoose -MDBIx::Class -L Shotgun bin/

This post from the 2009 plack advent calendars has more details.

Deploying on EC2:

The biggest problem I had starting with EC2 was documentation. Amazon overwhelms you with a lot of TLAs and their circuitous documentation makes going in circles seem like walking in a straight line (as the diameter of the circle tends to infinity this is indeed how it will feel, but I digress). That said, I came across their getting started guide which, alongwith the new web-based management console, made things a breeze.

The next big hurdle was picking the right distro of Linux to deploy on. I have more experience running Debian/Ubuntu in production than any other distro. Canonical’s 10.04 LTS Server was my first choice. While setting it up was a breeze, logging into it for the first time informed me that I had some pending updates. Installing those updates led me to a point in Grub configuration where the machine just froze. I didn’t take things any further.

To keep things simple I decided to go with the default Amazon 64-bit Linux instance. Now I would’ve loved to get hold of their custom Linux build just to replicate the production environment on my machine but Amazon doesn’t give it out. It looks like it is a Fedora derivative (there are fingerprints all over the place – e.g. in the welcome page you get when you install niginx) so one could run Fedora and get quite close to the Amazon provided Linux instance.

The Amazon Linux image comes with Perl 5.10.1. My first step was to install the ‘Development Tools’ bundle so that I could build things from source if needed.

sudo yum groupinstall 'Development Tools'

I then installed CPAN Minus (App::cpanminus), which is my preferred tool for installing things off CPAN.

sudo cpan App::cpanminus

I then used it to get Dancer:

sudo cpanm dancer

followed by other CPAN dependencies my app had.

At this stage I opened port 80 through the AWS EC2 console and tested my app to make sure it was running fine and was accessible over the internet using the temporary Amazon supplied domain name. I then got an Elastic IP and tied it to my running machine instance. I also went to my domain registrar (Dreamhost) and pointed my domain’s A record to this IP.

My next step was to install the Starman web-server under which my application is deployed.

sudo cpanm starman

I ran my app again – this time under Starman and checked it over the internet to make sure that everything was fine so far:

sudo /usr/local/bin/plackup -s Starman -p 80 -E deployment --workers=10 \
 -a /home/apps/TwitterToys/bin/

Satisfied, I moved on to the next big step – installing and configuring nginx. While ideally I would’ve loved to install the latest 0.8.x branch of nginx, it wasn’t available out of the box on Amazon’s Linux image. Indeed, even the most recent Linux distros (Ubuntu 10.04 Server or Debian Lenny/Squeeze) seem to give it a miss. While nginx compiles from source on most distros without problems, keeping it updated, patched and running can be daunting. So I settled for the default 0.7.67 install via yum:

sudo yum install nginx

I use nginx to do all the HTTP related stuff like gzipping content, serving static files, adding the right expires header and so on. It also acts as a caching proxy in my setup. Dancer running under Starman/Plack does everything else.

The following lines setup gzip response compression and a caching zone:

http {
    gzip on;
    gzip_min_length 1024;

    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=twitter:8m
max_size=64m inactive=60m;
    proxy_temp_path /tmp;
    proxy_cache_key "$scheme://$host$request_uri";
    proxy_cache_valid 200 60m;

I then setup the server to proxy the requests to my running Dancer instance:

  server {
        listen       80;

        location / {
            proxy_cache twitter;

            #bypass cache for this path so that I can check my API usage
            set $do_not_cache 0;

            if ( $request_uri ~ "^/xyz$" ) {
                 set $do_not_cache 1;
            proxy_no_cache $do_not_cache;
            proxy_redirect http://$host/;
            expires 1h;

I also setup another location block within the same server block to let nginx serve all the images directly:

        location /images/ {
            alias /home/apps/TwitterToys/public/images/;
            expires 30d;
            access_log off;

I fired up Starman again – this time on port 5001 and bound to and tested nginx from the internet to make sure that everything was working fine. I did run into a problem with serving static content. A look at the error log (/var/log/nginx/error.log) showed that nginx worker process was running into a permission issue reading the files:

2011/01/02 07:01:39 [error] 3781#0: *1 open()
"/home/apps/TwitterToys/public/images/logo.png" failed
(13: Permission denied), client:, server: _,
request: "GET /images/logo.png?x=1 HTTP/1.1", host:

I gave ‘others’ read and execute permissions on the /home/apps/ folder to make sure that nginx worker process could get in and read the files and the ’13: Permission denied’ errors went away.

sudo chmod -R o+rx /home/apps

This brought me to the last big task – configuring my Dancer application to run as a daemon so that it runs in the background and comes up when the OS boots the next time. I chose Daemontools for this. Unfortunately Daemontools were not available on Amazon’s Linux image via yum (they are available on Ubuntu 10.04 via the default repositories using apt-get), so I decided to roll pull the source and build. Here, I ran into another wall – the compilation would stop with some vague reference to errno.h. After some tense moments of frantically searching the internet, I found that I had to modify error.h in the src/ directory of the daemontools source distribution to something like this:

/* extern int errno; */
#include <errno.h>

The compilation and subsequent installation went fine. I restarted my EC2 instance to make sure that ‘svscan’ came up after the reboot. Things were much simpler this point on. All I had to do was create a folder for my daemon (I called it TwitterToys) under /service and place a shell script called ‘run’ with execute bit set:


export PERL5LIB='/home/apps/TwitterToys/lib'
exec 2>&1 \
/usr/local/bin/plackup -s Starman -l -E deployment \
--workers=10 -a /home/apps/TwitterToys/bin/

Within moments my new and shiny daemon came up. I did another reboot to make sure that it indeed does come up as expected. By this time my domain changes had replicated to my ISP and pointing my browser brought up my site.

My last steps before announcing it to the world were:

1. to add a CNAME record for ‘www’ pointing to so that people who prefix URLs with www do make it to my site.

2. to add the following server block to my nginx configuration so that redirects to (via)

    server {
	rewrite ^(.*)$1 permanent;

There you have it! A site running on Perl and Dancer from start to finish.

Unicode characters on MacOS and Linux filesystems

MacOS filesystem stores Unicode characters in their decomposed form. For example, it stores é as two code points e (plain vanilla ASCII e) + ´ (combining acute accent). Kiran had written a blog post about this so I won’t go into details, but having recently discovered the wonderful charnames Perl pragma, I was curious to find out how it could help me ‘see’ what form (precomposed or decomposed) of a character does an OS use . Given two folders – café and müller in /tmp/test/, here is what the following Perl script:

use strict;
use charnames ':full';

opendir my ($dir), "/tmp/test";
my @stuff = readdir($dir);

foreach my $stuff (@stuff) {
	next if $stuff =~ /^\./;
	my @parts = unpack("U*", $stuff);
	foreach my $part (@parts) {
		print charnames::viacode($part);
		print "\n";
	print "\n\n";




The same script when run on a Linux box results in:



Notice that é is stored as one code-point (latin small letter e with acute). Ditto for ü (latin small letter u with diaeresis as opposed to latin small letter u + combining diaeresis on Mac).

charnames can be a very useful tool in your toolbox. More on what it can do on perldoc:

p.s. Eric Sink mentions this (the difference between the way different filesystems store certain Unicode characters) in his wonderful cross-platform version control post too. See point #9.

Hacking Chrome Developer Tools Protocol for fun and profit

I recently came across a tip on Hacker News which along with Firefox, MozRepl plugin and this snippet:

autocmd BufWriteCmd *.html,*.css,*.gtpl :call Refresh_firefox()
function! Refresh_firefox()
  if &modified
    silent !echo  'vimYo = content.window.pageYOffset;
                 \ vimXo = content.window.pageXOffset;
                 \ BrowserReload();
                 \ content.window.scrollTo(vimXo,vimYo);
                 \ repl.quit();'  |
                 \ nc localhost 4242 2>&1 > /dev/null

allows you to refresh a tab in Firefox the moment you save your edits in Vim. No longer do you need to switch to Firefox, hit refresh to see the edits and then come back to your work. This is especially useful when you are on a multi-monitor setup.

My primary browser these days is Google Chrome. I was wondering if such a thing would be possible with Chrome too. It so happens it is. If you start Chrome with –remote-shell-port=9222 switch (on my Mac I do it like this: ~/Applications/Google\\ Chrome –remote-shell-port=9222), you can connect to it using a TCP socket over port 9222 and then issue it commands using Chrome Dev Tools Protocol.

I wrote a small Perl wrapper around the protocol and then wrote another simple script that simply refreshes the last open tab in your browser:

use strict;
use ChromeTool;

my $chrome = ChromeTool->new;
if($chrome) {
    my $tabs = $chrome->tabs;
    if(scalar(@$tabs) > 0) {

I modified the original Vim snippet mentioned above so that each time I save my code, both the browsers get auto-refreshed:

autocmd BufWriteCmd *.html,*.htm,*.css,*.gtpl,*.tt2 :call Refresh_firefox()
function! Refresh_firefox()
  if &modified
    silent !echo  'vimYo = content.window.pageYOffset;
                 \ vimXo = content.window.pageXOffset;
                 \ BrowserReload();
                 \ content.window.scrollTo(vimXo,vimYo);
                 \ repl.quit();'  |
                 \ nc localhost 4242 2>&1 > /dev/null

    silent !perl -I/Users/deepakg/Projects /Users/deepakg/Projects/

Of course, you’ll need to change /Users/deepakg/Projects to where you save the file and Here is a quick screencast showing this in action:

Credits: I learned about existence of Chrome Developer Tools Protocol from this post. You can also find link to a Ruby REPL client for talking to Chrome on the same page.

Nginx and embedded Perl

Nginx ships with support for embedded Perl. At the moment execution of Perl code blocks the nginx worker process and therefore anything that might take an indeterminate amount of time to finish (say a DB query) is discouraged.

That said, there could be scenarios where it could come in very handy – like redirecting users to browser-specific static content, generating CAPTCHA – and given Perl’s versatility I am sure several other.

Unlike Apache – where you can load mod_perl as a module, the embedded Perl support in nginx has to be “baked in” at the time of compilation.

Assuming you downloaded nginx-0.7.65, here is how you’ll build it with Perl support:

cd nginx-0.7.65
/configure --with-http_perl_module

Things should go smoothly from here, but you might get the following error:

	objs/ngx_modules.o \
	-lcrypt -lpcre -lcrypto -lz \
	-Wl,-E -L/usr/local/lib -L/usr/lib/perl/5.10/CORE -lperl -ldl -lm \
        -lpthread -lc -lcrypt
/usr/bin/ld: cannot find -lperl
collect2: ld returned 1 exit status
make[1]: *** [objs/nginx] Error 1
make[1]: Leaving directory `/home/deepakg/nginx-0.7.65'
make: *** [build] Error 2

To fix it create a symbolic link –, that points to the version of libperl installed on your system:

cd /usr/lib
sudo ln -s

You might need to replace with the version of Perl installed on your system. The compilation should now go smoothly. From here you can follow the usage examples off the nginx Wiki

Ruby 1.9 vs MacRuby – string handling

Ruby 1.9, among other things, brings much needed improvements to the way unicode strings are handled. The string class now includes a property called encoding which tells us the – well – encoding of a given string. By default a string’s encoding is same as the encoding of the source file, which in turn can be set by using the coding comment. For example, to use utf8 as the source’s encoding (and to be able to use utf8 characters as part of string literals) you’d use: # -*- coding: utf-8 -*-

Let’s look at some sample code and its output under Ruby 1.9


# -*- coding: utf-8 -*-
str = "café"
puts "Encoding    : #{str.encoding}"
puts "Length      : #{str.length}"
puts "Byte Size   : #{str.bytesize}"
puts "#{str} in upper case is: #{str.upcase}"


Encoding    : UTF-8
Length      : 4
Byte Size   : 5
café in upper case is: CAFé

As is evident from the output above, Ruby 1.9 still doesn’t handle casing beyond the ASCII range. Upper casing café, gave us CAFé as opposed to CAFÉ (which is the correct response).

Also the byte size of the string is 5 because under the utf-8 encoding, é takes up two bytes – 0xC3, 0xE9.

MacRuby – to quote the project site – is a version of Ruby 1.9, ported to run directly on top of Mac OS X core technologies such as the Objective-C common runtime and garbage collector, and the CoreFoundation framework.

This means that the Ruby datatypes have been implemented on top of Mac “native” (Cocoa) datatypes – e.g. Ruby strings are implemented on top NSString.

This introduces some differences in the way strings are handled by MacRuby. To start with, non-uncode strings use the ‘MACINTOSH’ encoding (Ruby 1.9 default is US-ASCII) while the unicode strings use utf-16 (even if you’ve set the coding comment to use utf-8). MacRuby also handles casing correctly.

So the same code snippet as above gives different output:


Encoding    : UTF-16
Length      : 4
Byte Size   : 4
café in upper case is: CAFÉ

Note that casing is handled correctly by MacRuby.

The byte size for the string is 4 because É is 0xE9 under the utf-16 encoding. Technically most characters, when using the utf-16 encoding, should take up 2 bytes (c should be 0×0043, é should be 0x00E9 and so on) but I guess the most significant byte is not used if it is 0×00.

p.s. the versions of the products used in the examples above are:

1. Ruby – 1.9.1p376 (2009-12-07 revision 26041) [i386-darwin10] (installed via macports 1.8.2)

2. MacRuby version 0.5 (ruby 1.9.0) [universal-darwin10.0, x86_64] (binary distribution from the official MacRuby project site)

Number-only Captcha

I was recently looking for a Perl Captcha implementation that’d generate number-only Captchas. I looked inside Authen::Captcha and figured I could subclass it do just numeric captchas. It turned out to be a lot simpler than I thought it would be. Here is the code:

package Authen::NumCaptcha;

use strict;
use base qw(Authen::Captcha);

sub new {
    my $class = shift;
    my $captcha = $class->SUPER::new( @_ );
    return bless $captcha, $class;

sub generate_random_string {
    my $self = shift;
    my $length = shift;

    my $code = "";
    my $char;

    for (my $i=0; $i < $length; $i++) {
        $char = int(rand 7)+50;
        $char = chr($char);
        $code .= $char;

    return $code;


We basically override Authen::Captcha's generate_random_string and only use ASCII characters 50 to 57 (digits 2 to 9). Authen::Captcha doesn't include the digits 0 and 1 because they could be confused with oh (O) and lower case elle (l) or upper case ai (I).

To use Authen::NumCaptcha, place it under /usr/share/perl5/Authen/ (or wherever you have Authen::Captcha installed) alongside Then use it as you'd use Auhten::Captcha. For example:

use Authen::NumCaptcha;

my $num_chars = 6;
my $num_captcha = Authen::NumCaptcha->new (
                                          data_folder => '/tmp',
                                          output_folder => '/tmp'

my $num_md5sum = $num_captcha->generate_code($num_chars);

Installing VMWare tools on Debain Lenny/Etch

If you are building an X-less command-line-only Debian VM, here is what you’ll need to do in order to install VMWare tools so that you can use features like shared folders:

1. Login as root or su.
2. Make sure you have installed the kernel sources and build tools.

apt-get install build-essential

Then run:

apt-get install linux-headers

This will probably give you a message saying: “Package linux-headers is a virtual package provided by:”, followed by a list of available kernel versions. Choose your version by looking it up:

cat /proc/version


uname -a

e.g. in my case I chose:

apt-get install linux-headers-2.6-2-amd64

3. Next choose Virtual Machine -> Install VMWare Tools option from the menu (this assumes VMWare Fusion running on Mac but there should be a similar option for VMWare Workstation on Linux and Windows as well).

Now mount the VMWare tools virtual CD:

mount /cdrom
cd /cdrom

Copy VMWare tool source code to /tmp and extract the files (the actual filename will depend on your VMWare version [which the VMWare installer would have reported in case of a mismatch]):

cp VMWareTools-7.9.6-173382.tar.gz /tmp
cd /tmp
tar -xzvf VMWareTools-7.9.6-173382.tar.gz

At this stage you should have a folder called vmware-tools-distrib inside your /tmp folder. Visit this folder and run, then follow the on-screen instructions.

cd /tmp/vmware-tools-distrib

Now at one stage the VMWare tool installer complained about my installed gcc (4.3) version being different from the gcc version that was used to compile my kernel (4.1). Press Ctrl+C to abort the installer at this stage.

Turns out that gcc is merely a symbolic link in /usr/bin folder. Chances are that you’ll have 2 versions of gcc on your system (e.g. Lenny comes with /usr/bin/gcc-4.3 and /usr/bin/gcc-4.1). Just make the symlink point to the version of gcc that was used to compile your kernel:

cd /usr/bin
rm gcc
ln -s /usr/bin/gcc-4.1 gcc

Run again and this time things should go through. Restart your VM to finish the installation. You should now be able to see folders shared from the host machine under /mnt/hgfs.

Most of these steps should also apply to Ubuntu.