Deploying a Perl Dancer application on Amazon EC2

Dancer is a new Perl web framework that I’ve been playing with since April. I finally got some time to build a small application and take it live. I chose Amazon’s EC2 for deployment because in addition to Dancer, that’s another area which I had been wanting to explore and with their (then) recently introduced free usage tier, there wasn’t much to lose. Here are some details on how the app was built and deployed:

Dancer:

stupidtwitterstats.com pulls out some interesting stats about people you follow on twitter. I use Net::Twitter::Lite to talk to Twitter. I wrote a small class to analyze the data I get from Twitter and that keeps my route clean. I use Template Toolkit for, well, templating. There is a ‘lite’ version of Template Toolkit which comes as default with Dancer, but since I’ve been a TT user (dare I add a power user) for a while now, I went with the real thing.

I initially ran my app using Perl as:

perl bin/app.pl

I tried enabling “auto reloading” of my module so that any changes to it are immediately availble without restarting the app but for some reasons I couldn’t get it to work consistently on MacOS. A quick note to the Dancer mailing list revealed an alternate solution – using Plack with the ‘shotgun’ loader. The latter reloads your entire app for each request – a bit like CGI. If you are using modules that tend to have a long start-up time (like Moose), you can also tell plack to not load them every time:

plackup -L Shotgun -p 3000 bin/app.pl

To prevent certain modules from being reloaded:

plackup -MMoose -MDBIx::Class -L Shotgun bin/app.pl

This post from the 2009 plack advent calendars has more details.

Deploying on EC2:

The biggest problem I had starting with EC2 was documentation. Amazon overwhelms you with a lot of TLAs and their circuitous documentation makes going in circles seem like walking in a straight line (as the diameter of the circle tends to infinity this is indeed how it will feel, but I digress). That said, I came across their getting started guide which, alongwith the new web-based management console, made things a breeze.

The next big hurdle was picking the right distro of Linux to deploy on. I have more experience running Debian/Ubuntu in production than any other distro. Canonical’s 10.04 LTS Server was my first choice. While setting it up was a breeze, logging into it for the first time informed me that I had some pending updates. Installing those updates led me to a point in Grub configuration where the machine just froze. I didn’t take things any further.

To keep things simple I decided to go with the default Amazon 64-bit Linux instance. Now I would’ve loved to get hold of their custom Linux build just to replicate the production environment on my machine but Amazon doesn’t give it out. It looks like it is a Fedora derivative (there are fingerprints all over the place – e.g. in the welcome page you get when you install niginx) so one could run Fedora and get quite close to the Amazon provided Linux instance.

The Amazon Linux image comes with Perl 5.10.1. My first step was to install the ‘Development Tools’ bundle so that I could build things from source if needed.

sudo yum groupinstall 'Development Tools'

I then installed CPAN Minus (App::cpanminus), which is my preferred tool for installing things off CPAN.

sudo cpan App::cpanminus

I then used it to get Dancer:

sudo cpanm dancer

followed by other CPAN dependencies my app had.

At this stage I opened port 80 through the AWS EC2 console and tested my app to make sure it was running fine and was accessible over the internet using the temporary Amazon supplied domain name. I then got an Elastic IP and tied it to my running machine instance. I also went to my domain registrar (Dreamhost) and pointed my domain’s A record to this IP.

My next step was to install the Starman web-server under which my application is deployed.

sudo cpanm starman

I ran my app again – this time under Starman and checked it over the internet to make sure that everything was fine so far:

sudo /usr/local/bin/plackup -s Starman -p 80 -E deployment --workers=10 \
 -a /home/apps/TwitterToys/bin/app.pl

Satisfied, I moved on to the next big step – installing and configuring nginx. While ideally I would’ve loved to install the latest 0.8.x branch of nginx, it wasn’t available out of the box on Amazon’s Linux image. Indeed, even the most recent Linux distros (Ubuntu 10.04 Server or Debian Lenny/Squeeze) seem to give it a miss. While nginx compiles from source on most distros without problems, keeping it updated, patched and running can be daunting. So I settled for the default 0.7.67 install via yum:

sudo yum install nginx

I use nginx to do all the HTTP related stuff like gzipping content, serving static files, adding the right expires header and so on. It also acts as a caching proxy in my setup. Dancer running under Starman/Plack does everything else.

The following lines setup gzip response compression and a caching zone:

http {
    ..
    ..
    ..
    gzip on;
    gzip_min_length 1024;

    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=twitter:8m
max_size=64m inactive=60m;
    proxy_temp_path /tmp;
    proxy_cache_key "$scheme://$host$request_uri";
    proxy_cache_valid 200 60m;
	..
	..
	..

I then setup the server to proxy the requests to my running Dancer instance:

  server {
        server_name stupidtwitterstats.com;
        listen       80;

        location / {
            proxy_cache twitter;

            #bypass cache for this path so that I can check my API usage
            set $do_not_cache 0;

            if ( $request_uri ~ "^/xyz$" ) {
                 set $do_not_cache 1;
            }
            proxy_no_cache $do_not_cache;
            proxy_pass http://127.0.0.1:5001;
            proxy_redirect http://127.0.0.1:5001/ http://$host/;
            expires 1h;
        }

I also setup another location block within the same server block to let nginx serve all the images directly:

        location /images/ {
            alias /home/apps/TwitterToys/public/images/;
            expires 30d;
            access_log off;
        }

I fired up Starman again – this time on port 5001 and bound to 127.0.0.1 and tested nginx from the internet to make sure that everything was working fine. I did run into a problem with serving static content. A look at the error log (/var/log/nginx/error.log) showed that nginx worker process was running into a permission issue reading the files:

2011/01/02 07:01:39 [error] 3781#0: *1 open()
"/home/apps/TwitterToys/public/images/logo.png" failed
(13: Permission denied), client: 122.167.81.253, server: _,
request: "GET /images/logo.png?x=1 HTTP/1.1", host:

I gave ‘others’ read and execute permissions on the /home/apps/ folder to make sure that nginx worker process could get in and read the files and the ’13: Permission denied’ errors went away.

sudo chmod -R o+rx /home/apps

This brought me to the last big task – configuring my Dancer application to run as a daemon so that it runs in the background and comes up when the OS boots the next time. I chose Daemontools for this. Unfortunately Daemontools were not available on Amazon’s Linux image via yum (they are available on Ubuntu 10.04 via the default repositories using apt-get), so I decided to roll pull the source and build. Here, I ran into another wall – the compilation would stop with some vague reference to errno.h. After some tense moments of frantically searching the internet, I found that I had to modify error.h in the src/ directory of the daemontools source distribution to something like this:

/* extern int errno; */
#include <errno.h>

The compilation and subsequent installation went fine. I restarted my EC2 instance to make sure that ‘svscan’ came up after the reboot. Things were much simpler this point on. All I had to do was create a folder for my daemon (I called it TwitterToys) under /service and place a shell script called ‘run’ with execute bit set:

#!/bin/sh

export PERL5LIB='/home/apps/TwitterToys/lib'
exec 2>&1 \
/usr/local/bin/plackup -s Starman -l 127.0.0.1:5001 -E deployment \
--workers=10 -a /home/apps/TwitterToys/bin/app.pl

Within moments my new and shiny daemon came up. I did another reboot to make sure that it indeed does come up as expected. By this time my domain changes had replicated to my ISP and pointing my browser http://stupidtwitterstats.com brought up my site.

My last steps before announcing it to the world were:

1. to add a CNAME record for ‘www’ pointing to stupidtwitterstats.com so that people who prefix URLs with www do make it to my site.

2. to add the following server block to my nginx configuration so that www.stupidtwitterstats.com redirects to stupidtwtterstats.com (via)

    server {
        server_name www.stupidtwitterstats.com;
	rewrite ^(.*) http://stupidtwitterstats.com$1 permanent;
    }

There you have it! A site running on Perl and Dancer from start to finish.

Unicode characters on MacOS and Linux filesystems

MacOS filesystem stores Unicode characters in their decomposed form. For example, it stores é as two code points e (plain vanilla ASCII e) + ´ (combining acute accent). Kiran had written a blog post about this so I won’t go into details, but having recently discovered the wonderful charnames Perl pragma, I was curious to find out how it could help me ‘see’ what form (precomposed or decomposed) of a character does an OS use . Given two folders – café and müller in /tmp/test/, here is what the following Perl script:

use strict;
use charnames ':full';

opendir my ($dir), "/tmp/test";
my @stuff = readdir($dir);
closedir($dir);

foreach my $stuff (@stuff) {
	next if $stuff =~ /^\./;
	utf8::decode($stuff);
	my @parts = unpack("U*", $stuff);
	foreach my $part (@parts) {
		print charnames::viacode($part);
		print "\n";
	}
	print "\n\n";
}

gives:

LATIN SMALL LETTER C
LATIN SMALL LETTER A
LATIN SMALL LETTER F
LATIN SMALL LETTER E
COMBINING ACUTE ACCENT

LATIN SMALL LETTER M
LATIN SMALL LETTER U
COMBINING DIAERESIS
LATIN SMALL LETTER L
LATIN SMALL LETTER L
LATIN SMALL LETTER E
LATIN SMALL LETTER R

The same script when run on a Linux box results in:

LATIN SMALL LETTER C
LATIN SMALL LETTER A
LATIN SMALL LETTER F
LATIN SMALL LETTER E WITH ACUTE

LATIN SMALL LETTER M
LATIN SMALL LETTER U WITH DIAERESIS
LATIN SMALL LETTER L
LATIN SMALL LETTER L
LATIN SMALL LETTER E
LATIN SMALL LETTER R

Notice that é is stored as one code-point (latin small letter e with acute). Ditto for ü (latin small letter u with diaeresis as opposed to latin small letter u + combining diaeresis on Mac).

charnames can be a very useful tool in your toolbox. More on what it can do on perldoc: http://perldoc.perl.org/charnames.html

p.s. Eric Sink mentions this (the difference between the way different filesystems store certain Unicode characters) in his wonderful cross-platform version control post too. See point #9.

Hacking Chrome Developer Tools Protocol for fun and profit

I recently came across a tip on Hacker News which along with Firefox, MozRepl plugin and this snippet:

autocmd BufWriteCmd *.html,*.css,*.gtpl :call Refresh_firefox()
function! Refresh_firefox()
  if &modified
    write
    silent !echo  'vimYo = content.window.pageYOffset;
                 \ vimXo = content.window.pageXOffset;
                 \ BrowserReload();
                 \ content.window.scrollTo(vimXo,vimYo);
                 \ repl.quit();'  |
                 \ nc localhost 4242 2>&1 > /dev/null
  endif
endfunction

allows you to refresh a tab in Firefox the moment you save your edits in Vim. No longer do you need to switch to Firefox, hit refresh to see the edits and then come back to your work. This is especially useful when you are on a multi-monitor setup.

My primary browser these days is Google Chrome. I was wondering if such a thing would be possible with Chrome too. It so happens it is. If you start Chrome with –remote-shell-port=9222 switch (on my Mac I do it like this: ~/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome –remote-shell-port=9222), you can connect to it using a TCP socket over port 9222 and then issue it commands using Chrome Dev Tools Protocol.

I wrote a small Perl wrapper around the protocol and then wrote another simple script that simply refreshes the last open tab in your browser:

#refresh.pl

use strict;
use ChromeTool;

my $chrome = ChromeTool->new;
if($chrome) {
    my $tabs = $chrome->tabs;
    if(scalar(@$tabs) > 0) {
        $chrome->eval($tabs->[-1]->[0],
                                 "javascript:window.location.reload()");
    }
}

I modified the original Vim snippet mentioned above so that each time I save my code, both the browsers get auto-refreshed:

autocmd BufWriteCmd *.html,*.htm,*.css,*.gtpl,*.tt2 :call Refresh_firefox()
function! Refresh_firefox()
  if &modified
    write
    silent !echo  'vimYo = content.window.pageYOffset;
                 \ vimXo = content.window.pageXOffset;
                 \ BrowserReload();
                 \ content.window.scrollTo(vimXo,vimYo);
                 \ repl.quit();'  |
                 \ nc localhost 4242 2>&1 > /dev/null

    silent !perl -I/Users/deepakg/Projects /Users/deepakg/Projects/refresh.pl
  endif
endfunction

Of course, you’ll need to change /Users/deepakg/Projects to where you save the ChromeTool.pm file and refresh.pl. Here is a quick screencast showing this in action:

Credits: I learned about existence of Chrome Developer Tools Protocol from this post. You can also find link to a Ruby REPL client for talking to Chrome on the same page.

Number-only Captcha

I was recently looking for a Perl Captcha implementation that’d generate number-only Captchas. I looked inside Authen::Captcha and figured I could subclass it do just numeric captchas. It turned out to be a lot simpler than I thought it would be. Here is the code:

package Authen::NumCaptcha;

use strict;
use base qw(Authen::Captcha);

sub new {
    my $class = shift;
    my $captcha = $class->SUPER::new( @_ );
    return bless $captcha, $class;
}

sub generate_random_string {
    my $self = shift;
    my $length = shift;

    my $code = "";
    my $char;

    for (my $i=0; $i < $length; $i++) {
        $char = int(rand 7)+50;
        $char = chr($char);
        $code .= $char;
    }

    return $code;
}

1;

We basically override Authen::Captcha's generate_random_string and only use ASCII characters 50 to 57 (digits 2 to 9). Authen::Captcha doesn't include the digits 0 and 1 because they could be confused with oh (O) and lower case elle (l) or upper case ai (I).

To use Authen::NumCaptcha, place it under /usr/share/perl5/Authen/ (or wherever you have Authen::Captcha installed) alongside Captcha.pm. Then use it as you'd use Auhten::Captcha. For example:

use Authen::NumCaptcha;

my $num_chars = 6;
my $num_captcha = Authen::NumCaptcha->new (
                                          data_folder => '/tmp',
                                          output_folder => '/tmp'
                                          );

my $num_md5sum = $num_captcha->generate_code($num_chars);

PerlAuthenHandler

Among other things, mod_perl allows you to write authentication handlers that fit nicely into Apache’s authentication scheme. A Perl authentication handler is a Perl module that decides whether a given user gets access to a resource or not. How you determine whether the user is allowed or denied access is up to you.

The mod_perl documentation comes with a sample that checks the length of the username+space+password supplied by a user and allows the user access only if the length is 14 characters long. Here is what the code looks like:

  package MyApache2::SecretLengthAuth;

  use strict;
  use warnings;

  use Apache2::Access ();
  use Apache2::RequestUtil ();

  use Apache2::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

  use constant SECRET_LENGTH => 14;

  sub handler {
      my $r = shift;

      my ($status, $password) = $r->get_basic_auth_pw;
      return $status unless $status == Apache2::Const::OK;

      return Apache2::Const::OK
          if SECRET_LENGTH == length join " ", $r->user, $password;

      $r->note_basic_auth_failure;
      return Apache2::Const::HTTP_UNAUTHORIZED;
  }

  1;

And you hook this handler up in your apache2.conf as:

  <Location / >
      PerlAuthenHandler MyApache2::SecretLengthAuth
      AuthType Basic
      AuthName "The Gate"
      Require valid-user
  </Location>

As the documentation points out, the authentication handler can be configured for any sub section of the site, it doesn’t matter if it is served by a mod_perl response handler or not.

Now obviously in real life, you’d want an authentication handler that does a little more – like verify the password supplied by the user against a database or LDAP . Here is a version that I wrote that uses Perl’s Net::POP3 package to verify user’s credential against a POP 3 server.

package Apache::PopAuth;

use strict;
use warnings;
use Net::POP3;

use Apache2::Access ();
use Apache2::RequestUtil ();
use Apache2::RequestRec ();
use Apache2::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

sub handler {
    my $r = shift;
    my ($status, $password) = $r->get_basic_auth_pw;
    my $user = $r->user;

    return $status unless $status == Apache2::Const::OK;

    return Apache2::Const::OK
	   if valid_user($user, $password);

    $r->note_basic_auth_failure;
    return Apache2::Const::HTTP_UNAUTHORIZED;
}

sub valid_user {
    my($user, $password) = @_;
    my $pop = Net::POP3->new('pop.yourdomain.com');
    my $status = $pop->login($user, $password);
    $pop->quit();
    return defined($status);
}
1;

This saves you from having to maintain a separate set of users on your web server and passes on the baton of user management to the POP server. One word of caution – this just proof of concept code and I am not sure on well it will scale outside a small intranet.

Install and enable Apache2::Request on Ubuntu Server 8.10

A mod_perl handler can parse the incoming client request (querystring, form post data etc) using Apache2::Request. It is *not* installed when you install mod_perl. Getting it working is a 3 step process.

First issue the following command:

sudo apt-get install libapreq2

This installs 2 things – the libapreq2 shared library, and an Apache module – mod_apreq2.

Next we install the Perl bindings – Apache2::Request – which we use in our handler code.

sudo apt-get install libapache2-request-perl

At this stage if you restart Apache, it will load your Perl handler without any complaints. However if you visit a handler that uses Apache2::Request, it’ll error out with the following entry in error.log:


/usr/sbin/apache2: symbol lookup error: /usr/lib/perl5/auto/APR/Request/Apache2/Apache2.so: undefined symbol: apreq_handle_apache2

This is because unlike our mod_perl installation, apt-get doesn’t enable mod_apreq2 after installing it. We enable it manually by creating a symbolic link to /etc/apache2/mods-available/apreq.load under /etc/apache2/mods-enabled/:


sudo bash
cd /etc/apache2/mods-enabled
ln -s ../mods-available/apreq.load
apache2ctl restart

Updated: You can also run a2enmod to enable an Apache module and use a2dismod to disable it.

Visit your handler url again and this time it should work withour errors. Here is the modified handler from the mod_perl installation post that now uses Apache2::Request to enable you to test if your installation is working correctly:

package Hello;
use strict;

use Apache2::RequestRec ();
use Apache2::RequestIO ();

use Apache2::Const -compile => qw(OK);

use Apache2::Request;

sub handler {
    my $r = shift;

    $r->content_type('text/plain');

    my $req = Apache2::Request->new($r);
    my $name = $req->param("name");
    $name = $name ? $name : "World";

    print "Hello $name, the time here is " . localtime() . "\n";

    return Apache2::Const::OK;
}

1;

CPAN modules on Ubuntu: apt-get vs. perl -MCPAN

Ubuntu allows you to add and remove components from your system using apt-get. Perl allows similar functionality for maintaing Perl modules through the CPAN module’s shell (invoked by perl -MCPAN -eshell). Now if you are just going to add a Perl module to your system, should you be using apt-get or CPAN?

I prefer to use apt-get because this allows me to keep track of everything I’ve added to my system in one place. It also keeps things neat in case a given Perl module has binary dependencies. For 99% of the cases, I’ve found an apt package equivalent of a given CPAN module. The naming convention of the modules varies between CPAN and apt. For example, the package Algorithm::Diff at CPAN, is known as libalgortihm-diff-perl in the apt world.

So sometimes the trick is knowing the correct apt name for a CPAN module. Now in most cases, if you know the Perl name of a module – say XML::Simple then you can arrive at the apt name by converting the package name to lowercase, replacing the “::” with a “-”, prefixing “lib” and suffixing -perl to it. i.e.


echo "XML::Simple" | perl -e '$x=<>; chomp($x); $x=~s/::/-/; $x=lc($x); print "lib$x-perl"'

And if you want to automate things further:

sudo bash
echo "XML::Simple" | perl -e '$x=<>; chomp($x); $x=~s/::/-/; $x=lc($x); print "lib$x-perl"' | xargs apt-get install

Just bear in mind that there are exceptions to this naming convention. The one I ran into recently involved Sablotron; which is known as XML::Sablotron in the CPAN world, but as libxml-sablot-perl in the apt world.

Also, apt might not always have the most recent version of a given module. So you have to depend on CPAN if you need the latest, greatest version of a module.

Lastly, some modules might not have an apt version – for example at the moment Devel::PerlySense exists on CPAN alone.

Installing mod_perl on Ubuntu Server 8.10

I am at a stage in life where I am going to be writing a lot of Perl code again. My preferred OS is Mac OS since it already comes with Perl 5.8.8 and Apache 2.2.9 (as of Mac OS 10.5.6). Unfortunately, mod_perl that ships with Mac OS, is broken (segfaults!). You can use fink or macports to pull Apache/Perl/mod_perl that work but I figured that if I use Ubuntu, I also get to be close to my Debian production environment. Here is how I got a fresh Ubuntu 8.10 Server VM ready with mod_perl:

Getting started

At one stage during the installation of Ubuntu server, you’ll be asked what components you want installed. Pick LAMP at the very least. After booting up for the first time (and logging in), fire up the following commands:

sudo bash #fire up a root shell so that we don't have to sudo every command
apt-get update
apt-get dist-update
reboot

#install things that could come in handy later
sudo bash
apt-get emacs #skip this if you prefer vi - it's already there
apt-get install linux-headers-server build-essential

At this stage you’ll have the latest kernel running. I find the default 80×24 display a little too restrictive. We’ll fix that by editing /boot/grub/menu.lst. Open the file in emacs or whatever editor you like, and scroll down to the end to a bunch of options that look like title, uuid, kernel, initrd. Append vga=0x31A to the end of the first kernel statement. e.g. in my case

kernel /boot/vmlinuz-2.6.27-9-server root=UUID=d9f9cc35-d880-494d-8cd3-92da418a438b ro quiet splash

became

kernel /boot/vmlinuz-2.6.27-9-server root=UUID=d9f9cc35-d880-494d-8cd3-92da418a438b ro quiet splash vga=0x31A

Reboot.

vga=0x31A gives me a resolution of 1280×1024 and 64k colors. Here are other options that you can play with:

#  FRAMEBUFFER RESOLUTION SETTINGS
#     +-------------------------------------------------+
#          | 640x480    800x600    1024x768   1280x1024
#      ----+--------------------------------------------
#      256 | 0x301=769  0x303=771  0x305=773   0x307=775
#      32K | 0x310=784  0x313=787  0x316=790   0x319=793
#      64K | 0x311=785  0x314=788  0x317=791   0x31A=794
#      16M | 0x312=786  0x315=789  0x318=792   0x31B=795
#     +-------------------------------------------------+
Installing mod_perl

At this stage we already have Apache and Perl installed. If you do:
tail /var/log/apache2/error.log, you’ll see that out of the box, you only get support for PHP.

[Sun Dec 14 12:04:05 2008] [notice] Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4 with Suhosin-Patch configured -- resuming normal operations

Here is how you add mod_perl support:


sudo bash
apt-get install libapache2-mod-perl2

#restart apache so that it loads mod_perl
apache2ctl restart

#make sure that it did indeed load
tail /var/log/apache2/error.log

#if all went well, you'll see something to the effect of (emphasis mine):
[Sun Dec 14 12:19:17 2008] [notice] Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.0 configured -- resuming normal operations

Testing our mod_perl installation

Let’s write a simple mod_perl response handler to make sure our installation was successful. Create Hello.pm in your home directory – which is /home/deepakg/ on my machine:

package Hello;
use strict;

use Apache2::RequestRec ();
use Apache2::RequestIO ();

use Apache2::Const -compile => qw(OK);

sub handler {
    my $r = shift;

    $r->content_type('text/plain');
    print "Hello World, the time here is " . localtime() . "\n";

    return Apache2::Const::OK;
}

1;

Then to make sure that we didn’t make any typos:

perl -c Hello.pm
Hello.pm syntax OK

Next, open /etc/apache2/apache2.conf and type the following right at the end:

PerlRequire /home/deepakg/Hello.pm
<Location /time>
   SetHandler perl-script
   PerlResponseHandler Hello
</Location>

Restart apache and check the Apache error log to make sure that it started without any issues:

sudo apache2ctl restart
tail /var/log/apache2/error.log

Install lynx, so that you can check your handy work:

sudo apt-get install lynx

# and once it is installed
lynx http://localhost/time

If everything is working then you’ll be greeted with something like this:

Hello World, the time here is Sat Jan 10 15:25:51 2009

Of course, the actual date and time will vary on your system :) .

Miscellaneous

If the time shown by the script above looks awkward, your time zone might not have been configured correctly. Configure the time zone to where you are:

sudo dpkg-reconfigure tzdata

And then may be tweak the clock by hand if needed:

sudo date MMDDhhmm #MM - month, DD - date, hh - hour (24 format), mm - minute