Ruby 1.9 vs MacRuby – string handling

January 10th, 2010

Ruby 1.9, among other things, brings much needed improvements to the way unicode strings are handled. The string class now includes a property called encoding which tells us the – well – encoding of a given string. By default a string’s encoding is same as the encoding of the source file, which in turn can be set by using the coding comment. For example, to use utf8 as the source’s encoding (and to be able to use utf8 characters as part of string literals) you’d use: # -*- coding: utf-8 -*-

Let’s look at some sample code and its output under Ruby 1.9

Code:

# -*- coding: utf-8 -*-
str = "café"
puts "Encoding    : #{str.encoding}"
puts "Length      : #{str.length}"
puts "Byte Size   : #{str.bytesize}"
puts "#{str} in upper case is: #{str.upcase}"

Output:

Encoding    : UTF-8
Length      : 4
Byte Size   : 5
café in upper case is: CAFé

As is evident from the output above, Ruby 1.9 still doesn’t handle casing beyond the ASCII range. Upper casing café, gave us CAFé as opposed to CAFÉ (which is the correct response).

Also the byte size of the string is 5 because under the utf-8 encoding, é takes up two bytes – 0xC3, 0xE9.

MacRuby – to quote the project site – is a version of Ruby 1.9, ported to run directly on top of Mac OS X core technologies such as the Objective-C common runtime and garbage collector, and the CoreFoundation framework.

This means that the Ruby datatypes have been implemented on top of Mac “native” (Cocoa) datatypes – e.g. Ruby strings are implemented on top NSString.

This introduces some differences in the way strings are handled by MacRuby. To start with, non-uncode strings use the ‘MACINTOSH’ encoding (Ruby 1.9 default is US-ASCII) while the unicode strings use utf-16 (even if you’ve set the coding comment to use utf-8). MacRuby also handles casing correctly.

So the same code snippet as above gives different output:

Output:

Encoding    : UTF-16
Length      : 4
Byte Size   : 4
café in upper case is: CAFÉ

Note that casing is handled correctly by MacRuby.

The byte size for the string is 4 because É is 0xE9 under the utf-16 encoding. Technically most characters, when using the utf-16 encoding, should take up 2 bytes (c should be 0×0043, é should be 0×00E9 and so on) but I guess the most significant byte is not used if it is 0×00.

p.s. the versions of the products used in the examples above are:

1. Ruby – 1.9.1p376 (2009-12-07 revision 26041) [i386-darwin10] (installed via macports 1.8.2)

2. MacRuby version 0.5 (ruby 1.9.0) [universal-darwin10.0, x86_64] (binary distribution from the official MacRuby project site)

deepakg Uncategorized

Number-only Captcha

November 22nd, 2009

I was recently looking for a Perl Captcha implementation that’d generate number-only Captchas. I looked inside Authen::Captcha and figured I could subclass it do just numeric captchas. It turned out to be a lot simpler than I thought it would be. Here is the code:

package Authen::NumCaptcha;

use strict;
use base qw(Authen::Captcha);

sub new {
    my $class = shift;
    my $captcha = $class->SUPER::new( @_ );
    return bless $captcha, $class;
}

sub generate_random_string {
    my $self = shift;
    my $length = shift;

    my $code = "";
    my $char;

    for (my $i=0; $i < $length; $i++) {
        $char = int(rand 7)+50;
        $char = chr($char);
        $code .= $char;
    }

    return $code;
}

1;

We basically override Authen::Captcha's generate_random_string and only use ASCII characters 50 to 57 (digits 2 to 9). Authen::Captcha doesn't include the digits 0 and 1 because they could be confused with oh (O) and lower case elle (l) or upper case ai (I).

To use Authen::NumCaptcha, place it under /usr/share/perl5/Authen/ (or wherever you have Authen::Captcha installed) alongside Captcha.pm. Then use it as you'd use Auhten::Captcha. For example:

use Authen::NumCaptcha;

my $num_chars = 6;
my $num_captcha = Authen::NumCaptcha->new (
                                          data_folder => '/tmp',
                                          output_folder => '/tmp'
                                          );

my $num_md5sum = $num_captcha->generate_code($num_chars);

deepakg perl , ,

Installing VMWare tools on Debain Lenny/Etch

September 19th, 2009

If you are building an X-less command-line-only Debian VM, here is what you’ll need to do in order to install VMWare tools so that you can use features like shared folders:

1. Login as root or su.
2. Make sure you have installed the kernel sources and build tools.

apt-get install build-essential

Then run:

apt-get install linux-headers

This will probably give you a message saying: “Package linux-headers is a virtual package provided by:”, followed by a list of available kernel versions. Choose your version by looking it up:

cat /proc/version

or

uname -a

e.g. in my case I chose:

apt-get install linux-headers-2.6-2-amd64

3. Next choose Virtual Machine -> Install VMWare Tools option from the menu (this assumes VMWare Fusion running on Mac but there should be a similar option for VMWare Workstation on Linux and Windows as well).

Now mount the VMWare tools virtual CD:


mount /cdrom
cd /cdrom

Copy VMWare tool source code to /tmp and extract the files (the actual filename will depend on your VMWare version [which the VMWare installer would have reported in case of a mismatch]):

cp VMWareTools-7.9.6-173382.tar.gz /tmp
cd /tmp
tar -xzvf VMWareTools-7.9.6-173382.tar.gz

At this stage you should have a folder called vmware-tools-distrib inside your /tmp folder. Visit this folder and run vmware-install.pl, then follow the on-screen instructions.

cd /tmp/vmware-tools-distrib
./vmware-install.pl

Now at one stage the VMWare tool installer complained about my installed gcc (4.3) version being different from the gcc version that was used to compile my kernel (4.1). Press Ctrl+C to abort the installer at this stage.

Turns out that gcc is merely a symbolic link in /usr/bin folder. Chances are that you’ll have 2 versions of gcc on your system (e.g. Lenny comes with /usr/bin/gcc-4.3 and /usr/bin/gcc-4.1). Just make the symlink point to the version of gcc that was used to compile your kernel:

cd /usr/bin
rm gcc
ln -s /usr/bin/gcc-4.1 gcc

Run vmware-install.pl again and this time things should go through. Restart your VM to finish the installation. You should now be able to see folders shared from the host machine under /mnt/hgfs.

Most of these steps should also apply to Ubuntu.

deepakg configuration, debian, installation, ubuntu ,

TechEd India 2009 – jQuery Presentation

May 17th, 2009

I was at Microsoft TechEd India 2009 yesterday and presented a session titled – jQuery – the ‘write less do more’ javascript library. You can download the slides and demos here (Zip file, 1.07 MB).

The demos on Ajax will require you to place the files in the Demos/3-Ajax folder on a web server (http://localhost will do). You’ll also need to get the php files up and running. Finally, you’ll need to edit default.htm in the Demos/3-Ajax folder to change the paths to point to where you have hosted the php files.

deepakg Uncategorized

PerlAuthenHandler

February 22nd, 2009

Among other things, mod_perl allows you to write authentication handlers that fit nicely into Apache’s authentication scheme. A Perl authentication handler is a Perl module that decides whether a given user gets access to a resource or not. How you determine whether the user is allowed or denied access is up to you.

The mod_perl documentation comes with a sample that checks the length of the username+space+password supplied by a user and allows the user access only if the length is 14 characters long. Here is what the code looks like:

  package MyApache2::SecretLengthAuth;

  use strict;
  use warnings;

  use Apache2::Access ();
  use Apache2::RequestUtil ();

  use Apache2::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

  use constant SECRET_LENGTH => 14;

  sub handler {
      my $r = shift;

      my ($status, $password) = $r->get_basic_auth_pw;
      return $status unless $status == Apache2::Const::OK;

      return Apache2::Const::OK
          if SECRET_LENGTH == length join " ", $r->user, $password;

      $r->note_basic_auth_failure;
      return Apache2::Const::HTTP_UNAUTHORIZED;
  }

  1;

And you hook this handler up in your apache2.conf as:

  <Location / >
      PerlAuthenHandler MyApache2::SecretLengthAuth
      AuthType Basic
      AuthName "The Gate"
      Require valid-user
  </Location>

As the documentation points out, the authentication handler can be configured for any sub section of the site, it doesn’t matter if it is served by a mod_perl response handler or not.

Now obviously in real life, you’d want an authentication handler that does a little more – like verify the password supplied by the user against a database or LDAP . Here is a version that I wrote that uses Perl’s Net::POP3 package to verify user’s credential against a POP 3 server.

package Apache::PopAuth;

use strict;
use warnings;
use Net::POP3;

use Apache2::Access ();
use Apache2::RequestUtil ();
use Apache2::RequestRec ();
use Apache2::Const -compile => qw(OK DECLINED HTTP_UNAUTHORIZED);

sub handler {
    my $r = shift;
    my ($status, $password) = $r->get_basic_auth_pw;
    my $user = $r->user;

    return $status unless $status == Apache2::Const::OK;

    return Apache2::Const::OK
	   if valid_user($user, $password);

    $r->note_basic_auth_failure;
    return Apache2::Const::HTTP_UNAUTHORIZED;
}

sub valid_user {
    my($user, $password) = @_;
    my $pop = Net::POP3->new('pop.yourdomain.com');
    my $status = $pop->login($user, $password);
    $pop->quit();
    return defined($status);
}
1;

This saves you from having to maintain a separate set of users on your web server and passes on the baton of user management to the POP server. One word of caution – this just proof of concept code and I am not sure on well it will scale outside a small intranet.

deepakg perl ,

Ruby-like difference between two arrays in JavaScript

January 15th, 2009

Ruby has a nifty feature that allows you to “subtract” two arrays. e.g.

Fruit = ["Apple", "Kinnow", "Mango", "Orange"]
Citrus = ["Lemon", "Kinnow", "Orange", "Tangerine"]

Then Fruit – Citrus gives:
["Apple", "Mango"]

Notice that elements in Citrus not in Fruit (Lemon, Tangerine) are not in the difference.

Now I needed something similar in Javascript. So I started by pushing my luck:

var Fruit = ["Apple", "Kinnow", "Mango", "Orange"];
var Citrus = ["Lemon", "Kinnow", "Orange", "Tangerine"];
var Diff = Fruit - Citrus;

Depending on where you are running this code, Diff will be 0 or NaN. This meant that I would have to come up with something of my own. I figured I’ll put javascript’s regular expressions to some use and came up with this:

function diffArrays (A, B) {

  var strA = ":" + A.join("::") + ":";
  var strB = ":" +  B.join(":|:") + ":";

  var reg = new RegExp("(" + strB + ")","gi");

  var strDiff = strA.replace(reg,"").replace(/^:/,"").replace(/:$/,"");

  var arrDiff = strDiff.split("::");

  return arrDiff;
}

diffArrays(Fruit, Citrus) gives:
["Apple", "Mango"]

The thing with dynamically typed languages is that you can take even numeric arrays through this regular expression rigmarole and get a correct result back without anyone complaining:

var Natural = [1,2,3,4,5,6];
var Prime = [1,2,3,5,7];

diffArrays(Natural, Prime) gives:
[4, 6]

A brief explanation of how this works:

I start by converting the first array to a string delimited with two colons, :: – the choice was arbitrary. The fact that regular expressions don’t treat a colon any specially helps. I then prefix and suffix the resulting string with another colon. The idea here is to have each element “surrounded” by its own pair of colons. For instance, [1,2,3,4,5,6] becomes :1::2::3::4::5::6:

The array to be subtracted from it – say [1,2,3,5] is converted to the form – :1:|:2:|:3:|:5: which I then use to create a regular expression. The pipe | has a special meaning in the regular expression world and will cause :1: or :2: or :3: or :5: to be matched.

Calling strA.replace with our just-crafted regular expression replaces :1:, :2:, :3: and :5: with “” giving us :4::6:, which I strip clean of the leading and trailing colons through another couple of replace calls. Finally, spitting the string on :: gives the delta array!

Now this implementation is certainly not going to win me any prizes in speed/elegance pageants, but I think this somewhat awkward application of regular expression was something worth sharing!

On a somewhat related note, a special thanks goes out to Steve Yegge for his wonderful js2-mode and ejacs. The latter especially comes in handy when you are just “doodling” around on JavaScript problems like these.

deepakg javascript , ,

Install and enable Apache2::Request on Ubuntu Server 8.10

January 13th, 2009

A mod_perl handler can parse the incoming client request (querystring, form post data etc) using Apache2::Request. It is *not* installed when you install mod_perl. Getting it working is a 3 step process.

First issue the following command:

sudo apt-get install libapreq2

This installs 2 things – the libapreq2 shared library, and an Apache module – mod_apreq2.

Next we install the Perl bindings – Apache2::Request – which we use in our handler code.

sudo apt-get install libapache2-request-perl

At this stage if you restart Apache, it will load your Perl handler without any complaints. However if you visit a handler that uses Apache2::Request, it’ll error out with the following entry in error.log:


/usr/sbin/apache2: symbol lookup error: /usr/lib/perl5/auto/APR/Request/Apache2/Apache2.so: undefined symbol: apreq_handle_apache2

This is because unlike our mod_perl installation, apt-get doesn’t enable mod_apreq2 after installing it. We enable it manually by creating a symbolic link to /etc/apache2/mods-available/apreq.load under /etc/apache2/mods-enabled/:


sudo bash
cd /etc/apache2/mods-enabled
ln -s ../mods-available/apreq.load
apache2ctl restart

Updated: You can also run a2enmod to enable an Apache module and use a2dismod to disable it.

Visit your handler url again and this time it should work withour errors. Here is the modified handler from the mod_perl installation post that now uses Apache2::Request to enable you to test if your installation is working correctly:

package Hello;
use strict;

use Apache2::RequestRec ();
use Apache2::RequestIO ();

use Apache2::Const -compile => qw(OK);

use Apache2::Request;

sub handler {
    my $r = shift;

    $r->content_type('text/plain');

    my $req = Apache2::Request->new($r);
    my $name = $req->param("name");
    $name = $name ? $name : "World";

    print "Hello $name, the time here is " . localtime() . "\n";

    return Apache2::Const::OK;
}

1;

deepakg configuration, installation, perl, ubuntu , , ,

CPAN modules on Ubuntu: apt-get vs. perl -MCPAN

January 12th, 2009

Ubuntu allows you to add and remove components from your system using apt-get. Perl allows similar functionality for maintaing Perl modules through the CPAN module’s shell (invoked by perl -MCPAN -eshell). Now if you are just going to add a Perl module to your system, should you be using apt-get or CPAN?

I prefer to use apt-get because this allows me to keep track of everything I’ve added to my system in one place. It also keeps things neat in case a given Perl module has binary dependencies. For 99% of the cases, I’ve found an apt package equivalent of a given CPAN module. The naming convention of the modules varies between CPAN and apt. For example, the package Algorithm::Diff at CPAN, is known as libalgortihm-diff-perl in the apt world.

So sometimes the trick is knowing the correct apt name for a CPAN module. Now in most cases, if you know the Perl name of a module – say XML::Simple then you can arrive at the apt name by converting the package name to lowercase, replacing the “::” with a “-”, prefixing “lib” and suffixing -perl to it. i.e.


echo "XML::Simple" | perl -e '$x=<>; chomp($x); $x=~s/::/-/; $x=lc($x); print "lib$x-perl"'

And if you want to automate things further:

sudo bash
echo "XML::Simple" | perl -e '$x=<>; chomp($x); $x=~s/::/-/; $x=lc($x); print "lib$x-perl"' | xargs apt-get install

Just bear in mind that there are exceptions to this naming convention. The one I ran into recently involved Sablotron; which is known as XML::Sablotron in the CPAN world, but as libxml-sablot-perl in the apt world.

Also, apt might not always have the most recent version of a given module. So you have to depend on CPAN if you need the latest, greatest version of a module.

Lastly, some modules might not have an apt version – for example at the moment Devel::PerlySense exists on CPAN alone.

deepakg installation, perl, ubuntu , ,

Installing mod_perl on Ubuntu Server 8.10

January 10th, 2009

I am at a stage in life where I am going to be writing a lot of Perl code again. My preferred OS is Mac OS since it already comes with Perl 5.8.8 and Apache 2.2.9 (as of Mac OS 10.5.6). Unfortunately, mod_perl that ships with Mac OS, is broken (segfaults!). You can use fink or macports to pull Apache/Perl/mod_perl that work but I figured that if I use Ubuntu, I also get to be close to my Debian production environment. Here is how I got a fresh Ubuntu 8.10 Server VM ready with mod_perl:

Getting started

At one stage during the installation of Ubuntu server, you’ll be asked what components you want installed. Pick LAMP at the very least. After booting up for the first time (and logging in), fire up the following commands:

sudo bash #fire up a root shell so that we don't have to sudo every command
apt-get update
apt-get dist-update
reboot

#install things that could come in handy later
sudo bash
apt-get emacs #skip this if you prefer vi - it's already there
apt-get install linux-headers-server build-essential

At this stage you’ll have the latest kernel running. I find the default 80×24 display a little too restrictive. We’ll fix that by editing /boot/grub/menu.lst. Open the file in emacs or whatever editor you like, and scroll down to the end to a bunch of options that look like title, uuid, kernel, initrd. Append vga=0×31A to the end of the first kernel statement. e.g. in my case

kernel /boot/vmlinuz-2.6.27-9-server root=UUID=d9f9cc35-d880-494d-8cd3-92da418a438b ro quiet splash

became

kernel /boot/vmlinuz-2.6.27-9-server root=UUID=d9f9cc35-d880-494d-8cd3-92da418a438b ro quiet splash vga=0x31A

Reboot.

vga=0×31A gives me a resolution of 1280×1024 and 64k colors. Here are other options that you can play with:

#  FRAMEBUFFER RESOLUTION SETTINGS
#     +-------------------------------------------------+
#          | 640x480    800x600    1024x768   1280x1024
#      ----+--------------------------------------------
#      256 | 0x301=769  0x303=771  0x305=773   0x307=775
#      32K | 0x310=784  0x313=787  0x316=790   0x319=793
#      64K | 0x311=785  0x314=788  0x317=791   0x31A=794
#      16M | 0x312=786  0x315=789  0x318=792   0x31B=795
#     +-------------------------------------------------+
Installing mod_perl

At this stage we already have Apache and Perl installed. If you do:
tail /var/log/apache2/error.log, you’ll see that out of the box, you only get support for PHP.

[Sun Dec 14 12:04:05 2008] [notice] Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4 with Suhosin-Patch configured -- resuming normal operations

Here is how you add mod_perl support:


sudo bash
apt-get install libapache2-mod-perl2

#restart apache so that it loads mod_perl
apache2ctl restart

#make sure that it did indeed load
tail /var/log/apache2/error.log

#if all went well, you'll see something to the effect of (emphasis mine):
[Sun Dec 14 12:19:17 2008] [notice] Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.0 configured -- resuming normal operations

Testing our mod_perl installation

Let’s write a simple mod_perl response handler to make sure our installation was successful. Create Hello.pm in your home directory – which is /home/deepakg/ on my machine:

package Hello;
use strict;

use Apache2::RequestRec ();
use Apache2::RequestIO ();

use Apache2::Const -compile => qw(OK);

sub handler {
    my $r = shift;

    $r->content_type('text/plain');
    print "Hello World, the time here is " . localtime() . "\n";

    return Apache2::Const::OK;
}

1;

Then to make sure that we didn’t make any typos:

perl -c Hello.pm
Hello.pm syntax OK

Next, open /etc/apache2/apache2.conf and type the following right at the end:

PerlRequire /home/deepakg/Hello.pm
<Location /time>
   SetHandler perl-script
   PerlResponseHandler Hello
</Location>

Restart apache and check the Apache error log to make sure that it started without any issues:

sudo apache2ctl restart
tail /var/log/apache2/error.log

Install lynx, so that you can check your handy work:

sudo apt-get install lynx

# and once it is installed
lynx http://localhost/time

If everything is working then you’ll be greeted with something like this:

Hello World, the time here is Sat Jan 10 15:25:51 2009

Of course, the actual date and time will vary on your system :) .

Miscellaneous

If the time shown by the script above looks awkward, your time zone might not have been configured correctly. Configure the time zone to where you are:

sudo dpkg-reconfigure tzdata

And then may be tweak the clock by hand if needed:

sudo date MMDDhhmm #MM - month, DD - date, hh - hour (24 format), mm - minute

deepakg configuration, installation, perl, ubuntu , , ,

Hello World

January 9th, 2009

Ah, I didn’t have to think too hard about the title of this blog post. Since this is a blog about programming, a Hello World should suffice.

deepakg Uncategorized