Tuesday, February 2, 2010

WURFL to Perl Data Script

WURFL to Perl data is an alternative method of implementing the WURFL XML Mobile Device Data file for smaller web projects that I have been working on for a while. Instead of parsing the complete WURFL XML file or loading the data to a database platform like MySQL, WURFL to Perl is an adaptation of the WURFL Data written as Perl code. That is, we use the WURFL Device Data as converted to Perl hashes and a configuration file.

This discussion assumes you have some familiarity with WURFL XML Device Detection Data. If not, please see http://wurfl.sourceforge.net/ for references and downloads of the actual WURFL XML data. Usage and adaptation is of coarse heavily dependent on your ability to do Perl programming too.

Let's Get Small


The first step is to condense WURFL Data down to just what we need for our particular project. WURLF Data is arguably the best and most complete reference for mobile device capabilities and as such, is a very large XML file. Large projects may need to have all device information readily available, but many times a project just needs certain device capabilities exposed. Below is a snippet from our configuration file which shows several configuration settings.

# Our configuration variables...
$config_vars = {

  wurfl_path => './lib/WURFL/',
  wurfl_file => './lib/WURFL/wurfl.xml',
  wurfl_status => "WURFL Status: OK!\n",
  wurfl_select => [
    'ajax', 'bearer', 'cache', 'css', 'display', 'image_format', 'markup', 'rss', 'transcoding', 'xhtml_ui'
  ],                          # WURFL groups to include
  wurfl_min_width => 169,  # Minimum resolution width to accept (>=)
  wurfl_min_xhtml => 1,    # Minimum xhtml markup level to accept (>=)
  wurfl_dump_indent => 0   # Sets 'Data::Dumper' indentation level, (0 is none)

}

The 'wurfl_select' hash key is probably the most important, as it specifies which WURFL 'group' data types will be included in the final WURFL to Perl code. I added 'wurfl_min_width' and 'wurfl_min_xhtml' as they were settings which were of interest in my projects. Others can be added here as applicable. The advantage is that data not required will be filtered away, making the Perl code smaller.

The configuration file also includes code for 'encoding' common string segments of 'user agent' text into abbreviated ones. Here is a sample:

# Sub to encode common long string segments in user agents
sub user_agent {

    # Input a user agent string
    my $ua = shift;
    
    # Remove non word characters
    $ua =~ s/_//g;
    $ua =~ s/\W+//g;

    # Encode many common long strings
    $ua =~ s/^3GSonyEricsson/3gse_/;
    $ua =~ s/^SonyEricsson/snye_/;
    $ua =~ s/^ACSNF30NE/ac31_/;
    $ua =~ s/^AUDIOVOX/audx_/;
    $ua =~ s/^Alcatel/alcl_/;
    $ua =~ s/^BlackBerry/blck_/;
    $ua =~ s/^Compal/cpl_/;
    $ua =~ s/^DoCoMo/dcm_/;
    ....
    ...
    ..
    .

We also encode device capabilites for brevity:

# Hash of abbreviations for WURFL standard capabilities
$abbrev_names = {

    'ajax' => {
        'ajax_manipulate_css' =>  'css',
        'ajax_manipulate_dom' =>  'dom',
        'ajax_support_getelementbyid' => 'byid',
        'ajax_support_inner_html' =>  'inner',
        'ajax_support_javascript' =>  'js',
 'ajax_support_events' =>  'events',
 'ajax_support_event_listener' => 'listen',
        'ajax_xhr_type' =>   'xhr'
    },

    'bearer' => {
 'max_data_rate' =>  'rate'
    },
    ....
    ...
    ..
    .


The configuration file is important because we use it to define and then access our WURFL to Perl data code. My 'perl_wurfl.cgi' script is used to create your version of WURFL to Perl code, base on your configuration settings. The 'perl_wurfl.cgi' script will massage and manipulate the latest WURFL XML Data file used, encoding common strings, filtering and condensing it down. The included finished WURFL to Perl code example files (from our download) show a reduction of approximately 10:1.

See is Believing



There is a lot going on here and the best way to see it is to install and run the 'perl_wurfl.cgi' script. Watch as it steps through each part of the process and you will get a feel for how it works and how much it is doing. Please note: the 'perl_wurfl.cgi' script will be very slow at first because of the 14+Mb size of the WURFL XML data file. So be patient while it is runnning. It also likes lots of memory ;)



The 'perl_wurfl.cgi' script does some fancy Perl comparisons and produces shortened 'unique' encoded user agent based hash keys. This is the reason your application will need to use the same configuration file as the 'perl_wurfl.cgi' script. From these Perl hash keys, we seek a 'best match' to the incoming user agent of the website visitor. This all works reasonably well because we use a combination of brute force searches (Regex) and textual data comparison.




From this point on, our process works much the same as other WURFL applications. We roll up the capabilities by combining each successive 'fall back' user agent (device id in WURFL terms) and provide a best guess of it's current browser capabilities in Perl hash form. The difference is, we use textual data comparison in the final device detection stage  instead of simple user agent string matching. This provides better 'new device' detection for new user agents that haven't been incorporated into the WURFL data yet.

Give it a Try



Also included in my download is the 'site_wurfl.cgi' script. This is actually the more important of the two. My 'site_wurfl.cgi' script demo's our WURFL to Perl data capabilities, while giving a realistic example of how it might be integrated into other Perl scripts. The 'perl_wurfl.cgi' script is only run when you download and update to the latest WURFL XML Data file. Thus 'perl_wurfl.cgi' is just a 'helper' script and should not be given any website user access of any kind. I update locally (yes my 4+ year old Toshiba laptop runs it just fine), which is the safest way to go.


Use the included 'site_wurfl.cgi' script as a guide to how we access our WURFL to Perl code. It is also a quick and convenient way to test configuration changes or new devices against current WURFL to Perl code. Or just use a 'User Agent Switcher' in your favorite browser and watch the process at work.


For more info, please see:


Demo the 'site_wurfl.cgi' script: Site-WURFL
Download Site-WURFL w/ WURFL-Perl code: http://code.google.com/p/mobilesiteos/
See WURFL to Perl in action on our website: OpenSiteMobile