Skip to content

Readable and compositional regexes in Perl

September 29, 2010

Regexes don’t (always!) have to be unreadable mess. For example see this HN post a little Clojure DSL for readable, compositional regexes. Here is the simple Clojure example that was given:

(def datestamp-re
  (let [d {\0 \9}]
    (regex [d d d d :as :year] \- [d d :as :month] \- [d d :as :day])))

And the equivalent Perl regex “DSL” can be equally lucid:

sub datestamp_re {
    qr/ (?<year> \d \d \d \d) - (?<month> \d \d) - (?<day> \d \d ) /x;
}

The two things that provide a little extra help to grok whats going on here are:

  1. The x modifier on the end of qr// which allows whitespace and newlines to be sprinkled into your regex pattern without any effect on the pattern matching. See perlre Modifers
  2. And “Named Capture Buffers” which were added at perl 5.10.
    (?<year> \d{4}) # stores pattern matched in "year" buffer

    Above not only gives a name to that capture buffer but provides an excellent visual placeholder to help describe what you are trying to do with the regex.

When processing named capture regexes the matches to patterns are recorded in the %+ hash variable:

for my $date (qw/2007-10-23 20X7-10-23/) {
    printf "year:%d, month:%d, day:%d\n", @+{qw/year month day/}
        if $date =~ datestamp_re;
}

# => year:2007, month:10, day:23

This is much more flexible for dealing with regex captures compared to positional $1, $2, $3, etc. So not just more readable but more compositional:

# nice readable regex
sub datestamp_re  {
     my $year  = qr/ (?<year>  \d{4}) /x;  
     my $month = qr/ (?<month> \d{2}) /x;
     my $day   = qr/ (?<day>   \d{2}) /x;
 
     qr/ $year - $month - $day /x;
}

or:

# DRY regex
sub datestamp_re {
    my %re = map { 
        my ($name, $digits) = @$_;
        $name => qr/ (?<$name>  \d{$digits}) /x;
    } [ year  => 4 ], [ month => 2 ], [ day   => 2 ];
    
    qr/ $re{year} - $re{month} - $re{day} /x;
}

and even:

# regex generator
sub re { qr/ (?<$_[0]> $_[1] )/x }

sub regex {
    my $pattern = join q{}, @_;
    qr/ $pattern /x;
}

sub datestamp_re {
    regex re( year => '\d{4}' ), '-', re( month => '\d{2}' ), '-', re( day => '\d{2}' );
}

Now that is a regex DSL :)

Note that the %+ hash variable only captures the first occurrence in the relevant named buffer:

sub numbers_re {
    my $four  = qr/ (?<four> \d{4}) /x;
    my $two   = qr/ (?<two>  \d{2}) /x;
    qr/ $four - $two - $two /x;
}

if ('2007-10-23' =~ numbers_re) {
    say 'four => ', $+{four};
    say 'two  => ', $+{two};
}

# four => 2007
# two  => 10

To get to the second $two (ie. 23) then use the %- hash variable which stores all the captures in an array reference for relevant named buffer:

if ('2007-10-23' =~ numbers_re) {
    say 'two(s) => ', join ',' => @{ $-{two} };
}

# two(s) => 10,23

/I3az/

PS. Please note that the WordPress syntax highlighter used is unfortunately upper-casing all code comments :(

About these ads
15 Comments leave one →
  1. DATA permalink
    September 30, 2010 7:05 am

    Thanks for pointing me to that, that’s cool!

  2. Martin permalink
    September 30, 2010 9:07 am

    Nice posting draegtun. I’d seen named pattern captures in 5.10 but without some worked examples showing how much more readable they can be I’ve so far not used them. I will now. Thanks.

    BTW, the form to leave a comment here is almost unreadable in my firefox – the line around the entry boxes is so faint I did not even see it at first.

  3. October 8, 2010 10:02 am

    Many thanks DATA & Martin.

    Martin,

    i) Yes its surprising that a lot of the new Perl features don’t seem to get expanded on much beyond the Perl Delta docs (http://perldoc.perl.org/index-history.html). Still it gives us the opportunity for something to blog about :)

    ii) re “blog form issues in Firefox” – There must be an issue with this WordPress theme . Its my favourite from a bad bunch that WordPress provide for free :(

    I’ll have a look to see if WP have provided something better now but my gut feeling is to move blog to a self-hosted (probably Perl based) solution. So more configurable and safer long term solution (and probably lots more work!) especially as I nearly chose Vox.com before settling on WordPress and look whats happened to them!

    regards Barry

  4. September 19, 2014 5:42 pm

    Thank youu ffor sharing your thoughts. I truly
    appreciate your efforts and I willl be waiting for your further write uups thank yyou once again.

  5. September 20, 2014 9:55 am

    Do you mind if I quote a couple of your posts as long as I provide credit and sources back to
    your website? My website is in the very same area of interest as yours and my visitors would
    truly benefit from some of the information you provide here.
    Please let me know if this alright with you. Cheers!

  6. September 20, 2014 1:45 pm

    I do believe all the concepts you have offered to your post.
    They’re very convincing and can definitely work. Still, the posts
    are very short for newbies. Could you please extend them a
    little from next time? Thank you for the post.

  7. September 24, 2014 10:38 pm

    Еxcellent post. Keep posting such kind of information on your page.
    Im really impressed Ƅy your sitе.
    Hello there, You have done an incredible job. I will definitely digg it and in my opinion suggest
    to my friends. I’m confident they will be benefitted from
    this sitе.

  8. September 27, 2014 3:01 am

    Come across this great site and decided to buy a steam shower and never ever glanced back again, incredible info here can not say thanks enough

  9. October 2, 2014 7:20 pm

    It’s the best time to make some plans for the longer term andd it is time to be happy.
    I have learn this put up aand if I may I wish to counsel you some attention-grabbing issues or tips.
    Maybe you can write next articles relating to this article.
    I desiree to read even more things approximately
    it!

  10. October 3, 2014 5:56 am

    Excellent weblog here! Also your site lot up fast! What web host aree you using?
    Can I get your associate hyperlink to yoyr host? I wish my website
    loaded up as quickly as yours lol

  11. October 5, 2014 2:08 pm

    As soon as all tooth are lined, let the combination sit on your tooth for 2 minutes.

  12. October 6, 2014 7:48 am

    Spot on with this write-up, I actually believe this website needs much
    more attention. I’ll probably be back again to see
    more, thanks for the advice!

  13. October 8, 2014 9:02 am

    It can be so hard finding a decent search engine optimisation freelancer nowadays,
    just going to do it myself I think
    You are now part of my weekly website bookmarks,
    keep up the good work!

  14. October 19, 2014 1:48 pm

    Excellent goods frdom you, man. I’ve understand your stuff previous too and you’re just too
    excellent. I really like what you have acquired here, certainly
    like whast you are saying and the wayy in which you say
    it. You make it entertaining and you still care for to keep it wise.

    I cant wasit to read far more from you. This is actually a terrific web site.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: