Skip to content

Readable and compositional regexes in Perl

September 29, 2010

Regexes don’t (always!) have to be unreadable mess. For example see this HN post a little Clojure DSL for readable, compositional regexes. Here is the simple Clojure example that was given:

(def datestamp-re
  (let [d {\0 \9}]
    (regex [d d d d :as :year] \- [d d :as :month] \- [d d :as :day])))

And the equivalent Perl regex “DSL” can be equally lucid:

sub datestamp_re {
    qr/ (?<year> \d \d \d \d) - (?<month> \d \d) - (?<day> \d \d ) /x;
}

The two things that provide a little extra help to grok whats going on here are:

  1. The x modifier on the end of qr// which allows whitespace and newlines to be sprinkled into your regex pattern without any effect on the pattern matching. See perlre Modifers
  2. And “Named Capture Buffers” which were added at perl 5.10.
    (?<year> \d{4}) # stores pattern matched in "year" buffer

    Above not only gives a name to that capture buffer but provides an excellent visual placeholder to help describe what you are trying to do with the regex.

When processing named capture regexes the matches to patterns are recorded in the %+ hash variable:

for my $date (qw/2007-10-23 20X7-10-23/) {
    printf "year:%d, month:%d, day:%d\n", @+{qw/year month day/}
        if $date =~ datestamp_re;
}

# => year:2007, month:10, day:23

This is much more flexible for dealing with regex captures compared to positional $1, $2, $3, etc. So not just more readable but more compositional:

# nice readable regex
sub datestamp_re  {
     my $year  = qr/ (?<year>  \d{4}) /x;  
     my $month = qr/ (?<month> \d{2}) /x;
     my $day   = qr/ (?<day>   \d{2}) /x;
 
     qr/ $year - $month - $day /x;
}

or:

# DRY regex
sub datestamp_re {
    my %re = map { 
        my ($name, $digits) = @$_;
        $name => qr/ (?<$name>  \d{$digits}) /x;
    } [ year  => 4 ], [ month => 2 ], [ day   => 2 ];
    
    qr/ $re{year} - $re{month} - $re{day} /x;
}

and even:

# regex generator
sub re { qr/ (?<$_[0]> $_[1] )/x }

sub regex {
    my $pattern = join q{}, @_;
    qr/ $pattern /x;
}

sub datestamp_re {
    regex re( year => '\d{4}' ), '-', re( month => '\d{2}' ), '-', re( day => '\d{2}' );
}

Now that is a regex DSL :)

Note that the %+ hash variable only captures the first occurrence in the relevant named buffer:

sub numbers_re {
    my $four  = qr/ (?<four> \d{4}) /x;
    my $two   = qr/ (?<two>  \d{2}) /x;
    qr/ $four - $two - $two /x;
}

if ('2007-10-23' =~ numbers_re) {
    say 'four => ', $+{four};
    say 'two  => ', $+{two};
}

# four => 2007
# two  => 10

To get to the second $two (ie. 23) then use the %- hash variable which stores all the captures in an array reference for relevant named buffer:

if ('2007-10-23' =~ numbers_re) {
    say 'two(s) => ', join ',' => @{ $-{two} };
}

# two(s) => 10,23

/I3az/

PS. Please note that the WordPress syntax highlighter used is unfortunately upper-casing all code comments :(

About these ads
25 Comments leave one →
  1. DATA permalink
    September 30, 2010 7:05 am

    Thanks for pointing me to that, that’s cool!

  2. Martin permalink
    September 30, 2010 9:07 am

    Nice posting draegtun. I’d seen named pattern captures in 5.10 but without some worked examples showing how much more readable they can be I’ve so far not used them. I will now. Thanks.

    BTW, the form to leave a comment here is almost unreadable in my firefox – the line around the entry boxes is so faint I did not even see it at first.

  3. October 8, 2010 10:02 am

    Many thanks DATA & Martin.

    Martin,

    i) Yes its surprising that a lot of the new Perl features don’t seem to get expanded on much beyond the Perl Delta docs (http://perldoc.perl.org/index-history.html). Still it gives us the opportunity for something to blog about :)

    ii) re “blog form issues in Firefox” – There must be an issue with this WordPress theme . Its my favourite from a bad bunch that WordPress provide for free :(

    I’ll have a look to see if WP have provided something better now but my gut feeling is to move blog to a self-hosted (probably Perl based) solution. So more configurable and safer long term solution (and probably lots more work!) especially as I nearly chose Vox.com before settling on WordPress and look whats happened to them!

    regards Barry

  4. July 22, 2014 10:43 pm

    Greetings from Idaho! I’m bored to tears at work so I decided to check out your blog on my iphone
    during lunch break. I love the knowledge you provide here and can’t wait to take a
    look when I get home. I’m amazed at how quick your blog loaded on my cell
    phone .. I’m not even using WIFI, just 3G .. Anyhow, superb blog!

  5. July 29, 2014 3:05 am

    Hello there, I do believe your website may bbe having internet browser
    compatibility problems. When I look at your web site
    in Safari, it looks fine however, if opening in Internet Explorer, it’s ggot some overlapping
    issues. I just wanted to give you a quick heads
    up! Other than that, great site!

  6. August 1, 2014 6:50 pm

    Great blog here! Also your site loads up very fast! What web host are you using?
    Can I get your affiliate link to your host? I wish my web site
    loaded up as fast as yours lol

  7. August 4, 2014 1:11 pm

    I think this is among the most significant info for me.
    And i am glad reading your article. But should remark on some general things, The site style is great, the
    articles is really excellent : D. Good job,
    cheers

  8. August 7, 2014 9:56 pm

    Adore all these steam showers

  9. August 7, 2014 10:53 pm

    Fantastic web site, really been looking forever for
    ideas on the perfect rattan furniture pieces for our home and in our patio.

    This website really helpedgreat blog some great info here

  10. August 8, 2014 5:35 am

    Yesterday, while I was at work, my cousin stole my apple ipad and tested to seee if it can survife a
    25 foot drop, just so she can be a youtube sensation. My apple ipad is now destroyed and she has 83 views.
    I know this is totally off topic but I had to share it with someone!

  11. August 8, 2014 6:01 pm

    Write more, thats all I have to say. Literally, it seems as though you relied
    on the video to make your point. You definitely know what youre talking about, why throw away your intelligence on just posting videos to your blog when you could be giving us something enlightening to read?

  12. August 9, 2014 7:50 am

    Therefore, once you have it, the engine must
    be stopped and the condition causing automobile the trouble must be remedied.

    There are a number advantages and disadvantages when it comes to buying used car parts for sale advertisements.
    Usually, the car parts prices so that right transactions can be made into
    a variety of bodies and autoimobile platforms. Sure it’s nice, clean, and uncluttered on the outside it wasn’t as efficient.
    There is this particular is automobile not true, however.

  13. August 10, 2014 2:36 am

    I think this is among the so much significant info for
    me. And i am glad reading your article. But should commentary
    on few common things, The web site taste is perfect, the articles is in reality great :
    D. Just right activity, cheers

  14. August 13, 2014 3:40 pm

    What’s up Dear, are you genuinely visiting this site daily, if so after that
    you will absolutely get good knowledge.

  15. August 14, 2014 8:07 pm

    If after your 1st month or even inside the first couple of months, if your not making any money,
    you will receive a full refund. In any case, the Sanyo VPC-GH4 Camcorder is a great buy for anyone who would like to
    start taking a higher quality video. We find these sponsored links or sponsored results alongside the search tools,
    usually on the right of the results of referencing natural.

  16. August 16, 2014 10:48 pm

    There’s definately a lot to know about this subject.

    I really like all of the points you have made.

  17. August 22, 2014 3:40 am

    A great deal of great guidance on this site,
    truly want a steam shower unit in my bathroom

  18. September 4, 2014 1:09 am

    We ended up in a lot of trouble with our last SEO company we hired, I been doing a
    bit of research on it myself and next time we are probably going to hire someone in-house
    to do it
    Readable and compositional regexes in Perl – The latest addition to
    my RSS feed

  19. September 6, 2014 10:13 am

    Hey there! I know this is somewhat off topic but I was wondering if yyou knew where I
    could get a captcha plugin for my comment form?
    I’m using the same blog platform aas yours and I’m having trouble finding one?
    Thankms a lot!

  20. September 19, 2014 5:42 pm

    Thank youu ffor sharing your thoughts. I truly
    appreciate your efforts and I willl be waiting for your further write uups thank yyou once again.

  21. September 20, 2014 9:55 am

    Do you mind if I quote a couple of your posts as long as I provide credit and sources back to
    your website? My website is in the very same area of interest as yours and my visitors would
    truly benefit from some of the information you provide here.
    Please let me know if this alright with you. Cheers!

  22. September 20, 2014 1:45 pm

    I do believe all the concepts you have offered to your post.
    They’re very convincing and can definitely work. Still, the posts
    are very short for newbies. Could you please extend them a
    little from next time? Thank you for the post.

  23. September 24, 2014 10:38 pm

    Еxcellent post. Keep posting such kind of information on your page.
    Im really impressed Ƅy your sitе.
    Hello there, You have done an incredible job. I will definitely digg it and in my opinion suggest
    to my friends. I’m confident they will be benefitted from
    this sitе.

  24. September 27, 2014 3:01 am

    Come across this great site and decided to buy a steam shower and never ever glanced back again, incredible info here can not say thanks enough

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: