Readable and compositional regexes in Perl
Regexes don’t (always!) have to be unreadable mess. For example see this HN post a little Clojure DSL for readable, compositional regexes. Here is the simple Clojure example that was given:
(def datestamp-re (let [d {\0 \9}] (regex [d d d d :as :year] \- [d d :as :month] \- [d d :as :day])))
And the equivalent Perl regex “DSL” can be equally lucid:
sub datestamp_re { qr/ (?<year> \d \d \d \d) - (?<month> \d \d) - (?<day> \d \d ) /x; }
The two things that provide a little extra help to grok whats going on here are:
- The
x
modifier on the end ofqr//
which allows whitespace and newlines to be sprinkled into your regex pattern without any effect on the pattern matching. See perlre Modifers - And “Named Capture Buffers” which were added at perl 5.10.
(?<year> \d{4}) # stores pattern matched in "year" buffer
Above not only gives a name to that capture buffer but provides an excellent visual placeholder to help describe what you are trying to do with the regex.
When processing named capture regexes the matches to patterns are recorded in the %+
hash variable:
for my $date (qw/2007-10-23 20X7-10-23/) { printf "year:%d, month:%d, day:%d\n", @+{qw/year month day/} if $date =~ datestamp_re; } # => year:2007, month:10, day:23
This is much more flexible for dealing with regex captures compared to positional $1, $2, $3, etc
. So not just more readable but more compositional:
# nice readable regex sub datestamp_re { my $year = qr/ (?<year> \d{4}) /x; my $month = qr/ (?<month> \d{2}) /x; my $day = qr/ (?<day> \d{2}) /x; qr/ $year - $month - $day /x; }
or:
# DRY regex sub datestamp_re { my %re = map { my ($name, $digits) = @$_; $name => qr/ (?<$name> \d{$digits}) /x; } [ year => 4 ], [ month => 2 ], [ day => 2 ]; qr/ $re{year} - $re{month} - $re{day} /x; }
and even:
# regex generator sub re { qr/ (?<$_[0]> $_[1] )/x } sub regex { my $pattern = join q{}, @_; qr/ $pattern /x; } sub datestamp_re { regex re( year => '\d{4}' ), '-', re( month => '\d{2}' ), '-', re( day => '\d{2}' ); }
Now that is a regex DSL π
Note that the %+
hash variable only captures the first occurrence in the relevant named buffer:
sub numbers_re { my $four = qr/ (?<four> \d{4}) /x; my $two = qr/ (?<two> \d{2}) /x; qr/ $four - $two - $two /x; } if ('2007-10-23' =~ numbers_re) { say 'four => ', $+{four}; say 'two => ', $+{two}; } # four => 2007 # two => 10
To get to the second $two (ie. 23) then use the %-
hash variable which stores all the captures in an array reference for relevant named buffer:
if ('2007-10-23' =~ numbers_re) { say 'two(s) => ', join ',' => @{ $-{two} }; } # two(s) => 10,23
/I3az/
PS. Please note that the WordPress syntax highlighter used is unfortunately upper-casing all code comments π¦
given/when – the Perl switch statement
Its been a while since my last post; summer holidays have inflicted upon me some big distractions! So to ease myself back in gently I’ll I’ve pulled out a (nearly finished but unposted) article from last year (in true Blue Peter tradition!)
This post was inspired by this nice & straightforward blog article: How A Ruby Case Statement Works And What You Can Do With It. I’ve simply converted the Ruby code to Perl and added some so called insightful notes π
Now what Ruby calls case/when
statements is more commonly known in computer parlance has the switch statement.
Perl itself has had such a switch statement for quite a while now:
use Switch; switch ($grade) { case "A" { say "Well done!" } case "B" { say "Try harder!" } case "C" { say "You need help!!!" } else { print "You just making it up" } }
However DON’T USE IT!!! Its a source filter and is being deprecated.
Instead use given/when which is the Perl 6 switch statement that was introduced at perl 5.10.
use strict; use warnings; use feature qw(switch say); print 'Enter your grade: '; chomp( my $grade = <> ); given ($grade) { when ('A') { say 'Well done!' } when ('B') { say 'Try harder!' } when ('C') { say 'You need help!!!' } default { say 'You are just making it up!' } }
And at perl 5.12 you could also use when
has a statement modifiers:
use 5.012; use warnings; print 'Enter your grade: '; chomp( my $grade = <> ); given ($grade) { say 'Well done!' when 'A'; say 'Try harder!' when 'B'; say 'You need help!!!' when 'C'; default { say 'You are just making it up!' } }
And even more enhancements are coming with perl 5.14 because given
now returns the last evaluated expression:
use 5.013; # 5.014 use warnings; print 'Enter your grade: '; chomp( my $grade = <> ); say do { given ($grade) { 'Well done!' when 'A'; 'Try harder!' when 'B'; 'You need help!!!' when 'C'; default { 'You are just making it up!' } } };
Which can lead to some very pleasant looking dry code.
But there is more to given/when
than just single value conditionals. You can also check multi-values and ranges:
use 5.012; use warnings; print 'Enter your grade: '; chomp( my $grade = <> ); given ($grade) { say 'You pretty smart' when ['A', 'B']; say 'You pretty dumb!!' when ['C', 'D']; default { say "You can't even use a computer" } }
And also pattern match using a regex:
use 5.012; use warnings; print 'Enter some text: '; chomp( my $some_string = <> ); given ($some_string) { say 'String has numbers' when /\d/; say 'String has letters' when /[a-zA-Z]/; default { say "String has no numbers or letters" } }
In fact you can use anything that Smart Matching can resolve.
And because smart matching is used under the hood by given/when
then you can customise it by overloading the ~~
smart match operator:
use 5.012; use warnings; { package Vehicle; use Moose; use overload '~~' => '_same_wheels', fallback => 1; has number_of_wheels => (is => 'rw', isa => 'Int'); sub _same_wheels { my ($self, $another_vehicle) = @_; $self->number_of_wheels == $another_vehicle->number_of_wheels; } } my $four_wheeler = Vehicle->new( number_of_wheels => 4 ); my $two_wheeler = Vehicle->new( number_of_wheels => 2 ); my $robin_reliant = Vehicle->new( number_of_wheels => 3 ); print 'Enter number of wheel for vehicles: '; chomp( my $wheels = <> ); my $vehicle = Vehicle->new( number_of_wheels => $wheels ); given ($vehicle) { say 'Vehicle has the same number of wheels as a two-wheeler!' when $two_wheeler; say 'Vehicle has the same number of wheels as a four-wheeler!' when $four_wheeler; say 'Gosh... is that Del Trotter!!' when $robin_reliant; default { say "Don't know of a vehicle with that wheel arrangement!" } }
Hopefully that was a nice lightweight introduction to given/when
. And I haven’t even touched upon break or continue π
/I3az/
Beautiful Perl
Trying to kill two birds with one stone here. There is a new site called UseTheSource where beautiful or interesting code can be posted.
So I thought this might be a good opportunity to get more Modern Perl code across the eyes of the unbelievers π
So to start the ball rolling I thought I’d post a link to the code used in the Microsoft Winter Scripting Games 2008, as mentioned by chromatic in his The Opposite of Modern post.
However typically Microsoft they’ve now broken the links. And even worse they seemed to have removed all content related to the Scripting Games from their websites π¦
Tut tut. Luckily the content can be retrieved via WayBackMachine. Here is the article linked in chromatic’s post.
So while the first bird was to bring UseTheSource to everyones attention, the second bird was to make sure the lost code was resurrected from the dead by saving it here.
This was the challenge:
The first event requires the script to request a 7-digit phone number from the user and then select a corresponding phoneword from the βwordlist.txtβ file. The letters in the phone number need to be mapped the same way they are represented on a North American phone dial
And below is the code:
use strict; use warnings; print "Enter 7 digit phone number: "; chomp(my $number = <>); die "Invalid number: '$number'\n" unless $number =~ /^\d{7}$/; my @groups = qw(ABC DEF GHI JKL MNO PRS TUV WXY); my($letters, $digits); foreach my $digit (2..9) { my $group = $groups[$digit-2]; $letters .= $group; $digits .= $digit x length($group); } open(my $fh, "<", "C:/Scripts/wordlist.txt") or die; while (<$fh>) { local $_ = $_; eval "tr/$letters\L$letters\E/$digits$digits/"; last if /^$number$/o; } print uc if defined;
This is Jan Dubois wonderful idiomatic Perl riposte to (what I believe is) Microsoft’s original stab at a Perl solution:
%dictionary = (); $dictionary{2} = "ABC"; $dictionary{3} = "DEF"; $dictionary{4} = "GHI"; $dictionary{5} = "JKL"; $dictionary{6} = "MNO"; $dictionary{7} = "PRS"; $dictionary{8} = "TUV"; $dictionary{9} = "WXY"; open (WordList, "C:\\Scripts\\WordList.txt"); @arrWordList = <WordList>; close (WordList); $strWordList = join(" ", @arrWordList); print "Please enter the phone number: "; $phonenumber = <>; $low = 0; $high = 3; do { $strCharacters = ""; for ($i = 0; $i < 7; $i++) { $intValue = substr($phonenumber, $i, 1); $letters = $dictionary{$intValue}; $b = int(rand($high) + $low); $substring = substr($letters, $b, 1); $strCharacters = $strCharacters . $substring } if ($strWordList =~ m/$strCharacters/i) { print $strCharacters; $x = 1; } } until $x == 1
Proof that you can really write FORTRAN in any language π
This article is now self referentially posted to UseTheSource.
BTW, If you post a link or code to UseTheSource by Monday, July 19 at 0800 BST then you are in with a chance of winning the “Beautiful Code” e-book.
Hopefully that will give you an incentive to waste a bit of time online this weekend π
/I3az/
PS. Additionally to this there was another wonderful solution posted by Gav.
PPS. Gav also provided a more idiomatic solution to Game Event 3. The original Microsoft solution can be found here on the WayBackMachine (if you dare look!). I’ve also posted Gav’s solution to this on UseSourceCode.
Sequence, Selection and Iteration
My bit of procrastination this afternoon was watching this entertaining and interesting keynote from RailsConf 2010 given by Robert Martin.
Here is the direct YouTube link
Don’t worry its not Rails or even Ruby loaded. The keynote goes through an interesting history of programming languages and on route Robert mentions Fortran, Lisp, Algol, BASIC, Simula, BCPL, B, C, Scheme, Smalltalk, C++, Perl, Java, Ruby, Erlang, C#, F#, Haskell and Clojure.
Hopefully I didn’t miss one? Strangely can’t recall Python being mentioned!?
/I3az/
Very API
In my last post about extending autobox::Core, I touched on API choices. So let me expand on this a little bit here and in some more in posts to come.
In all bar one of my autobox examples I used $_[0] to pass the iterator object. For eg:
['a'..'z']->iterate( sub{ print $_[0]->currentIndex, ' => ', $_[0]->current; });
But in the last code example I went for $_[0] having the value and $_[1] the iterator object:
['a'..'z']->iterate( sub{ print $_[1]->currentIndex, ' => ', $_[0]; });
And to help avoid getting cross-eyed with $_[] then this is nicer on the eye if a tad longer:
['a'..'z']->iterate( sub{ my ($v, $iter) = @_; print $iter->currentIndex, ' => ', $v; });
But hold on, is there a preference to which way round these should be assigned to @_? Would it be better other way round because value can be got at from the iterator object anyway? (one shift and away!)
So perhaps using $_ is best?
['a'..'z']->iterate( sub{ print $_->currentIndex, ' => ', $_->current; });
I like this a lot. But it would be amiss not to take advantage of using $_ for the value and $_[0] for iterator so that we can write code like this:
['a'..'z']->iterate( sub{ print $_[0]->getNext if m/[aeiou]/; });
However note the print $_[0]->getNext
and not just plain print
π¦
I don’t think there is a canonical answer to this but my preference is to use $_ has the iterator object… but with probably value & the iterator (again!) passed through @_ (ie. the kitchen sink approach!).
sub autobox::Core::ARRAY::iterate { my ($array, $code) = @_; my $iter = Array::Iterator->new( $array ); while ($iter->hasNext) { local $_ = $iter; $code->( $iter->getNext, $iter ); } return wantarray ? @$array : $array; }
Note the return wantarray ? @$array : $array;
addition. This keeps the autobox chaining going and is good adherence to autobox API (though in this iterator context it probably isn’t of much benefit).
/I3az/
PS. Let me just add “Perl” here because despite lots of perl code snippets I haven’t mentioned Perl, CPAN or Ironman at all and I’m still wary whether ironman will pick up the perl tag.
each to their own (autobox)
If you are a bit wary of using each on an @array then perhaps using autobox maybe more to your taste? Because autobox::Core comes with three looping methods:
- each
- foreach
- for
each iterates over a hash and works exactly the same as the perl function each (because internally it uses each so same caveats apply that I mentioned in my last post).
%hash->each( sub{ my ($k, $v) = @_; say "$k = $v"; });
The other two methods work on arrays:
my @array = ('a'..'z'); @array->foreach(sub { say $_[0] }); @array->for( sub{ my ($index, $value) = @_; say "$index : $value"; });
foreach just provides the current element. Whereas for works in similar way to how the each function on an array. However it also provides the array ref which allows us to repeat my “vowel” snippet from last post in autobox like so:
use 5.012; use warnings; use autobox::Core; my @tokens = 'a' .. 'z'; @tokens->for( sub{ my ($i, $v, $array) = @_; if ($v =~ m/[aeiou]/) { print $array->[ $i + 1 ]; # peeks ahead into @tokens } }); # => bfjpv
OK thats not quite the same because my function each snippet in last post was more optimal because it moved to next iteration in the array so saving a pass in the loop. Stayed tuned because i’ll come back to this later.
So what about my last caveat from previous post? Well foreach & for do not use the each function so its not a problem!
@tokens->for( sub{ my ($i, $v, $array) = @_; print $v; last if $i == 12; }); # => abcdefghijklm
However (potential caveat warning!) you will get a Exiting subroutine via last at… warning when using above π¦
This can be fixed:
@tokens->for( sub{ my ($i, $v, $array) = @_; print $v; if ($i == 12) { no warnings 'exiting'; last; } });
But I get the feeling I’ve brushed something under the carpet here!
So perhaps time to add a method to autobox::Core then:
sub autobox::Core::ARRAY::whileTrue { my ($array, $code) = @_; for (my $i = 0; $i <= $#$array; $i++) { my $return = $code->( $i, $array->[$i], $array ); last unless $return; } } # now go thru @tokens unless the block returns false value @tokens->whileTrue( sub{ my ($i, $v, $array) = @_; print $v; return 0 if $i == 12; 1; # <= always remember this! });
Hmmm… OK it works but is a bit hacky! (even wacky!!).
But something is needed because otherwise it will just iterate through whole array. So a much better way is to integrate something like the Array::Iterator CPAN module that I mentioned in the comments of my last post:
sub autobox::Core::ARRAY::iterate { my ($array, $code) = @_; my $iter = Array::Iterator->new( $array ); while ($iter->hasNext) { $iter->getNext; $code->( $iter ); } }
Now lets first jump back a bit and see how this works with my original snippets:
@tokens->iterate( sub{ # index => value print $_[0]->currentIndex, ' => ', $_[0]->current; }); @tokens->iterate( sub{ # print next token after a vowel print $_[0]->getNext if $_[0]->current =~ m/[aeiou]/; });
Wonderbar! And because we are using a full blown iterator it now pulls the next element off the array (see getNext method) just like function each but without any of the side effects!
So for last, its just a matter of subclassing of Array::Iterator with a last method:
use 5.012; use warnings; use autobox::Core; { package Array::Iterator::Last; use parent 'Array::Iterator'; sub last { my ($self) = @_; $self->_current_index = $self->getLength; } } sub autobox::Core::ARRAY::iterate { my ($array, $code) = @_; my $iter = Array::Iterator::Last->new( $array ); while ($iter->hasNext) { $code->($iter->getNext, $iter); } } ['a'..'z']->iterate( sub{ my ($val, $iter) = @_; print $val; $iter->last if $iter->currentIndex == 12; });
And bob’s your uncle π
NB. Now the keen eyed among you may have noticed that this time I passed both the element and the iterator object to the callback. Even with just methods, API choices are never easy! More on this in my next post.
/I3az/
each @array in Perl 5.12
Don’t recall seeing anyone blog about it and in fact the documentation is pretty sparse but from Perl 5.12 each can now iterate over an array like so:
use 5.012; use warnings; my @array = 'a' .. 'h'; while (my ($i, $val) = each @array) { say "$i => $val"; }
each on an array returns two values… the index and the value of the current iteration. Thus we get this output from above code:
0 => a 1 => b 2 => c 3 => d 4 => e 5 => f 6 => g 7 => h
Anyway to save me thinking up a blog post please excuse me while I just regurgitate my Stackoverflow answer I gave earlier today about using each with an array:
==start==
Starting with Perl 5.12, each is now more flexible by also working on arrays:
use 5.012; use warnings; my @tokens = 'a' .. 'z'; while (my ($i, $val) = each @tokens) { if ($val =~ m/[aeiou]/) { ($i, $val) = each @tokens; # get next token after a vowel print $val; } } # => bfjpv
One caveat with each, remember the iterator is global and is not reset if you break out of a loop.
For eg:
while (my ($i, $val) = each @tokens) { print $val; last if $i == 12; } # => abcdefghijklm my ($i, $val) = each @tokens; say "Now at => $val ($i)"; # Now at => n (13)
So use keys or values to manually reset the iterator:
keys @tokens; # resets iterator ($i, $val) = each @tokens; say "Now at => $val ($i)"; # Now at => a (0)
==end==
I’ve just come back from a nice weeks break… so I’m still a lazy git in holiday mode here π
/I3az/
Where did it go?
My last post a couple of days ago, What I Would Like To See, seemed to have hit the Ironman bitbucket π¦
There is no mention of Perl in the article (a first for me for probably over a year!). But it is tagged with “perl” and “ironman”. Is there an issue with WordPress tags? Is ironman only seeing WordPress categories (which were “blogging” and “programming” for this post).
Thought I better check what my WordPress RSS feed is spewing out using XML::Feed:
use 5.012; use warnings; use XML::Feed; my $feed = XML::Feed->parse(URI->new('https://transfixedbutnotdead.com/feed')) or die XML::Feed->errstr; for my $entry ($feed->entries) { say $entry->title; for my $category ($entry->category) { say "\t", $category; } }
What came back was this:
What I Would Like To See Blogging Programming perl ironman Perl block with lexically-scoped method delegation Programming perl Devel::Declare dsl PDL advert get its 15 minutes of fame! Programming PDL perl stackoverflow Perlcast website is back Blogging Programming perl perlcast Podcasts & Perl Programming perl podcast Famous Perl programmers Programming perl python django Caralyst 5.8: The Perl MVC Framework Programming catalyst perl Perl ads on Stackoverflow Programming perl stackoverflow Two questions⦠similar answers⦠same module Programming moose perl The unlikely intersection of Boy George and Damian Conway Programming london.pm perl
Both the WordPress tags & categories were coming through has RSS categories. I believe this is correct so why hasn’t Ironman picked up the “What I Would Like To See” post?
So is having Perl, CPAN or ironman in the content mandatory?
/I3az/
What I Would Like To See
I think I would love to see mst present a YAPC talk on Catalyst, Rails & Django – the shootout!
No web framework is perfect so I think it maybe insightful to highlight the best & worst features of all these frameworks and see how they stack up against each other.
And to provide complete impartiality for the talk then I think the hair colour should be a mixture of green for django, red for rails and eh, not sure what colour for catalyst?… so lets stick to mst’s natural hair colour for that π
/I3az/
PS. BTW… if you think my multi-colour choice was a bit OTT, then its fortunate I didn’t go with my original thought of suggesting using the colours of my favourite Rugby & Footie teams! That would have been eye catching mix of black & gold for London Wasps and the blue & white Superhoops π