<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>transfixed but not dead! &#187; regex</title>
	<atom:link href="http://transfixedbutnotdead.com/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://transfixedbutnotdead.com</link>
	<description>my ramblings on life, work &#38; anything left in-between</description>
	<lastBuildDate>Tue, 08 May 2012 09:57:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='transfixedbutnotdead.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/0a317653027efb1ab2bf8adde3dcb067?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>transfixed but not dead! &#187; regex</title>
		<link>http://transfixedbutnotdead.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://transfixedbutnotdead.com/osd.xml" title="transfixed but not dead!" />
	<atom:link rel='hub' href='http://transfixedbutnotdead.com/?pushpress=hub'/>
		<item>
		<title>Readable and compositional regexes in Perl</title>
		<link>http://transfixedbutnotdead.com/2010/09/29/readable-and-compositional-regexes-in-perl/</link>
		<comments>http://transfixedbutnotdead.com/2010/09/29/readable-and-compositional-regexes-in-perl/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 15:39:44 +0000</pubDate>
		<dc:creator>draegtun</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[clojure]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[perl 5.10]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://transfixedbutnotdead.com/?p=1091</guid>
		<description><![CDATA[Regexes don&#8217;t (always!) have to be unreadable mess. For example see this HN post a little Clojure DSL for readable, compositional regexes. Here is the simple Clojure example that was given: And the equivalent Perl regex &#8220;DSL&#8221; can be equally lucid: The two things that provide a little extra help to grok whats going on [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=transfixedbutnotdead.com&#038;blog=351142&#038;post=1091&#038;subd=draegtun&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Regexes don&#8217;t (always!) have to be unreadable mess. For example see this HN post <a href="http://news.ycombinator.com/item?id=1719171">a little Clojure DSL for readable, compositional regexes</a>. Here is the simple Clojure example that was given:</p>
<p><pre class="brush: clojure;">
(def datestamp-re
  (let [d {&#092;&#048; \9}]
    (regex [d d d d :as :year] \- [d d :as :month] \- [d d :as :day])))
</pre></p>
<p>And the equivalent Perl regex &#8220;DSL&#8221; can be equally lucid:</p>
<p><pre class="brush: perl;">
sub datestamp_re {
    qr/ (?&lt;year&gt; \d \d \d \d) - (?&lt;month&gt; \d \d) - (?&lt;day&gt; \d \d ) /x;
}
</pre></p>
<p>The two things that provide a little extra help to grok whats going on here are:</p>
<ol>
<li>The <code>x</code> modifier on the end of <code>qr//</code> which allows whitespace and newlines to be sprinkled into your regex pattern without any effect on the pattern matching.  See <a href="http://perldoc.perl.org/perlre.html#Modifiers">perlre Modifers</a>
<li>And &#8220;Named Capture Buffers&#8221; which were added at <a href="http://perldoc.perl.org/perl5100delta.html#Regular-expressions">perl 5.10</a>.<br />
<pre class="brush: perl;">(?&lt;year&gt; \d{4}) # stores pattern matched in &quot;year&quot; buffer</pre><br />
Above not only gives a name to that capture buffer but provides an excellent visual placeholder to help describe what you are trying to do with the regex.
</ol>
<p>When processing named capture regexes the matches to patterns are recorded in the <code>%+</code> hash variable:<br />
<pre class="brush: perl;">
for my $date (qw/2007-10-23 20X7-10-23/) {
    printf &quot;year:%d, month:%d, day:%d\n&quot;, @+{qw/year month day/}
        if $date =~ datestamp_re;
}

# =&gt; year:2007, month:10, day:23
</pre></p>
<p>This is much more flexible for dealing with regex captures compared to positional <code>$1, $2, $3, etc</code>.  So not just more readable but more compositional:</p>
<p><pre class="brush: perl;">
# nice readable regex
sub datestamp_re  {
     my $year  = qr/ (?&lt;year&gt;  \d{4}) /x;  
     my $month = qr/ (?&lt;month&gt; \d{2}) /x;
     my $day   = qr/ (?&lt;day&gt;   \d{2}) /x;
 
     qr/ $year - $month - $day /x;
}
</pre></p>
<p>or:<br />
<pre class="brush: perl;">
# DRY regex
sub datestamp_re {
    my %re = map { 
        my ($name, $digits) = @$_;
        $name =&gt; qr/ (?&lt;$name&gt;  \d{$digits}) /x;
    } [ year  =&gt; 4 ], [ month =&gt; 2 ], [ day   =&gt; 2 ];
    
    qr/ $re{year} - $re{month} - $re{day} /x;
}
</pre></p>
<p>and even:<br />
<pre class="brush: perl;">
# regex generator
sub re { qr/ (?&lt;$_[0]&gt; $_[1] )/x }

sub regex {
    my $pattern = join q{}, @_;
    qr/ $pattern /x;
}

sub datestamp_re {
    regex re( year =&gt; '\d{4}' ), '-', re( month =&gt; '\d{2}' ), '-', re( day =&gt; '\d{2}' );
}
</pre></p>
<p>Now that is a regex DSL  <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Note that the <code>%+</code> hash variable only captures the first occurrence in the relevant named buffer:<br />
<pre class="brush: perl;">
sub numbers_re {
    my $four  = qr/ (?&lt;four&gt; \d{4}) /x;
    my $two   = qr/ (?&lt;two&gt;  \d{2}) /x;
    qr/ $four - $two - $two /x;
}

if ('2007-10-23' =~ numbers_re) {
    say 'four =&gt; ', $+{four};
    say 'two  =&gt; ', $+{two};
}

# four =&gt; 2007
# two  =&gt; 10
</pre></p>
<p>To get to the second $two (ie. 23) then use the <code>%-</code> hash variable which stores all the captures in an array reference for relevant named buffer:<br />
<pre class="brush: perl;">
if ('2007-10-23' =~ numbers_re) {
    say 'two(s) =&gt; ', join ',' =&gt; @{ $-{two} };
}

# two(s) =&gt; 10,23
</pre></p>
<p>/I3az/</p>
<p>PS. Please note that the WordPress syntax highlighter used is unfortunately upper-casing all code comments <img src='http://s0.wp.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/draegtun.wordpress.com/1091/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/draegtun.wordpress.com/1091/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/draegtun.wordpress.com/1091/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/draegtun.wordpress.com/1091/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/draegtun.wordpress.com/1091/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/draegtun.wordpress.com/1091/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/draegtun.wordpress.com/1091/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/draegtun.wordpress.com/1091/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=transfixedbutnotdead.com&#038;blog=351142&#038;post=1091&#038;subd=draegtun&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://transfixedbutnotdead.com/2010/09/29/readable-and-compositional-regexes-in-perl/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/29cb106071d163d703484e63839d89cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">draegtun</media:title>
		</media:content>
	</item>
		<item>
		<title>Regex brain fart!</title>
		<link>http://transfixedbutnotdead.com/2010/03/11/regex-brain-fart/</link>
		<comments>http://transfixedbutnotdead.com/2010/03/11/regex-brain-fart/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 20:03:02 +0000</pubDate>
		<dc:creator>draegtun</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://transfixedbutnotdead.com/?p=888</guid>
		<description><![CDATA[Yesterday I got stuck for far too long on a simple regex not doing what I expected it to do. Here is an example: I was &#8220;hoping&#8221; to see radio and comment but got back an incorrect radio_some_very_long_var on first item (and unfortunately I hadn&#8217;t noticed the _name had been dropped. That may have saved [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=transfixedbutnotdead.com&#038;blog=351142&#038;post=888&#038;subd=draegtun&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Yesterday I got stuck for far too long on a simple</em> regex not doing what I expected it to do.  Here is an example:</p>
<p><pre class="brush: perl;">
sub prefix {
    return $1 if $_[0] =~ m/^(\w+)_/;
    return 'unknown';
}

say prefix( 'radio_some_very_long_var_name' );
say prefix( 'comment_blahblah'              );
</pre></p>
<p>I was &#8220;hoping&#8221; to see <em>radio</em> and <em>comment</em> but got back an incorrect <em>radio_some_very_long_var</em> on first item (and unfortunately I hadn&#8217;t noticed the <em>_name</em> had been dropped.  That may have saved a lot of the heartache!).</p>
<p>I looked long and hard at the regex thinking&#8230;</p>
<blockquote><p>what is wrong with \w+ ?</p></blockquote>
<p>Of course what was wrong was me  <img src='http://s0.wp.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />      </p>
<p>\w regex includes matching the _ (underscore)  <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>From <a href="http://perldoc.perl.org/perlre.html">perlre pod</a></p>
<blockquote><p>\w  &#8211;  Match a &#8220;word&#8221; character (alphanumeric plus &#8220;_&#8221;)</p></blockquote>
<p>I don&#8217;t think I&#8217;ve ever needed the pleasure of knowing that&#8230; but I do now!  Here&#8217;s the correct regex i needed:<br />
<pre class="brush: perl;">
m/^([A-Za-z]+)_/
</pre></p>
<p>Doh!</p>
<p>/I3az/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/draegtun.wordpress.com/888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/draegtun.wordpress.com/888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/draegtun.wordpress.com/888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/draegtun.wordpress.com/888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/draegtun.wordpress.com/888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/draegtun.wordpress.com/888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/draegtun.wordpress.com/888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/draegtun.wordpress.com/888/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=transfixedbutnotdead.com&#038;blog=351142&#038;post=888&#038;subd=draegtun&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://transfixedbutnotdead.com/2010/03/11/regex-brain-fart/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/29cb106071d163d703484e63839d89cb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">draegtun</media:title>
		</media:content>
	</item>
	</channel>
</rss>
