12.2. Line Boundaries
In addition to always using the /x flag, always use the /m flag. In every regular expression you ever write. The normal behaviour of the ^ and $ metacharacters is unintuitive to most programmers, especially if they're coming from a Unix background. Almost all of the Unix utilities that feature regular expressions (e.g., sed, grep, awk) are intrinsically line-oriented. So in those utilities, ^ and $ naturally mean "match at the start of any line" and "match at the end of any line", respectively. But they don't mean that in Perl. In Perl, ^ and $ mean "match at the start of the entire string" and "match at the end of the entire string". That's a crucial difference, and one that leads to a very common type of mistake: # Find the end of a Perl program... $text =~ m{ [^\0]*? # match the minimal number of non-null chars ^_ _END_ _$ # until a line containing only an end-marker }x; In fact, what that code really does is: $text =~ m{ [^\0]*? # match the minimal number of non-null chars ^ # until the start of the string _ _END_ _ # then match the end-marker $ # then match the end of the string }x; The minimal number of characters until the start of the string is, of course, zero[*]. Then the regex has to match '_ _END_ _'. And then it has to be at the end of the string. So the only strings that this pattern matches are those that consist of '_ _END_ _'. That is clearly not what was intended.
The /m mode makes ^ and $ work "naturally"[
The previous example could be fixed by making those two metacharacters actually mean what the original developer thought they meant, simply by adding a /m:
Which now really means:
$text =~ m{ [^\0]*? Consistently using the /m on every regex makes Perl's behaviour consistently conform to your unreasonable expectations. So you don't have to unreasonably change your expectations to conform to Perl's behaviour[*].
|