ImportingandusingthePerlgrammarregex
The PPR module exports no subroutines or variables, and provides no methods. Instead, it defines a single
package variable, $PPR::GRAMMAR, which can be interpolated into regexes to add rules that permit Perl
constructs to be parsed:
$source_code =~ m{ (?&PerlEntireDocument) $PPR::GRAMMAR }x;
Note that all the examples shown so far have interpolated this "grammar variable" at the end of the
regular expression. This placement is desirable, but not necessary. Both of the following work
identically:
$source_code =~ m{ (?&PerlEntireDocument) $PPR::GRAMMAR }x;
$source_code =~ m{ $PPR::GRAMMAR (?&PerlEntireDocument) }x;
However, if the grammar is to be extended, then the extensions must be specified before the base grammar
(i.e. before the interpolation of $PPR::GRAMMAR). Placing the grammar variable at the end of a regex
ensures that will be the case, and has the added advantage of "front-loading" the regex with the most
important information: what is actually going to be matched.
Note too that, because the PPR grammar internally uses capture groups, placing $PPR::GRAMMAR anywhere
other than the very end of your regex may change the numbering of any explicit capture groups in your
regex. For complete safety, regexes that use the PPR grammar should probably use named captures, instead
of numbered captures.
Errorreporting
Regex-based parsing is all-or-nothing: either your regex matches (and returns any captures you
requested), or it fails to match (and returns nothing).
This can make it difficult to detect why a PPR-based match failed; to work out what the "bad source code"
was that prevented your regex from matching.
So the module provides a special variable that attempts to detect the source code that prevented any call
to the "(?&PerlStatement)" subpattern from matching. That variable is: $PPR::ERROR
$PPR::ERROR is only set if it is undefined at the point where an error is detected, and will only be set
to the first such error that is encountered during parsing.
Note that errors are only detected when matching context-sensitive components (for example in the middle
of a "(?&PerlStatement), as part of a "(?&PerlContextualRegex)", or at the end of a
"(?&PerlEntireDocument")". Errors, especially errors at the end of otherwise valid code, will often not
be detected in context-free components (for example, at the end of a "(?&PerlStatementSequence), as part
of a "(?&PerlRegex)", or at the end of a "(?&PerlDocument")".
A common mistake in this area is to attempt to match an entire Perl document using:
m{ \A (?&PerlDocument) \Z $PPR::GRAMMAR }x
instead of:
m{ (?&PerlEntireDocument) $PPR::GRAMMAR }x
Only the second approach will be able to successfully detect an unclosed curly bracket at the end of the
document.
"PPR::ERROR"interface
If it is set, $PPR::ERROR will contain an object of type PPR::ERROR, with the following methods:
"$PPR::ERROR->origin($line, $file)"
Returns a clone of the PPR::ERROR object that now believes that the source code parsing failure it is
reporting occurred in a code fragment starting at the specified line and file. If the second argument
is omitted, the file name is not reported in any diagnostic.
"$PPR::ERROR->source()"
Returns a string containing the specific source code that could not be parsed as a Perl statement.
"$PPR::ERROR->prefix()"
Returns a string containing all the source code preceding the code that could not be parsed. That is:
the valid code that is the preceding context of the unparsable code.
"$PPR::ERROR->line( $opt_offset )"
Returns an integer which is the line number at which the unparsable code was encountered. If the
optional "offset" argument is provided, it will be added to the line number returned. Note that the
offset is ignored if the PPR::ERROR object originates from a prior call to "$PPR::ERROR->origin"
(because in that case you will have already specified the correct offset).
"$PPR::ERROR->diagnostic()"
Returns a string containing the diagnostic that would be returned by "perl -c" if the source code
were compiled.
Warning: The diagnostic is obtained by partially eval'ing the source code. This means that run-time
code will not be executed, but "BEGIN" and "CHECK" blocks will run. Do not call this method if the
source code that created this error might also have non-trivial compile-time side-effects.
A typical use might therefore be:
# Make sure it's undefined, and will only be locally modified...
local $PPR::ERROR;
# Process the matched block...
if ($source_code =~ m{ (?<Block> (?&PerlBlock) ) $PPR::GRAMMAR }x) {
process( $+{Block} );
}
# Or report the offending code that stopped it being a valid block...
else {
die "Invalid Perl block: " . $PPR::ERROR->source . "\n",
$PPR::ERROR->origin($linenum, $filename)->diagnostic . "\n";
}
DecommentingcodewithPPR::decomment()
The module provides (but does not export) a decomment() subroutine that can remove any comments and/or
POD from source code.
It takes a single argument: a string containing the course code. It returns a single value: a string
containing the decommented source code.
For example:
$decommented_code = PPR::decomment( $commented_code );
The subroutine will fail if the argument wasn't valid Perl code, in which case it returns "undef" and
sets $PPR::ERROR to indicate where the invalid source code was encountered.
Note that, due to separate bugs in the regex engine in Perl 5.14 and 5.20, the decomment() subroutine is
not available when running under these releases.
ExamplesNote: In each of the following examples, the subroutine slurp() is used to acquire the source code from a
file whose name is passed as its argument. The slurp() subroutine is just:
sub slurp { local (*ARGV, $/); @ARGV = shift; readline; }
or, for the less twisty-minded:
sub slurp {
my ($filename) = @_;
open my $filehandle, '<', $filename or die $!;
local $/;
return readline($filehandle);
}
Validatingsourcecode
# "Valid" if source code matches a Perl document under the Perl grammar
printf(
"$filename %s a valid Perl file\n",
slurp($filename) =~ m{ (?&PerlEntireDocument) $PPR::GRAMMAR }x
? "is"
: "is not"
);
Countingstatements
printf( # Output
"$filename contains %d statements\n", # a report of
scalar # the count of
grep {defined} # defined matches
slurp($filename) # from the source code,
=~ m{
\G (?&PerlOWS) # skipping whitespace
((?&PerlStatement)) # and keeping statements,
$PPR::GRAMMAR # using the Perl grammar
}gcx; # incrementally
);
StrippingcommentsandPODfromsourcecode
my $source = slurp($filename); # Get the source
$source =~ s{ (?&PerlNWS) $PPR::GRAMMAR }{ }gx; # Compact whitespace
print $source; # Print the result
StrippingcommentsandPODfromsourcecode(inPerlv5.14orlater)
# Print the source code, having compacted whitespace...
print slurp($filename) =~ s{ (?&PerlNWS) $PPR::GRAMMAR }{ }gxr;
Strippingeverything"except"commentsandPODfromsourcecode
say # Output
grep {defined} # defined matches
slurp($filename) # from the source code,
=~ m{ \G ((?&PerlOWS)) # keeping whitespace,
(?&PerlStatement)? # skipping statements,
$PPR::GRAMMAR # using the Perl grammar
}gcx; # incrementally
Availablerules
Interpolating $PPR::GRAMMAR in a regex makes all of the following rules available within that regex.
Note that other rules not listed here may also be added, but these are all considered strictly internal
to the PPR module and are not guaranteed to continue to exist in future releases. All such "internal-use-
only" rules have names that start with "PPR_"...
"(?&PerlDocument)"
Matches a valid Perl document, including leading or trailing whitespace, comments, and any final
"__DATA__" or "__END__" section.
This rule is context-free, so it can be embedded in a larger regex. For example, to match an embedded
chunk of Perl code, delimited by "<<<"...">>>":
$src = m{ <<< (?&PerlDocument) >>> $PPR::GRAMMAR }x;
"(?&PerlEntireDocument)"
Matches an entire valid Perl document, including leading or trailing whitespace, comments, and any final
"__DATA__" or "__END__" section.
This rule is not context-free. It has an internal "\A" at the beginning and "\Z" at the end, so a regex
containing "(?&PerlEntireDocument)" will only match if:
(a) the "(?&PerlEntireDocument)" is the sole top-level element of the regex (or, at least the sole
element of a single top-level "|"-branch of the regex),
and
(b) the entire string being matched contains only a single valid Perl document.
In general, if you want to check that a string consists entirely of a single valid sequence of Perl code,
use:
$str =~ m{ (?&PerlEntireDocument) $PPR::GRAMMAR }
If you want to check that a string contains at least one valid sequence of Perl code at some point,
possibly embedded in other text, use:
$str =~ m{ (?&PerlDocument) $PPR::GRAMMAR }
"(?&PerlStatementSequence)"
Matches zero-or-more valid Perl statements, separated by optional POD sequences.
"(?&PerlStatement)"
Matches a single valid Perl statement, including: control structures; "BEGIN", "CHECK", "UNITCHECK",
"INIT", "END", "DESTROY", or "AUTOLOAD" blocks; variable declarations, "use" statements, etc.
"(?&PerlExpression)"
Matches a single valid Perl expression involving operators of any precedence, but not any kind of block
(i.e. not control structures, "BEGIN" blocks, etc.) nor any trailing statement modifier (e.g. not a
postfix "if", "while", or "for").
"(?&PerlLowPrecedenceNotExpression)"
Matches an expression at the precedence of the "not" operator. That is, a single valid Perl expression
that involves operators above the precedence of "and".
"(?&PerlAssignment)"
Matches an assignment expression. That is, a single valid Perl expression involving operators above the
precedence of comma ("," or "=>").
"(?&PerlConditionalExpression)"or"(?&PerlScalarExpression)"
Matches a conditional expression that uses the "?"...":" ternary operator. That is, a single valid Perl
expression involving operators above the precedence of assignment.
The alterative name comes from the fact that anything matching this rule is what most people think of as
a single element of a comma-separated list.
"(?&PerlBinaryExpression)"
Matches an expression that uses any high-precedence binary operators. That is, a single valid Perl
expression involving operators above the precedence of the ternary operator.
"(?&PerlPrefixPostfixTerm)"
Matches a term with optional prefix and/or postfix unary operators and/or a trailing sequence of "->"
dereferences. That is, a single valid Perl expression involving operators above the precedence of
exponentiation ("**").
"(?&PerlTerm)"
Matches a simple high-precedence term within a Perl expression. That is: a subroutine or builtin
function call; a variable declaration; a variable or typeglob lookup; an anonymous array, hash, or
subroutine constructor; a quotelike or numeric literal; a regex match; a substitution; a transliteration;
a "do" or "eval" block; or any other expression in surrounding parentheses.
"(?&PerlTermPostfixDereference)"
Matches a sequence of array- or hash-lookup brackets, or subroutine call parentheses, or a postfix
dereferencer (e.g. "->$*"), with explicit or implicit intervening "->", such as might appear after a
term.
"(?&PerlLvalue)"
Matches any variable or parenthesized list of variables that could be assigned to.
"(?&PerlPackageDeclaration)"
Matches the declaration of any package (with or without a defining block).
"(?&PerlSubroutineDeclaration)"
Matches the declaration of any named subroutine (with or without a defining block).
"(?&PerlUseStatement)"
Matches a "use <module name> ...;" or "use <version number>;" statement.
"(?&PerlReturnStatement)"
Matches a "return <expression>;" or "return;" statement.
"(?&PerlReturnExpression)"
Matches a "return <expression>" as an expression without trailing end-of-statement markers.
"(?&PerlControlBlock)"
Matches an "if", "unless", "while", "until", "for", or "foreach" statement, including its block.
"(?&PerlDoBlock)"
Matches a "do"-block expression.
"(?&PerlEvalBlock)"
Matches a "eval"-block expression.
"(?&PerlTryCatchFinallyBlock)"
Matches an "try" block, followed by an option "catch" block, followed by an optional "finally" block,
using the built-in syntax introduced in Perl v5.34 and v5.36.
Note that if your code uses one of the many CPAN modules (such as "Try::Tiny" or "TryCatch") that
provided try/catch behaviours prior to Perl v5.34, then you will most likely need to override this
subrule to match the alternate "try"/"catch" syntax provided by your preferred module.
For example, if your code uses the "TryCatch" module, you would need to alter the PPR parser by
explicitly redefining the subrule for "try" blocks, with something like:
my $MATCH_A_PERL_DOCUMENT = qr{
\A (?&PerlEntireDocument) \Z
(?(DEFINE)
# Redefine this subrule to match TryCatch syntax...
(?<PerlTryCatchFinallyBlock>
try (?>(?&PerlOWS))
(?>(?&PerlBlock))
(?: (?>(?&PerlOWS))
catch (?>(?&PerlOWS))
(?: \( (?>(?&PPR_balanced_parens)) \) (?>(?&PerlOWS)) )?+
(?>(?&PerlBlock))
)*+
)
)
$PPR::GRAMMAR
}xms;
Note that the popular "Try::Tiny" module actually implements "try"/"catch" as a normally parsed Perl
subroutine call expression, rather than a statement. This means that the unmodified PPR grammar can
successfully parse all the module's constructs.
However, the unmodified PPR grammar may misclassify some "Try::Tiny" usages as being built-in Perl v5.36
"try" blocks followed by an unrelated call to the "catch" subroutine, rather than identifying the "try"
and "catch" as a single expression containing two subroutine calls.
If that difference in interpretation matters to you, you can deactivate the built-in Perl v5.36
"try"/"catch" syntax entirely, like so:
my $MATCH_A_PERL_DOCUMENT = qr{
\A (?&PerlEntireDocument) \Z
(?(DEFINE)
# Turn off built-in try/catch syntax...
(?<PerlTryCatchFinallyBlock> (?!) )
# Decanonize 'try' and 'catch' as reserved words ineligible for sub names...
(?<PPR_X_non_reserved_identifier>
(?! (?> for(?:each)?+ | while | if | unless | until | given | when | default
| sub | format | use | no | my | our | state | defer | finally
# Note: Removed 'try' and 'catch' which appear here in the original subrule
| (?&PPR_X_named_op)
| [msy] | q[wrxq]?+ | tr
| __ (?> END | DATA ) __
)
\b
)
(?>(?&PerlQualifiedIdentifier))
(?! :: )
)
)
$PPR::GRAMMAR
}xms;
For more details and options for modifying PPR grammars in this way, see also the documentation of the
"PPR::X" module.
"(?&PerlStatementModifier)"
Matches an "if", "unless", "while", "until", "for", or "foreach" modifier that could appear after a
statement. Only matches the modifier, not the preceding statement.
"(?&PerlFormat)"
Matches a "format" declaration, including its terminating "dot".
"(?&PerlBlock)"
Matches a "{"..."}"-delimited block containing zero-or-more statements.
"(?&PerlCall)"
Matches a call to a subroutine or built-in function. Accepts all valid call syntaxes, either via a
literal names or a reference, with or without a leading "&", with or without arguments, with or without
parentheses on any argument list.
"(?&PerlAttributes)"
Matches a list of colon-preceded attributes, such as might be specified on the declaration of a
subroutine or a variable.
"(?&PerlCommaList)"
Matches a list of zero-or-more comma-separated subexpressions. That is, a single valid Perl expression
that involves operators above the precedence of "not".
"(?&PerlParenthesesList)"
Matches a list of zero-or-more comma-separated subexpressions inside a set of parentheses.
"(?&PerlList)"
Matches either a parenthesized or unparenthesized list of comma-separated subexpressions. That is,
matches anything that either of the two preceding rules would match.
"(?&PerlAnonymousArray)"
Matches an anonymous array constructor. That is: a list of zero-or-more subexpressions inside square
brackets.
"(?&PerlAnonymousHash)"
Matches an anonymous hash constructor. That is: a list of zero-or-more subexpressions inside curly
brackets.
"(?&PerlArrayIndexer)"
Matches a valid indexer that could be applied to look up elements of a array. That is: a list of or one-
or-more subexpressions inside square brackets.
"(?&PerlHashIndexer)"
Matches a valid indexer that could be applied to look up entries of a hash. That is: a list of or one-
or-more subexpressions inside curly brackets, or a simple bareword indentifier inside curley brackets.
"(?&PerlDiamondOperator)"
Matches anything in angle brackets. That is: any "diamond" readline (e.g. "<$filehandle>" or file-grep
operation (e.g. "<*.pl>").
"(?&PerlComma)"
Matches a short (",") or long ("=>") comma.
"(?&PerlPrefixUnaryOperator)"
Matches any high-precedence prefix unary operator.
"(?&PerlPostfixUnaryOperator)"
Matches any high-precedence postfix unary operator.
"(?&PerlInfixBinaryOperator)"
Matches any infix binary operator whose precedence is between ".." and "**".
"(?&PerlAssignmentOperator)"
Matches any assignment operator, including all op"=" variants.
"(?&PerlLowPrecedenceInfixOperator)"
Matches "and", <or>, or "xor".
"(?&PerlAnonymousSubroutine)"
Matches an anonymous subroutine.
"(?&PerlVariable)"
Matches any type of access on any scalar, array, or hash variable.
"(?&PerlVariableScalar)"
Matches any scalar variable, including fully qualified package variables, punctuation variables, scalar
dereferences, and the $#array syntax.
"(?&PerlVariableArray)"
Matches any array variable, including fully qualified package variables, punctuation variables, and array
dereferences.
"(?&PerlVariableHash)"
Matches any hash variable, including fully qualified package variables, punctuation variables, and hash
dereferences.
"(?&PerlTypeglob)"
Matches a typeglob.
"(?&PerlScalarAccess)"
Matches any kind of variable access beginning with a "$", including fully qualified package variables,
punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.
"(?&PerlScalarAccessNoSpace)"
Matches any kind of variable access beginning with a "$", including fully qualified package variables,
punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.
But does not allow spaces between the components of the variable access (i.e. imposes the same constraint
as within an interpolating quotelike).
"(?&PerlScalarAccessNoSpaceNoArrow)"
Matches any kind of variable access beginning with a "$", including fully qualified package variables,
punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.
But does not allow spaces or arrows between the components of the variable access (i.e. imposes the same
constraint as within a "<...>"-delimited interpolating quotelike).
"(?&PerlArrayAccess)"
Matches any kind of variable access beginning with a "@", including arrays, array dereferences, and list
slices of arrays or hashes.
"(?&PerlArrayAccessNoSpace)"
Matches any kind of variable access beginning with a "@", including arrays, array dereferences, and list
slices of arrays or hashes. But does not allow spaces between the components of the variable access
(i.e. imposes the same constraint as within an interpolating quotelike).
"(?&PerlArrayAccessNoSpaceNoArrow)"
Matches any kind of variable access beginning with a "@", including arrays, array dereferences, and list
slices of arrays or hashes. But does not allow spaces or arrows between the components of the variable
access (i.e. imposes the same constraint as within a "<...>"-delimited interpolating quotelike).
"(?&PerlHashAccess)"
Matches any kind of variable access beginning with a "%", including hashes, hash dereferences, and kv-
slices of hashes or arrays.
"(?&PerlLabel)"
Matches a colon-terminated label.
"(?&PerlLiteral)"
Matches a literal value. That is: a number, a "qr" or "qw" quotelike, a string, or a bareword.
"(?&PerlString)"
Matches a string literal. That is: a single- or double-quoted string, a "q" or "qq" string, a heredoc,
or a version string.
"(?&PerlQuotelike)"
Matches any form of quotelike operator. That is: a single- or double-quoted string, a "q" or "qq"
string, a heredoc, a version string, a "qr", a "qw", a "qx", a "/.../" or "m/.../" regex, a substitution,
or a transliteration.
"(?&PerlHeredoc)"
Matches a heredoc specifier. That is: just the initial "<<TERMINATOR>" component, not the actual
contents of the heredoc on the subsequent lines.
This rule only matches a heredoc specifier if that specifier is correctly followed on the next line by
any heredoc contents and then the correct terminator.
However, if the heredoc specifier is correctly matched, subsequent calls to either of the whitespace-
matching rules ("(?&PerlOWS)" or "(?&PerlNWS)") will also consume the trailing heredoc contents and the
terminator.
So, for example, to correctly match a heredoc plus its contents you could use something like:
m/ (?&PerlHeredoc) (?&PerlOWS) $PPR::GRAMMAR /x
or, if there may be trailing items on the same line as the heredoc specifier:
m/ (?&PerlHeredoc)
(?<trailing_items> [^\n]* )
(?&PerlOWS)
$PPR::GRAMMAR
/x
Note that the saeme limitations apply to other constructs that match heredocs, such a "(?&PerlQuotelike)"
or "(?&PerlString)".
"(?&PerlQuotelikeQ)"
Matches a single-quoted string, either a '...' or a "q/.../" (with any valid delimiters).
"(?&PerlQuotelikeQQ)"
Matches a double-quoted string, either a "..." or a "qq/.../" (with any valid delimiters).
"(?&PerlQuotelikeQW)"
Matches a "quotewords" list. That is a "qw/ list of words /" (with any valid delimiters).
"(?&PerlQuotelikeQX)"
Matches a "qx" system call, either a `...` or a "qx/.../" (with any valid delimiters)
"(?&PerlQuotelikeS)"or"(?&PerlSubstitution)"
Matches a substitution operation. That is: "s/.../.../" (with any valid delimiters and any valid
trailing modifiers).
"(?&PerlQuotelikeTR)"or"(?&PerlTransliteration)"
Matches a transliteration operation. That is: "tr/.../.../" or "y/.../.../" (with any valid delimiters
and any valid trailing modifiers).
"(?&PerlContextualQuotelikeM)"or"(?&PerContextuallMatch)"
Matches a regex-match operation in any context where it would be allowed in valid Perl. That is: "/.../"
or "m/.../" (with any valid delimiters and any valid trailing modifiers).
"(?&PerlQuotelikeM)"or"(?&PerlMatch)"
Matches a regex-match operation. That is: "/.../" or "m/.../" (with any valid delimiters and any valid
trailing modifiers) in any context (i.e. even in places where it would not normally be allowed within a
valid piece of Perl code).
"(?&PerlQuotelikeQR)"
Matches a "qr" regex constructor (with any valid delimiters and any valid trailing modifiers).
"(?&PerlContextualRegex)"
Matches a "qr" regex constructor or a "/.../" or "m/.../" regex-match operation (with any valid
delimiters and any valid trailing modifiers) anywhere where either would be allowed in valid Perl.
In other words: anything capable of matching within valid Perl code.
"(?&PerlRegex)"
Matches a "qr" regex constructor or a "/.../" or "m/.../" regex-match operation in any context (i.e. even
in places where it would not normally be allowed within a valid piece of Perl code).
In other words: anything capable of matching.
"(?&PerlBuiltinFunction)"
Matches the name of any builtin function.
To match an actual call to a built-in function, use:
m/
(?= (?&PerlBuiltinFunction) )
(?&PerlCall)
/x
"(?&PerlNullaryBuiltinFunction)"
Matches the name of any builtin function that never takes arguments.
To match an actual call to a built-in function that never takes arguments, use:
m/
(?= (?&PerlNullaryBuiltinFunction) )
(?&PerlCall)
/x
"(?&PerlVersionNumber)"
Matches any number or version-string that can be used as a version number within a "use", "no", or
"package" statement.
"(?&PerlVString)"
Matches a version-string (a.k.a v-string).
"(?&PerlNumber)"
Matches a valid number, including binary, octal, decimal and hexadecimal integers, and floating-point
numbers with or without an exponent.
"(?&PerlIdentifier)"
Matches a simple, unqualified identifier.
"(?&PerlQualifiedIdentifier)"
Matches a qualified or unqualified identifier, which may use either "::" or "'" as internal separators,
but only "::" as initial or terminal separators.
"(?&PerlOldQualifiedIdentifier)"
Matches a qualified or unqualified identifier, which may use either "::" or "'" as both internal and
external separators.
"(?&PerlBareword)"
Matches a valid bareword.
Note that this is not the same as an simple identifier, nor the same as a qualified identifier.
"(?&PerlPod)"
Matches a single POD section containing any contiguous set of POD directives, up to the first "=cut" or
end-of-file.
"(?&PerlPodSequence)"
Matches any sequence of POD sections, separated and /or surrounded by optional whitespace.
"(?&PerlNWS)"
Match one-or-more characters of necessary whitespace, including spaces, tabs, newlines, comments, and
POD.
"(?&PerlOWS)"
Match zero-or-more characters of optional whitespace, including spaces, tabs, newlines, comments, and
POD.
"(?&PerlOWSOrEND)"
Match zero-or-more characters of optional whitespace, including spaces, tabs, newlines, comments, POD,
and any trailing "__END__" or "__DATA__" section.
"(?&PerlEndOfLine)"
Matches a single newline ("\n") character.
This is provided mainly to allow newlines to be "hooked" by redefining "(?<PerlEndOfLine>)" (for example,
to count lines during a parse).
"(?&PerlKeyword)"
Match a pluggable keyword.
Note that there are no pluggable keywords in the default PPR regex; they must be added by the end-user.
See the following section for details.
ExtendingthePerlsyntaxwithkeywords
In Perl 5.12 and later, it's possible to add new types of statements to the language using a mechanism
called "pluggable keywords".
This mechanism (best accessed via CPAN modules such as "Keyword::Simple" or "Keyword::Declare") acts like
a limited macro facility. It detects when a statement begins with a particular, pre-specified keyword,
passes the trailing text to an associated keyword handler, and replaces the trailing source code with
whatever the keyword handler produces.
For example, the Dios module uses this mechanism to add keywords such as "class", "method", and "has" to
Perl 5, providing a declarative OO syntax. And the Object::Result module uses pluggable keywords to add a
"result" statement that simplifies returning an ad hoc object from a subroutine.
Unfortunately, because such modules effectively extend the standard Perl syntax, by default PPR has no
way of successfully parsing them.
However, when setting up a regex using $PPR::GRAMMAR it is possible to extend that grammar to deal with
new keywords...by defining a rule named "(?<PerlKeyword>...)".
This rule is always tested as the first option within the standard "(?&PerlStatement)" rule, so any
syntax declared within effectively becomes a new kind of statement. Note that each alternative within the
rule must begin with a valid "keyword" (that is: a simple identifier of some kind).
For example, to support the three keywords from Dios:
$Dios::GRAMMAR = qr{
# Add a keyword rule to support Dios...
(?(DEFINE)
(?<PerlKeyword>
class (?&PerlOWS)
(?&PerlQualifiedIdentifier) (?&PerlOWS)
(?: is (?&PerlNWS) (?&PerlIdentifier) (?&PerlOWS) )*+
(?&PerlBlock)
|
method (?&PerlOWS)
(?&PerlIdentifier) (?&PerlOWS)
(?: (?&kw_balanced_parens) (?&PerlOWS) )?+
(?: (?&PerlAttributes) (?&PerlOWS) )?+
(?&PerlBlock)
|
has (?&PerlOWS)
(?: (?&PerlQualifiedIdentifier) (?&PerlOWS) )?+
[\@\$%][.!]?(?&PerlIdentifier) (?&PerlOWS)
(?: (?&PerlAttributes) (?&PerlOWS) )?+
(?: (?: // )?+ = (?&PerlOWS)
(?&PerlExpression) (?&PerlOWS) )?+
(?> ; | (?= \} ) | \z )
)
(?<kw_balanced_parens>
\( (?: [^()]++ | (?&kw_balanced_parens) )*+ \)
)
)
# Add all the standard PPR rules...
$PPR::GRAMMAR
}x;
# Then parse with it...
$source_code =~ m{ \A (?&PerlDocument) \Z $Dios::GRAMMAR }x;
Or, to support the "result" statement from "Object::Result":
my $ORK_GRAMMAR = qr{
# Add a keyword rule to support Object::Result...
(?(DEFINE)
(?<PerlKeyword>
result (?&PerlOWS)
\{ (?&PerlOWS)
(?: (?> (?&PerlIdentifier)
| < [[:upper:]]++ >
) (?&PerlOWS)
(?&PerlParenthesesList)?+ (?&PerlOWS)
(?&PerlBlock) (?&PerlOWS)
)*+
\}
)
)
# Add all the standard PPR rules...
$PPR::GRAMMAR
}x;
# Then parse with it...
$source_code =~ m{ \A (?&PerlDocument) \Z $ORK_GRAMMAR }x;
Note that, although pluggable keywords are only available from Perl 5.12 onwards, PPR will still accept
"(&?PerlKeyword)" extensions under Perl 5.10.
ExtendingthePerlsyntaxinotherways
Other modules (such as "Devel::Declare" and "Filter::Simple") make it possible to extend Perl syntax in
even more flexible ways. The PPR::X module provides support for syntactic extensions more general than
pluggable keywords.
ComparisonwithPPI
The PPI and PPR modules can both identify valid Perl code, but they do so in very different ways, and are
optimal for different purposes.
PPI scans an entire Perl document and builds a hierarchical representation of the various components. It
is therefore suitable for recognition, validation, partial extraction, and in-place transformation of
Perl code.
PPR matches only as much of a Perl document as specified by the regex you create, and does not build any
hierarchical representation of the various components it matches. It is therefore suitable for
recognition and validation of Perl code. However, unless great care is taken, PPR is not as reliable as
PPI for extractions or transformations of components smaller than a single statement.
On the other hand, PPI always has to parse its entire input, and build a complete non-trivial nested data
structure for it, before it can be used to recognize or validate any component. So it is almost always
significantly slower and more complicated than PPR for those kinds of tasks.
For example, to determine whether an input string begins with a valid Perl block, PPI requires something
like:
if (my $document = PPI::Document->new(\$input_string) ) {
my $block = $document->schild(0)->schild(0);
if ($block->isa('PPI::Structure::Block')) {
$block->remove;
process_block($block);
process_extra($document);
}
}
whereas PPR needs just:
if ($input_string =~ m{ \A (?&PerlOWS) ((?&PerlBlock)) (.*) }xs) {
process_block($1);
process_extra($2);
}
Moreover, the PPR version will be at least twice as fast at recognizing that leading block (and usually
four to seven times faster)...mainly because it doesn't have to parse the trailing code at all, nor build
any representation of its hierarchical structure.
As a simple rule of thumb, when you only need to quickly detect, identify, or confirm valid Perl (or just
a single valid Perl component), use PPR. When you need to examine, traverse, or manipulate the internal
structure or component relationships within an entire Perl document, use PPI.