Regex operations
Regex operations in Pip use the Pattern data type.
Patterns are delimited by `
(backticks).
Backticks within the Pattern can be escaped using backslash, as can literal backslashes. Regexes are basically Python flavor with a few add-ons. Any legal Python regex is a legal Pip regex (as long as backticks and &
are escaped) and will behave the same way. Global flags can be set on a Pip regex using unary prefix operators.
Differences between Python and Pip
- Pip Patterns are used both as regexes and as regex replacement strings.
- In addition to back-references (e.g.
\1
), Pip replacement Patterns can contain&
, which corresponds to the entire match (as in sed et al.). - Many Pip regex operations set special variables similar to the ones in Perl, rather than the Python strategy of returning a match object encapsulating that information.
Predefined Pattern variables
Some common regexes are available as predefined variables:
Variable | Value | Mnemonic |
---|---|---|
w | `\s+` | Whitespace |
XA | -`[a-z]` (case-insensitive) | regeX Alpha |
XC | `[bcdfghjklmnpqrstvwxyz]` | regeX Consonant |
XD | `\d` | regeX Digit |
XI | `-?\d+` | regeX Integer |
XL | `[a-z]` | regeX Lowercase |
XN | `-?\d+(?:\.\d+)?` | regeX Number |
XU | `[A-Z]` | regeX Uppercase |
XV | `[aeiou]` | regeX Vowel |
XW | `\w` | regeX Word |
XX | `.` | regeX anything |
XY | `[aeiouy]` | regeX vowel-or-Y |
Regex-building operations
The following operators can be used to build regexes:
Toggle flags: A
-
,
.
Usage: -x
Four unary operators toggle regex flags on a Pattern:
A
toggles the ASCII-only flag-
toggles the ignore case flag,
toggles the multiline flag (^
and$
match at the beginning/end of each line, not just the beginning/end of the string).
toggles the dotall flag (.
matches any character, including newlines)
When a Pattern has a flag turned on, the Pattern’s repr shows the corresponding operator: A-`[a-z]`
, for example, is the regex [a-z]
with the ASCII-only and case-insensitive flags.
Concatenate/repeat (low level): .
X
Usage: x.s
xXn
Both binary operators work the same as with Scalars. They consider only the text of the Pattern, whether it is a regex, a fragment of a regex, or a replacement. Concatenating a Scalar and a Pattern coerces the result to Pattern.
Concatenate/alternate/repeat (high level): +
,
*
Usage: x+y
x,y
x*n
+
and ,
assume both operands are valid regexes, wrap each in a non-capturing group, and concatenate them, with ,
placing a |
in between. *
assumes the first operand is a valid regex, wraps it in a non-capturing group, and appends a repetition construct like {n}
.
Convert to regex: X
Usage: Xs
Converts a Scalar to a Pattern, escaping special characters. Given a List or Range, converts to a Pattern that will match any of the items.
Repetition/grouping: K
+
C
Usage: Kx
K
and +
modify a Pattern with *
or +
, respectively. C
wraps a pattern in a capturing group.
NOTE: K
also works on Scalars and Ranges, converting them to Patterns first. +
and C
only work on Patterns.
Pip regex operations
Pip currently supports the following regex operations (with more in the works):
First match: ~
Usage: s~x
Returns the first match of Pattern x
in Scalar s
, or nil if no match was found. Can also be used as x~s
. If s
and x
are both Scalars, convert x
to a Pattern first.
All matches: @
Usage: s@x
Returns a List of all non-overlapping matches of Pattern x
in Scalar s
.
Find index: @?
Usage: s@?x
Returns the start index of the first match of Pattern x
in Scalar s
, or nil if no match was found.
Find all indices: @*
Usage: s@*x
Same as @?
, but returns a List of all match indices.
Not in/in/count: NI
, N
Usage: xNs
N
returns the number of non-overlapping matches of Pattern x
in Scalar s
. NI
returns 1
if the Pattern was not found, 0
if it was.
Fullmatch: ~=
Usage: s~=x
Returns 1
if Pattern x
fully matches Scalar s
, 0
otherwise. Can be chained with other comparison operators. Can also be used as x~=s
. If s
and x
are both Scalars, convert x
to a Pattern first.
Replace: R
Usage: sRxp
Replace each non-overlapping match of Pattern x
in Scalar s
with replacement (Pattern, Scalar or callback function) p
. The arguments passed to a callback function are the entire match (parameter a
) followed by capture groups (parameters b
through e
).
Remove: RM
Usage: sRMx
Remove each non-overlapping match of Pattern x
from Scalar s
.
Strip/lstrip/rstrip: ||
|>
<|
Usage: s||x
Strip matches of Pattern x
from the left, right, or both sides of Scalar s
.
Split: ^
Usage: s^x
Split Scalar s
on occurrences of Pattern x
. If x
contains capture groups, they are included in the resulting List.
Map: MR
Usage: fMRxs
Find all matches of Pattern x
in Scalar s
and map function f
to them. The arguments passed to the function are the entire match (parameter a
) followed by capture groups (parameters b
through e
). Operands can be given in any order. If x
and s
are both Scalars, convert x
to a Pattern first.
Loop: LR
Usage: LRxs{...}
The command version of MR
: loops over all matches of Pattern x
in Scalar s
. Use regex special variables to access match information inside the loop. Can also be used as LRsx{...}
. If x
and s
are both Scalars, convert x
to a Pattern first.
Match variables
The following regex match variables are set every time a match is made by most regex operations–most usefully, MR
, LR
, ~
, and R
:
$0
: entire match$1
: capture group 1 (and similarly for 2-9)$$
: list of all capture group contents$(
: start index of match$)
: end index of match$[
: list of start indices of capture groups$]
: list of end indices of capture groups$`
: the part of the string before the match$'
: the part of the string after the match