pcre2 #
pcre2
NOTE: This release is graded alpha and is likely to experience API changes up until the 1.0 release.## OverviewA V library module for processing Perl Compatible Regular Expressions (PCRE) using the PCRE2 library.
- The
pcre2
module is a wrapper for the PCRE2 8-bit runtime library. - Regex
find_*
methods search asubject
string for regular expression matches. - Regex
replace_*
methods return a string in which matches in thesubject
string are replaced by a replacement string or the result of a replacement function.- Regex*_all_*
methods process all matches;*_one_*
methods process the first match. - The Regex
replace_*_extended
methods support the PCRE2 extended replacements string syntax (seePCRE2_SUBSTITUTE_EXTENDED
in the pcre2api man page). - Currently there are no extraction methods for named subpatterns.
- The pcre module (which uses the older PCRE library) was the inspiration and starting point for this project;the Go regex package also influenced the project.
Documentation
Examples
import srackham.pcre2
fn main() {
// Match words starting with `d` or `n`.
r := pcre2.must_compile(r'\b([dn].*?)\b')
subject := 'Lorem nisi dis diam a cras placerat natoque'
// Extract array of all matched strings.
a := r.find_all(subject)
println(a) // ['nisi', 'dis', 'diam', 'natoque']
// Quote matched words.
s1 := r.replace_all(subject, '"$1"')
println(s1) // 'Lorem "nisi" "dis" "diam" a cras placerat "natoque"'
// Replace all matched strings with upper case.
s2 := r.replace_all_fn(subject, fn (m string) string {
return m.to_upper()
})
println(s2) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'
// Replace all matched strings with upper case (PCRE2 extended replacement syntax).
s3 := r.replace_all_extended(subject, r'\U$1')
println(s3) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'
}
For more examples see inside the examples directory and take a look at the module tests.
Dependencies
Install the PCRE2 library:
Arch Linux and Manjaro: pacman -S pcre2
Debian and Ubuntu: apt install libpcre2-dev
Fedora: yum install pcre2-devel
macOS: brew install pcre2
Windows †: pacman.exe -S mingw-w64-x86_64-pcre2
† Uses the MSYS2 package management tools.
Installation
v install srackham.pcre2
Test the installation by running:
v test $HOME/.vmodules/srackham/pcre2
Example installation and test workflows for Ubuntu, macOS and Windows can be found in the Github Actions workflow file.
Performance
Complex patterns can cause PCRE2 resource exhaustion. find_*
library functions respond to such errors by raising a panic. The solution is to simplify the offending pattern. Unlike, for example, the Go regexp package, PCRE2 does not have linear-time performance and while they may not trigger a panic, pathalogical patterns can exhibit slow performance. See the PCRE2 pcre2perform man page.
fn compile #
fn compile(pattern string) !Regex
compile
parses a regular expression pattern
and returns the corresponding Regexp
struct. If the pattern fails to parse an error is returned.
fn escape_meta #
fn escape_meta(s string) string
escape_meta
returns a string that escapes all regular expression metacharacters inside the argument text. The returned string is a regular expression matching the literal text.
Example
assert escape_meta(r'\.+*?()|[]{}^$') == r'\\\.\+\*\?\(\)\|\[\]\{\}\^\$'
fn must_compile #
fn must_compile(pattern string) Regex
must_compile
is like compile
but panics if the regex pattern
cannot be parsed.
struct Regex #
struct Regex {
pub:
pattern string
subpattern_count int
mut:
re &C.pcre2_code = unsafe { nil }
}
Regex
contains the compiled regular expression. * pattern
is the regular expression pattern. * subpattern_count
is the number of capturing subpatterns. * re
is a pointer to the compiled PCRE2 regular expression.
fn (Regex) == #
fn (r1 Regex) == (r2 Regex) bool
fn (Regex) str #
fn (r Regex) str() string
str
returns a human-readable representation of a Regex
.
fn (Regex) is_nil #
fn (r Regex) is_nil() bool
is_nil
returns true if the r
has not been initialized with a compiled PCRE2 regular expression.
fn (Regex) free #
fn (r &Regex) free()
free
disposes memory allocated to the PCRE2 compiled regex. If V's -autofree
option is enabled V's autofree engine calls free
automatically when it disposes the Regex
struct.
fn (Regex) is_match #
fn (r &Regex) is_match(subject string) bool
is_match
return true
if the subject
string contains a match for the regular expression; if no then false
is returned.
fn (Regex) find_all #
fn (r &Regex) find_all(subject string) []string
find_all
returns an array containing all matched strings from the subject
string.
Example
assert must_compile(r'\d').find_all('1 abc 9 de 5 g') == ['1', '9', '5']
fn (Regex) find_one #
fn (r &Regex) find_one(subject string) ?string
find_one
returns the first matched string from the subject
string. If a match is not found none
returned.
Example
assert must_compile(r'\d').find_one('1 abc 9 de 5 g') == '1'
fn (Regex) find_all_index #
fn (r &Regex) find_all_index(subject string) [][]int
find_all_index
searches subject
for all matches and returns an array; each element of the array is an array of byte indexes identifying the match and submatches within the subject
(see find_one_index
for details).
fn (Regex) find_one_index #
fn (r &Regex) find_one_index(subject string) ?[]int
find_one_index
searches subject
for the first match and returns an array of subject
byte indexes identifying the match and submatches. * result[0]..result[1]
is the entire match. * result[2*N..2*N+2]
is the Nth submatch (N = 1...). * If a subpattern did not participate in the match its indexes will be -1
. * If no match is found none
is returned.
fn (Regex) find_all_submatch #
fn (r &Regex) find_all_submatch(subject string) [][]string
find_all_submatch
searchs the subject
string for all regular expression matches and returns an array containing match and submatches text. * Each match contributes an element to the result array. * Each result array element is an array containing the matched text (at index 0) plus any submatches (at indexes 1..). * If a subpattern did not participate in the match the corresponding element is set to ''.
fn (Regex) find_one_submatch #
fn (r &Regex) find_one_submatch(subject string) ?[]string
find_one_submatch
searchs the subject
string for the first regular expression match and returns an array containing match and submatches text. * The first element (at index 0) contains the the entire matched text. * Subsequent elements (indexes 1..) contain corresponding matched subpatterns * If a subpattern did not participate in the match the corresponding array element is set to ''. * If a match is not found none
is returned.
fn (Regex) replace_all #
fn (r &Regex) replace_all(subject string, repl string) string
replace_all
returns a copy of the subject
string with all matches of the regular expression replaced by the repl
string. * $0
...$99
in the repl
string are replaced by matching text; the number zero refers to the entire matched substring; higher numbers refer to substrings captured by parenthesized subpatterns e.g. $1
refers to the first submatch. * References to undefined subpatterns are not replaced. * Subpatterns that did not participate in the match replaced with ''. * To insert a literal $
in the output, use $$
.
fn (Regex) replace_one #
fn (r &Regex) replace_one(subject string, repl string) string
replace_one
returns a copy of the subject
string in with the first match of the regular expression replaced by the repl
string. In all other respects behaves like the replace_all
method.
fn (Regex) replace_all_fn #
fn (r &Regex) replace_all_fn(subject string, repl fn (string) string) string
replace_all_fn
returns a copy of the subject
string with all regular expression matches replaced by the return value of the repl
callback function. * The repl
function is passed a string containing the matched text.
fn (Regex) replace_one_fn #
fn (r &Regex) replace_one_fn(subject string, repl fn (string) string) string
replace_one_fn
returns a copy of the subject
string with the first regular expression match replaced by the return value of the repl
callback function. * The repl
function is passed a string containing the matched text.
fn (Regex) replace_all_submatch_fn #
fn (r &Regex) replace_all_submatch_fn(subject string, repl fn (matches []string) string) string
replace_all_submatch_fn
returns a copy of the subject
string with all regular expression matches replaced by the return value of the repl
callback function. * The repl
function is passed a matches
array containing the matched text (matches[0]
) and any submatches (matches[1..]
). * If a subpattern did not participate in the match the corresponding matches
element is set to ''.
fn (Regex) replace_one_submatch_fn #
fn (r &Regex) replace_one_submatch_fn(subject string, repl fn (matches []string) string) string
replace_one_submatch_fn
returns a copy of the subject
string with the first regular expression match replaced by the return value of the repl
callback function. * The repl
function is passed a matches
array containing the matched text (matches[0]
) and any submatches (matches[1..]
). * If a subpattern did not participate in the match the corresponding matches
element is set to ''.
fn (Regex) replace_all_extended #
fn (r &Regex) replace_all_extended(subject string, repl string) string
replace_all_extended
returns a copy of the subject
string with all matches of the regular expression replaced by the repl
string. The repl
string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED
in the pcre2api man page).
fn (Regex) replace_one_extended #
fn (r &Regex) replace_one_extended(subject string, repl string) string
replace_one_extended
returns a copy of the subject
string with the first match of the regular expression replaced by the repl
string. The repl
string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED
in the pcre2api man page).
fn (Regex) split_all #
fn (r &Regex) split_all(subject string) []string
split_all
splits the subject
string at regular expression match boundaries and returns an array of the split strings. If no matches are found a single-element array containing subject
string is returned.
fn (Regex) split_one #
fn (r &Regex) split_one(subject string) ?[]string
split_one
splits the subject
string at the first regular expression match boundary and returns an array of the two split strings. If no match is found none
is returned.