pcre2 #

pcre2

NOTE: This release is graded alpha and is likely to experience API changes up until the 1.0 release.## OverviewA V library module for processing Perl Compatible Regular Expressions (PCRE) using the PCRE2 library.

The pcre2 module is a wrapper for the PCRE2 8-bit runtime library.
Regex find_* methods search a subject string for regular expression matches.
Regex replace_* methods return a string in which matches in the subjectstring are replaced by a replacement string or the result of a replacement function.- Regex *_all_* methods process all matches; *_one_* methods process the first match.
The Regex replace_*_extended methods support the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).
Currently there are no extraction methods for named subpatterns.
The pcre module (which uses the older PCRE library) was the inspiration and starting point for this project;the Go regex package also influenced the project.

Documentation

Examples

import srackham.pcre2

fn main() {
// Match words starting with `d` or `n`.
r := pcre2.must_compile(r'\b([dn].*?)\b')

subject := 'Lorem nisi dis diam a cras placerat natoque'

// Extract array of all matched strings.
a := r.find_all(subject)
println(a) // ['nisi', 'dis', 'diam', 'natoque']

// Quote matched words.
s1 := r.replace_all(subject, '"$1"')
println(s1) // 'Lorem "nisi" "dis" "diam" a cras placerat "natoque"'

// Replace all matched strings with upper case.
s2 := r.replace_all_fn(subject, fn (m string) string {
return m.to_upper()
})
println(s2) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'

// Replace all matched strings with upper case (PCRE2 extended replacement syntax).
s3 := r.replace_all_extended(subject, r'\U$1')
println(s3) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'
}

For more examples see inside the examples directory and take a look at the module tests.

Dependencies

Install the PCRE2 library:

Arch Linux and Manjaro: pacman -S pcre2

Debian and Ubuntu: apt install libpcre2-dev

Fedora: yum install pcre2-devel

macOS: brew install pcre2

Windows †: pacman.exe -S mingw-w64-x86_64-pcre2

† Uses the MSYS2 package management tools.

Installation

v install srackham.pcre2

Test the installation by running:

v test $HOME/.vmodules/srackham/pcre2

Example installation and test workflows for Ubuntu, macOS and Windows can be found in the Github Actions workflow file.

Performance

Complex patterns can cause PCRE2 resource exhaustion. find_* library functions respond to such errors by raising a panic. The solution is to simplify the offending pattern. Unlike, for example, the Go regexp package, PCRE2 does not have linear-time performance and while they may not trigger a panic, pathalogical patterns can exhibit slow performance. See the PCRE2 pcre2perform man page.

fn compile #

fn compile(pattern string) !Regex

compile parses a regular expression pattern and returns the corresponding Regexp struct. If the pattern fails to parse an error is returned.

fn escape_meta #

fn escape_meta(s string) string

escape_meta returns a string that escapes all regular expression metacharacters inside the argument text. The returned string is a regular expression matching the literal text.

Example

assert escape_meta(r'\.+*?()|[]{}^$') == r'\\\.\+\*\?\(\)\|\[\]\{\}\^\$'

fn must_compile #

fn must_compile(pattern string) Regex

must_compile is like compile but panics if the regex pattern cannot be parsed.

struct Regex #

struct Regex {
pub:
	pattern          string
	subpattern_count int
mut:
	re &C.pcre2_code = unsafe { nil }
}

Regex contains the compiled regular expression. * pattern is the regular expression pattern. * subpattern_count is the number of capturing subpatterns. * re is a pointer to the compiled PCRE2 regular expression.

fn (Regex) == #

fn (r1 Regex) == (r2 Regex) bool

fn (Regex) str #

fn (r Regex) str() string

str returns a human-readable representation of a Regex.

fn (Regex) is_nil #

fn (r Regex) is_nil() bool

is_nil returns true if the r has not been initialized with a compiled PCRE2 regular expression.

fn (Regex) free #

fn (r &Regex) free()

free disposes memory allocated to the PCRE2 compiled regex. If V's -autofree option is enabled V's autofree engine calls free automatically when it disposes the Regex struct.

fn (Regex) is_match #

fn (r &Regex) is_match(subject string) bool

is_match return true if the subject string contains a match for the regular expression; if no then false is returned.

fn (Regex) find_all #

fn (r &Regex) find_all(subject string) []string

find_all returns an array containing all matched strings from the subject string.

Example

assert must_compile(r'\d').find_all('1 abc 9 de 5 g') == ['1', '9', '5']

fn (Regex) find_one #

fn (r &Regex) find_one(subject string) ?string

find_one returns the first matched string from the subject string. If a match is not found none returned.

Example

assert must_compile(r'\d').find_one('1 abc 9 de 5 g') == '1'

fn (Regex) find_all_index #

fn (r &Regex) find_all_index(subject string) [][]int

find_all_index searches subject for all matches and returns an array; each element of the array is an array of byte indexes identifying the match and submatches within the subject (see find_one_index for details).

fn (Regex) find_one_index #

fn (r &Regex) find_one_index(subject string) ?[]int

find_one_index searches subject for the first match and returns an array of subject byte indexes identifying the match and submatches. * result[0]..result[1] is the entire match. * result[2*N..2*N+2] is the Nth submatch (N = 1...). * If a subpattern did not participate in the match its indexes will be -1. * If no match is found none is returned.

fn (Regex) find_all_submatch #

fn (r &Regex) find_all_submatch(subject string) [][]string

find_all_submatch searchs the subject string for all regular expression matches and returns an array containing match and submatches text. * Each match contributes an element to the result array. * Each result array element is an array containing the matched text (at index 0) plus any submatches (at indexes 1..). * If a subpattern did not participate in the match the corresponding element is set to ''.

fn (Regex) find_one_submatch #

fn (r &Regex) find_one_submatch(subject string) ?[]string

find_one_submatch searchs the subject string for the first regular expression match and returns an array containing match and submatches text. * The first element (at index 0) contains the the entire matched text. * Subsequent elements (indexes 1..) contain corresponding matched subpatterns * If a subpattern did not participate in the match the corresponding array element is set to ''. * If a match is not found none is returned.

fn (Regex) replace_all #

fn (r &Regex) replace_all(subject string, repl string) string

replace_all returns a copy of the subject string with all matches of the regular expression replaced by the repl string. * $0...$99 in the repl string are replaced by matching text; the number zero refers to the entire matched substring; higher numbers refer to substrings captured by parenthesized subpatterns e.g. $1 refers to the first submatch. * References to undefined subpatterns are not replaced. * Subpatterns that did not participate in the match replaced with ''. * To insert a literal $ in the output, use $$.

fn (Regex) replace_one #

fn (r &Regex) replace_one(subject string, repl string) string

replace_one returns a copy of the subject string in with the first match of the regular expression replaced by the repl string. In all other respects behaves like the replace_all method.

fn (Regex) replace_all_fn #

fn (r &Regex) replace_all_fn(subject string, repl fn (string) string) string

replace_all_fn returns a copy of the subject string with all regular expression matches replaced by the return value of the repl callback function. * The repl function is passed a string containing the matched text.

fn (Regex) replace_one_fn #

fn (r &Regex) replace_one_fn(subject string, repl fn (string) string) string

replace_one_fn returns a copy of the subject string with the first regular expression match replaced by the return value of the repl callback function. * The repl function is passed a string containing the matched text.

fn (Regex) replace_all_submatch_fn #

fn (r &Regex) replace_all_submatch_fn(subject string, repl fn (matches []string) string) string

replace_all_submatch_fn returns a copy of the subject string with all regular expression matches replaced by the return value of the repl callback function. * The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]). * If a subpattern did not participate in the match the corresponding matches element is set to ''.

fn (Regex) replace_one_submatch_fn #

fn (r &Regex) replace_one_submatch_fn(subject string, repl fn (matches []string) string) string

replace_one_submatch_fn returns a copy of the subject string with the first regular expression match replaced by the return value of the repl callback function. * The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]). * If a subpattern did not participate in the match the corresponding matches element is set to ''.

fn (Regex) replace_all_extended #

fn (r &Regex) replace_all_extended(subject string, repl string) string

replace_all_extended returns a copy of the subject string with all matches of the regular expression replaced by the repl string. The repl string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).

fn (Regex) replace_one_extended #

fn (r &Regex) replace_one_extended(subject string, repl string) string

replace_one_extended returns a copy of the subject string with the first match of the regular expression replaced by the repl string. The repl string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).

fn (Regex) split_all #

fn (r &Regex) split_all(subject string) []string

split_all splits the subject string at regular expression match boundaries and returns an array of the split strings. If no matches are found a single-element array containing subject string is returned.

fn (Regex) split_one #

fn (r &Regex) split_one(subject string) ?[]string

split_one splits the subject string at the first regular expression match boundary and returns an array of the two split strings. If no match is found none is returned.