Skip to content

pcre2 #

pcre2

NOTE: This release is graded alpha and is likely to experience API changes up until the 1.0 release.## OverviewA V library module for processing Perl Compatible Regular Expressions (PCRE) using the PCRE2 library.

  • The pcre2 module is a wrapper for the PCRE2 8-bit runtime library.
  • Regex find_* methods search a subject string for regular expression matches.
  • Regex replace_* methods return a string in which matches in the subjectstring are replaced by a replacement string or the result of a replacement function.- Regex *_all_* methods process all matches; *_one_* methods process the first match.
  • The Regex replace_*_extended methods support the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).
  • Currently there are no extraction methods for named subpatterns.
  • The pcre module (which uses the older PCRE library) was the inspiration and starting point for this project;the Go regex package also influenced the project.

Documentation

Examples

import srackham.pcre2

fn main() {
// Match words starting with `d` or `n`.
r := pcre2.must_compile(r'\b([dn].*?)\b')

subject := 'Lorem nisi dis diam a cras placerat natoque'

// Extract array of all matched strings.
a := r.find_all(subject)
println(a) // ['nisi', 'dis', 'diam', 'natoque']

// Quote matched words.
s1 := r.replace_all(subject, '"$1"')
println(s1) // 'Lorem "nisi" "dis" "diam" a cras placerat "natoque"'

// Replace all matched strings with upper case.
s2 := r.replace_all_fn(subject, fn (m string) string {
return m.to_upper()
})
println(s2) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'

// Replace all matched strings with upper case (PCRE2 extended replacement syntax).
s3 := r.replace_all_extended(subject, r'\U$1')
println(s3) // 'Lorem NISI DIS DIAM a cras placerat NATOQUE'
}

For more examples see inside the examples directory and take a look at the module tests.

Dependencies

Install the PCRE2 library:

Arch Linux and Manjaro: pacman -S pcre2

Debian and Ubuntu: apt install libpcre2-dev

Fedora: yum install pcre2-devel

macOS: brew install pcre2

Windows †: pacman.exe -S mingw-w64-x86_64-pcre2

† Uses the MSYS2 package management tools.

Installation

v install srackham.pcre2

Test the installation by running:

v test $HOME/.vmodules/srackham/pcre2

Example installation and test workflows for Ubuntu, macOS and Windows can be found in the Github Actions workflow file.

Performance

Complex patterns can cause PCRE2 resource exhaustion. find_* library functions respond to such errors by raising a panic. The solution is to simplify the offending pattern. Unlike, for example, the Go regexp package, PCRE2 does not have linear-time performance and while they may not trigger a panic, pathalogical patterns can exhibit slow performance. See the PCRE2 pcre2perform man page.

fn compile #

fn compile(pattern string) !Regex

compile parses a regular expression pattern and returns the corresponding Regexp struct. If the pattern fails to parse an error is returned.

fn escape_meta #

fn escape_meta(s string) string

escape_meta returns a string that escapes all regular expression metacharacters inside the argument text. The returned string is a regular expression matching the literal text.

Example

assert escape_meta(r'\.+*?()|[]{}^$') == r'\\\.\+\*\?\(\)\|\[\]\{\}\^\$'

fn must_compile #

fn must_compile(pattern string) Regex

must_compile is like compile but panics if the regex pattern cannot be parsed.

struct Regex #

struct Regex {
pub:
	pattern          string
	subpattern_count int
mut:
	re &C.pcre2_code = unsafe { nil }
}

Regex contains the compiled regular expression. * pattern is the regular expression pattern. * subpattern_count is the number of capturing subpatterns. * re is a pointer to the compiled PCRE2 regular expression.

fn (Regex) == #

fn (r1 Regex) == (r2 Regex) bool

fn (Regex) str #

fn (r Regex) str() string

str returns a human-readable representation of a Regex.

fn (Regex) is_nil #

fn (r Regex) is_nil() bool

is_nil returns true if the r has not been initialized with a compiled PCRE2 regular expression.

fn (Regex) free #

fn (r &Regex) free()

free disposes memory allocated to the PCRE2 compiled regex. If V's -autofree option is enabled V's autofree engine calls free automatically when it disposes the Regex struct.

fn (Regex) is_match #

fn (r &Regex) is_match(subject string) bool

is_match return true if the subject string contains a match for the regular expression; if no then false is returned.

fn (Regex) find_all #

fn (r &Regex) find_all(subject string) []string

find_all returns an array containing all matched strings from the subject string.

Example

assert must_compile(r'\d').find_all('1 abc 9 de 5 g') == ['1', '9', '5']

fn (Regex) find_one #

fn (r &Regex) find_one(subject string) ?string

find_one returns the first matched string from the subject string. If a match is not found none returned.

Example

assert must_compile(r'\d').find_one('1 abc 9 de 5 g') == '1'

fn (Regex) find_all_index #

fn (r &Regex) find_all_index(subject string) [][]int

find_all_index searches subject for all matches and returns an array; each element of the array is an array of byte indexes identifying the match and submatches within the subject (see find_one_index for details).

fn (Regex) find_one_index #

fn (r &Regex) find_one_index(subject string) ?[]int

find_one_index searches subject for the first match and returns an array of subject byte indexes identifying the match and submatches. * result[0]..result[1] is the entire match. * result[2*N..2*N+2] is the Nth submatch (N = 1...). * If a subpattern did not participate in the match its indexes will be -1. * If no match is found none is returned.

fn (Regex) find_all_submatch #

fn (r &Regex) find_all_submatch(subject string) [][]string

find_all_submatch searchs the subject string for all regular expression matches and returns an array containing match and submatches text. * Each match contributes an element to the result array. * Each result array element is an array containing the matched text (at index 0) plus any submatches (at indexes 1..). * If a subpattern did not participate in the match the corresponding element is set to ''.

fn (Regex) find_one_submatch #

fn (r &Regex) find_one_submatch(subject string) ?[]string

find_one_submatch searchs the subject string for the first regular expression match and returns an array containing match and submatches text. * The first element (at index 0) contains the the entire matched text. * Subsequent elements (indexes 1..) contain corresponding matched subpatterns * If a subpattern did not participate in the match the corresponding array element is set to ''. * If a match is not found none is returned.

fn (Regex) replace_all #

fn (r &Regex) replace_all(subject string, repl string) string

replace_all returns a copy of the subject string with all matches of the regular expression replaced by the repl string. * $0...$99 in the repl string are replaced by matching text; the number zero refers to the entire matched substring; higher numbers refer to substrings captured by parenthesized subpatterns e.g. $1 refers to the first submatch. * References to undefined subpatterns are not replaced. * Subpatterns that did not participate in the match replaced with ''. * To insert a literal $ in the output, use $$.

fn (Regex) replace_one #

fn (r &Regex) replace_one(subject string, repl string) string

replace_one returns a copy of the subject string in with the first match of the regular expression replaced by the repl string. In all other respects behaves like the replace_all method.

fn (Regex) replace_all_fn #

fn (r &Regex) replace_all_fn(subject string, repl fn (string) string) string

replace_all_fn returns a copy of the subject string with all regular expression matches replaced by the return value of the repl callback function. * The repl function is passed a string containing the matched text.

fn (Regex) replace_one_fn #

fn (r &Regex) replace_one_fn(subject string, repl fn (string) string) string

replace_one_fn returns a copy of the subject string with the first regular expression match replaced by the return value of the repl callback function. * The repl function is passed a string containing the matched text.

fn (Regex) replace_all_submatch_fn #

fn (r &Regex) replace_all_submatch_fn(subject string, repl fn (matches []string) string) string

replace_all_submatch_fn returns a copy of the subject string with all regular expression matches replaced by the return value of the repl callback function. * The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]). * If a subpattern did not participate in the match the corresponding matches element is set to ''.

fn (Regex) replace_one_submatch_fn #

fn (r &Regex) replace_one_submatch_fn(subject string, repl fn (matches []string) string) string

replace_one_submatch_fn returns a copy of the subject string with the first regular expression match replaced by the return value of the repl callback function. * The repl function is passed a matches array containing the matched text (matches[0]) and any submatches (matches[1..]). * If a subpattern did not participate in the match the corresponding matches element is set to ''.

fn (Regex) replace_all_extended #

fn (r &Regex) replace_all_extended(subject string, repl string) string

replace_all_extended returns a copy of the subject string with all matches of the regular expression replaced by the repl string. The repl string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).

fn (Regex) replace_one_extended #

fn (r &Regex) replace_one_extended(subject string, repl string) string

replace_one_extended returns a copy of the subject string with the first match of the regular expression replaced by the repl string. The repl string supports the PCRE2 extended replacements string syntax (see PCRE2_SUBSTITUTE_EXTENDED in the pcre2api man page).

fn (Regex) split_all #

fn (r &Regex) split_all(subject string) []string

split_all splits the subject string at regular expression match boundaries and returns an array of the split strings. If no matches are found a single-element array containing subject string is returned.

fn (Regex) split_one #

fn (r &Regex) split_one(subject string) ?[]string

split_one splits the subject string at the first regular expression match boundary and returns an array of the two split strings. If no match is found none is returned.