Knowledge Base

Regular Expressions

Some text fields can take a regular expression (regex) to match strings against. Hund supports a somewhat stripped-down flavor of regex. This regex engine does not support backtracking, nor the features that depend on it (lookarounds, backreferences, possessive repetitions, etc.). This forces the use of safe, guaranteed-fast regexes when performing string matching. Below is a description of the supported syntax (and flags) of Hund regexes.

If an otherwise syntactically-correct regex you have written is rejected by our platform, it is likely because it is relying on an unsupported feature. Only the features listed below are guaranteed supported.

Usage

Flags

A few common regex flags are supported by Hund. See further below for advanced syntax regarding their setting/clearing. To set flags for the entire regex itself, simply prepend your regex with (?xyz) where each x, y, and z are one of the flags listed below. All flags are turned off by default.

Flags
i case-insensitive
m multi-line mode (^ and $ match begin/end line in addition to begin/end text)
s dotall (let . match \n)
U ungreedy (swap meaning of x* and x*?, x+ and x+?, etc.)

When declaring flags, they can be either set or cleared. Merely declaring the flag by itself (e.g. xyz, which sets flags x, y, and z) will set that flag. To clear a flag or group of flags, prepend them with a hyphen (e.g. -xyz to clear x, y, and z; x-yz to set x but then clear y and z).

General Syntax

The following tables list the various kinds of operators and expressions supported by Hund regexes, grouped by their function.

Single-character Expressions
. any character, possibly including newline (when flag s set)
[xyz] character class
[^xyz] negated character class
\d Perl character class (jump)
\D negated Perl character class
[[:alpha:]] ASCII character class (jump)
[[:^alpha:]] negated ASCII character class
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class
Composites
xy x followed by y
x|y x or y (prefer x)
Repetitions
x* zero or more x, prefer more
x+ one or more x, prefer more
x? zero or one x, prefer one
x{n,m} n or n+1 or ... or m x, prefer more
x{n,} n or more x, prefer more
x{n} exactly n x
x*? zero or more x, prefer fewer
x+? one or more x, prefer fewer
x?? zero or one x, prefer zero
x{n,m}? n or n+1 or ... or m x, prefer fewer
x{n,}? n or more x, prefer fewer
x{n}? exactly n x

Note: The counting expressions x{n,m}, x{n,}, and x{n} do not support minimum or maximum counts exceeding 1000. Unlimited repetitions will still match as normal.

Grouping
(re) numbered capturing group (submatch)
(?:re) non-capturing group
(?flags) set flags within current group; non-capturing
(?flags:re) set flags during re; non-capturing
Anchors
^ at beginning of text or line (when flag m set)
$ at end of text (like \z not \Z) or line (when flag m set)
\A at beginning of text
\b at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B not at ASCII word boundary
\z at end of text
Escape Sequences
\a bell
\f form feed
\t horizontal tab
\n newline
\r carriage return
\v vertical tab character
\* literal *, for any punctuation character *
\123 octal character code (up to three digits)
\x7F hex character code (exactly two digits)
\x{10FFFF} hex character code
\C match a single byte
\Q...\E literal text ... even if ... has punctuation

Perl Character Classes (all ASCII-only)
\d digits (i.e. [0-9])
\D not digits (i.e. [^0-9])
\s whitespace (i.e. [\t\n\f\r ])
\S not whitespace (i.e. [^\t\n\f\r ])
\w word characters (i.e. [0-9A-Za-z_])
\W not word characters (i.e. [^0-9A-Za-z_])

ASCII Character Classes
[[:alnum:]] alphanumeric (i.e. [0-9A-Za-z])
[[:alpha:]] alphabetic (i.e. [A-Za-z])
[[:ascii:]] ASCII (i.e. [\x00-\x7F])
[[:blank:]] blank (i.e. [\t ])
[[:cntrl:]] control (i.e. [\x00-\x1F\x7F])
[[:digit:]] digits (i.e. [0-9])
[[:graph:]] graphical (i.e. [!-~] or [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\]^_`{|}~])
[[:lower:]] lower case (i.e. [a-z])
[[:print:]] printable (i.e. [ -~] or [ [:graph:]])
[[:punct:]] punctuation (i.e. [!-/:-@[-`{-~])
[[:space:]] whitespace (i.e. [\t\n\v\f\r ])
[[:upper:]] upper case (i.e. [A-Z])
[[:word:]] word characters (i.e. [0-9A-Za-z_])
[[:xdigit:]] hex digit (i.e. [0-9A-Fa-f])