shithub: purgatorio

ref: 75c92428225428c8fde2d015f010e608a0b12f1d
dir: purgatorio/man/6/regexp

View raw version
.TH REGEXP 6
.SH NAME
regexp, regex \- regular expression notation
.SH DESCRIPTION
A 
.I "regular expression"
specifies
a set of strings of characters.
A member of this set of strings is said to be
.I matched
by the regular expression.  In many applications
a delimiter character, commonly
.LR / ,
bounds a regular expression.
In the following specification for regular expressions
the word `character' means any character (rune) but newline.
.PP
The syntax for a regular expression
.B e0
is
.IP
.EX
e3:  literal | charclass | '.' | '^' | '$' | '(' e0 ')'

e2:  e3
  |  e2 REP

REP: '*' | '+' | '?'

e1:  e2
  |  e1 e2

e0:  e1
  |  e0 '|' e1
.EE
.PP
A
.B literal
is any non-metacharacter, or a metacharacter
(one of
.BR .*+?[]()|\e^$ ),
or the delimiter
preceded by 
.LR \e .
.PP
A
.B charclass
is a nonempty string
.I s
bracketed
.BI [ \|s\| ]
(or
.BI [^ s\| ]\fR);
it matches any character in (or not in)
.IR s .
A negated character class never
matches newline.
A substring 
.IB a - b\f1,
with
.I a
and
.I b
in ascending
order, stands for the inclusive
range of
characters between
.I a
and
.IR b .
In 
.IR s ,
the metacharacters
.LR - ,
.LR ] ,
an initial
.LR ^ ,
and the regular expression delimiter
must be preceded by a
.LR \e ;
other metacharacters 
have no special meaning and
may appear unescaped.
.PP
A 
.L .
matches any character.
.PP
A
.L ^
matches the beginning of a line;
.L $
matches the end of the line.
.PP
The 
.B REP
operators match zero or more
.RB ( * ),
one or more
.RB ( + ),
zero or one
.RB ( ? ),
instances respectively of the preceding regular expression 
.BR e2 .
.PP
A concatenated regular expression,
.BR "e1\|e2" ,
matches a match to 
.B e1
followed by a match to
.BR e2 .
.PP
An alternative regular expression,
.BR "e0\||\|e1" ,
matches either a match to
.B e0
or a match to
.BR e1 .
.PP
A match to any part of a regular expression
extends as far as possible without preventing
a match to the remainder of the regular expression.
.SH "SEE ALSO"
.IR acme (1),
.IR sh-regex (1),
.IR regex (2)