ABOUT REGEX

BASIC GRAMMAR

A simple example

^[0-9]+abc$

这个是如下意义：
(1) ^is the start of the original string.
(2) [0-9]+ can match multiple numbers, if without +, it will match only ONE number
(3) abc$ match the number ‘abc’ and end with it, $ means the end of the match string

So the ^ and $ char refers to the range of regex .
And the [] can contain the things we need to match. Such as the a-z 0-9 _ - and so on.
But elements without [] refers to the strict match of the string.

The next step

char + : For example, there is a Zion+b , it means that there is a Zion and some other chars and finally a ‘b’.
char * : It is similar to * but the char in front of the * can appear any times.
char ? : It means the char in front of the ‘?’ can exist at most 1 time

The normal characters

range in the []:
- [ABC]: match the characters in it, dont need to be successive
- [^ABC]: match the characters not in it.
- [A-Z]: match the characters in the range of it
- . : match any character which is not the blank character.
- [\s\S]: match anything, \s means blank char, \S means not-blank char.
- \w : equals to [A-Za-z0-9_]

The transform characters

unprinted characters:
- \cx: match the control chars pointed by x. For example, \cM matchs a Control-M or a return, x should be in range of A-Za-z
- \f :match a turn-page char. equals to \x0c and \cL
- \n :matches a next-line char
- \r :matches a return char
- \t :a table char
- \v :a virtical table char, the same as \x0b and \cK
particular characters:
- $: match the end of the original string.
- (): mark the range of a sub-expression.
- +: match the expression before more than 1 time.
- {: mark the range of the restricted-char expression
- |: point out the choice between two options
restricted-char:
- {n}: n means match the char n times, o{n} means match ‘o’ n times
- {n,}: means match at least n times
- {n,m}: means match not less than n and not more than m times;
- ?: it will match the char 1 time and restrict the * and + to not greedy
locational char:
- ^: match the beginning of the original word
- $: match the ending of the original string. If the mutiline is true, $ will match the chars in front of the \n chars
- \b: match a edge of a word
- \B: doesn’t match the edge of the word

choose
use () to contain all options, divide the adjacent options by |
For example, a regex like “/([1-9])([a-z]+)/g” can match the string correspond with two requirements above.
however, the matched strings will be stored in buffer.
One solution is to use the ?: char in front of the first () .
presearch character

?= means find the matched string in front of the marked string. exp1(?=exp2) means find the exp1 which is in front of the exp2
?<= means find the matched string behind the marked string. (?<=exp2)exp1 means find the exp1 which is behind the exp2
?! means find the matched string which doesn’t have the marked string on the back. exp1(?!exp2) means find the exp1 which is not followed by the exp2
?< ! means find the matched string without the marked string in the head. (?< !exp2)exp1
means find the exp1 without a previous exp2 next to it.

ornamental characters

i:means ignore the capitalization of the expression
g: find all the matched string
m: match the exps with ^$ in multiple lines
s: make the . char can match \n character\

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

关于计算机所代表的理性世界的真实性讨论上一篇

2022.10.27随笔下一篇

正则表达式

ABOUT REGEX

BASIC GRAMMAR