正则表达式
ABOUT REGEX
BASIC GRAMMAR
- A simple example
^[0-9]+abc$
这个是如下意义:
(1) ^is the start of the original string.
(2) [0-9]+ can match multiple numbers, if without +, it will match only ONE number
(3) abc$ match the number ‘abc’ and end with it, $ means the end of the match string
So the ^ and $ char refers to the range of regex .
And the [] can contain the things we need to match. Such as the a-z 0-9 _ - and so on.
But elements without [] refers to the strict match of the string.
- The next step
- char + : For example, there is a Zion+b , it means that there is a Zion and some other chars and finally a ‘b’.
- char * : It is similar to * but the char in front of the * can appear any times.
- char ? : It means the char in front of the ‘?’ can exist at most 1 time
- The normal characters
- range in the []:
- [ABC]: match the characters in it, dont need to be successive
- [^ABC]: match the characters not in it.
- [A-Z]: match the characters in the range of it
- . : match any character which is not the blank character.
- [\s\S]: match anything, \s means blank char, \S means not-blank char.
- \w : equals to [A-Za-z0-9_]
- The transform characters
unprinted characters:
- \cx: match the control chars pointed by x. For example, \cM matchs a Control-M or a return, x should be in range of A-Za-z
- \f :match a turn-page char. equals to \x0c and \cL
- \n :matches a next-line char
- \r :matches a return char
- \t :a table char
- \v :a virtical table char, the same as \x0b and \cK
particular characters:
- $: match the end of the original string.
- (): mark the range of a sub-expression.
- +: match the expression before more than 1 time.
- {: mark the range of the restricted-char expression
- |: point out the choice between two options
restricted-char:
- {n}: n means match the char n times, o{n} means match ‘o’ n times
- {n,}: means match at least n times
- {n,m}: means match not less than n and not more than m times;
- ?: it will match the char 1 time and restrict the * and + to not greedy
locational char:
- ^: match the beginning of the original word
- $: match the ending of the original string. If the mutiline is true, $ will match the chars in front of the \n chars
- \b: match a edge of a word
- \B: doesn’t match the edge of the word
choose
use () to contain all options, divide the adjacent options by |
For example, a regex like “/([1-9])([a-z]+)/g” can match the string correspond with two requirements above.
however, the matched strings will be stored in buffer.
One solution is to use the ?: char in front of the first () .presearch character
- ?= means find the matched string in front of the marked string. exp1(?=exp2) means find the exp1 which is in front of the exp2
- ?<= means find the matched string behind the marked string. (?<=exp2)exp1 means find the exp1 which is behind the exp2
- ?! means find the matched string which doesn’t have the marked string on the back. exp1(?!exp2) means find the exp1 which is not followed by the exp2
- ?< ! means find the matched string without the marked string in the head. (?< !exp2)exp1
means find the exp1 without a previous exp2 next to it.
- ornamental characters
- i:means ignore the capitalization of the expression
- g: find all the matched string
- m: match the exps with ^$ in multiple lines
- s: make the . char can match \n character\
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!