正则表达式

ABOUT REGEX

BASIC GRAMMAR

  1. A simple example

    ^[0-9]+abc$

这个是如下意义:
(1) ^is the start of the original string.
(2) [0-9]+ can match multiple numbers, if without +, it will match only ONE number
(3) abc$ match the number ‘abc’ and end with it, $ means the end of the match string

So the ^ and $ char refers to the range of regex .
And the [] can contain the things we need to match. Such as the a-z 0-9 _ - and so on.
But elements without [] refers to the strict match of the string.

  1. The next step
  • char + : For example, there is a Zion+b , it means that there is a Zion and some other chars and finally a ‘b’.
  • char * : It is similar to * but the char in front of the * can appear any times.
  • char ? : It means the char in front of the ‘?’ can exist at most 1 time
  1. The normal characters
  • range in the []:
    • [ABC]: match the characters in it, dont need to be successive
    • [^ABC]: match the characters not in it.
    • [A-Z]: match the characters in the range of it
    • . : match any character which is not the blank character.
    • [\s\S]: match anything, \s means blank char, \S means not-blank char.
    • \w : equals to [A-Za-z0-9_]
  1. The transform characters
  • unprinted characters:

    • \cx: match the control chars pointed by x. For example, \cM matchs a Control-M or a return, x should be in range of A-Za-z
    • \f :match a turn-page char. equals to \x0c and \cL
    • \n :matches a next-line char
    • \r :matches a return char
    • \t :a table char
    • \v :a virtical table char, the same as \x0b and \cK
  • particular characters:

    • $: match the end of the original string.
    • (): mark the range of a sub-expression.
    • +: match the expression before more than 1 time.
    • {: mark the range of the restricted-char expression
    • |: point out the choice between two options
  • restricted-char:

    • {n}: n means match the char n times, o{n} means match ‘o’ n times
    • {n,}: means match at least n times
    • {n,m}: means match not less than n and not more than m times;
    • ?: it will match the char 1 time and restrict the * and + to not greedy
  • locational char:

    • ^: match the beginning of the original word
    • $: match the ending of the original string. If the mutiline is true, $ will match the chars in front of the \n chars
    • \b: match a edge of a word
    • \B: doesn’t match the edge of the word
  1. choose
    use () to contain all options, divide the adjacent options by |
    For example, a regex like “/([1-9])([a-z]+)/g” can match the string correspond with two requirements above.
    however, the matched strings will be stored in buffer.
    One solution is to use the ?: char in front of the first () .

  2. presearch character

  • ?= means find the matched string in front of the marked string. exp1(?=exp2) means find the exp1 which is in front of the exp2
  • ?<= means find the matched string behind the marked string. (?<=exp2)exp1 means find the exp1 which is behind the exp2
  • ?! means find the matched string which doesn’t have the marked string on the back. exp1(?!exp2) means find the exp1 which is not followed by the exp2
  • ?< ! means find the matched string without the marked string in the head. (?< !exp2)exp1
    means find the exp1 without a previous exp2 next to it.
  1. ornamental characters
  • i:means ignore the capitalization of the expression
  • g: find all the matched string
  • m: match the exps with ^$ in multiple lines
  • s: make the . char can match \n character\

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!