Basic Regex Tutorial & Sample

Long Do
3 min readOct 1, 2020

--

Regular expressions(REGEX) is very useful in programming. A good Developer should have good Regex knowledge to improve Code.

There is some principle Regex that a coder should know for coding.

You can test your Regex Online here: https://regex101.com/

BASIC REGEX RULE

Anchors — ^ and $

^Hello         Hello World, Hello Medium,... (start with Hello)World$.       end of the World, a better World,... (end with Wolrd)^Hello World$  Hello World (matches only this string)Hello World    Hello World, No No Hello World bla bla,... (matches any string include Hello World)

Quantifiers — * + ? and {}

money*         mone, money, moneyy, money..., ... (zero or multiple y)
money+ money, moneyy, moneyyy..., ... (at least one y)
money? mone, money (zero or one y)
money{3}. moneyyy (tripple y)
money{3,} moneyyy, moneyyyy, .... (at least 3 y)
money{3,5} moneyyy, moneyyyy, moneyyyyy (3 to 5 y)

OR operator — | or []

money-(coin|stock)    money-coin, money-stock (coin or stock)

Grouping and capturing — (), []

a(money)*      a, amoney, amoneymoney, ... (group money)
a(money){2} amoneymoney
a-[mcs] a-m, a-c, a-s (m|c|s)
a-[a-d] a-a, a-b, a-c, a-d (from a->d)
[a-zA-Z0-9] matches single character: a->z, A->Z, 0->9
[^a-zA-Z] matches any character exclude alphabet (^ mean negative)

Character classes — \d \w \s and .

\d             1, 2, ..., 9 (single digit)
\w a, A, b, B, ... z, Z, _ (single alphanumeric character or underscore)
\s matches a whitespace character (includes tabs and line breaks). matches any character\D matches single char not digit
\W matches single char not alphanumeric or underscore
\S matches single char not space, tab, break line

INTERMEDIATE REGEX RULE

Greedy and Lazy match — * + {}?

Example: <div>simple div</div>.*        {max character}     <.*> Example: <div>simple div</div>
.+ {max character} <.+> Example: <div>simple div</div>
.+? {min character} <.+?>Example: <div>, </div>
[^<>]+ Not <> <[^<>]>Example: <div>, </div>

Flags /g /m /i /x /X /s /u /U /A /J /D

/g     Global search. Don't return after first match/m     multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)/i     insensitive. Case insensitive match (ignores case of [a-zA-Z])/x     extended. Spaces and text after a # in the pattern are ignored/X     eXtra. A \ followed by a letter with no special meaning is faulted/s.    single line. Dot matches newline characters/u.    unicode. Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters/U.    Ungreedy. The match becomes lazy by default. Now a ? following a quantifier makes it greedy/A.    Anchored. Pattern is forced to ^/J.    Allow duplicate subpattern names/D.    Dollar. Force the a dollar sign, $, to always match end of the string, instead of end of the line. This option is ignored if the m-flag is set

Boundaries — \b and \B

\bmoney\b      I do not like money ('money' stands alone)
\Bmoney\B I do not like coin-money-stock ('money' is in the middle)

Back-references — \1

a-(money.)\1              a-money.money.  (double the last group)
(gold.)(coin.)\1\1 gold.coin.gold.gold. (double the first group twice)
(gold.)(coin.)\1\2 gold.coin.gold.coin (repeat the first and second group)(gold.)(coin.)\2\1 gold.coin.coin.gold (repeat the second and first)(gold.)(coin.)\2\2 gold.coin.coin.coin (repeat the second gropu twice)(gold.)(coin.)(stock)\3 gold.coin.stock.stock (repeat the third group)(?<M>(money.))\k<M> money.money. (alias money by M)

Look-ahead and Look-behind — (?=) and (?<=) and (?!) and (?<!)

success(?=(-money))            success-money. matches success only if follow by money, but money will not be part of the overall regex match(?<=(work hard ->))success     work hard ->success. matches success only if preceded by work hard ->success(?!(-lazy))             matches success but not success-lazy.
(?<!(lazy-))success matches success but not lazy-success

SUMMARY

Basic regex:
- Anchors — ^ and $
- Quantifiers — * + ? and {}
- OR operator — | or []
- Grouping and capturing — (), []
- Character classes — \d \w \s and .
Immediate regex:
- Greedy and Lazy match — * + {}?
- Flags /g /m /i /x /X /s /u /U /A /J /D
- Boundaries — \b and \B
- Back-references — \1
- Look-ahead and Look-behind — (?=) and (?<=) and (?!) and (?<!)

Have fun and do not forget to claps this story.

Thanks

--

--

Long Do
Long Do

Written by Long Do

Better code, better world

No responses yet