English 中文(简体)
Sed - Regular Expressions
  • 时间:2024-09-08

Stream Editor - Regular Expressions


Previous Page Next Page  

It is the regular expressions that make SED powerful and efficient. A number of complex tasks can be solved with regular expressions. Any command-pne expert knows the power of regular expressions.

Like many other GNU/Linux utipties, SED too supports regular expressions, which are often referred to as as regex. This chapter describes regular expressions in detail. The chapter is spanided into three sections: Standard regular expressions, POSIX classes of regular expressions, and Meta characters.

Standard Regular Expressions

Start of pne (^)

In regular expressions terminology, the caret(^) symbol matches the start of a pne. The following example prints all the pnes that start with the pattern "The".

[jerry]$ sed -n  /^The/ p  books.txt

On executing the above code, you get the following result:

The Two Towers, J. R. R. Tolkien 
The Alchemist, Paulo Coelho 
The Fellowship of the Ring, J. R. R. Tolkien 
The Pilgrimage, Paulo Coelho

End of Line ($)

End of pne is represented by the dollar($) symbol. The following example prints the pnes that end with "Coelho".

[jerry]$ sed -n  /Coelho$/ p  books.txt 

On executing the above code, you get the following result:

The Alchemist, Paulo Coelho 
The Pilgrimage, Paulo Coelho

Single Character (.)

The Dot(.) matches any single character except the end of pne character. The following example prints all three letter words that end with the character "t".

[jerry]$ echo -e "cat
bat
rat
mat
batting
rats
mats" | sed -n  /^..t$/p  

On executing the above code, you get the following result:

cat 
bat 
rat 
mat

Match Character Set ([])

In regular expression terminology, a character set is represented by square brackets ([]). It is used to match only one out of several characters. The following example matches the patterns "Call" and "Tall" but not "Ball".

[jerry]$ echo -e "Call
Tall
Ball" | sed -n  /[CT]all/ p 

On executing the above code, you get the following result:

Call 
Tall

Exclusive Set ([^])

In exclusive set, the caret negates the set of characters in the square brackets. The following example prints only "Ball".

[jerry]$ echo -e "Call
Tall
Ball" | sed -n  /[^CT]all/ p 

On executing the above code, you get the following result:

Ball 

Character Range ([-])

When a character range is provided, the regular expression matches any character within the range specified in square brackets. The following example matches "Call" and "Tall" but not "Ball".

[jerry]$ echo -e "Call
Tall
Ball" | sed -n  /[C-Z]all/ p  

On executing the above code, you get the following result:

Call 
Tall

Now let us modify the range to "A-P" and observe the result.

[jerry]$ echo -e "Call
Tall
Ball" | sed -n  /[A-P]all/ p  

On executing the above code, you get the following result:

Call 
Ball

Zero on One Occurrence (?)

In SED, the question mark (?) matches zero or one occurrence of the preceding character. The following example matches "Behaviour" as well as "Behavior". Here, we made "u" as an optional character by using "?".

[jerry]$ echo -e "Behaviour
Behavior" | sed -n  /Behaviou?r/ p  

On executing the above code, you get the following result:

Behaviour 
Behavior

One or More Occurrence (+)

In SED, the plus symbol(+) matches one or more occurrences of the preceding character. The following example matches one or more occurrences of "2".

[jerry]$ echo -e "111
22
123
234
456
222"  | sed -n  /2+/ p 

On executing the above code, you get the following result:

22 
123 
234 
222 

Zero or More Occurrence (*)

Asterisks (*) matches the zero or more occurrence of the preceding character. The following example matches "ca", "cat", "catt", and so on.

[jerry]$ echo -e "ca
cat" | sed -n  /cat*/ p  

On executing the above code, you get the following result:

ca 
cat 

Exactly N Occurrences {n}

{n} matches exactly "n" occurrences of the preceding character. The following example prints only three digit numbers. But before that, you need to create the following file which contains only numbers.

[jerry]$ cat numbers.txt 

On executing the above code, you get the following result:

1 
10 
100 
1000 
10000 
100000 
1000000 
10000000 
100000000 
1000000000

Let us write the SED expression.

[jerry]$ sed -n  /^[0-9]{3}$/ p  numbers.txt 

On executing the above code, you get the following result:

100

Note that the pair of curly braces is escaped by the "" character.

At least n Occurrences {n,}

{n,} matches at least "n" occurrences of the preceding character. The following example prints all the numbers greater than or equal to five digits.

[jerry]$ sed -n  /^[0-9]{5,}$/ p  numbers.txt

On executing the above code, you get the following result:

10000 
100000 
1000000
10000000 
100000000 
1000000000 

M to N Occurrence {m, n}

{m, n} matches at least "m" and at most "n" occurrences of the preceding character. The following example prints all the numbers having at least five digits but not more than eight digits.

[jerry]$ sed -n  /^[0-9]{5,8}$/ p  numbers.txt

On executing the above code, you get the following result:

10000 
100000 
1000000 
10000000 

Pipe (|)

In SED, the pipe character behaves pke logical OR operation. It matches items from either side of the pipe. The following example either matches "str1" or "str3".

[jerry]$ echo -e "str1
str2
str3
str4" | sed -n  /str(1|3)/ p  

On executing the above code, you get the following result:

str1 
str3

Note that the pair of the parenthesis and pipe (|) is escaped by the "" character.

Escaping Characters

There are certain special characters. For example, newpne is represented by " ", carriage return is represented by " ", and so on. To use these characters into regular ASCII context, we have to escape them using the backward slash() character. This chapter illustrates escaping of special characters.

Escaping ""

The following example matches the pattern "".

[jerry]$ echo  str1str2  | sed -n  /\/ p 

On executing the above code, you get the following result:

str1str2 

Escaping " "

The following example matches the new pne character.

[jerry]$ echo  str1
str2  | sed -n  /\n/ p 

On executing the above code, you get the following result:

str1
str2

Escaping " "

The following example matches the carriage return.

[jerry]$ echo  str1
str2  | sed -n  /\r/ p 

On executing the above code, you get the following result:

str1
str2

Escaping "dnnn"

This matches a character whose decimal ASCII value is "nnn". The following example matches only the character "a".

[jerry]$ echo -e "a
b
c" | sed -n  /d97/ p 

On executing the above code, you get the following result:

a

Escaping "onnn"

This matches a character whose octal ASCII value is "nnn". The following example matches only the character "b".

[jerry]$ echo -e "a
b
c" | sed -n  /o142/ p  

On executing the above code, you get the following result:

b 

This matches a character whose hexadecimal ASCII value is "nnn". The following example matches only the character "c".

[jerry]$ echo -e "a
b
c" | sed -n  /x63/ p 

On executing the above code, you get the following result:

c

POSIX Classes of Regular Expressions

There are certain reserved words which have special meaning. These reserved words are referred to as POSIX classes of regular expression. This section describes the POSIX classes supported by SED.

[:alnum:]

It imppes alphabetical and numeric characters. The following example matches only "One" and "123", but does not match the tab character.

[jerry]$ echo -e "One
123
	" | sed -n  /[[:alnum:]]/ p 

On executing the above code, you get the following result:

One 
123

[:alpha:]

It imppes alphabetical characters only. The following example matches only the word "One".

[jerry]$ echo -e "One
123
	" | sed -n  /[[:alpha:]]/ p 

On executing the above code, you get the following result:

One 

[:blank:]

It imppes blank character which can be either space or tab. The following example matches only the tab character.

[jerry]$ echo -e "One
123
	" | sed -n  /[[:space:]]/ p  | cat -vte

On executing the above code, you get the following result:

^I$

Note that the command "cat -vte" is used to show tab characters (^I).

[:digit:]

It imppes decimal numbers only. The following example matches only digit "123".

[jerry]$ echo -e "abc
123
	" | sed -n  /[[:digit:]]/ p  

On executing the above code, you get the following result:

123 

[:lower:]

It imppes lowercase letters only. The following example matches only "one".

[jerry]$ echo -e "one
TWO
	" | sed -n  /[[:lower:]]/ p  

On executing the above code, you get the following result:

one 

[:upper:]

It imppes uppercase letters only. The following example matches only "TWO".

[jerry]$ echo -e "one
TWO
	" | sed -n  /[[:upper:]]/ p 

On executing the above code, you get the following result:

TWO

[:punct:]

It imppes punctuation marks which include non-space or alphanumeric characters

[jerry]$ echo -e "One,Two
Three
Four" | sed -n  /[[:punct:]]/ p 

On executing the above code, you get the following result:

One,Two

[:space:]

It imppes whitespace characters. The following example illustrates this.

[jerry]$ echo -e "One
123f	" | sed -n  /[[:space:]]/ p  | cat -vte 

On executing the above code, you get the following result:

123^L^I$ 

Metacharacters

Like traditional regular expressions, SED also supports metacharacters. These are Perl style regular expressions. Note that metacharacter support is GNU SED specific and may not work with other variants of SED. Let us discuss metacharacters in detail.

Word Boundary ()

In regular expression terminology, "" matches the word boundary. For example, "the" matches "the" but not "these", "there", "they", "then", and so on. The following example illustrates this.

[jerry]$ echo -e "these
the
they
then" | sed -n  /the/ p 

On executing the above code, you get the following result:

the

Non-Word Boundary (B)

In regular expression terminology, "B" matches non-word boundary. For example, "theB" matches "these" and "they" but not "the". The following example illustrates this.

[jerry]$ echo -e "these
the
they" | sed -n  /theB/ p 

On executing the above code, you get the following result:

these 
they

Single Whitespace (s)

In SED, "s" imppes single whitespace character. The following example matches "Line 1" but does not match "Line1".

[jerry]$ echo -e "Line	1
Line2" | sed -n  /Lines/ p 

On executing the above code, you get the following result:

Line 1 

Single Non-Whitespace (S)

In SED, "S" imppes single whitespace character. The following example matches "Line2" but does not match "Line 1".

[jerry]$ echo -e "Line	1
Line2" | sed -n  /LineS/ p  

On executing the above code, you get the following result:

Line2

Single Word Character (w)

In SED, "w" imppes single word character, i.e., alphabetical characters, digits, and underscore (_). The following example illustrates this.

[jerry]$ echo -e "One
123
1_2
&;#" | sed -n  /w/ p 

On executing the above code, you get the following result:

One 
123 
1_2

Single Non-Word Character (W)

In SED, "W" imppes single non-word character which is exactly opposite to "w". The following example illustrates this.

[jerry]$ echo -e "One
123
1_2
&;#" | sed -n  /W/ p 

On executing the above code, you get the following result:

&;#

Beginning of Pattern Space (`)

In SED, "`" imppes the beginning of the pattern space. The following example matches only the word "One".

[jerry]$ echo -e "One
Two One" | sed -n  /`One/ p  

On executing the above code, you get the following result:

One
Advertisements