Cheat Sheet - String Substitution

sed tricks

# Delete leading whitespace (spaces/tabs) from front of each line
# (this aligns all text flush left). '^t' represents a true tab
# character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
sed 's/^[ ^t]*//' file

# Delete trailing whitespace (spaces/tabs) from end of each line
sed 's/[ ^t]*$//' file               # see note on '^t', above

# Delete BOTH leading and trailing whitespace from each line
sed 's/^[ ^t]*//;s/[ ^]*$//' file    # see note on '^t', above
sed 's/^\s*//; s/\s*$//' file        # \s matches whitespace

# Substitute "foo" with "bar" on each line
sed 's/foo/bar/' file        # replaces only 1st instance in a line
sed 's/foo/bar/4' file       # replaces only 4th instance in a line
sed 's/foo/bar/g' file       # replaces ALL instances within a line

# Substitute "foo" with "bar" ONLY for lines which contain "baz"
sed '/baz/s/foo/bar/g' file

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
sed '/baz/!s/foo/bar/g'

# change "scarlet" or "ruby" or "puce" to "red"
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'   # most seds
gsed 's/scarlet\|ruby\|puce/red/g'                # GNU sed only

# print the line immediately before a regexp, but not the line
# containing the regexp
sed -n '/regexp/{g;1!p;};h'

# print the line immediately after a regexp, but not the line
# containing the regexp
sed -n '/regexp/{n;p;}'

# print all of file EXCEPT section between 2 regular expressions
sed '/Iowa/,/Montana/d'

# delete lines matching pattern
sed '/pattern/d'

# Delete all CONSECUTIVE blank lines from file except the first.
# This method also deletes all blank lines from top and end of file.
# (emulates "cat -s")
sed '/./,/^$/!d' file       # this allows 0 blanks at top, 1 at EOF
sed '/^$/N;/\n$/D' file     # this allows 1 blank at top, 0 at EOF

# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file

# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file

# If a line ends with a backslash, join the next line to it.
sed -e :a -e '/\\$/N; s/\\\n//; ta' file


sed '1~3d' file      # delete every 3d line, starting with line 1
                     # deletes lines 1, 4, 7, 10, 13, 16, ...

sed '0~3d' file      # deletes lines 3, 6, 9, 12, 15, 18, ...

sed -n '2~5p' file   # print every 5th line, starting with line 2
                     # prints lines 2, 7, 12, 17, 22, 27, ...

OPTIMIZING FOR SPEED: If execution speed needs to be increased (due to large input files or slow processors or hard disks), substitution will be executed more quickly if the find expression is specified before giving the s/.../.../ instruction. Thus:

sed 's/foo/bar/g' filename         # standard replace command
sed '/foo/ s/foo/bar/g' filename   # executes more quickly
sed '/foo/ s//bar/g' filename      # shorthand sed syntax

On line selection or deletion in which you only need to output lines from the first part of the file, a “quit” command (q) in the script will drastically reduce processing time for large files. Thus:

sed -n '45,50p' filename           # print line nos. 45-50 of a file
sed -n '51q;45,50p' filename       # same, but executes much faster

Note: extended regexp mode -E allows you to use () {} + etc as metacharacters without having to escape them.

GNU/POSIX extensions to regular expressions

GNU sed supports “character classes” in addition to regular character sets, such as [0-9A-F]. Like regular character sets, character classes represent any single character within a set.

Character classes are a new feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but where the actual characters themselves can vary from country to country and/or from character set to character set. For example, the notion of what is an alphabetic character differs in the USA and in France.

From the docs for GNU awk v3.1.0

Though character classes don’t generally conserve space on the line, they help make scripts portable for international use. The equivalent character sets for U.S. users follows:

[[:alnum:]]  - [A-Za-z0-9]     Alphanumeric characters
[[:alpha:]]  - [A-Za-z]        Alphabetic characters
[[:blank:]]  - [ \x09]         Space or tab characters only
[[:cntrl:]]  - [\x00-\x19\x7F] Control characters
[[:digit:]]  - [0-9]           Numeric characters
[[:graph:]]  - [!-~]           Printable and visible characters
[[:lower:]]  - [a-z]           Lower-case alphabetic characters
[[:print:]]  - [ -~]           Printable (non-Control) characters
[[:punct:]]  - [!-/:-@[-`{-~]  Punctuation characters
[[:space:]]  - [ \t\v\f]       All whitespace chars
[[:upper:]]  - [A-Z]           Upper-case alphabetic characters
[[:xdigit:]] - [0-9a-fA-F]     Hexadecimal digit characters

Note that [[:graph:]] does not match the space “ “, but [[:print:]] does. Some character classes may (or may not) match characters in the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on which C library was used to compile sed. For non-English languages, [[:alpha:]] and other classes may also match high ASCII characters.

Example:

Remove leading whitespace chars:

$ sed -e 's/^[[:space:]]*//g' filename

Remove trailing whitespace chars:

```sh $ sed -e ‘s/[[:space:]]*$//g’ filename ´´´

Last updated