Linux awk Command

Linux awk Command
what is awk Command ?
The basic function of awk is to search files for lines (or other units of text) that contain a pattern. When a line matches, awk performs a specific action on that line.
The Program statement that tells awk what to do; consists of a series of "rules". Each rule specifies one pattern to search for, and one action to perform when that pattern is found.
Linux awk Command

How to Use Awk to Find and Sort Text in Linux ?

What do you do?
In its simplest usage awk is meant for processing column-oriented text data, such as tables, presented to it on standard input. The variables $1, $2, and so forth are the contents of the first, second, etc. column of the current input line. For example, to print the second column of a file, you might use the following simple awk script:
awk < file '{ print $2 }'
This means "on every line, print the second field".

To print the second and third columns, you might use
awk < file '{ print $2, $3 }'

Input separator
By default awk splits input lines into fields based on whitespace, that is, spaces and tabs. You can change this by using the -F option to awk and supplying another character. For instance, to print the home directories of all users on the system, you might do
awk < /etc/passwd -F: '{ print $6 }'
since the password file has fields delimited by colons and the home directory is the 6th field.

Awk is a weakly typed language; variables can be either strings or numbers, depending on how they're referenced. All numbers are floating-point. So to implement the fahrenheit-to-celsius calculator, you might write
awk '{ print ($1-32)*(5/9) }'

which will convert fahrenheit temperatures provided on standard input to celsius until it gets an end-of-file.

The selection of operators is basically the same as in C, although some of C's wilder constructs do not work. String concatenation is accomplished simply by writing two string expressions next to each other. '+' is always addition. Thus
echo 5 4 | awk '{ print $1 + $2 }'
prints 9, while
echo 5 4 | awk '{ print $1 $2 }'

prints 54. Note that
echo 5 4 | awk '{ print $1, $2 }'
prints "5 4".

awk has some built-in variables that are automatically set; $1 and so on are examples of these. The other builtin variables that are useful for beginners are generally NF, which holds the number of fields in the current input line ($NF gives the last field), and $0, which holds the entire current input line.

You can make your own variables, with whatever names you like (except for reserved words in the awk language) just by using them. You do not have to declare variables. Variables that haven't been explicitly set to anything have the value "" as strings and 0 as numbers.

For example, the following code prints the average of all the numbers on each line:
awk '{ tot=0; for (i=1; i<=NF; i++) tot += $i; print    tot/NF; }'

 Note the use of $i to retrieve the i'th variable, and the for loop, which works like in C. The reason tot is explicitly initialized at the beginning is that this code is run for every input line, and when starting work on the second line, tot will have the total value from the first line.
It might seem silly to do that. Probably, you have only one set of numbers to add up. Why not put each one on its own line? In order to do this you need to be able to print the results when you're done. The way you do this is like this:
awk '{ tot += $1; n += 1; }  END { print tot/n; }'

Note the use of two different block statements. The second one has END in front of it; this means to run the block once after all input has been processed. In fact, in general, you can put all kinds of things in front of a block, and the block will only run if they're satisfied. That is, you can say
awk ' $1==0 { print $2 }'

which will print the second column for lines of input where the first column is 0. You can also supply regular expressions to match the whole line against:
awk ' /^test/ { print $2 }'

If you put no expression, the block is run on every line of input. If multiple blocks have conditions that are true, they are all run. There is no particularly clean way I know of to get it to run exactly one of a bunch of possible blocks of code.

The block conditions BEGIN and END are special and are run before processing any input, and after processing all input, respectively.
Other language constructs
As hinted at above, awk supports loop and conditional statements like in C, that is, for, while, do/while, if, and if/else.
awk includes a printf statement that works essentially like C printf. This can be used when you want to format output neatly or combine things onto one line in more complex ways (print implicitly adds a newline; printf doesn't.)

Here's how you strip the first column off:
awk '{ for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n"; }'

Note the use of NF to iterate over all the fields and the use of printf to place newlines explicitly.
awk or gawk (gnu awk)
Find and Replace text, database sort/validate/index
awk <options> 'Program' Input-File1 Input-File2 ...

awk -f PROGRAM-FILE <options> Input-File1 Input-File2 ...

 -F FS
 --field-separator FS
     Use FS for the input field separator (the value of the `FS'
     predefined variable).
     Read the awk program source from the file PROGRAM-FILE, instead
     of from the first command line argument.
 -mf NNN
 -mr NNN
     The `f' flag sets the maximum number of fields, and the `r' flag
     sets the maximum record size.  These options are ignored by
     `gawk', since `gawk' has no predefined limits; they are only for
     compatibility with the Bell Labs research version of Unix awk.
 --assign VAR=VAL
     Assign the variable VAR the value VAL before program execution
 -W traditional
 -W compat
     Use compatibility mode, in which `gawk' extensions are turned off.
 -W lint
    Give warnings about dubious or non-portable awk constructs.
 -W lint-old
     Warn about constructs that are not available in the original
     Version 7 Unix version of awk.
 -W posix
     Use POSIX compatibility mode, in which `gawk' extensions are
     turned off and additional restrictions apply.
 -W re-interval
     Allow interval expressions, in regexps.
 --source PROGRAM-TEXT
     Use PROGRAM-TEXT as awk program source code.  This option allows
     mixing command line source code with source code from files, and is
     particularly useful for mixing command line programs with library
     Signal the end of options.  This is useful to allow further
     arguments to the awk program itself to start with a `-'.  This
     is mainly for consistency with POSIX argument parsing conventions.
     A series of patterns and actions: see below
     If no Input-File is specified then awk applies the Program to
     "standard input", (piped output of some other command or the terminal.
     Typed input will continue until end-of-file (typing `Control-d')
Linux Commands
July 14, 2014



Popular Posts

USB WiFi adapters that support monitor mode and wireless injection

usb wifi adapter with monitor mode and wireless injection (100% compatibl…

how to update Kali Linux and Fix update error

Kali Linux one of the best Linux distro for penetration testing it's freq…

Fix Kali Linux sources.list Repositories

Fix default repository First after installing a clean Kali Linux the sou…

Recent Comments

Contact Me