Comparing and Manipulating File Content with Sort and Uniq
Continuing our exploration of command-line file manipulation, this part of the blog focuses on the sort
and uniq
commands. We'll delve into how these commands, when paired with regular expressions, can efficiently compare and manipulate file content.
Sort Command
The sort
command is instrumental in arranging lines within a file, be it alphabetically or numerically. Regular expressions come into play when defining custom sorting criteria.
Example 1: Sort lines in a file alphabetically
sort filename.txt
Example 2: Reverse sort lines based on the second column (numerically)
sort -k2,2nr data.txt
Example 3: Sort lines based on a custom pattern (e.g., month abbreviation)
sort -t"-" -k2,2M -k3,3n dates.txt
Example 4: Sort lines based on the last word in each line
sort -t" " -kNF file.txt
Example 5: Sort lines ignoring leading whitespaces
sort -b data.txt
Uniq Command
The uniq
command, as the name suggests, is designed to identify and filter out repeated lines within a file. Regular expressions add depth to its functionality.
Example 1: Display only unique lines in a sorted file
sort data.txt | uniq
Example 2: Count and display the number of occurrences of each line
sort logfile.txt | uniq -c
Example 3: Display only repeated lines in a sorted file
sort data.txt | uniq -d
Example 4: Display only unique lines, ignoring the first 5 characters
sort file.txt | uniq -s 5
Example 5: Display only the first occurrence of each repeated line
sort data.txt | uniq -u
By combining sort
and uniq
with regular expressions, you gain precise control over how data is ordered and filtered. These commands, when used in conjunction with the previously discussed ones, provide a comprehensive toolkit for efficient file content manipulation on the command line.
Understanding these examples and experimenting with various patterns will empower you to handle a wide range of file manipulation tasks effectively. Regular expressions, as the common thread, tie together these commands into a cohesive and powerful set for text processing and analysis.