2024-04-29
The core function of uniq
is to report or omit repeated lines. Critically, uniq
only works on consecutive duplicate lines. If you have duplicate lines that are not adjacent, you’ll need to sort the input first.
The basic syntax is straightforward:
uniq [OPTION]... [INPUT [OUTPUT]]
Without any options, uniq
simply prints the file, omitting repeated consecutive lines. Let’s illustrate with an example:
cat input.txt
apple
banana
banana
orange
apple
apple
grape
uniq input.txt
apple
banana
orange
apple
grape
Notice how the consecutive “banana” and “apple” lines are reduced to single instances.
uniq
Commandsuniq
offers many options to customize its behavior:
-c
(count): Prefix each line with the count of its consecutive occurrences.uniq -c input.txt
1 apple
2 banana
1 orange
2 apple
1 grape
-d
(repeated lines only): Only print the duplicate lines.uniq -d input.txt
banana
apple
-u
(unique lines only): Only print the unique lines (lines that appear only once).uniq -u input.txt
orange
grape
-i
(ignore case): Treat uppercase and lowercase characters as the same. This is useful for handling inconsistencies in capitalization.cat input_case.txt
apple
Apple
banana
Banana
uniq -i input_case.txt
apple
banana
-f NUM
(ignore leading fields): Ignore the first NUM
fields when comparing lines. Fields are separated by whitespace by default.cat input_fields.txt
apple 1
banana 2
banana 3
orange 4
uniq -f 1 input_fields.txt
apple 1
banana 2
orange 4
Here, -f 1
ignores the first field (“apple”, “banana”, etc.) and only compares the second field (the numbers).
The true power of uniq
emerges when you combine these options. For instance, to count the occurrences of unique lines regardless of case:
cat input_case.txt | sort | uniq -ic
2 apple
2 banana
This pipeline first sorts the file to ensure uniq
works correctly, then uses -i
to ignore case, -c
to count, and outputs the unique lines with their counts.
By default, uniq
considers whitespace as the field separator. However, you can use tools like awk
to preprocess your data if you need a different delimiter. For example, to work with comma-separated values (CSV), you might use awk
to reformat the data before piping it to uniq
.
uniq
forms a fundamental building block in many more complex text processing workflows. It is frequently used in conjunction with other commands like grep
, sed
, awk
, and sort
to achieve complex data manipulation and analysis. Its concise syntax and efficient operation make it an essential tool for any Linux user working with text data.