2024-04-29
The core function of uniq is to report or omit repeated lines. Critically, uniq only works on consecutive duplicate lines. If you have duplicate lines that are not adjacent, you’ll need to sort the input first.
The basic syntax is straightforward:
uniq [OPTION]... [INPUT [OUTPUT]]Without any options, uniq simply prints the file, omitting repeated consecutive lines. Let’s illustrate with an example:
cat input.txt
apple
banana
banana
orange
apple
apple
grapeuniq input.txt
apple
banana
orange
apple
grapeNotice how the consecutive “banana” and “apple” lines are reduced to single instances.
uniq Commandsuniq offers many options to customize its behavior:
-c (count): Prefix each line with the count of its consecutive occurrences.uniq -c input.txt
1 apple
2 banana
1 orange
2 apple
1 grape-d (repeated lines only): Only print the duplicate lines.uniq -d input.txt
banana
apple-u (unique lines only): Only print the unique lines (lines that appear only once).uniq -u input.txt
orange
grape-i (ignore case): Treat uppercase and lowercase characters as the same. This is useful for handling inconsistencies in capitalization.cat input_case.txt
apple
Apple
banana
Bananauniq -i input_case.txt
apple
banana-f NUM (ignore leading fields): Ignore the first NUM fields when comparing lines. Fields are separated by whitespace by default.cat input_fields.txt
apple 1
banana 2
banana 3
orange 4uniq -f 1 input_fields.txt
apple 1
banana 2
orange 4Here, -f 1 ignores the first field (“apple”, “banana”, etc.) and only compares the second field (the numbers).
The true power of uniq emerges when you combine these options. For instance, to count the occurrences of unique lines regardless of case:
cat input_case.txt | sort | uniq -ic
2 apple
2 bananaThis pipeline first sorts the file to ensure uniq works correctly, then uses -i to ignore case, -c to count, and outputs the unique lines with their counts.
By default, uniq considers whitespace as the field separator. However, you can use tools like awk to preprocess your data if you need a different delimiter. For example, to work with comma-separated values (CSV), you might use awk to reformat the data before piping it to uniq.
uniq forms a fundamental building block in many more complex text processing workflows. It is frequently used in conjunction with other commands like grep, sed, awk, and sort to achieve complex data manipulation and analysis. Its concise syntax and efficient operation make it an essential tool for any Linux user working with text data.