uniq

2024-04-29

Understanding the Basics

The core function of uniq is to report or omit repeated lines. Critically, uniq only works on consecutive duplicate lines. If you have duplicate lines that are not adjacent, you’ll need to sort the input first.

The basic syntax is straightforward:

uniq [OPTION]... [INPUT [OUTPUT]]

Without any options, uniq simply prints the file, omitting repeated consecutive lines. Let’s illustrate with an example:

cat input.txt
apple
banana
banana
orange
apple
apple
grape
uniq input.txt
apple
banana
orange
apple
grape

Notice how the consecutive “banana” and “apple” lines are reduced to single instances.

Key Options: Fine-Tuning Your uniq Commands

uniq offers many options to customize its behavior:

uniq -c input.txt
      1 apple
      2 banana
      1 orange
      2 apple
      1 grape
uniq -d input.txt
banana
apple
uniq -u input.txt
orange
grape
cat input_case.txt
apple
Apple
banana
Banana
uniq -i input_case.txt
apple
banana
cat input_fields.txt
apple 1
banana 2
banana 3
orange 4
uniq -f 1 input_fields.txt
apple 1
banana 2
orange 4

Here, -f 1 ignores the first field (“apple”, “banana”, etc.) and only compares the second field (the numbers).

Combining Options for Powerful Text Manipulation

The true power of uniq emerges when you combine these options. For instance, to count the occurrences of unique lines regardless of case:

cat input_case.txt | sort | uniq -ic
      2 apple
      2 banana

This pipeline first sorts the file to ensure uniq works correctly, then uses -i to ignore case, -c to count, and outputs the unique lines with their counts.

Handling Different Delimiters

By default, uniq considers whitespace as the field separator. However, you can use tools like awk to preprocess your data if you need a different delimiter. For example, to work with comma-separated values (CSV), you might use awk to reformat the data before piping it to uniq.

Beyond Simple Line Comparisons: Advanced Applications

uniq forms a fundamental building block in many more complex text processing workflows. It is frequently used in conjunction with other commands like grep, sed, awk, and sort to achieve complex data manipulation and analysis. Its concise syntax and efficient operation make it an essential tool for any Linux user working with text data.