2024-01-12
The simplest usage of split
involves specifying the input file and the desired prefix for the output files. split
will create files with sequentially numbered suffixes.
split -l 1000 my_large_file.txt my_large_file_
This command splits my_large_file.txt
into smaller files, each containing 1000 lines. The output files will be named my_large_file_aa
, my_large_file_ab
, my_large_file_ac
, and so on. The -l
option specifies the number of lines per output file.
Let’s try splitting a file into files of a specific size instead of a number of lines:
split -b 100k my_large_file.txt my_large_file_
This command does exactly the same, but instead of specifying number of lines, it specifies size of each output file in bytes. 100k
stands for 100 kilobytes. You can use other suffixes like m
(megabytes), g
(gigabytes), etc.
split
offers many options to fine-tune the splitting process. Let’s look at some of them:
Specifying the Output File Suffix Length:
The default suffix length is two characters (e.g., aa
, ab
, ac
). You can change this using the -d
option, which uses numeric suffixes instead of alphanumeric ones. You can also control the number of digits using the -a
option.
split -d -a 3 -l 1000 my_large_file.txt my_large_file_
This command will create files like my_large_file_000
, my_large_file_001
, my_large_file_002
, etc., each with 1000 lines.
Splitting Based on a Specific Number of Files:
If you need a precise number of output files, the -n
option is your go-to.
split -n 5 my_large_file.txt my_large_file_
This will split my_large_file.txt
into exactly five files. split
will calculate the optimal number of lines or bytes per file to achieve this. You can also use suffixes like k
, m
, g
here too to specify number of lines. For example, -n 5k
means split into 5000 lines.
Handling Files Larger Than Specified Size/Lines:
If you use -l
or -b
with numbers that result in last file being less than specified number of lines or bytes, split
will still create a last file containing the remaining lines/bytes. However, there is a --filter=command
option to handle processing of each file before it’s written to disk. This can be useful in more complex scenarios. For example, one could compress each chunk before writing it to disk:
split --filter='gzip > $FILE.gz' -l 1000 my_large_file.txt my_large_file_
This will compress each 1000-line chunk with gzip
as it is being created.
Using a Different Suffix:
By default split
uses suffix after the prefix, but you can specify what it will append instead of automatically generated suffixes:
split -l 1000 my_large_file.txt - my_large_file_
This will create files like my_large_file_001
, my_large_file_002
and so on. Essentially, it overrides the auto-generated suffix with -
. You can put any string in that place, which will be appended to each output file name.
These examples demonstrate the versatility of the split
command. By combining different options, you can tailor the splitting process to suit your specific needs. Remember to consult the man split
page for a detailed list of all available options and their functionalities.