Posted By: Anonymous
I’ve got a large (by number of lines) plain text file that I’d like to split into smaller files, also by number of lines. So if my file has around 2M lines, I’d like to split it up into 10 files that contain 200k lines, or 100 files that contain 20k lines (plus one file with the remainder; being evenly divisible doesn’t matter).
I could do this fairly easily in Python but I’m wondering if there’s any kind of ninja way to do this using bash and unix utils (as opposed to manually looping and counting / partitioning lines).
Have you looked at the split command?
$ split --help Usage: split [OPTION] [INPUT [PREFIX]] Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N use suffixes of length N (default 2) -b, --bytes=SIZE put SIZE bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file -d, --numeric-suffixes use numeric suffixes instead of alphabetic -l, --lines=NUMBER put NUMBER lines per output file --verbose print a diagnostic to standard error just before each output file is opened --help display this help and exit --version output version information and exit
You could do something like this:
split -l 200000 filename
which will create files each with 200000 lines named
xaa xab xac …
Another option, split by size of output file (still splits on line breaks):
split -C 20m --numeric-suffixes input_filename output_prefix
creates files like
output_prefix01 output_prefix02 output_prefix03 ... each of max size 20 megabytes.