11 August 2018

Truncate files using unix command tools

by Mallikarjun

Overview

Empty the contents of file
Truncate tail end of the file contents
Truncate top end of the file contents
Truncate arbitrary part of the file contents
Truncate the file contents by number of lines
References

Empty the contents of file

This command empties the file using redirection operator. From my experiments using time command, it is found to be faster than methods shown below.

$ ls -l file.txt
-rw-r--r-- 1 none none 5019968 Mar 31 18:32 file.txt
$ > file.txt
$ ls -l file.txt
-rw-r--r-- 1 none none 0 Mar 31 20:50 file.txt

» redirection operator is a way of redirecting information from standard streams to user defined locations

This command truncates the size of the file to 0 bytes. Also, you can fill up file with NULL -- 00 hex values of specific size greater than the actual file size, for example > file.txt; truncate --size 1024 file.txt is perfectly valid command.

$ ls -l file.txt
-rw-r--r-- 1 none none 5019968 Mar 31 18:32 file.txt
$ truncate --size 0 file.txt
$ ls -l file.txt
-rw-r--r-- 1 none none 0 Mar 31 20:50 file.txt

» truncate : command to shrink or enlarge a file to a specified size.

» --size : argument specifies the size to which file should be shrunk or enlarged to.

This command uses null device to write out empty data to file.txt. There is another way to do the same, cp /dev/null file.txt, which I think is slower than the above methods.

$ ls -l file.txt
-rw-r--r-- 1 none none 5019968 Mar 31 18:32 file.txt
$ cat /dev/null > file.txt
$ ls -l file.txt
-rw-r--r-- 1 none none 0 Mar 31 20:50 file.txt

» /dev/null : is a special device called null device on Unix-like operating systems which provides no data when you read from it nor does it store any data when you write to it.

Truncate tail end of the file contents

When you truncate a file to a specific size, be sure that actual size of the file is greater than the specified size and truncate command always removes/adds from/to the tail end of the file. If file is to be enlarged, it fills up the rest of the space with unix null character (\x0).

$ ls -l file.txt
-rw-r--r-- 1 none none 5019968 Mar 31 18:32 file.txt
$ truncate --size 1024 file.txt
$ ls -l file.txt
-rw-r--r-- 1 none none 1024 Mar 31 20:50 file.txt

If you want to trim off some arbitrary size based on the actual size of the file, say trim by half the size. Then you can use awk and wc to calculate the size to trim.

$ ls -l file.txt
-rw-r--r-- 1 none none 5019968 Mar 31 18:32 file.txt
$ wc file.txt | truncate --size `awk -F " " '{printf "%d\n", ($3/2)}'` file.txt
$ ls -l file.txt
-rw-r--r-- 1 none none 2509984 Mar 31 20:57 file.txt

» wc : command to print lines, words and bytes count read from a stream

» awk : a handy yet powerful text processing language.

» -F or --field-separator : to specify input stream field separator, in this case it is spaces.

Truncate top end of the file contents

There isn’t a easy way to truncate arbitrary part of the file, split does the trick but it requires twice the file size. split helps in splitting the file into 2 parts(or any number of parts using --number=n parameter) and mv to rename part which we want into the actual file.

$ ls -l file.txt
-rw-r--r-- 1 none none 133054207 Apr  6 10:15 file.txt
$ split --suffix-length=1 --number=2 file.txt output; mv outputa file.txt; rm outputb
$ ls -l file.txt 
-rw-r--r-- 1 none none 66527103 Apr  6 10:16 file.tx

» split : split the files in various ways, checkout man pages for more info. This takes a prefix at the end of the command, which is used as a prefix to the newly created split files, for example output is the prefix we have used above.

» -a or --suffix-length : is the length of the suffix used on the split files to distinguish between them. For example, we have outputa and outputb in the above example

» -n or --number : is the number of files to generate as split output.

» mv : rename files.

» rm : remove files.

WARNING: This command needs twice the disk space of the file you are truncating.

Truncate arbitrary part of the file contents

This is very similar previous section except we have changed a few things like --number=3 and the choice of the split file we are interested in.

$ ls -l file.txt
-rw-r--r-- 1 none none 133054207 Apr  6 10:15 file.txt
$ split --suffix-length=1 --number=3 file.txt output; mv outputb file.txt; rm outputa outputc
$ ls -l file.txt 
-rw-r--r-- 1 none none 44351402 Apr  6 10:16 file.tx

WARNING: This command needs twice the disk space of the file you are truncating.

Truncate the file contents by number of lines

This again is very similar to previous section except that we use --lines argument of split.

$ wc file.txt 
994046  3574572 44351402 file.txt
$ split --suffix-length=1 --lines=100000 file.txt output; mv outputa file.txt; rm output*;
$ wc file.txt 
100000  359611 4460876 file.txt

» --lines : This is a split command argument by we can choose the number of lines each split file contains.

WARNING: This command needs twice the disk space of the file you are truncating.

You have to do a little more circus to retain last n lines. Because if you use split, last file isn’t guaranteed to be n lines for obvious reasons and hard to determine the last split chunk file name. This method is least efficient of the previous ones because of the use of cat instead of mv as in earlier cases.

$ wc file.txt
994046  3574572 44351402 file.txt
$ wc file.txt | split --suffix-length=1 --lines=`awk -F " " '{printf "%d\n", ($1-100000)}'` file.txt output; rm outputa; cat output* > file.txt
$ wc file.txt 
100000  359611 4460876 file.txt

References

» Linux Man pages

tags: file - truncate - command - tool - command-line