Renaming files inside tarballs

Posted on by Chris Warburton

I recently needed to alter the contents of a .tar.gz file (a “tarball”), to remove the first component of every path, e.g. turning an entry like foo/bar/baz.txt into just bar/baz.txt. It took me an extra-ordinary amount of searching to come across a solution; with most forum/QA responses stating that it can’t be done without either extracting all the files, or resorting to overkill solutions like FUSE filesystems or Perl scripts.

I did come across a couple of tools which claim to do this, but couldn’t get them to work:

Eventually I discovered this can be done using bsdtar, which is provided by libarchive in Nixpkgs.

To create a new file with the contents of an existing file (rather than just including that file itself), just prefix the filename argument with an @.

For example, let’s make a file called foo, and create an original.tar.gz file containing it:

$ echo hello > foo
$ tar czf original.tar.gz foo

Let’s list the contents of that tarball:

$ tar tf original.tar.gz
foo

Great, now let’s delete the file foo to avoid any confusion:

$ rm foo
$ ls
original.tar.gz

Now let’s use bsdtar to create a new .tar.gz file, using its -s argument to transform the paths (for this example, we’ll replace all o letters with the digit 0). If we give it original.tar.gz as an argument, we’ll just get a tarball-in-a-tarball:

$ bsdtar -s '/o/0/g' -c -z -f wrapped.tar.gz original.tar.gz
$ tar tf wrapped.tar.gz
0riginal.tar.gz

If we instead give it @original.tar.gz, it will use the contents of that archive instead (i.e. the entry foo):

$ bsdtar -s '/o/0/g' -c -z -f unwrapped.tar.gz '@original.tar.gz'
$ tar tf unwrapped.tar.gz
f00

Removing prefixes

Whilst the -s argument is very powerful (giving us sed-like regular expressions), my use-case of removing the leading directory from paths is common enough to have its own option: --strip-components 1