Proposals

On predicting predictors: hacking archive formats for fun and prophecy

*
Talk
lpc2009-0083

Excerpt

We aim to inform you about the archive formats you use every day. We will include an in-depth look at the tar, ar, cpio, gzip, bzip2, and deb formats, as well as the internals of the Git object store. Armed with this information, we will show you a practical application: removing the redundancy between files in version control and distributions of source and binaries.

Description

Existing projects like pristine-tar focus on finding the right options to the compression code to reproduce the file from the uncompressed data (“gzip -9 —rsyncable”), treating the file formats as magic black boxes. Our in-depth analysis of archive formats lets us record just enough information to reproduce any archive regardless of the tool used to produce it.

Tags

archive, file formats, compression, pristine, tar, cpio, gzip, bzip2, ar, deb, git

Speakers

Leave a private comment to organizers about this proposal