Linux Data de-duplication

Scheduled: Friday, September 25, 2009 from 10:50 – 11:35am in Salon E


Data de-duplication is a effective way to reduce large storage needs by eliminating redundant data, a hot demanded feature for virtualization OS image sharing and efficient data storage backups. It's really valuable to add data de-duplication support to Linux filesystem, however the feature is quite challenging too. How to get it right? What's the performance impact? Block level or file level? On the fly data de-duplication in filesystem or background userspace de-duplication?


By Data de-duplication,only one unique instance of the data is actually retained on storage media, such as disk or tape or a virtual machine image. This not only reduces large storage needs but also and reduce the corresponding cost of backup, disaster recovery and power consumption. It becomes a hot topic with increasing need for large storage. Adding support of data de-duplication today makes Linux much stronger in enterprise environment. Whether support data de-duplication on the air within filesystem or perform delayed user-space de-duplication have different challenges. This talk will discuss about the challenges of both methods, options, describe several ideas discussed in the btrfs community. This talk encourages inputs and ideas from audience about future plans about data deduplication for linux.


storage, data deduplication, filesystem


  • Biography

    Mingming have been working in IBM Linux Technology Center for years. She is interested in filesystem, storage, IO and kernel scalability.

Leave a private comment to organizers about this proposal