Tracing on large scale infrastructure

This proposal has been rejected.


One Line Summary

How to best use tracing tools on severals thousands of servers


Using tracing for monitoring issues on a large scale server infrastructure can generate a huge amount of data. What are the best solution to handle that data and to analyse it to gain insight on the health of the kernel on a server fleet.
We heard of the Dapper and Zipkins solutions, is there other frameworks or simple tools to agregate the result of tracing?

Also, how can we facilitate the deployment of the tracing tools either for debugging or monitoring?


  • Yannick Brosseau



    Yannick Brosseau is a Production Engineer on the Kernel team at Facebook. As such he works on improving the stability and performance of the kernels deployed on the Facebook infrastructure and develops testing, monitoring and deployment tools to help in this endeavor. Previously, he was a Research associate at École Polytechnique de Montréal where he worked on performance analysis tools for Linux. He worked on several part of the LTTng project. He was also an open source software consultant for several years and is a Fedora Packager.