Over the last 25 years, the linux kernel has grown to become one of the world’s largest open source project. Overall, the authors have written over 22 million lines of code (including documentation) spread over 50,000 files. As these numbers increase, it is becoming increasingly more difficult to keep track of who contributed what in the kernel.

This is why we are introducing the “linux contributor treemap”, a tool allowing the user to visualize the different metrics about the entirety of the kernel source code. Those metrics include an aggregated view of the top authors in the whole kernel as well as their number of lines of code contributed, age of the first and last commit, a link to the email list discussing the multiple patches regarding the different files, and a link to the tokenized version of the files. The tool automatically extracts the data from the linux git repository, stores it in a database, then formats it for the treemap visualization.

In this talk, I will introduce the tool, discuss the techniques used to retrieve the data, as well as future work directions. The audience will also be able to test the tool during the presentation to get a hands on experience and hopefully recommend interesting features that would make the tool useful.


    Alexandre Courouble

    Polytechnique Montreal


    I am currently a masters student in computer science engineering at Polytechnique Montreal. My research is in mining software repositories. I am currently mining the linux git repository.

