jueves, 19 de enero de 2017

git contribution spans (emacs case study)

 Let's play a little, just for the fun of it, and to gather some metainformation about a codebase.

I sometimes would like to find out the commit spans of the different authors. I may not care about the number of commits, but only first and last. This info may be useful to know if a repo has most long time commiters or the majority of contributors are one-off, or one-feature-and-forget.

First we'll sharpen a bit our unix chainsaw:

Here's a little helper that has proven to be very useful. It's similar to uniq, but you don't need to sort first, and also it accepts a parameter meaning the column you want to be unique. (Unique by Column)

function uc () {

    awk -F" " "!_[\$$1]++"


And then, you only have to look at man git-log to find out you can reverse the order of the logs, so you can track the first and the last appearence of each commiter.

git log --format='%aE %ai'  | uc 1 | sort  >/tmp/last

git log --reverse --format='%aE %ai -> '  | uc 1 | sort >/tmp/first

paste /tmp/first /tmp/last > /tmp/spans.txt

Here is the output file. it has some spurious parsing errors, but I think it still shows how powerful insights we can get with just a bit of bash, man and pipelines.

Just by chance, the day after I tried this, I discovered git-quick-stats, which is a utility tool to get simple stats out of a git repo. It's great that I could add the same functionality there too via a Pull Request :).

Big thanks to every one of the commiters in emacs, being long or short span and amount of code contributed. Thanks to y'all