It’s all fun and games until a build fails

Being a company that is centered around workplace engagement, it should come as no surprise that we are constantly looking for ways in which we can gamify our internal teams. Given that sales teams are our bread and butter it came quite naturally to us to set up internal games arounds things like calls made, demos set and leads prospected. Similarly, our marketing team quickly adopted metrics around social media posts. This led to some fierce (but friendly) competition between our internal teams.

This still left us with the unanswered question of how to gamify a dev team. I’m sure we’ve all heard the horror stories of technically illiterate management setting ‘lines of code written’ as a metric. We vowed to make sure that whatever process we used to gamify the development process, it should promote good development practices and, most importantly of all, it had to be fun.

What we looked for in development metrics

1. Objectiveness

We wanted to make sure that the process of tracking these metrics should be intuitive and require little overhead. If entering information about metrics was a chore then fun flies right out the window. The solution to this was to rely on integrations as much as possible. We found this played quite nicely with our natural instincts as developers to automate as much as possible.

The one consideration with this was that, due to not having an in house NLP team, we would need to limit our metrics to things that were purely objective. While a metric around “Good Code Written” would be nice, it would also be far too subjective.

2. Promotes best practices

While the example of “Lines of code written” was a bite extreme, we did find that some of our early ideas for metrics ended up promoting poor development practices. As an example of a scrapped metric:

Number of commits

While the initial idea behind this was to promote atomic commits, we found that it actually prompted people to go too far in the opposite direction. One extreme case lead to the point of a commit per line change

3. Human oversight

A turning point came when we realised that even the best laid developer metrics would require human oversight. One of the metrics that we ended up settling on, Unit Tests Written, clearly has the potential to be exploited should an eager developer write 30 tests asserting that 1 == 1. However, with proper oversight and developer management in other facets of the process this won’t happen. Extending this idea we decided that any potential way to exploit a metric would be considered acceptable if it could be caught as part of the normal level of human oversight (in this case code review). In the scrapped example of number of commits we ultimately decided that oversight on a commit by commit basis was too extreme.

4. Frequently occurring

This point isn’t limited to development teams but is still highly relevant. Gamification is most effective when it is able to provide frequent and continual feedback loops. While we might be tempted to create metrics around milestones hit, a feedback loop of ~14 days starts to lose a lot of value very quickly. While not a hard and fast rule, we decided that all chosen metrics should be things that a developer could reasonably achieve at least once per day.

Development Team Metrics in Arcade

What metrics did we choose?

1. Unit Tests Written

As the name suggests, this metric is based around how many unit tests have been written as part of a merged branch. We’ve found that it has been phenomenal in prompting people to write more comprehensive unit tests and to write them more frequently. Stringent code review has ensured that few (if any) vanity tests make it through and it’s definitely shifted the mindset from testing being a chore to something that people are itching to do.

2. Cards Closed

This metric ties in to the number of features that a developer delivers. We did find with this one that it actually prompted us to be more concise in our tasks and ultimately lead us to the point of a “card” being equal to roughly a days work. Due to our desire for code quality it’s actually ended up with a few people drastically exceeding their target for this metric (e.g. we are not wanting people to do 40 days work in a month) but it has meant that our tasks are more clearly defined, more likely to get done on time and we have a far better idea of what realistic output is.

3. Builds Passed

This was a metric that we added at a later date (roughly two months after #1 and #2) when it was noticed that a level of haphazardness was developing around pushing failing builds up to master. We introduced a metric that rewarded people for staying at > 95% passing builds pushed and the failing builds habit was soon stamped out.

How do we track the metrics?

We currently use CircleCI for our continuous integration. Our CI setup includes a post-deploy script that checks the output file from our test suite (RSpec for rails or Jest for react) and sends that information to the Arcade backend. For an idea of what the script looks like:


sha, email, date = `git show -s --format="%h,%aE,%ai"`.strip.split(',')
test_count = `bundle exec rspec --require "./spec/dry_run_formatter.rb" --format DryRunFormatter spec`.strip
repo_name = "api"

command =  <<-sha
curl -X POST --data-urlencode \
'payload={"sha": "#{sha}", "email": "#{email}", "test_count": "#{test_count}", "date": "#{date}", "repo_name": "#{repo_name}"}'\

exec command

This one comes courtesy of the fine folks at Github. We have a webhook configured for all of our main repos that fires when we merge a PR into master. We then take that payload information in the Arcade backend and add an event against the corresponding metric for the authoring user

Also through CircleCI we’ve configured a ‘build complete’ notification to fire from which we strip the build status (either fixed, passed or failed) and create the corresponding sale.

The results we’ve seen

Just by implementing the above we saw a dramatic culture shift in a positive direction! Writting Unit tests is no longer a chore, builds failing a fewer and farther between and scoping out tasks is now done with a much higher level of rigour.

In a future post I’ll touch on the concrete numeric improvements, but as both I and the wonderful team around me can atest to, this has most definitely be a change for the better!

If you’d like to hear more about how we achieved the above or learn about how you could use Arcade to incentivise your development team, please do get in in touch with us.