Performance, Benchmarking and Allocations in Go

When performance matters and you’ve already ruled out the usual suspects (e.g. blocking operations), unnecessary memory allocations are a good metric to look at.

Fortunately, Go lets you analyze this simply by adding b.ReportAllocs() at the end of your benchmarking functions, like so:

func BenchmarkSomething(b *testing.B) {
	for n := 0; n < b.N; n++ {
		// ...
	}
	b.ReportAllocs()
}

You can also just pass the -test.benchmem flag to make all your benchmarks report allocations.

Adding b.ReportAllocs() will add a new column to the benchmark results that reports the number of allocations per operations.

Let’s try this out with an actual example:

func Task() []bool {
	var elements []bool
	for i := 0; i < 1000; i++ {
		elements = append(elements, i % 2 == 0)
	}
	return elements
}

And now the benchmark:

func BenchmarkTask(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Task()
	}
	b.ReportAllocs()
}

Which yields the following result:

BenchmarkTask-8    	 1000000	      1026 ns/op	    2040 B/op	       8 allocs/op

As you can see, our simple Task() has a whopping 8 allocations, which is quite a lot given what it’s doing. This isn’t about finding where the issues are, but rather what indicators you should be looking at, so I won’t go into an in-depth explanation, but to cut to the chase, because we’re not specifying a size for the slice, the append function will allocate a new underlying array every time the slice’s capacity is reached.

If we use make to define the initial length and capacity of the slice, these allocations will disappear, as append will not allocate a new underlying array.

func Task() []bool {
	elements := make([]bool, 0, 1000)
	for i := 0; i < 1000; i++ {
		elements = append(elements, i % 2 == 0)
	}
	return elements
}

Running the benchmark again will yield the following results:

BenchmarkTask-8     	 1730910	       690 ns/op	    1024 B/op	       1 allocs/op

Not only did we go from 8 to 1 allocation per operation, we’ve also reduced the duration per operation from 1026ns to 690ns, an improvement of nearly 33%.

Generally, minimizing the number of allocations is the best way to keep your optimize the performance of your applications, but that doesn’t mean that allocations are the only thing you should be looking at.

Using elements := make([]bool, 0, 1000) instead of var elements []bool did remove the useless allocations, but this doesn’t mean that you can no longer optimize this function. In fact, the usage of the append function is not as efficient as directly assigning the value, because append has to keep an eye on the length and capacity of the slice.

If we replace append by an assignment and we specify only the length rather than the length and the capacity on the make function, we should be able to get slightly better results:

func Task() []bool {
	elements := make([]bool, 1000)
	for i := 0; i < 1000; i++ {
		elements[i] = i % 2 == 0
	}
	return elements
}

As expected, the results are slightly better:

BenchmarkTask-8   	 1885698	       635 ns/op	    1024 B/op	       1 allocs/op

Bottom line is, allocations or not, if you don’t benchmark and you don’t compare the results before and after your changes, you’ll be blind to the impact your changes are bringing to the table. Creating benchmarks early in the development lifecycle of your projects is a good habit to pick up, and it’d be silly to not make use of it since Go makes the process of benchmarking this simple.