1. 程式人生 > >Go slices are not dynamic arrays · Applied Go

Go slices are not dynamic arrays · Applied Go

Background: Just recently I observed a few discussions–again–about seemingly inconsistent behavior of slice operations. I take this as an opportunity to talk a bit about slice internals and the mechanics around slice operations, especially append() and bytes.Split().

Go’s slices

The concept of slices in Go is really a clever one. A slice represents a flexible-length array-like data type while providing full control over memory allocations.

This concept is not seen in other languages, and so people new to Go often consider the behavior of slice operations as quite confusing. (Believe me, it happened to me as well.) Looking at the inner workings of slices removes much (if not all) of the confusion, so let’s first have a look at the basics: What are slices, and how do they work?

A slice is just a view on an array

In Go, arrays have a fixed size. The size is even part of the definition of an array, so the two arrays [10]int and [20]int are not just two int arrays of different size but are in fact different types.

Slices add a dynamic layer on top of arrays. Creating a slice from an array neither allocates new memory nor copies anything. A slice is nothing but a “window” to some part of the array. Technically, a slice can be seen as a struct with a pointer to the array element where the slice starts, and two ints describing length and capacity.

This means that typical slice manipulations are cheap. Creating a slice, expanding it (as far as the available capacity permits), moving it back and forth on the underlying array–all that requires nothing more than changing the pointer value and/or one or both of the two int values. The data location does not change.

Fig.1: Slices are just “windows” to an array (click the buttons to see the operations)

This also means that two slices created from the same array can overlap, and after assigning a slice to a new slice variable, both variables now share the same memory cells. Changing one item in one of the slices also change the same item in the other slice. If you want to create a true copy of a slice, create a new slice and use the built-in function copy().

All of this is based on simple and consistent mechanisms. The problems arise when not being aware of these mechanisms.

Some slice functions work in place

Since slices are just efficient “dynamic windows” on static arrays, it does make sense that most slice manipulations also happen in place.

As an example, bytes.Split() takes a slice and a separator, splits the slice by the separator, and returns a slice of byte slices.

But: All the byte slices returned by Split() still point to the same underlying array as the original slice. This may come unexpected to many who know similar split functions from other languages that rely on allocate-and-copy semantics (at the expense of efficiency at runtime).

Fig. 2: bytes.Split() is an in-place operation

Code that ignores the fact that the result of Split() still points to the original data may cause data corruption in a way that neither the compiler nor the runtime can detect as being wrong.

Another unexpected behavior can happen when combining bytes.Split() and append() - but first, let’s have a look at append() alone.

append() adds convenience–and some “magic”

append() adds new elements to the end of a slice, thus expanding the slice. append() has two convenience features:

  • First, it can append to a nil slice, making it spring into existence in the moment of appending.
  • Second, if the remaining capacity is not sufficient for appending new values, append() automatically takes care of allocating a new array and copying the old content over.

Especially the second one can cause confusion, because after an append(), sometimes the original array has been changed, and sometimes a new array has been created, and the original one stays the same. If the original array was referenced by different parts of the code, one reference then may point to stale data.

Fig. 3: The two outcomes of append()

This behavior could be easily characterized as “random”, although the behavior is in fact quite deterministic. An observer who always knows the values of slice length, capacity, and the number of items to append can trivially determine whether append() needs to allocate a new array.

Prior to Go 1.10, append() can also create unexpected results in combination with bytes.Split(). The slices that bytes.Split() return in pre-1.10 have their cap() set to the end of the underlying array. Now when append()ing to the first of the returned slices, the slice grows within the same underlying array, overwriting subsequent slices.

Fig. 4: In Go < 1.10, after splitting (see fig. 2), append to the first returned slice

In Go 1.10 and later,bytes.Split() returns all slices with their capacity set to their own length,rather than that of the underlying array. append() thus is not able to overwrite subsequent slices anymore.

A few demos

The code below demonstrates the discussed Split() and append() scenarios. It also shows how to do achieve an “always copy” semantics when appending.