Skip to content

proposal: Go 2: function values as iterators #43557

Closed as not planned
Closed as not planned
@mknyszek

Description

Proposal: Function values as iterators

Motivation

A number of proposals have come up over the years of supporting some first-class notion of iterator such that loops that use the range keyword may be used to iterate over some custom data structure. The Go community in general has also wondered about the "right" way to describe an iterator, evidenced by this blog post from 2013 that describes many of the ways Go programmers abstract away iteration, and much of this is true today as well.

Overall, however, iteration over a non-builtin data structure may be somewhat error-prone and clunky. To refresh on the various ways that iteration may be abstracted away in Go, let's work through some examples.

Firstly, we may have some tree type with some dedicated iterator type:

type IntTree struct {
    ...
}

func (t *IntTree) Iter() IntTreeIterator {
    ...
}

type IntTreeIterator struct {
    ...
}

func (t IntTreeIterator) Next() (int, bool) {
    ...
}

One way such an "iterator" could be used is,

iter := tree.Iter()
for {
    i, ok := iter.Next()
    if !ok {
        break
    }
    // Do stuff with i.
}

or even,

iter := tree.Iter()
for i, ok := iter.Next(); ok; i, ok = iter.Next() {
    // Do stuff with i.
}

The former usage works fine, but suffers from readability issues. The meaning of the code is obscured somewhat as the reader first sees an apparently infinite for-loop and must look for the "break" statement to understand what's going. Luckily this condition is usually present at the top of the loop, but it requires a more careful look. Furthermore, the iteration condition needs to be written explicitly. Writing it once may not be a problem, but writing it 100 times might be.

The latter usage also works fine and the intent is more clear, but it has a similar problem with the iteration condition. There's also an element of repetition which on the surface is fine, but it does harm the readability of the loop. Especially with variable names like "i" it becomes easy to get lost in punctuation.

Another way to abstract iteration away is to pass a function value to a function that iterates on behalf of the caller. For example:

type IntTree struct {
    ...
}

func (t *IntTree) ForEach(func(i int)) {
    ...
}

tree.ForEach(func(i int) {
    // Do stuff with i.
})

This method works well in many scenarios, but is decidedly less flexible as it separates the loop body from the surrounding code. Capturing local variables in that function value helps, but potentially at the cost of some efficiency, depending on the complexity of the iteration. One advantage of this method, though, is that defer may be used to perform clean up on each loop iteration, without allocating a defer (thanks to open-coded defers).

Prior art

A previous proposal (#40605) suggested allowing types with a Next method to have that method repeatedly called when used in a range loop:

iter := tree.Iter()
for i := range iter {
    // Do stuff with i.
}

This works fine, but from my perspective, doesn't feel very Go-like. Having a language construct be aware of a type's methods for the sake of syntactic sugar is not a pattern found anywhere else in the language (yet). In the parlance of the generic design, the existing range keyword matches on the underlying type used, not the type's methods.

Furthermore, it usually requires defining a new type which is a bit more work for the writer of the code as well as the reader. Overall a bit clunky, but not bad. It lines up well with how other languages work. Rust, for instance, uses a trait (analogous to an interface) to determine whether a type is an iterator.

Another previous proposal (#21855) suggested supporting a two-clause for-loop to make iterating less error-prone, such as:

iter := tree.Iter()
for i, ok := iter.Next(); ok {
    // Do stuff with i.
}

Unfortunately, a two-clause loop form is itself error-prone, as the placement of a single semicolon has a significant effect on semantics (specifically, the second semicolon which distinguishes between the 2-clause and 3-clause forms). This proposal was rejected because that semicolon was considered too dangerous.

Other proposals (#24282) have been made to fundamentally change how for loops work, indicating at least some degree of friction.

Proposal

Rolling with the idea of closures, and with the observation that the range form matches on types largely structurally, I would like to propose that range loops repeatedly apply function values of a particular form.

More specifically, I propose allowing the for-range statement to accept values of type func() (T, bool) or func() (T, S, bool) (where T and S are placeholder types; they may be any substituted for any other type) and will repeatedly call these values until their bool result is false.

Iterators may then be written in the following way:

type IntTree struct {
    ...
}

func (t *IntTree) Iter() func() (int, bool) {
    ...
    curr := t.root
    return func() (int, bool) {
        // Use curr to keep track of your position, repeatedly find the successor of curr
        // until there is none.
    }
}

for i := range tree.Iter() {
    // Do stuff with i.
}

More precisely, the last for statement "de-sugars" into the following code, where tmp is a temporary variable not visible to the program:

tmp := tree.Iter()
for i, ok := tmp(); ok; i, ok = tmp()  {
    // Do stuff with i.
}

The limitation to variables of type func() (T, bool) or func() (T, S, bool) (instead of allowing an arbitrary number of return values) is to keep range loops looking familiar and to avoid a misuse of the syntax.

Discussion and observations

Pros:

  • Efficient: with appropriate inlining, the Go compiler can stack-allocate any values captured by the function value returned by Iter and perform a direct function call for each iteration (theoretically that could be inlined further, but it depends).
  • Shifts API complexity to the writer: users get all the benefits of the range form, but writers have a little more work to do to generate the function value.
  • Lightweight: allows the writer to avoid defining a new type in many cases.
  • Sets the standard: by picking this structure, it sanctions a particular way of writing iterators at the language level.
  • Composes well: with generics, iterators defined this way can be easily transformed (map, filter, zip, etc.).

Cons:

  • Hidden behavior: similar to operator overloading, there's now one more thing that range could mean, and each call could be arbitrarily expensive (e.g. a series of HTTP requests).
    • Counter-point: (credit to @mvdan) channels may cause a for-range loop to block for an arbitrarily long time, and copies created during iteration could also cause surprising CPU usage.
  • Outsized impact: a whole language change for something that is arguably only a minor issue.
  • Unfamiliar: users coming from other popular statically-typed languages might find this behavior surprising.
  • Ambiguity: the range keyword doesn't always make sense to the reader (it does for custom data structures and in some other cases, though).
  • Ambiguity: the type func() (T, bool) may not always mean "iterator." A Next method is more explicit.
    • Counter-point: the fact that it doesn't accept any arguments is a pretty good signal, though.

This idea was inspired by the way Python generators are used as iterators. I recently had the realization that Python generators can approximated by repeated application of closures. I thought that this would have been proposed already, but I couldn't find anything like it. Interestingly, this proposal also allows for iterating over infinite generators and such. It can also be used to write Python-like loops (for better or for worse):

// Linspace returns an iterator that iterates from start (inclusive)
// to stop (exclusive) by stride. start must be <= stop.
func Linspace(start, stop, stride int) func() (int, bool) {
    i := start
    return func() (int, bool) {
        i += stride
        return i, i < stop
    }
}

for i := range Linspace(0, 10, 2) {
    // i = 0, 2, 4, 6, 8
}

It might also be worth allowing the iterator's final return value to be an error instead of a bool, though I'm not totally sure how to surface that to the user.

I'm not even totally convinced myself that this is worth doing, since the benefit seems minor at best, but I figured I should at least put the idea out there.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions