proposal: Go 2: function values as iterators

# Proposal: Function values as iterators

## Motivation

A number of proposals have come up over the years of supporting some first-class notion of iterator such that loops that use the `range` keyword may be used to iterate over some custom data structure. The Go community in general has also wondered about the "right" way to describe an iterator, evidenced by [this blog post](https://ewencp.org/blog/golang-iterators/index.html) from 2013 that describes many of the ways Go programmers abstract away iteration, and much of this is true today as well.

Overall, however, iteration over a non-builtin data structure may be somewhat error-prone and clunky. To refresh on the various ways that iteration may be abstracted away in Go, let's work through some examples.

Firstly, we may have some tree type with some dedicated iterator type:

```
type IntTree struct {
    ...
}

func (t *IntTree) Iter() IntTreeIterator {
    ...
}

type IntTreeIterator struct {
    ...
}

func (t IntTreeIterator) Next() (int, bool) {
    ...
}
```

One way such an "iterator" could be used is,

```
iter := tree.Iter()
for {
    i, ok := iter.Next()
    if !ok {
        break
    }
    // Do stuff with i.
}
```

or even,

```
iter := tree.Iter()
for i, ok := iter.Next(); ok; i, ok = iter.Next() {
    // Do stuff with i.
}
```

The former usage works fine, but suffers from readability issues. The meaning of the code is obscured somewhat as the reader first sees an apparently infinite for-loop and must look for the "break" statement to understand what's going. Luckily this condition is usually present at the top of the loop, but it requires a more careful look. Furthermore, the iteration condition needs to be written explicitly. Writing it once may not be a problem, but writing it 100 times might be.

The latter usage also works fine and the intent is more clear, but it has a similar problem with the iteration condition. There's also an element of repetition which on the surface is fine, but it does harm the readability of the loop. Especially with variable names like "i" it becomes easy to get lost in punctuation.

Another way to abstract iteration away is to pass a function value to a function that iterates on behalf of the caller. For example:

```
type IntTree struct {
    ...
}

func (t *IntTree) ForEach(func(i int)) {
    ...
}

tree.ForEach(func(i int) {
    // Do stuff with i.
})
```

This method works well in many scenarios, but is decidedly less flexible as it separates the loop body from the surrounding code. Capturing local variables in that function value helps, but potentially at the cost of some efficiency, depending on the complexity of the iteration. One advantage of this method, though, is that `defer` may be used to perform clean up on each loop iteration, without allocating a defer (thanks to open-coded defers).

## Prior art

A previous proposal (#40605) suggested allowing types with a `Next` method to have that method repeatedly called when used in a range loop:

```
iter := tree.Iter()
for i := range iter {
    // Do stuff with i.
}
```

This works fine, but from my perspective, doesn't feel very Go-like. Having a language construct be aware of a type's methods for the sake of syntactic sugar is not a pattern found anywhere else in the language (yet). In the parlance of the generic design, the existing `range` keyword matches on the _underlying_ type used, not the type's methods.

Furthermore, it usually requires defining a new type which is a bit more work for the writer of the code as well as the reader. Overall a bit clunky, but not bad. It lines up well with how other languages work. Rust, for instance, uses a trait (analogous to an interface) to determine whether a type is an iterator.

Another previous proposal (#21855) suggested supporting a two-clause for-loop to make iterating less error-prone, such as:

```
iter := tree.Iter()
for i, ok := iter.Next(); ok {
    // Do stuff with i.
}
```

Unfortunately, a two-clause loop form is itself error-prone, as the placement of a single semicolon has a significant effect on semantics (specifically, the second semicolon which distinguishes between the 2-clause and 3-clause forms). This proposal was rejected because that semicolon was considered too dangerous.

Other proposals (#24282) have been made to fundamentally change how `for` loops work, indicating at least some degree of friction.

## Proposal

Rolling with the idea of closures, and with the observation that the range form matches on types largely _structurally_, I would like to propose that range loops repeatedly apply function values of a particular form.

More specifically, I propose allowing the for-range statement to accept values of type `func() (T, bool)` or `func() (T, S, bool)` (where `T` and `S` are placeholder types; they may be any substituted for any other type) and will repeatedly call these values until their `bool` result is false.

Iterators may then be written in the following way:

```
type IntTree struct {
    ...
}

func (t *IntTree) Iter() func() (int, bool) {
    ...
    curr := t.root
    return func() (int, bool) {
        // Use curr to keep track of your position, repeatedly find the successor of curr
        // until there is none.
    }
}

for i := range tree.Iter() {
    // Do stuff with i.
}
```

More precisely, the last `for` statement "de-sugars" into the following code, where `tmp` is a temporary variable not visible to the program:

```
tmp := tree.Iter()
for i, ok := tmp(); ok; i, ok = tmp()  {
    // Do stuff with i.
}
```

The limitation to variables of type `func() (T, bool)` or `func() (T, S, bool)` (instead of allowing an arbitrary number of return values) is to keep range loops looking familiar and to avoid a misuse of the syntax.

## Discussion and observations

Pros:
- Efficient: with appropriate inlining, the Go compiler can stack-allocate any values captured by the function value returned by `Iter` and perform a direct function call for each iteration (theoretically that could be inlined further, but it depends).
- Shifts API complexity to the writer: users get all the benefits of the range form, but writers have a little more work to do to generate the function value.
- Lightweight: allows the writer to avoid defining a new type in many cases.
- Sets the standard: by picking this structure, it sanctions a particular way of writing iterators at the language level.
- Composes well: with generics, iterators defined this way can be easily transformed (map, filter, zip, etc.).

Cons:
- Hidden behavior: similar to operator overloading, there's now one more thing that `range` could mean, and each call could be arbitrarily expensive (e.g. a series of HTTP requests).
    - Counter-point: (credit to @mvdan) channels may cause a for-range loop to block for an arbitrarily long time, and copies created during iteration could also cause surprising CPU usage.
- Outsized impact: a whole language change for something that is arguably only a minor issue.
- Unfamiliar: users coming from other popular statically-typed languages might find this behavior surprising.
- Ambiguity: the `range` keyword doesn't always make sense to the reader (it does for custom data structures and in some other cases, though).
- Ambiguity: the type `func() (T, bool)` may not always mean "iterator." A `Next` method is more explicit.
    - Counter-point: the fact that it doesn't accept any arguments is a pretty good signal, though.

This idea was inspired by the way Python generators are used as iterators. I recently had the realization that Python generators can approximated by repeated application of closures. I thought that this would have been proposed already, but I couldn't find anything like it. Interestingly, this proposal also allows for iterating over infinite generators and such. It can also be used to write Python-like loops (for better or for worse):

```
// Linspace returns an iterator that iterates from start (inclusive)
// to stop (exclusive) by stride. start must be <= stop.
func Linspace(start, stop, stride int) func() (int, bool) {
    i := start
    return func() (int, bool) {
        i += stride
        return i, i < stop
    }
}

for i := range Linspace(0, 10, 2) {
    // i = 0, 2, 4, 6, 8
}
```

It might also be worth allowing the iterator's final return value to be an `error` instead of a `bool`, though I'm not totally sure how to surface that to the user.

I'm not even totally convinced myself that this is worth doing, since the benefit seems minor at best, but I figured I should at least put the idea out there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: Go 2: function values as iterators #43557

Proposal: Function values as iterators

Motivation

Prior art

Proposal

Discussion and observations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: Go 2: function values as iterators #43557

Description

Proposal: Function values as iterators

Motivation

Prior art

Proposal

Discussion and observations

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions