Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite record updates to record constructors #3773

Merged
merged 14 commits into from
Nov 19, 2024

Conversation

joshi-monster
Copy link
Contributor

@joshi-monster joshi-monster commented Nov 1, 2024

Vastly improve performance for record updates on the JavaScript target, making them basically as fast as constructing a new record. The trade-off here is that every custom type class definition that can be used with the record update syntax doubles in size.

Benchmarks:

Benchmark 1: gleam run -m record_updates
  Time (mean ± σ):     33.187 s ±  1.375 s    [User: 32.737 s, System: 0.169 s]
  Range (min … max):   31.209 s … 35.113 s    10 runs

Benchmark 2: ../target/debug/gleam run -m record_updates
  Time (mean ± σ):      5.235 s ±  0.127 s    [User: 5.189 s, System: 0.086 s]
  Range (min … max):    5.113 s …  5.495 s    10 runs
 
Benchmark 3: gleam run -m record_reconstruction
  Time (mean ± σ):      5.579 s ±  0.509 s    [User: 5.498 s, System: 0.092 s]
  Range (min … max):    5.021 s …  6.231 s    10 runs
 
Benchmark 4: gleam run -m recursive_function
  Time (mean ± σ):      4.166 s ±  0.402 s    [User: 4.103 s, System: 0.086 s]
  Range (min … max):    3.867 s …  5.052 s    10 runs

for these functions; every function is called 10000 times on a list containing random integers:

pub type Stats {
  Stats(min: Int, max: Int, avg: Int, count: Int, sum: Int)
}

fn record_updates(data: List(Int)) -> Stats {
  use stats, n <- list.fold(data, from: shared.empty_stats)
  Stats(
    ..stats,
    min: int.min(stats.min, n),
    max: int.max(stats.max, n),
    // NOTE: this is probably wrong, I just wanted something else to do
    avg: { stats.avg * stats.count + n } / { stats.count + 1 },
    count: stats.count + 1,
    sum: stats.sum + n,
  )
}

fn record_reconstruction(data: List(Int)) -> Stats {
  use stats, n <- list.fold(data, from: shared.empty_stats)
  Stats(
    min: int.min(stats.min, n),
    max: int.max(stats.max, n),
    avg: { stats.avg * stats.count + n } / { stats.count + 1 },
    count: stats.count + 1,
    sum: stats.sum + n,
  )
}

fn recursive_function(data: List(Int), min: Int, max: Int, avg: Int, count: Int, sum: Int) -> Stats {
  case data {
    [] -> Stats(min:, max:, avg:, count:, sum:)
    [n, ..data] ->
      recursive_function(
        data,
        int.min(min, n),
        int.max(max, n),
        { avg * count + n } / { count + 1 },
        count + 1,
        sum + n,
      )
  }
}

I noticed this in practice while implementing my string-width layout functions; those use a list.fold-like apparatus to loop through strings in a special way that would lead to lots of code duplication when inlined to recursive functions. limit for example has an internal state of 9 record fields, but only updates 2/3 each iteration. Removing the record updates improved performance on javascript by 70% in practice for this function, which I found was really surprising.

@joshi-monster joshi-monster changed the title Monomorphised withfields Javascript target: Monomorphised withFields Nov 1, 2024
@lpil
Copy link
Member

lpil commented Nov 2, 2024

Love this! Improving this area would benefit everyone.

Monomorphisation sounds like a good strategy to me. How about we go all the way and monomorphise it fully to a regular constructor call? That way there would be no runtime reflection at all, and the generate code size would possibly even go down.

If we implement it by converting an update to a call during analysis we could get this one for free also: #745

The main thing we'd need to do is to ensure we don't sacrifice good error messages.

@joshi-monster
Copy link
Contributor Author

joshi-monster commented Nov 2, 2024

Hi, thank you!! ☺️

I went for this change instead of emitting constructor calls at the call site for these reasons:

  • You can put arbitrary expressions as the extended record, which would mean we would have to introduce a new local variable/IIFE or make this optimisation only work for "simple" cases, where the expression is just a variable/field.
  • It was already somehow consistently faster than regular constructor calls, so I didn't want to overly complicate things
  • We would not be able to get rid of withFields entirely, since it's part of the FFI interface. Emitting custom implementations benefits FFI code as well.

I'm not sure fully replacing everything with contructor calls reduces code size; If we assume large enough records where only a few fields update every time (like e.g. a lustre component model), withFields should always be smaller, since it only needs to list all fields of the records once intead of at every record update.

Type-checking them as if they where record constructor calls would probably be a nice way to implement #745 .

@lpil
Copy link
Member

lpil commented Nov 2, 2024

You can put arbitrary expressions as the extended record, which would mean we would have to introduce a new local variable/IIFE or make this optimisation only work for "simple" cases, where the expression is just a variable/field.

There's no issue here, we frequently do this.

It was already somehow consistently faster than regular constructor calls, so I didn't want to overly complicate things

How can this be given the new function is a regular constructor call with some additional runtime reflection? We'd need to benchmark this further to identify the cause of the slowdown.

We would not be able to get rid of withFields entirely, since it's part of the FFI interface. Emitting custom implementations benefits FFI code as well.

We would not remove it, we would just not use it. It's a 3 line function so there's no real cost to having it.

@joshi-monster
Copy link
Contributor Author

joshi-monster commented Nov 5, 2024

Hi again!

I now change the AST during type inference to make the following transformation:

let updated = Wibble(..old_record(), a: 5)
// -->
let updated = {
  let _record = old_record()
  Wibble(a: 5, b: _record.b, c: _record.c)
}

This now also means that record updates are type-checked similarly to the after code-snippet above, fixing #745.

I still kept most of the old infer_record_update intact, and also kept a separate RecordUpdate AST note to make sure the compiler errors are not getting worse.

This also implicitly fixes the same bug as #3795, since I have to keep track of which fields I have to add from the base record. I haven't added that to the changelog though or copied that test, because Gears was faster than me 😄

Right now, this doesn't detect use of that feature. This turned out to be quite hard, since I would have to somehow check that the old way of typechecking would have still succeeded; basically I would have to check that my return_type unifies with the record_type from before. Doing so could potentially make the return type more specific than it would have to be though, if a type variable is not used in the variant being constructed.

Oh and also I think this is even faster and works for both targets now, I was just being too lazy in my benchmarks and did not eliminate the extra function call.

@joshi-monster
Copy link
Contributor Author

I also fixed some typos I noticed along the way, I hope that's ok :)

@joshi-monster joshi-monster marked this pull request as ready for review November 5, 2024 22:20
@joshi-monster joshi-monster changed the title Javascript target: Monomorphised withFields Rewrite record updates to record constructors Nov 5, 2024
@GearsDatapacks
Copy link
Member

Ah cool this is ready sooner than I thought. If this get merged, I'll close my PR

@lpil
Copy link
Member

lpil commented Nov 6, 2024

Right now, this doesn't detect use of that feature. This turned out to be quite hard, since I would have to somehow check that the old way of typechecking would have still succeeded; basically I would have to check that my return_type unifies with the record_type from before. Doing so could potentially make the return type more specific than it would have to be though, if a type variable is not used in the variant being constructed.

Let's leave it in this case. I don't want to sacrifice compilation performance for this.

Oh and also I think this is even faster and works for both targets now, I was just being too lazy in my benchmarks and did not eliminate the extra function call.

Do you have benchmarks? It would be great to be able to boast about this!

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm really excited for this one! I've left a few minor notes inline

compiler-core/src/type_/error.rs Outdated Show resolved Hide resolved
compiler-core/src/type_/expression.rs Outdated Show resolved Hide resolved
compiler-core/src/type_/expression.rs Outdated Show resolved Hide resolved
compiler-core/src/type_/expression.rs Show resolved Hide resolved
compiler-core/src/type_/expression.rs Outdated Show resolved Hide resolved
@joshi-monster
Copy link
Contributor Author

joshi-monster commented Nov 6, 2024

Do you have benchmarks? It would be great to be able to boast about this!

Basically record updates are now always as fast as constructing new records (which means they are very fast), no matter what. The extra IIFE/block does not make a measurable difference.

Javascript previously was absolutely terrible, and the benchmarks from my first post should give you a good idea. The runtime there seems to scale linearly with the number of fields defined in the record constructor. With the benchmark above, 10 fields take ~1 minute and 20 fields take ~2 minutes, even if the fields are just copied over. 20 fields with the new implementation take 6 seconds for me.

Erlang was less bad because of their optimistic mutation optimisation, but all those nested setelement calls where still not ideal, so Erlang is now also roughly 2 times faster. This surprisingly does not scale with the number of fields being updated:

Benchmark 1: gleam run -m record_updates
  Time (mean ± σ):      8.964 s ±  0.201 s    [User: 9.193 s, System: 0.109 s]
  Range (min … max):    8.698 s …  9.307 s    10 runs

Benchmark 2: ../target/debug/gleam run -m record_updates
  Time (mean ± σ):      4.301 s ±  0.253 s    [User: 4.403 s, System: 0.091 s]
  Range (min … max):    3.652 s …  4.569 s    10 runs
 
Benchmark 3: gleam run -m record_reconstruction
  Time (mean ± σ):      4.258 s ±  0.041 s    [User: 4.366 s, System: 0.092 s]
  Range (min … max):    4.195 s …  4.347 s    10 runs
 
Benchmark 4: gleam run -m recursive_function
  Time (mean ± σ):      2.414 s ±  0.103 s    [User: 2.520 s, System: 0.094 s]
  Range (min … max):    2.361 s …  2.705 s    10 runs

@lpil
Copy link
Member

lpil commented Nov 12, 2024

This surprisingly does not scale with the number of fields being updated

Wow! That's very surprising. I wonder why that is.

I will review and merge this once the release is done!

@joshi-monster
Copy link
Contributor Author

Ohh ok merge commits are not allowed, I just clicked through the Github UI because I had an ADHD moment at work ^^

I'll rebase again once 1.6 is released!

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thank you!!

@lpil lpil merged commit 9e2977f into gleam-lang:main Nov 19, 2024
12 checks passed
@joshi-monster joshi-monster deleted the monomorphised-withfields branch November 19, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants