-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve match ergonomics #1944
Improve match ergonomics #1944
Conversation
Prevent some of the boilerplate required for match expressions by auto- dereferencing the target expression, and 'auto-referencing' pattern variables. The rules for the latter are based closely on those for closures. For example, current: ``` match *x { Foo(ref x) => { ... } Bar(ref mut y, z) => { ... } } ``` proposed: ``` match x { Foo(x) => { ... } Bar(y, z) => { ... } } ```
Target auto-deref seems fine to me. I can imagine some potential surprises from it, but nothing worse than we currently get from method calls. Arm auto-ref, and especially arm auto-ref-mut, makes me very nervous. I think we need to consider the case of programs that didn't compile before, do compile now, and shouldn't. In particular, with the help of type inference, I've seen people analyzing programs on the Rust IRC channels (and I've written a few myself) where the types accidentally ended up with double-references and similar (e.g. I do agree that |
One other thought: it seems a bit inconsistent to apply this only to match, and not to if let Some(ref y) = x {
...
} |
is this bad? I've hit this myself, and it feels like a type error where I've written the wrong code and the compiler has caught it. That doesn't seem any worse than any other kind of type error.
Can you give an example of how?
This is a good point, but I don't think we are - like closures, we're introducing something that mostly 'just works'. Where we have to explain the mechanism we can insert the
The RFC proposes applying this to |
Consider this example:
This gives a type error. You have to do I don't think the RFC as written covers this case, which has not been uncommon in my experience. Could it possibly? Is there a reason we can't autoref as well as autoderef the target, in the same style as we handle methods today? Philosophically, it seems appropriate that the transformation on the target of a match should be the same as the transformation on the receiver of a method. |
I've helped debug people's programs on #rust and #rust-beginners, where in the natural course of type inference, people have ended up with double-references and vectors/slices of double-references, in programs that type-checked and compiled just fine. Then, when trying to make a change, they ended up with a type error, and suddenly wondered "how did I end up with This seems to happen especially often when dealing with iterators, and functions over iterators such as maps/filters/etc, which tend to introduce another level of reference. I worry that auto-ref and auto-deref will make it even easier to dig even further before realizing you have a weird type. (And then wonder "why didn't Rust (with its strong typing) catch this sooner?".) I wonder if we could introduce a lint of some kind for variables or expressions that unnecessarily have a type like |
I assume Anyway, I don't think there is a technical reason not to do auto-ref on match targets, however, I have not given it as much thought as auto-deref. My reason for avoiding it is to avoid implicitly borrowing owned data (i.e., we only convert one kind of borrowing to another). This is the philosophy behind which conversions we allow for 'deref' coercions, and it just 'felt' right to me here. I can totally see the argument for allowing auto-ref - it makes things more flexible, should work fine, and is analogous to the conversions we do with the dot operator - and I don't have a strong argument against it. It is however less conservative - it would be easier to add it later than to take it away. |
Thanks for the expanded explanation. Personally, I think such a lint and this proposal are fairly orthogonal, however, I see where you are coming from here and I agree it could be helpful (especially if it runs even if compilation fails). In fact, independent of this proposal, I think such a lint would be helpful in general - afaik there is never a reason to have |
duration of the match. It also does not work with nested types, `match (*e,) | ||
...` for example is not allowed. | ||
|
||
In either case if we further bind variables, we must ensure that we do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This of course is still a problem if you don't use derefs - you'll just get "use of move data" errors instead of "move out of borrow" errors.
|
||
Auto-deref allows the programmer to access borrowed data within a limited scope | ||
without having to explicitly dereference the data. This is analogous to the | ||
implicit dereferencing performed by the dot operator, however, the scope of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"scope of the dereference"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "the scope of dereference is inline, rather than defined by a function" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read it as "the scope of the dereference is the match expression".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dereference exists for the duration of the match expression c.f. the duration of a function call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for the dereference to exist? There is no temporaries that can "stop existing" in both function calls and match expressions. The resulting value lives as long as borrowck lets it live.
patterns. In order to be backwards compatible, we must be able to fall back to | ||
the type known from the patterns if type inference is stuck without that | ||
information. I think this will only require choosing `m = 0` if type inference | ||
can't make progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should act like deref coercions - if we have a match with k
derefs, proceed with it even if it "forces" some inference variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case I'm covering here is if we get stuck (if we can proceed by forcing inference, we should) and can't find m
/k
. Then we arbitrarily choose 0, and try to continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get stuck = encounter a type error for every k
? Then continuing is a matter of error recovery, right?
There are 4 situations:
- There is some
k
such that we can proceed usingk
autoderefs, possibly "forcing inference". In such case, choose the minimal suchk
and proceed. - For all
j < k
, we can autoderefj
times but can't proceed with the match, and thek
th autoderef is ambiguous - in that case, we generally report a "the type of this value must be known in this context" error and do something random for error recovery, through coercion forces 0 autoderefs. - For all
j < k
, we can autoderefj
times but can't proceed with the match, and thek
th autoderef can't be done. This is an error. To display the error to the user, we can e.g. show the type error you get from 0 autoderefs. - For all
j < RECURSION_LIMIT
, we can autoderefj
times but can't proceed with the match. Report a recursion limit error.
Because the current situation corresponds to 0 autoderefs, and the 0th autoderef always succeeds, cases (2), (3) and (4) would have caused an error pre-RFC, and so aren't backwards-compatibility issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of an example like
match from_str("foo").unwrap() {
Foo { ... } => { ... }
}
where the type of the target is completely unknown. We currently use the type of the pattern to infer the type of the target. Under the rules in the RFC we can't do that - we need the type of the target and pattern to decide on the number of derefs. In cases like this, we must pick zero derefs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what "forcing inference" means.
``` | ||
let mut i: i32 = 0; | ||
match i { | ||
j => j += 1, // `j` is treated as `ref mut j` since `+=` takes `j` as `&mut i32`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
silent binding mutability? This looks scary. I would prefer mut j
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a valid alternative, IMO. However, the back-compat question is more complex because if you write mut x
that would force x
to be by value in the current proposal. If we allow that to also be something like ref mut x
, then the user can't force a mutable value in any way - we'd need move
in this case, and we still might not be 100% back compat, we'd need to look into that.
More conservatively, we could just not to any auto-ref'ing for mutable variables - roughly, we would infer ref
, but never ref mut
. That is obviously less flexible, but avoids implicit mutable binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a valid alternative, IMO. However, the back-compat question is more complex because if you write mut x that would force x to be by value in the current proposal.
But the current proposal doesn't allow you to force an immutable move, no? Is it your intent that users would have to write mut x
when they want to move x, whether or not they need mutability? That would raise a warning and seems like the wrong thing to teach people to do.
If we want users to be able to force a by-value binding, we should add the move
keyword in patterns to do that. It seems separate from inferring mut x
to a ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it your intent that users would have to write mut x when they want to move x
No, my belief is that 'inference' of move/borrow should be perfect (c.f., closures) and so move
or forcing moving a single variable is not necessary. I agree that would be the wrong thing to do.
If we want users to be able to force a by-value binding, ... It seems separate from inferring mut x to a ref.
You might be right (I admit, I haven't thought this through as thoroughly as the actual proposal. My worry is that with mutability, there is another category of use that doesn't exist without it. Consider
match x {
mut y => { y.f = 0 }
}
Should this mutate x
or not (or, IOW should x
be moved)? There is no guide for the compiler whether y
should be borrowed or not (both would compile) and the semantics are clearly different. If we do adopt the mut
requirement and infer this as ref mut
we break back compat, as far as I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, x
is moved, so while the semantics are different, they're not observable in any code that compiles today, are they? I guess this is what you mean about needing to look into the backcompat issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do adopt the
mut
requirement and infer this asref mut
we break back compat, as far as I can see.
We break back compat even if the code compiles without the mut
. If we want to keep back-compat, we need to make mut
force by-move - though that would leave Cell
as a nasty edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comex If the data is Copy, it is never referenced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Closure inference does not do anything smart wrt. by-ref vs. by-value" I'm not sure what you mean by "smart" here. Closure inference clearly does some fairly smart stuff to know whether to borrow or move data. AIUI, you only need to mark a closure move
if it is returned by a function or otherwise escapes its stack frame, in that case the inference cannot be perfect and the programmer has to handle this stuff manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If we want to keep back-compat, we need to make mut force by-move" - right, this is what the RFC currently does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrc Oops, I missed that. Doesn't that turn implementing Copy
for a given type into a breaking change? Especially in the alternative where mut y
can be a mutable reference, but even as written in edge cases, e.g. a huge array that won't fit on the stack.
} | ||
``` | ||
|
||
To opt out of all implicit dereferencing in a pattern, we could allow the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move-mode would also be the default for let-statements (which are basically a one-armed match).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, though not if let
Implicit "mut" is a dangerous thing in my opinion. I think "mut" should be always explicit only. |
Allowing auto-ref without a matching auto-deref will change drop order (possibly introducing deadlock in programs that rely on it). Consider: #[derive(Debug)]
struct Dropper;
impl Drop for Dropper {
fn drop(&mut self) {
println!("dropping");
}
}
fn main() {
let thing = Some(Dropper);
match thing {
Some(thing) => { // Auto ref will turn this into a reference instead of a move.
println!("{}", thing);
// Currently drops here.
},
None => {},
}
println!("after");
} This currently prints "Dropper", "dropping", then "after". After this proposed change, it would print "Dropper", "after", "dropping". Also, please see my objections on the internals thread. |
Consider a pattern I'd suggest starting with a minimal RFC for auto-deref of target only, which is much less controversial. |
Example of a deadlock after @Stebalien's example: use std::sync::Mutex;
fn main() {
let x = Mutex::new(());
let lock = x.lock();
match lock {
Ok(_lock) => { } // deadlocks if we infer this to `ref _lock`
_ => ()
};
let _ = x.lock();
} |
Given the way drop order affects this, it seems to me like there are two possible avenues:
|
The fact that adding this sugar opens up so many thorny issues makes me wonder if it will really make anybody's life easier. Perhaps this makes code easier to write, but does it make it easier to read? Per the evaluation procedure in https://blog.rust-lang.org/2017/03/02/lang-ergonomics.html I'd think this worrisomely context-dependent. |
Could |
clippy does this for existing code already ( |
If we don't auto-ref the discriminant, this would be like deref coercions, which I feel are more similar because they are fundamentally both lvalue operations (matching an autoref-ed pattern by ref would create a ref to an "extraneous" temporary). |
Option 1 did not seem possible when I investigated it when working on the RFC - it breaks back compat and means we have different rules to closures (possibly some other issues too). Option 2 sounds good to me. I agree this feels like a real issue that can't be worked around, but also one that is unlikely to come up in practice. |
The biggest sticking point here seems to be the inference of
ISTM that unless we decide to schedule Rust 2.0, that option 1 is better here. |
How does it break backwards compatibility? |
Aren't @Stebalien's examples a problem when inferring even just |
I get your point, and agree that it's a learnability hazard. I still think it's better for there to be a consistent logic that needs a bit of effort to learn, than for there not to be a consistent logic you can learn with any amount of effort. And while right now the voices of people who find the current meaning confusing are being heard, I suspect if we were to flip it, we'd instead hear the voices of the other half (especially considering we taught them). And if it's going to defy the expectations of half the audience either way we go (which seems quite likely), then as with |
I think I agree with this, and that's why I think I prefer using keywords here when possible (and elide thise keywords as much as possible!) |
For the record, so far I still find all the active proposals fairly confusing, in that they introduce magic type translation rules that make it hard for me to form a clear mental model of the types involved. When I see I think the biggest problem is that automatic dereferencing goes from "this couldn't possibly make sense as written" to "here's a plausible interpretation", while automatic referencing goes from one reasonable interpretation (ownership) to another reasonable interpretation (referencing). (That's true even if the reasonable interpretation of ownership leads to a compile error; better to have a clear mental model that leads to an error than to have a fuzzier model that allows the code to compile but perhaps not mean what the user expected it to mean.) @glaebhoerl's point about wanting a consistent underlying logic echoes exactly the concerns I have.
All that said, I do think there might be a way to simplify match; I just don't think the current proposals are worth the cost in consistency and reasoning. |
Is this actually quantifiable? I'm much more comfortable with the current situation. And although I wouldn't be too surprised if I'm in the minority, I'm not sure it can be seen as certain. It seems hard to measure the number of people that have an easier time with the way it works now. |
Responding to an earlier comment from @nrc:
I agree, but I've also run into cases where it happens and it's not obvious how to avoid it. As an example:
That situation ("iterator over a collection of references") seems to happen fairly often, and I don't see an easy way to avoid momentarily ending up with such types (though you can then eliminate the extra reference). Where that then becomes a much more easily avoided error is if you then use When working with generic functions, type-inferred closures, and auto-deref, it's surprisingly easy for all of the above to happen without ever encountering a type error. And where that comes back to the topic of this RFC is that it's surprisingly common for the first point where you realize you've ended up with a |
@glaebhoerl I definitely agree that we should strive to make "deref patterns" as unnecessary as possible. We can put aside the question of how to write them for the time being, I suspect. =) |
Returning to @nagisa's point about not focusing only on match, I really think these rules would be valuable in every kind of pattern. Its not uncommon that I iter over a Vec<(_, _)> and find or filter by one of the tuple members: vec.iter().find(|(name, _)| name == some_name) Even today it usually takes me more than one attempt to get this right, because it is behind two references: vec.iter().find(|&&(name, _)| name == some_name) |
What about assignment |
@burdges There's no destructuring going on, so it doesn't impact those. However, if you were to do: struct Foo { arg: usize }
let Foo { arg } = &Foo { arg: 0 };
|
So, I think I'd like to see an RFC with my take on @Stebalien's proposal, and then we can weigh them side-by-side? @Stebalien, would you be interested in working with me on it? |
What about match &Some(foo()) {
x @ Some(y) => { ... },
_ => { ... }
} Is the type of EDIT: Context is I was looking at this code linked from @nikomatsakis's recent blog post. It seems like that choice ( |
This sounds good too. If I understand correctly, this would end up obviating things like Thoughts related to this:
|
Well, that "auto-deref" I was referring to was intended to be applied only at the top-level. So if you were matching a |
This is one of the reasons that I want to avoid general deref patterns for the moment. =) |
Thanks for articulating this perspective and concern, which I imagine many people share. Here are a couple of things that, in my mind, help mitigate the worry:
@nikomatsakis, I quite like your proposal as it stands, and the longer I sit with it, the more I prefer this general direction over the original RFC (as clever as it is). |
Oh, there was one downside I wanted to mention, which is the mild "inconsistency" with field projection. Compare: struct Foo { x: Bar, y: Bar }
let my_foo = Foo { x: ..., y: ... };
let Foo { x, y } = &my_foo; // here, x and y have type &Bar
&my_foo.x // this also has type &Bar, and there's a decent
// intuition that we're applying `&` to the whole path
// to say "please borrow this path"
let my_foo_ref = &my_foo;
let Foo { x, y } = my_foo_ref; // here, x and y have type &Bar
my_foo_ref.x // error
&my_foo_ref.x // here, we get &Bar, but we had to *ask* for
// the borrow, even though we started with a borrow;
// the match doesn't, though |
Your proposal is similar mine but a bit more DWIM. match & &mut Some(3) {
Some(x) if false => {
// `x` would be an `&i32` here, because we dereference the `&`, getting into
// ref mode, then the `&mut`, but we are already in `ref` mode, so we stay that way
}
_ => { }
} In my proposal, One thing I'm a bit fuzzy on is the reference table:
To me, "default" generally means implicit. However, unless I'm mistaken, patterns always bind by value at the moment so that can't be what you're talking about. Are you talking about the case where the programmer explicitly binds by
By example, match &Some(3i32) {
Some(ref x) => {
// Is `x` a `&i32` or an `&&i32`? In my proposal, it would have been the
// latter. If I understood your proposal correctly, it would be the
// former.
},
_ => {},
} Now, if I am reading this table correctly, I disagree with the third entry (marked by "??"; I think it should be a type error. That is: match &Some(3i32) {
Some(ref mut x) => {
// Type error because we can't mutably bind `x`. Based on my understanding
// of your current proposal, `x` would be an `&i32`.
},
_ => {},
} However, converse (the fifth entry) should be perfectly valid: match &mut Some(3i32) {
Some(ref x) => {
// `x` is an `&i32` (downgrade mutability).
},
_ => {},
} On the topic of not needing a special However, the word let mut y = Box::new(1i32);
let ref mut move x: &mut i32 = y;
*x = 2; However, as As for writing an RFC, I'm unfortunately a bit busy at the moment (getting ready to start a new job) so I really shouldn't take on any more responsibilities (I'm already falling behind on my current ones). However, I'd be happy to give feedback and kibitz (and let you do all the real work :)). |
It also took me a long time to grok what the table was getting at. What "default" means here is basically what implicit mode you're already inside. In particular, if you are taking apart a re: RFCs planning, so far it seems like those in favor of doing something along these lines are in favor of @Stebalien and @nikomatsakis's proposal in particular. @nrc, though, has been on parental leave, and obviously his opinion is very much desired. But assuming that he also agrees that these new proposals are an improvement, perhaps he can just update this RFC accordingly? |
I spoke with @nrc, and he seemed to be generally in favor of the new direction, but I guess he can speak for himself. Pretty sure he's buried under notifications at the moment. =) |
@Stebalien Hopefully @aturon's explanation helped. In any case, the reason I called it the default mode because the idea was that it can be overridden by an explicit mode declaration. (So, e.g., one could write That said, @nrc was pointing out that there isn't really a need to make it possible to declare "move" mode explicitly, since one could still use an explicit |
I think |
It seems quite counter-intuitive that I'd have to use reference-facilities to perform a move. I'd also consider it a lot more vague, given the goal is the ability to be explicit. Additionally I'm not sure how that would work outside of |
Closing in favour of #2005 |
Prevent some of the boilerplate required for match expressions by auto-
dereferencing the target expression, and 'auto-referencing' pattern variables.
The rules for the latter are based closely on those for captured variables in
closures.
For example, current:
proposed: