ICFP 2014 Notes, Day 1
Sep. 21st, 2014 07:05 pm
tim
During this year's ICFP, I took probably more notes than I've taken at any other conference I've gone to. Now some of my notes were silly ideas or random to-do items that would have distracted me if I didn't write them down, but a lot of them were actually about the talks I was listening to, surprisingly enough!
In the interest of posterity, as well as justifying to my employer why it was a good idea for them to pay $3000 for me to go to an academic conference when I'm not a researcher, I'm going to try to summarize those notes here. What follows is my attempt to turn my notes on the first day (September 1) into something half coherent. I'll do a separate post for each day. I will try to link to the video from each talk.
The first talk of the day that I caught was Daniel Schoepe talking about SELinq. Yes, this is LINQ as in LINQ. He (and his coauthors Daniel Hedin and Andrei Sabelfeld) wanted to build on top of what LINQ already does -- making database queries typed -- to annotate types with "public" or "private". This means, probably, exactly what you'd think it would mean in an everyday sort of database application: for example, they applied their work to an example from the PostgreSQL tutorial site and showed that they could implement a rule that in a database of people, demographic info, and addresses, each person's exact address is private -- so queries can get aggregate data about what regions people live in, but a query that tries to look up an individual's street address would be ill-typed. That's really cool!
Schoepe et al.'s work is based on FlowCaml, which is an information flow analysis for OCaml. Crucially, the work relies on embedding the database query language in the underlying programming language, so you can piggyback on the host language's type system. That's cute! It also relies on baking operations like list append into the language, which is a little sad. (In my never-published -- and never-finished -- dissertation proposal, I wanted to explore using a combination of proofs and types to modularize the primitive operations of a language. I wonder if an approach like that would be useful here.)
They proved a soundness theorem that guarantees non-interference, and implemented a source-to-source compiler that generates F# code. Their type inference is conservative, which doesn't violate any intuitions: that is, it's always safe to treat public data as private. In future work, they want to apply it to front-end JS web apps, and in the question and answer session, Schoepe said it shouldn't be hard to generalize the work to arbitrary lattices (not just the one with "public" and "private").
After lunch, Paul Stansifer talked about binding-safe programming. The take-away line from Paul's talk was "Roses are olfactorily equivalent under α-conversion." (This might make more sense if you keep in mind that the name of the system he implemented is "Romeo".) Everybody knows that naming is one of the hardest problems in computer science; Paul presented a type system for naming. This pleases me, since I think nothing makes sense unless it's formulated as a type system. In his system, types track binding information: specifically, in a way that allows names to be shadowed safely. The type keeps track of which "direction" shadowing should go in. The running example was expanding let* (because this is in a Lisp-ish context) into lambdas.
The next talk I took notes on was by Lars Bergstrom, on higher-order inlining. (Paul and Lars are both former colleagues of mine from Mozilla, by the way, which is why I'm referring to them by their first names.) Lars presented an analysis that determines where it's safe to inline higher-order functions. This was interesting to me since I'd always thought of inlining higher-order functions as a very naughty thing to do; the first research papers I ever read, in undergrad, were about inlining, and I did my undergrad and master's thesis work on deforestation (which depends intimately on it), so this is a belief I've held for a long time! Lars showed that it's wrong, so that was cool. The central contribution of his work is an analysis that doesn't have to compare the entire lexical environment for equality. I would have to read the paper to explain why, but Lars said that his work provides principles that replace the bag of heuristics that was the previous state of the art, and for now, I believe him.
It's safe to inline a higher-order function if you know that its environment doesn't change dynamically: that is, that inlining won't change the meaning of the function's free variables. Lars' analysis answers the question "does any control flow path pass through a sub-graph of the control flow graph that constitutes a re-binding?" An interesting thing that Lars didn't say (but perhaps it's not as deep as I think it is) is that this touches on how closures and value capture are sort of a hidden side effect even in a pure, strict functional program. His conclusion was "You can use closures in benchmarks now!"
In the Q&A, I asked whether it was imaginable to prove that the analysis improves performance, or at least doesn't make it worse -- in other words, that the higher-order inlining optimization has a predictable effect on performance. This question has haunted me ever since I worked on type-based deforestation. Lars thought this would be very ambitious because inlining is often an enabler of other optimizations, rather than improving performance on its own; though it does inherently eliminate stack frame allocation and deallocation. He said that he and colleagues want to do a correctness (i.e. semantic preservation) proof, but haven't yet.
My last set of notes for the day was on Jennifer Hackett's talk "Worker/Wrapper Makes It Faster". As Lars said in answer to my question, Jennifer's work answers the question I asked, sort of: it's about proving that optimization really does improve performance. She has a proof strategy based on defining a "precongruence": a relation that's preserved by substitution; this is because the "natural" approach to proving performance improvement, induction based on execution steps, doesn't work for lazy languages. If a relation is a precongruence, intuitively, a local improvement always translates into a global one. I would have to read the paper before I said any more, but I thought this talk was cool because it contributes the first performance guarantee for an optimization for a lazy language.
Stay tuned for my notes on day 2, once I get around to editing them!
In the interest of posterity, as well as justifying to my employer why it was a good idea for them to pay $3000 for me to go to an academic conference when I'm not a researcher, I'm going to try to summarize those notes here. What follows is my attempt to turn my notes on the first day (September 1) into something half coherent. I'll do a separate post for each day. I will try to link to the video from each talk.
The first talk of the day that I caught was Daniel Schoepe talking about SELinq. Yes, this is LINQ as in LINQ. He (and his coauthors Daniel Hedin and Andrei Sabelfeld) wanted to build on top of what LINQ already does -- making database queries typed -- to annotate types with "public" or "private". This means, probably, exactly what you'd think it would mean in an everyday sort of database application: for example, they applied their work to an example from the PostgreSQL tutorial site and showed that they could implement a rule that in a database of people, demographic info, and addresses, each person's exact address is private -- so queries can get aggregate data about what regions people live in, but a query that tries to look up an individual's street address would be ill-typed. That's really cool!
Schoepe et al.'s work is based on FlowCaml, which is an information flow analysis for OCaml. Crucially, the work relies on embedding the database query language in the underlying programming language, so you can piggyback on the host language's type system. That's cute! It also relies on baking operations like list append into the language, which is a little sad. (In my never-published -- and never-finished -- dissertation proposal, I wanted to explore using a combination of proofs and types to modularize the primitive operations of a language. I wonder if an approach like that would be useful here.)
They proved a soundness theorem that guarantees non-interference, and implemented a source-to-source compiler that generates F# code. Their type inference is conservative, which doesn't violate any intuitions: that is, it's always safe to treat public data as private. In future work, they want to apply it to front-end JS web apps, and in the question and answer session, Schoepe said it shouldn't be hard to generalize the work to arbitrary lattices (not just the one with "public" and "private").
After lunch, Paul Stansifer talked about binding-safe programming. The take-away line from Paul's talk was "Roses are olfactorily equivalent under α-conversion." (This might make more sense if you keep in mind that the name of the system he implemented is "Romeo".) Everybody knows that naming is one of the hardest problems in computer science; Paul presented a type system for naming. This pleases me, since I think nothing makes sense unless it's formulated as a type system. In his system, types track binding information: specifically, in a way that allows names to be shadowed safely. The type keeps track of which "direction" shadowing should go in. The running example was expanding let* (because this is in a Lisp-ish context) into lambdas.
The next talk I took notes on was by Lars Bergstrom, on higher-order inlining. (Paul and Lars are both former colleagues of mine from Mozilla, by the way, which is why I'm referring to them by their first names.) Lars presented an analysis that determines where it's safe to inline higher-order functions. This was interesting to me since I'd always thought of inlining higher-order functions as a very naughty thing to do; the first research papers I ever read, in undergrad, were about inlining, and I did my undergrad and master's thesis work on deforestation (which depends intimately on it), so this is a belief I've held for a long time! Lars showed that it's wrong, so that was cool. The central contribution of his work is an analysis that doesn't have to compare the entire lexical environment for equality. I would have to read the paper to explain why, but Lars said that his work provides principles that replace the bag of heuristics that was the previous state of the art, and for now, I believe him.
It's safe to inline a higher-order function if you know that its environment doesn't change dynamically: that is, that inlining won't change the meaning of the function's free variables. Lars' analysis answers the question "does any control flow path pass through a sub-graph of the control flow graph that constitutes a re-binding?" An interesting thing that Lars didn't say (but perhaps it's not as deep as I think it is) is that this touches on how closures and value capture are sort of a hidden side effect even in a pure, strict functional program. His conclusion was "You can use closures in benchmarks now!"
In the Q&A, I asked whether it was imaginable to prove that the analysis improves performance, or at least doesn't make it worse -- in other words, that the higher-order inlining optimization has a predictable effect on performance. This question has haunted me ever since I worked on type-based deforestation. Lars thought this would be very ambitious because inlining is often an enabler of other optimizations, rather than improving performance on its own; though it does inherently eliminate stack frame allocation and deallocation. He said that he and colleagues want to do a correctness (i.e. semantic preservation) proof, but haven't yet.
My last set of notes for the day was on Jennifer Hackett's talk "Worker/Wrapper Makes It Faster". As Lars said in answer to my question, Jennifer's work answers the question I asked, sort of: it's about proving that optimization really does improve performance. She has a proof strategy based on defining a "precongruence": a relation that's preserved by substitution; this is because the "natural" approach to proving performance improvement, induction based on execution steps, doesn't work for lazy languages. If a relation is a precongruence, intuitively, a local improvement always translates into a global one. I would have to read the paper before I said any more, but I thought this talk was cool because it contributes the first performance guarantee for an optimization for a lazy language.
Stay tuned for my notes on day 2, once I get around to editing them!