BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations The State of Serverless Computing

The State of Serverless Computing

Bookmarks
  • 38:24

    Summary

    Chenggang Wu briefly discusses both the benefits and shortcomings of existing serverless offerings. He then projects forward to the future and highlights challenges that must be overcome to realize truly general-purpose serverless computing.

    Bio

    Chenggang Wu is a Ph.D. student at UC Berkeley. His research interest lies in distributed systems, specifically coordination-free architectures, distributed consistency models, and serverless infrastructure.

    About the conference

    Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

    Transcript

    Wu: My name is Chenggang [Wu], and I'm a grad student at UC Berkeley. Serverless computing is something a bunch of folks at Berkeley have gotten really excited about these days. One of the reason why is, now everyone is starting to learn to program. This is our introductory CS class at UC Berkeley last fall. There are about 2,000 students in this classroom. They couldn't really fit into any lecture hall on campus, so we have to actually put them into this large concert hall. You can see from the picture that this concert hall is almost entirely full except for that very top deck over there. In addition to these 2,000 people each semester, there are about another 1,000 students who are learning a new data science curriculum at Berkeley.

    They're doing something like simple data processing, maybe running some machine learning models in Python, usually with the help of Jupyter Notebooks. They're running on some sort of hosted infrastructures. Most of these people aren't computer scientists, they are biologist, economists, and even some lawyers these days. They're just using some simple programming to try to make their lives easier. Importantly, they don't have a traditional software engineering or computer science background like we do, so they definitely don't know about stuff like Kubernetes or Dockers or anything.

    A really interesting question that we've been thinking about is how are we going to make these folks more productive. Especially if they're not running on powerful laptops that we're used to, if they're just running on their Chromebooks, if they're just running on some hosted infrastructure.

    A key question is where their code is going to run. The answer is, obviously in the cloud. Otherwise, I wouldn't be here giving this talk.

    What I'm going to do today is to first talk about what we have, so what is serverless? Why is it so cool? What can we do with it? Then I'm going to talk about some of the gaps, so where does existing serverless offerings fail to meet the requirement of certain applications that we care about? Then I'll talk about some of the new works that we've been doing at Berkeley to try to fill in these gaps, and finally, talk about our vision for the future of cloud programming.

    Background: Serverless Computing

    A brief background on serverless computing. There's been a bunch of definitions that's floating around there, but to us, we roughly define serverless computing as a programming abstraction that allows a user to upload some programs, run them at any scale, and pay only for the resource being used.

    With this definition, I'm basically referring to FaaS, so Functions as a Service. There's a lot of really popular offerings. I'm sure you all know AWS Lambda, for example, and all other major cloud providers have similar offerings. Google has Google Cloud Functions, Microsoft has Azure Functions and so on. There are some open-source ones as well, like OpenWhisk, OpenFaaS, and stuff like that.

    At a high level, all of these services are optimized for simplicity. The point is that you go there, you can register some functions, you can even enable some triggers. Here in the picture, you can see some example triggers into the AWS ecosystem. For example, you can have some API gateway calls, some Alexa skills, or even uploading a new key to S3. These can all act as triggers to enable your function execution.

    You scale transparently, which means if you run one request, you pay only for that request, and if you run a million times that, you pay a million times that cost. Then more importantly, if you run zero requests, you pay nothing.

    There's been a bunch of academic interest in serverless. We've seen a lot of papers being published. We've written some of the paper ourselves at Berkeley. Sometimes we get feedback like this of Hacker News saying, "Oh look, a bunch of academics telling developers how to do their jobs, again." I just want to show you that there's not just academic interest, but also some industrial interest as well.

    We're in Silicon Valley, there's been a bunch of cool startups there who are working on making serverless more usable and helping folks deploy serverless applications. Maybe even more excitingly, these are just a subset of the companies that are using AWS Lambda for a variety of applications that they're interested in. These are just a subset and not including companies like Google or Microsoft. It's quite amazing.

    What Is FaaS Good at Today?

    Having seen all this excitement, let's talk about what is FaaS good at today. The first family of work is embarrassingly parallel tasks. Think about stuff like image and video processing and also, ETL. ETL link data into a storage system. Basically, any tasks that doesn't require coordination, where the requests are usually item potent and the operators are purely functional, where the output purely depends on what data you put into that function.

    The other family of workload, on the right-hand side, is workflow orchestration. This is an example of workflow from the Autodesk use cases from the Amazon AWS use cases example. There is something like 24 Lambda function invocations was 12 API gateway calls, 8 database accesses, and 7 SNS notifications. The role of Lambda here is to coordinate these tasks. You want to access some database, you want to do some integrity constraint checking, and you want to be able to send the right email to the right user and stuff like that.

    According to their website, using Lambda actually reduced their user account signup workflow from taking two weeks down to just 10 minutes. I'm not sure like why it takes two weeks in the first place, but obviously, FaaS is pretty good at these types of workload.

    The issue is that if you go beyond this embarrassingly parallel workload or workflow orchestration stuff, then you're going to hit into some walls pretty quickly, and I'm going to talk about them more in detail soon. In case any of you are from major cloud providers, this is some feedback that we got pretty often when we talk about serverless. We work pretty closely with folks from Google, with folks from AWS, and whenever we talked about serverless, they raise their hand saying, "So I've built this serverless service, and it's called something like Google Cloud Dataflow, or AWS Athena, or Snowflake and so on." I do want to acknowledge that these are all really awesome services. For example, they autoscale pretty well, they take into account data locality, and they do some efficient data movements, all of these good properties that we want, but what we care about is more around generality. We want to be able to enable this general-purpose computing in a serverless mode with all of these nice guarantees that we have.

    s Going back to FaaS, what can't you do on FaaS today? I have here a common list of limitations that we hear folks talk about. The first one is this limited execution lifetime. In Lambda, functions can be 15 minutes long, and in some other cloud providers, it's something like 9 or 10 minutes, so there's this limitation. The second one is no inbound network connection. We can't just open up a port and start sending and receiving messages like what you're used to do when you're deploying a web server, for example.

    Then IO is a bottleneck. We all know that S3, for example, gives you pretty great bandwidth, but if you're using Lambda, and you want to take advantage of S3's bandwidth, you have to spin up a massive number of Lambda clients, actually tens of thousands of client in order to completely saturate this bandwidth. The problem is, even if you're able to get that bandwidth, latency is usually going to be another issue because if you're doing any sort of data-intensive workload, then the latency from going from Lambda to S3 is usually prohibitive for applications that do require some ultra-low latency.

    Then finally, there is this no specialized hardware support. If you're doing stuff like computationally intensive training on neural net, or making some prediction using a machine learning models, and you want stuff like GPUs or FPGAs, you're not going to get them currently.

    In this talk, I'm going to focus on the middle two. The first one, the limited execution lifetime, has been consistently improving over time. When Lambda first launched, it was a one-minute restriction, and then they increased it to five minutes. During last year's reinvent, they further increased it to 15 minutes. We think it's really a configuration thing.

    The last one, no specialized hardware, we also believe that the cloud providers are probably soon going to address this. We still have this middle two limitations, so no inbound network connections, and IO is a bottleneck.

    Isn't it just fine? Because everything is functional programming. We can all write Haskell-like programs, it's called AWS Lambda, anyway so you know it must be functional. Functional programming doesn't have any side effects and it’s completely stateless. It's not a big deal. The problem is it's not actually the case. As much as the inventor of the Haskell language, Haskell Curry and Simon Peyton Jones might want it to be true, this is not how real applications are built today. I'll talk more about that in a second, but what it really means is that if you're trying to build applications of FaaS infrastructure today, you're not getting Function as a service, but you're actually getting something like dysfunction as a service.

    The problem is because real applications do share state and they share state in a variety of ways, what it looks like function composition. Say, I have some function G of X that executes and the output has to go as an argument to a second function F. Today, this function chaining phase has to go through a really slow far away storage system like S3. Otherwise, you'll have to play some tax to get it to pass state transparently, which is kind of tough.

    The second one, another way in which applications share state is via message passing. Say, you work in distributed systems, and you want to build some consensus protocols like Paxos, or you want to build some real-time streaming applications, and you want to play some distributed aggregation techniques to do some analytics. Both of these workloads require low latency message passing. If you don't have inbound network connections, which means that direct messaging is disabled, then you're going to have a hard time building these applications.

    Then finally, the third one is modifying shared mutable state. Say, you have a database, and you have a bunch of distributed compute agents that want to simultaneously access database, read or write data. Then you have to worry about things like what are the consistency models provided by that database and how do you get performance out of that? All of these things may seem pretty easy, but it turns out that if you try to do them on FaaS infrastructure today, it's all pretty difficult. At the end of the day, FaaS is poorly suited to all of these tasks.

    Quantifying the Pain of Faas

    Now, I'm going to dive a little bit deeper to quantify the pain points of running applications on FaaS today. As we go, it turns out that we're going to really disappoint some famous computer scientists.

    Imagine you have a simple sequence of operations, something like looking up user information, looking up their Netflix history, and making some predictions of what they'll want to watch next. It's a really simple sequence of purely functional operators and we want to see how this will perform on Lambda. To isolate this overhead, we pretended that these functions were instantaneous by doing some really simple math and just running a two-function sequence. The problem is as we deploy these functions, we found that there is no natural way in Lambda to chain them together. As a first-class solution, what we did is to write the result of the first function to a storage system like DynamoDB or S3, and the second function will basically read in this result and does the computation and returns to the user.

    As you can imagine, it turns out to be really slow. From this picture, you see both Lambda plus S3 and Lambda plus DynamoDB have a latency of up to 200 milliseconds. The reason apparently behind this high latency is because we're going through this slow storage when chaining functions. The next hack we tried is to basically hard code function composition into the program itself, which means that at the end of the first function, we just manually add a Lambda function invocation call to the second function.

    By the way, this is not how you're supposed to be using Lambda, because while the second function is executing, the first function is still hanging there waiting for the result, and Amazon will still be charging you for that, so that's very cost inefficient. Nevertheless, it removes this latency penalty of going through storage.

    As expected, the latency got much better, but actually still a little bit slow, around 100 millisecond. Finally, we tried this AWS Step Function, which is an orchestration service specifically designed for constructing bags of operators and running them on Lambda. With the hope that this will incur lower latency than the previous two implementations - but as you can see from this figure, it's actually even slower, it's somewhere around 500 millisecond. By the way, I just want to let you know that if you execute the same workload locally on your laptop, that's actually four orders of magnitude faster than using any of these services. Of course, I do want to acknowledge that you have a more data or compute-intensive workload. Then, these Lambda overhead will be masked quite a bit, but it introduced some other issues like data locality and data movement overhead. To summarize, even if you're writing pure functional programs, you're still going to pay a pretty significant performance penalty. These two folks here are still going to be pretty disappointed.

    To make things a little bit more interesting, as I mentioned a few slides ago, real-world applications are not purely functional. They do have state and most of them have shared mutable state. I'm sure many of you have built web apps that's run on multiple servers, and they have shared access to a database. In this case, the stuff being stored inside that database is shared mutable state.

    Shared Mutable Storage

    As I mentioned earlier, current FaaS offerings are stateless. In order for these applications to work, they have to manage state using a separate storage engine. These days, every cloud provider has like 10 different storage offerings and each of them offer a different trade-offs on performance, consistency, and the ability to autoscale.

    We expected that we will be able to find at least one service that fits the serverless ideal of being both autoscaling and low latency. The problem is as we play around with these services, we found that there just seems to be attention between autoscaling and low latency. For example, autoscaling storage systems like S3 or DynamoDB imposes a latency penalty of up to tens of milliseconds, even for very small reason writes. On the other hand, low latency systems like the hosted version of Redis and Memcached are not very elastic because they require manual provisioning to scale the system up and down. Of course, one can always provision for the peak load but that's very anti-serverless and costs you a lot of money. Basically, for our purpose, we ignored any system that wasn't autoscaling because that will become either a performance bottleneck or it will incur a lot of costs.

    (In)Consistency Guarantees

    In addition, today's autoscaling systems tend to offer pretty poor or restricted consistency guarantee. For example, at a first glance, maintaining something like a Shared Counter should be a pretty trivial task. For those of us who work in distributed systems, we know that getting it right is sometimes, could be pretty hard. If I store a Shared Counter in S3, for example, and multiple people are trying to simultaneously read the counter and do some local increment operation and write it back, then certain updates will be overwritten by the others, which introduce inconsistency.

    Jim Gray, who is the father of modern databases and also a Turing Award winner, will be pretty unhappy with this consistency guarantee. By the way, I do want to say that there does exist services today that are both autoscaling and start to offer stronger guarantees. For example, DynamoDB these days have this transaction mode, which gives you stronger isolation levels, but at $1 and latency penalty. To summarize, there doesn't exist a cloud storage offering today that's simultaneously address the issue of offering high-performance, being richly consistent, and at the same time autoscale.

    No Inbound Network Connections

    Finally, our current FaaS offering does not allow functions to talk to each other. I was just complaining about not having inbound network connections, but it turns out that it does have some benefit that I want to acknowledge. For example, it enables cloud providers to do this process migration for load balancing and also enables easy automatic fault tolerance, so these are the nice properties.

    The problem is, if we want to build real distributed applications on top of FaaS, sometimes these functions do need to talk to each other. Since direct communication is disabled, they again have to go through the storage system. One function will write the message to a well-known location in the database, and the other function can periodically pause that location to retrieve that message.

    As I mentioned before, this is just extremely slow, and real-world high-performance distributed applications sometimes just cannot tolerate such high latency. With the current FaaS offering plus slow storage, we're basically throwing away the entire distributed computing textbook. The other Turing Award-winning computer scientist, Leslie Lamport, will again be pretty sad about that.

    Today, I'm here not to just complain about FaaS, but as Mr. Chekov might tell you, we have an idea of how to fix that. By now, hopefully, I've convinced you that there is still a long way to go for people to run stateful serverless application cloud. This is actually relevant to both non-CS expert, as well as expert developers because recall that it even requires some hack to get a simple composition of two functions running on Lambda. If you are an expert developer building real applications, then you have to worry about stuff like the trade-off between performance, autoscaling, latency, and then consistency.

    A Platform for Stateful Serverless Computing

    In the spirit of being constructive rather than just complaining stuff, we've been trying to tackle some of these challenges and make serverless computing more general purpose. The direction that we've been headed in is to make state management easier in a serverless context. As the name suggests, instead of pure functional programming, we also want to embrace state.

    Embracing state is a pretty challenging task, because a successful state management or storage system needs to simultaneously address the issue of high-performance, consistency, and autoscale. Fortunately, we worked exactly on these topics during our PhDs. In the last couple of years, we built a distributed key-value store called Anna that's performing up to 10 times faster than Redis is in certain workloads, and it also quickly adapts to our workload spikes and troughs in a cost-efficient manner. Due to time constraint, I won't be able to go into the technical detail behind Anna, but if you're interested, please take a look at these two papers listed below, or come talk to me afterwards. The overall message here is that we already have a storage system that can act as a backend for supporting stateful serverless applications.

    We've been building a system called Fluent right now, which is a fast layer on top of Anna, but obviously this name Fluent collides with the famous Fluentd project so we're trying to come up with a better name here, but for now, we'll just stick with Fluent. Our design goal is to first keep all the goodies about the current FaaS offering, especially the desegregation of compute and storage, which means that these two tiers can scale independently.

    FaaS systems like AWS Lambda pioneered this disaggregated architecture, and I want to mention that it's beneficial for a couple of reasons. First of all, from the user's perspective, it reduces costs and enables simple, independent scaling of both tiers, which are great. Also, from the cloud provider's perspective, it actually enables this aggressive bin packing of compute and storage resources, which leads to higher realization, which is one of the holy grails of cloud computing.

    In addition to all of this, we want to solve all the limitations that we listed previously in order to support stateful serverless application. The key to achieve that is to use Anna for both storage and communication.

    Logical Disaggregation with Physical Colocation

    Simply replacing slow storage like S3 with Anna is obviously not sufficient because although Anna is still orders of magnitude faster than S3, in order for the fast layer to talk to Anna, it still has to cross this network boundary, and crossing this network boundary incurs some overhead at which is intolerable for some applications that do require ultra-low latency.

    Our key insight to solve this problem is that logical desegregation of computer storage should not preclude physical co-location. What it means is that, logically, people can still treat this compute and storage as two completely separate tiers that scale independently. In practice, as we implement these two tiers, we can play some techniques to co-locate data close to the compute node, actually, ideally co-locating them into the same machine in order to completely remove this network latency.

    As you can imagine, the simplest way to achieve this is via caching. 40 years of computer science already taught us that caching will definitely make things faster. Then communication and function composition can then be achieved via a FaaS pass on top of the Anna key-value stores puts and gets. In case there is some machine failures or network partitioning, we still allow communication to go through the Anna key-value store with the potential cost of paying that network roundtrip overhead. The problem is just adding caching doesn't solve all of our problems because we now have shared mutual state both in the storage tier as well as in the compute tier. There's this new challenge of how do we maintain consistency across these different storage locations. As it turns out, this is a pretty hard technical problem, and also we know sidestepping this consistency issue is a bad idea.

    By years of research, by using some of Anna's consistency mechanisms, we're able to provide a bunch of different consistency models by encapsulating the program state into what we call lattice data structure. What is a lattice? You can think of lattice as a data structure that wraps an element and update that element in a way that is associative, communitive, and item potent, so we call them ACI property. Here is an example of a set lattice, which is one of the simplest lattice you can imagine, whose merge function is a set union operator. We all know that set union satisfies this ACI property. Given a set of different updates, regardless of the order in which they arrive at the set lattice, its end state is always going to converge the same value. Because of this property, lattice can help achieve eventual convergence of replicas, which in turn guarantees eventual consistency.

    The problem is just having eventual consistency sometimes is not enough because the simplest eventual consistency is this [Inaudible], but under this guarantee, from the applications perspective, writes can arrive out of order and somebody may still overwrite others write, and the only guarantee you get is that, eventually, they will converge to some state. It's a pretty weak guarantee.

    It turns out the supporting even stronger consistency model is actually quite challenging. After years of research by composing the simple lattices together, we found that lattice composition plus some smart protocols allows us to actually implement a variety of stronger isolation levels, which includes a causal consistency, which is actually known to be the strongest consistency model that doesn't require synchronous coordination.

    Casual Consistency

    Causal consistency guarantees that causally related updates will be observed in the order that respect causality. What it means is that, for example, in the Twitter world, if Alice made a tweet and then Bob saw Alice's tweet and respond to that tweet, then under causal consistency, Bob's reply will be considered causally dependent on Alice's tweet. Then causal consistency will guarantee that Bob's reply will not be made visible to the others, unless Alice's tweets are also made available.

    Also, in addition to this guarantee, we can augment the causal consistency protocol a little bit such that it also guarantees the following property, which is repeatable read and atomic visibility. The former basically means during a client session, if a client reads an object A, later on reads the same object again. Then during this different read, although different people might be updating that object, that client is guaranteed to have read the same version of that object, so that what's provided by repeatable read guarantee.

    The second one, the atomic visibility means, within a client session, if Alice updates both objects A and B, then for other clients, they either observe, both see Alice's update effect to A and B, or none of the effect will be revealed, so that's atomic visibility. It reveals update as a unit. With these guarantees, it will be much easier for the applications to reason about what's going on with their app what their application is really doing.

    Function Composition, Revised

    Let's look at some performance numbers. With Fluent, let's revisit the performance of our function composition microbenchmark. As we can see, Fluent's native support for function chaining allows us to easily outperform Lambda by over two orders of magnitude.

    Similarly, for all of these other limitations that I listed, including data locality, autoscaling, direct messaging, and consistency stuff, we have relevant experiments showing that Fluent beats the state of the art by orders of magnitude, but rather than letting me going through these figures one by one, let me instead show you a real-world case study that we did for a machine learning prediction serving workload.

    Case Study: Prediction Serving

    The goal of prediction serving, a short primer, is to generate predictions from pre-trained machine learning models. These tasks are actually typically computationally intensive with a low latency requirement, so it's a good fit to FaaS. In practice, when we're doing prediction serving, instead of running one model, we typically run multiple replicas of that model.

    As shown in this figure, we first need to clean the input, and then we need to join the input with some reference data. Then, as I mentioned, instead of just running one model, we typically run multiple replicas of that model to boost the statistical accuracy of the prediction. Finally, we gather the result and normalize it and produce our final output. It's a chain. Because of the function composition chain, it should be a perfect fit to FaaS. As we can imagine, we run into the same performance issue that I just mentioned. Actually, these tasks also often require pretty heavy data movement, which is expensive in the current FaaS offering.

    How do people actually do this in production today? It turns out the folks at major cloud vendors are building special-purpose systems just for this type of workload. One example is AWS SageMaker, which is an end-to-end machine learning model life cycle management framework. A part of that is the ability for it to deploy this prediction serving pipeline, just like the one that I showed you. We wanted to see how Fluent will perform against this state of the art solution.

    We went ahead and implemented the above pipeline using the SqueezeNet model, which is a new image classification model developed at Berkeley. The goal of SqueezeNet is aimed to create a smaller neural net with fewer parameters with the hope that it can easily fit into computer memory. This is more oriented towards deploying at edge devices, so think about stuff like autonomous driving.

    We deployed this pipeline in Fluent then compared it to SageMaker. Guess what? We can actually easily outperform SageMaker by up to 3X without even trying to do any special kind of optimizations. The majority of performance wins comes from the fact that SageMaker doesn't support parallelism while executing this pipeline where Fluent does. When executing these multiple replicas, Fluent will basically detect that there's no internal dependency between these tasks, so we execute them in parallel, whereas SageMaker will run them in a serial fashion. That explains the performance gap.

    Also, I want to mention that this specific use case doesn't really exercise a lot about Fluent plus Anna's rich consistency model. We do have another case study in this Twitter example, so we implemented the retweets app that actually exercises are causal consistency model. If you're interested about that use case, please feel free to come talk to me afterwards.

    The Future of Cloud Programming

    Stepping back a little to the high-level picture, one of the important goals for our project is to think about how cloud programming is going to look like in the next 5, or 10, or 15 years.

    Looking back, we see that FaaS has innovated in terms of the guarantees that it makes. It makes it really easy for an average developer to get started, run some code in the cloud and get it to scale transparently. There are some key limitations function composition doesn't really work very well, so functional programming is slowed down. Also, communication is forced through slow storage, so you can't do any distributed message passing that's essential in building any distributed applications.

    Finally, there is this poor consistency guarantees. This one is particularly insidious because you might not think it's very hard, but once you start reading and writing from a database and start getting some random stuff, or not even being able to read what you just wrote, which can actually happen on some of today's cloud offerings, then it becomes really hard for you to reason about what your applications is really doing.

    What we've been doing is trying to make FaaS functional, not in the functional programming sense, but just making it work. We've been doing things like embracing state, which makes simple things a lot better, the function composition microbenchmark allows us to go significantly faster than before. Harder things like this machine learning prediction pipeline become a lot easier. We're definitely not claiming that we have full support for all the production-ready features that a system like SageMaker does. Just like building a prototype and being really nitpicky and thoughtful about what are the fundamental overhead we're introducing, we're actually being able to provide a performance that's significantly better than what exists today. We think this, in general, is a step on the road towards a more programmable cloud.

    The vision that we have is basically serverless will definitely change the way we think about writing software. More excitingly, as folks who do research in infrastructure, it's also going to change the way we think about how to build that infrastructure. In particular, thinking about how to program the cloud is very interesting. There's never been a way to really think about how do you program with millions of cores and petabytes of RAM. That's actually the scale of resources that's available in the cloud for us today.

    Most applications probably don't need that much resource, but for those applications that do, we really need to think hard about how do we transition from the programming model that we have today to the programming model that will allow us to do all of these crazy scaling things.

    Moving Forward from Faas

    Moving forward from FaaS, moving into what we think are the key enabling next steps to make serverless more general-purpose and usable obviously, we've talked about state management, and we think this is the crucial first enabling next step, but there are a bunch of other directions as well. The first one is to build developer tools.

    We know building distributed system is hard and debugging them is even more difficult. Now, you want to throw FaaS into the picture so you can no longer SSH into this machine and read logs and do all these crazy debugging tricks that we're used to do. Providing some form of observability and allow application developers to reason about what's going on between these function chains, and especially allows all of these people who are non-computer scientists and non-distributed systems expert to be thinking about how their applications fit together. What are the tracing tools that they need? What are the testing tools that they have? That's going to be a really important enabling next direction.

    Another one is this autoscaling policy. We think that having a general-purpose, autoscaling policy is maybe not the best idea. We're not even sure that with all of this wide variety of different applications, you are going to be able to build one. Today, if you make more requests, you can get more cores and more RAMs. That doesn't necessarily capture all the ways in which one might choose to autoscale. At Berkeley, there's been some cool research in our lab about designing some machine learning back to proactive prediction to predict where your low spike would be and allocate resources accordingly.

    This is especially useful if you have a production application that have performance deadlines and so on. In the FaaS infrastructure today, if you run into a resource limit, basically, you have to wait for another EC2 VM to spin up, which can take you like 30 seconds to a few minutes, which doesn't really fit into this consistent performance, which ties into my last point about designing SLOs and SLAs. This one is more oriented towards software engineering folks rather than non-computer scientists. I think a key blocker that's preventing FaaS from widely adopted is because it's hard to get consistent performance out of this infrastructure. Think about a workload where you and I are both using FaaS and my code takes like 50 milliseconds to run, but yours takes 50 seconds or even 50 minutes to run. Then it becomes pretty hard to reason about what your SLO versus my SLO should be in this multi-tenanted environment. What is the cost model? Think carefully about what the agreements are, what are the contracts between the cloud provider and the user, and how we can get together and start designing some of these performance guarantees is a really interesting and important next step to make serverless truly useful.

    We've been working on this open-source project called Fluent. We actually submitted another new paper a few months ago, but it's actually a double-blind conference, so we couldn't share it publicly. If you're interested in what we're doing, I'm definitely happy to chat more about it, so please come reach out to us afterwards.

     

    See more presentations with transcripts

    Recorded at:

    Aug 19, 2019

    BT