Google I/O 2009 -..Distributed Transaction Layer: App Engine

Just another WordPress site

Google I/O 2009 -..Distributed Transaction Layer: App Engine

Wilkerson: Okay I’m supposed to start talking now I’m Daniel Shawcross Wilkerson I’m here to speak to you about distributed transactions for Google App Engine Worked with several really amazing, first-rate people on this project, and what I can’t stand on my short list of things I really can’t stand about the media is this myth of the lone scientist, you know, working alone with his laboratory, the lone genius coming up with something amazing Most things worth doing get done by a team I would like the media, anyone watching this, to please start acknowledging that when you write articles Simon Goldsmith is a very good friend of mine from Berkeley He now works at Coverity A very smart guy and very humble Robert Johnson is a professor at Stony Brook Another friend of mine from Berkeley He helped us greatly simplify the locking protocol Erick Armbrust at Google is a really enthusiastic engineer, great guy, who found a bug Found an optimization And also did this little, minor thing He implemented it That’s a joke Ryan Barrett is right here Where is Ryan? Ryan, raise your hand, ’cause I can’t see you There he is Ryan Barrett is at Google on the App Engine team He is, I believe, in charge of the interface between App Engine and BigTable And just a very generous guy with his time when he knows someone’s trying to do something with App Engine A very humble guy A very friendly guy And Erick as well Great guys at Google Let’s see You’ll notice my corporate affiliation It’s unemployed Case anyone cares to help remedy that, let me know Afterward, Tony and I may do a startup, but if that collapses, then I’ll be talking to you guys So this is a preliminary report What does that mean? We thought we had this thing nailed down months ago, and then we just kept finding ways to improve it And, you know, you can’t resist those things Once you find one, you have to do it The only problem is once you prove it, you have to prove it correct again And as we’ll see, much of the challenge to this algorithm– It’s a very simple looking algorithm Don’t let that fool you Distributed algorithms and those involving multi-threading and distributed…together are very, very difficult to debug They’re basically impossible to debug You have to prove them correct So this isn’t so much the algorithm, but the algorithm plus proof of correctness Yesterday I was really checking it to make sure it was really right and didn’t have any extra parts and the proof was tight and the algorithm was tight, and I’m very convinced now But we didn’t want to release it to you till we, you know, all looked at it and made sure that was the case So we’ll be releasing this probably in the next month or so–the paper– and Erick has been following along with an implementation as we’ve been revising the paper, and we hope to get that out somewhere around the next month or so Sorry, Erick, for inventing the vaporware and signing you up for something without asking you Erick is away at a wedding and Simon’s in Europe and Rob’s in New York, but Ryan is here and I am here What is the fundamental concern that hits most engineers when they start writing software? There are many concerns, but two of the fundamental ones that engineers run into is correctness and performance Correctness Is the output what you really wanted? Performance How much did that cost? There’s no getting around this This is a timeless problem What I don’t like about most talks about distributed computing or transactions or databases– anything–systems or almost any talk you go to in computer science, they’ll start talking about, “Wouldn’t it be cool if we could enforce the isolation of these objects,” or, “If we could do these transactions faster…” And it’s like, “Why do I need transactions? “Why would I want them? “Do I want transactions? No, they’re annoying “I don’t want to deal with transactions I just want to write my code.” Right? So when I’m presenting something, you got to start with what you actually want So instead of a script going forward, this is more like a makefile going backward from the ultimate goal The ultimate goal is I have a program that I would like to run And it should be correct to do what I want, and it shouldn’t cost very much And I think basically all talks should start this way But from here we can motivate why you need transactions And Brian said, “You know, people at this talk, “they’re really gonna want to see the details “They’re gonna want to know the details of how this thing works.” I go to a lot of technical talks, and I’m not an idiot And I’ve been to a lot of talks at Berkeley where some incredibly detailed thing that I can’t follow after two minutes, and then I have to sit there, ’cause I can’t leave the room ’cause it would be impolite for another 40 minutes, and, “oh, my god, I’m gonna fall asleep.”

I don’t want to give a talk like that So the details of this algorithm, they’re subtle, the proof is very subtle I’m not gonna pretend to give it to you during this talk I’m gonna tell you why you should care You can read it yourself or you can trust us The implementation will be open source But hopefully what you’ll get out of this is why do you care about transactions, why do you need them for distributed computing, and why this is the future and you absolutely cannot avoid learning about it There’s no more avoiding learning about this By the way, this talk was originally written to be, like, about 25 minutes, ’cause I gave it at CodeCon 2009 If you guys never been to CodeCon, be sure to go next year It’s more like a party It’s a un-conference It’s kind of the opposite of how managed this is A lot of good stuff is there Most people don’t hear about it So if I’m giving a talk and I’ve got it designed for 25 minutes and I have 60 minutes, what I’d really like you guys to do is just raise your hand if you have a question And I’ll probably wait to the end of my sentence or the end of my paragraph, and then I’ll probably take your question If your question is completely off the wall, then I’m gonna maybe say, “Well, let’s prune that, or talk about it offline,” but go ahead and feel free I can’t stand sitting in a talk not knowing what’s going on, so raise your hand When you do it, you all look very small, so raise your hand high like that You know, don’t do this, ’cause I’m wondering, are you fixing your hair? But feel free. Please It’s much more fun to have a conversation Correctness is our primary concern, and it requires invariants A lot of people who didn’t spend a lot of their time in a computer science or math department don’t think about invariants, but if you really want to think about the correctness of your code, especially in domains where it’s getting harder and harder to debug code, distributed parallel stuff, it’s just impossible to debug it ‘Cause you may have a bug, and it’s just unreproducible How are you gonna debug it? You’re not gonna run it in debugger Unless you’ve got some really clever– There are some clever infrastructures people build to try to make them reproducible, but then you got to learn that So the way to think about the correctness of your code is, what doesn’t change while all this stuff is changing? I’ve heard someone say, “The essence of software has changed.” So you want stuff to happen, but you also want some other stuff to not happen So computers make it easy to say what you want and have it happen They also make it real easy to say what you really did not want and have it happen This is a fundamental problem So invariants are sentences that are always true they do not change when all else is changing I should have put one more thing on this slide You decide what should not move You know, my data structures maybe have invariants I’ll give a few examples You initialize them when you construct your data structures or your objects You maintain them during operation as you do things Okay? And the third thing you want to do is, you want to pick invariants whereby you can insure the correctness of your code if you know they’re always true And you’ll see what I mean by that when you see the examples If you aren’t thinking in terms of invariants, start now Software is becoming more and more and more critical to our infrastructure Bugs are just more and more and more devastating People die now ’cause of software bugs Okay? It’s not a joke So you have to start thinking about this If you get nothing else out of my talk, remember this– invariants and code correctness Here’s an example invariant You have a doubly-linked list This example is due to Scott McPeak People think you can’t do automated proofs of correctness, but you can now His PhD thesis at Berkeley was a C compiler that could prove some memory drivers in Linux memory save Some, sorry, driver’s memory save in Linux It’s the sort of thing you should start thinking about This is really a talk about correctness So the invariant that he liked to give a demo of was a doubly-linked list You either want x->next to be null, or you want x->next->prev to equal x If that is ever not true, you’ve got a problem Further, you don’t just want it to be true You want–If it’s a multi-threaded program– Sometimes you have to modify your list, right? You have to add an element in the middle of your list or at the end Temporarily, this invariant may be violated What you really want is no other thread can see that, and you want that when you violate the invariant, you don’t end up in this violated state, that you get to another good state Those are isolation and atomicity, which are two properties we want of databases, right? But it’s not just about databases anymore Transactions and correctness, it’s not just about databases anymore It’s about all of your software Many other data structures have similar invariants You know, “Don’t turn off your computer “without shutting it down, ’cause your file system could get corrupted.” Boy, that was a good idea Let’s design our software that way No. Let’s not do that anymore, okay? It’s because they’re missing the transaction semantics in the file system Here’s another invariant Conservation of money You’re implementing a bank When you transfer money from Alice to Bob, no money is transferred, right? You’re just subtracting an adding There’s an illusion of transferring an object, but why does that illusion persist?

The illusion persists because there are invariants that are maintained For example, the sum of all money does not change If money goes away somewhere, it has to show up somewhere else Local conservation of money is what makes money work If we could just invent money, it wouldn’t be money, right? All right We’ll get back to that invariant, that example Scalability We’d really like our software to scale now And that’s why, you know, lots of people come to Google App Engine conferences It’s the whole thing Google does is, “Let’s do all this cool stuff and make it scalable.” This is a great idea, but you’re not gonna do it with a big computer Gee, Google must just have this really big computer, right? That’s how they do all that They just have this huge, honking computer No, they don’t They have a deconstructed semantics They figure out how to deconstruct what they’re doing so they can spread it out over a whole bunch of little computers There’s no way around that There’s no such thing as this mythological, big, powerful computer Unbounded performance scalability The people who are enthusiastic about App Engine, it’s great If you write your app the right way, it just scales and scales and scales and scales, and it just keeps going That’s awesome, isn’t it? And there’s only a finite amount of stuff you have to deal with to make your app do that, but there is some stuff you have to deal with So the illusion many of us grew up with, you know, I grew up programming an Atari 800 in BASIC, and, you know, you’re programming this computer It does what you want But it was single-threaded It didn’t turn itself off at random times The hardware– I had one piece of hardware and it never failed I suppose it could have But in a big, distributed– If your app is scaling across this huge data center, computers that are running your app will be failing I’m sure there’s Google people here besides Ryan They can confirm that Google’s just losing machines every second, probably They’re just losing them constantly Things become disconnected Things go away So distributed machines have, if we can call them that, or clusters of machines, have the following characteristics, which are very annoying They’re not reliable, as I said They’re not serial Many threads are happening all at once Most people, the way they write data-structure code, if there were two threads manipulating it, it would be a mess You don’t want that And they’re non-synchronized There’s no single ringmaster coordinating everything, making it all coordinated And not only that, there’s not even a global notion of time If you’ve got a distributed algorithm, and it’s got a global notion of wall-clock time in it in order to insure its correctness, you’re gonna have a bug Something’s wrong This is the future You might as well learn it This is what we all have to deal with, but it’s a finite problem to deal with Especially if you have [clears throat] Well, okay, so–shoot I thought the next slide was something else Okay, distributed computing makes maintaining invariants hard We have to maintain invariants in order to have correctness We want performance, unlimited scalability, so we need to distribute this When these two meet, it’s difficult Here’s why Alice sends $10 to Bob This time it’s $10 I don’t know why Okay, step one, add $10 to Bob’s account Step two Oops, we didn’t get to step two Process times out and machine crashes $10 is never subtracted from Alice’s account If you’re the federal government, this is not a problem, ’cause you can just create money, as we’ve noticed recently They can create a lot of money But most of us are not the federal government We don’t work for the federal government, so we’re not allowed to do that How do we solve this problem? Transactions Transactions are where maintaining invariants correctness meets unlimited scalability It’s called a “good” state, a state where all of your invariants are satisfied Your program is in a good state It has all the properties the user wants Good. Leave it that way Don’t change anything If the user says, “I’d like you to compute a service for me,” no Don’t change anything That’ll preserve correctness The only problem is, it won’t give you any performance, so in order to get performance, we have to temporarily violate invariants You know, we have a data structure It’s supposed to point like this Well, you’re gonna change one pointer, then you’re gonna change the next pointer In between, you’ve violated the invariant If that state were visible to someone else, that would be bad So what we want to do is, we want to have a set of operations that takes us from one good state where all our invariants are satisfied to another good state where all our invariants are satisfied I’m used to thinking this way, but maybe some people aren’t People talk about the design space or the state space You can imagine all the variables in your program, all the pointers and everything, it’s this huge, horrible, high-dimensional space There are certain islands in that space which are the good points, and what you really want to do, and everything else is horrible, right? Badness And this space is, the good set, is very non-convex It’s very non-local You have to go through badness to get to other goodness, okay? So there’s islands of good states, and you would like to hop from one island to the other and not fall in the ocean If you think that way, it’s very easy So a transaction is a set of operations to get us from one good state to another

If they satisfy all the transaction properties, that is You first start hearing about transactions in school, and they say, “ACID properties. ACID.” And they say it over and over and over And they say it in every class You get books on transactions, and they’ll say–even Gray’s famous book on transactions, I actually read a lot of that He says the ACID properties, and then the next page he says them again Sometimes he says them twice on the same page Why doesn’t he say it one time, you know? And then I started thinking, you know, why are these four properties enough? Like, are these four properties what we really need? I’ve never seen this ever written down anywhere, so I wrote it down Actually motivate–Why do we need these four properties? If you’re hopping from island to island, first of all, when you hop to an island, you’d like to actually stay there That’s called durability It’s got to have a big, fancy, Latin name, because we couldn’t explain anything simply I don’t know why That’s academia So, atomicity We would like it that you don’t fall in the ocean Okay, that’s atomicity You either hop there or you don’t It’s okay if you try to hop to an island and your plane gets cancelled and you stay at home That’s okay, ’cause you’re still on an island Isolation is that nobody else can observe your in-between state between one island and the next island These are really the same property These are like, there is no in-between time, semantically, between for yourself, atomicity, or others, isolation Consistency Jump, hop from one island to another island Don’t hop to the middle of the ocean That’s really the responsibility of the layer above, and durability is really the responsibility of the layer below So all we really need to concentrate on when we’re using, say, Google App Engine Underneath is a lot of good infrastructure by people like Ryan, and others I’m sure, that provide durability and a lot of other properties So you can count on App Engine to keep your data once you’ve put it, and that put comes come back You know, it’s replicated We heard earlier today in a different talk, it’s in at least three different places, and it’s geographically distributed, good I think that’s good enough It’s up to you as the application developer to hop from good state to good state, to say, “Run this transaction Take me from this state to another good state.” But the in-between layer, the transactional layer I’m gonna talk about, the distributed transaction layer I’m gonna tell you about, we need to be worried about atomicy and isolation And here ACID is again I don’t know why I put that in All right So no questions yet so far? Nobody’s like, “What is he talking about?” No? Okay Local transactions Google App Engine provides local transactions They provide some transactional semantics These guys aren’t dumb They’re at Google They said, “Obviously people need transactions.” But for various underlying implementation reasons that Ryan could tell you a lot about, and I think he actually will in his next talk It really helps them to localize the transactions in space, or what we call space in data So, in other words, when you make an object in App Engine, you can group it with other objects, and make something called entity group Once you’ve done that, that’s it You get to pick the entity group of your object at object construction time You can’t move objects from group to group There’s no such thing Okay, so these entity groups form a partition of your data And I think the suggested size in the documentation is, “Well, that’s enough for one user’s data.” Um, yeah, okay, but maybe my users would like to interact There’s this Web 2.0 thing Okay. Maybe I’d like to build a bank You can’t build a bank on App Engine right now, because you can’t put everybody’s bank account into one entity group that’ll overwhelm I mean, you could, technically, but something would break, because it would overwhelm they’ve assumed the entity groups have a certain size, and it’s not that big It’s about the data for one user, like I said That’s what they say So how do we solve this problem? We would like transactions, but with local transactions, you run a transaction, you’d better only operate on data in one entity group, ’cause you try to operate on two, you get an exception, I believe, right, Ryan? You get an exception if you try to–Yeah And also if you try to run queries, you can’t do that either We’re not gonna solve– So we’re gonna solve the first problem, not the second problem So those of you who come from the relational world, everything’s done with a query You can’t look up an object without running a query In App Engine, if you have the key to an object, you can get it and you can put it without running a query There’s something else called a query We’re not gonna solve the problem called “You can’t run queries in a transaction on App Engine.” That’s future work We are gonna solve the problem called, “You can’t do transactions “across a set of objects that span more than one entity group.” We’re gonna solve that problem You haven’t been able to do that until now Any questions so far? That’s the problem And why you should care Solution Well, this is the algorithm That’s it Fits on one slide

I’ll read it to you The first thing is What we’re gonna do is, we basically want the user, when they run a transaction, they’re gonna read some objects, they’re gonna read something, maybe from the user, and they’re gonna write some objects We want that to look to all other threads as if it happened instantaneously So the database was in some state, and then when the user– they mapped basically reads to writes, and that happened in one instant in time Now, it doesn’t really happen in an instant in time, so we have to provide that illusion Semantically, it happens in an instant in time, but we’re separating that from how it actually runs So what we do is, first of all, when the client says, “Here’s a function and here’s some arguments, run this in a transaction for me.” That’s the interface You say, “Run in transaction.” Hand it a client function When the client asks for reads, when they say, “Read these objects from the database,” we read them then, but we record the version number of the object read All objects get a version number And they can’t roll over I’ll tell you about that later Then the client function says, “Hey, write these objects back to the database.” We don’t do that That could be bad That could break something So we’re not gonna write any of your data So we said instead we store the writes in these shadow objects, and if you try to write a user object in the client function, we store that, what you wrote, in a shadow object in the same entity group as the user object So we spread your data out It’s in many different entity groups It’s cool It’s all gonna work Now the client function is done I should have put that in here So the client function is done The map it computes from reads to writes, that’s now a static object of finite size This is very handy for us So now what we do is, we get write locks on all the objects we’re gonna write Now if we just do that in any order, what can happen? ‘Cause write locks exclude each other, so no two distributor’s transactions can have a write lock on the same user object at the same time What can happen if you just get write locks? Deadlock Somebody said it Deadlock Because, “I need this,” and “You need that,” and we all grab, and we just sort of wait, ’cause none of us can get what we need to finish, but we’re not about to let go of the resources we have, so we just stay there in deadlock All right, so instead, the standard algorithm for getting rid of that, sort your objects, get them in increasing order You can’t get a cycle, because your wait-for graph is only pointing up the order Then we go and check the version numbers on all the objects you read to see if they still have the same versions they had when you read them Now some of you are saying, “But that’s a race condition, Dan They could change after you check.” No, it’ll work Trust me Someone should say, “Race condition?” Come on You guys are just sitting there It’s much more fun if you say something Okay. We also check not only is the version the same, but nobody else is in write lock on that read object And then we’re gonna go do something later It’s not a race condition Then we go and we take all the shadows and we copy them, stomping on the state of the user object We make your write actually for real We copy the shadow object to the user object state We update the version number, which is actually the ID of the distributed transaction object IDs are guaranteed never to repeat Ah-ha So version numbers can’t roll over And then we delete the write locks and the shadow objects leaving no garbage There’s a very subtle, very rare condition under which we can still get garbage in this algorithm I’m working on it If I told you, Ryan would say, “Dan, that’ll never happen.” Well, I don’t know what he would say Sorry, Ryan But I’m obsessive about these things See, that’s what you want You want someone designing this algorithm to be obsessive All right, there’s a lot of things you can do That’s all the correctness depends on There’s some things you can do so that transactions kind of try not to stomp on each other Ah. Yes We have a question Thank you I’ll repeat the question If I got write locks, why do I then check that the objects don’t have write locks? I get write locks on all the objects I write For all the objects I read, which could be a different set of objects I check the version numbers are still good, so no one else has written it, and no one else has it locked No one else has the objects I read write locked ‘Cause you could read some objects and write others You tend to read and then write the same objects, but it could be separate sets of objects Good question. Thank you Yes And if you’re near the mic, please use the mic man: This is similar to phase commit Wilkerson: It’s–yeah There’s only so many good ideas in this world man: How do you prevent a client from crashing? Wilkerson: I’m not gonna patent this. Yes? man: How do you prevent the locks from being held indefinitely when a client crashes?

Wilkerson: How do I prevent the locks from being held indefinitely if someone crashes? Very good question Okay So a lot of the complexity of the algorithm goes into these concerns This is very simple, but there’s lots of things that can go wrong I’ll actually talk about that later, but I’ll answer it now Which is, if you grab some write locks and then you just crash, your thread crashes, first of all, all the state of the distributed transaction is stored in the distributed transaction object, which is also in the database, and we don’t start rolling it forward until it’s in the database So you grab some locks, and then your thread goes away Someone else tries to lock that object, they block, ’cause you have the lock They can check your creation time of the– So distributed transaction one gets a lock on an object Distributed transaction two tries to get it Distributed transaction one’s thread is timed out or something Distributed transaction two– Since all the state of the distributed transactions is in the database, its thread can pause, go and become the thread that’s rolling forward this one that has the lock, roll it forward to completion, then go back and complete itself So the entire state of the distributed transaction is in the database as well So different threads can kind of switch off and roll forward different distributed transactions if they’re blocking That’s the answer Make sense? Transactions and cooperate If you are reading an object and, hey, it has a write lock, you probably ought to wait That’s pretty much it That probably works This is all not as easy as it looks In case you’re wondering, why’d this guy get to talk at Google IO? I could have done this in a weekend Deadlock prevention, yeah, you’ve got to do that But deadlock prevention prevents a lot of other cool stuff If you’ve got to sort all the objects you’re writing, what that means is the client has to be done writing them So you have to do it after a client’s done writing, and these restrictions begin to accumulate ongoing progress There’s a lot of things that can cause There’s a lot of ways to deal with ongoing progress This is–it’s a big topic One is your thread can go away Someone else needs to be able to roll you forward But what if no one does try to get a write lock on one of your objects and no one tries to roll you forward? Well, you need a background thread that’s looking for old distributed transactions and rolling them forward Also, how do you prevent the user from accumulating unsatisfied distributed transactions? We thought a lot about how to, or I thought about how not to One of the things Gray talks about in his book on databases, he has this huge book the size of a calculus book on databases, and then he has this one little aside It’s, like, a page and a half He says, “But this reality can become completely decoupled from what’s in the database.” For example, something that actually happened, in a branch office of this, like, bank, they took all the records, and they hid them in the ladies’ room And they never got entered in the database So keeping your database synchronized with the user interface is actually part of our paper So we actually record the user, the request of the distributed transactions, so that, A, well, when you get another request from a user, first thing you can do is query for all the pending distributed transactions they haven’t finished and roll those forward first, so the user has an experience of transactions committing in the order in which they requested them I think that’s important, you know In algorithms class they’ve never probably talked about that, but this is reality Concurrent roll-forward The second half of the algorithm, where once the client function is finished If other threads can come and start rolling your distributed transaction forward, there can be multiple threads doing that, right? So that means your entire roll-forward has to be thread-safe But how do you do that? Because I have to get locks, and then release them later That’s not item potent, right? Item potent means I do something and then it’s done, I never go back Or like a light switch You switch it on, no matter how many times you switch it on, it’s on It’s not a toggle switch Toggle switches are not item potent They’re also not thread-safe Because if I went and turned the lights on, another thread could come and hit the button also and then turn ’em off And, no, I want ’em on, and they come and turn ’em off again I want it on And two people trying to turn on a toggle switch, they could fight with each other and keep turning the lights off This is not thread safe, so you need your entire process of getting the locks and releasing them to somehow be thread-safe And I’ll leave that as a puzzle for the listener How do you get a lock and then release it in a way that’s item potent? Erick Armbrust came up with a way to do that It’s really clever By accident, actually, in a sense We didn’t realize he solved that problem Proof of isolation We need to guarantee all these properties where no one can see things happening in between So, deadlock prevention, we talked about that Get locks in a certain order Ongoing progress, oh, yeah We want read-storms I think Ryan once said in a talk, found that Google has ten to a hundred times as many reads per writes in web apps So this is optimized so that if you have a write and lots of people are reading that object, those reads can’t keep out your write Some of this I already talked about,

rolling forward when you’re blocked We take care not to create garbage, which is actually kind of hard to do I talked about concurrent rollforward, so when we’re getting locks, we want to proceed– We want the state space of our distributed transaction to proceed monotonically through some big state space where there’s only one path, so multiple threads rolling it forward won’t matter, ’cause they’re all going along the same path So they’ll all get to the same place And this is more about that So this says the answer But can anybody else figure out how, looking at this and thinking about it, how do you release a lock in an item-potent way? When we created the need for the lock, we wrote a shadow object So first you have a shadow object Then you have a shadow object and a lock Then you delete the lock and the shadow object at the same time in a local transaction So those are three distinct states that actually aren’t repeating Even though the lock is getting set and then released, the whole state’s base is still monotonic Locally, it looks non-item potent, but it isn’t Strong consistency Eventual consistency is this new fad in distributed algorithms where you don’t have to actually make sure everybody else finds out when you update objects, but that’s a real pain to program too Thankfully Ryan is nodding Thankfully the good people at Google on the App Engine team have provided us with a strong consistency, which means if I update an object and it returns, then anybody else I talk to subsequently, if they look, they’re gonna see that update There’s something in between called causal consistency, where I can tell certain people and they’re guaranteed to know You could actually do our algorithm on top of causal consistency, but we have strong consistency, so we don’t need it Without strong consistency, it’s very, very funky what does time mean in a distributed algorithm Local and distributed transactions don’t mix Local transactions are obviously cheaper They have less mechanisms So Erick has carefully implemented this algorithm so that if you only need distributed transactions for certain objects, you can have them only for those objects and not use them for other objects Also, you really don’t want to mix them accidentally, because local transactions don’t honor our distributed transaction locks, so there’s no way with the way Erick has it set up, at least the way he told me, is there’s no way to use distributed transactions on an object without entering the distributed transaction infrastructure You can’t use a DT If you do, you’re objects suddenly become DT flavored, and you are using distributed transactions We did a lot of work to try to prevent you guys from falling into holes Like, if you use the algorithm just right, everything’s fine, and then if you do something wrong, too bad for you, ’cause you’re the stupid user No We didn’t do that We really tried to make it so that it’s fool-proof, so that you guys don’t fall into holes Also, there’s a thing if we’re buffering writes from an algorithm I read, object x, I get x is 1 Then I write x is 2 Then I read object x again, I’m gonna get x is 1, ’cause it’s a buffered write So we prevent that from happening We don’t allow you to do read after write We also don’t allow you to do write after write, but again, this allows the illusion of correct data flow in your client code, so that you’re not– we’re not messing you up Even though we’re messing with the data flow, you can’t see that So you can’t mess yourself up by using our infrastructure that way Also failed DTs throw an exception Oops That’s not true any more Anyway, we leave distributed transactions around after they’re completed, so you can query them See, distributed transactions are They’re synchronous issue and asynchronous complete You don’t know when they’ll complete, ’cause your thread might time out So they stay in the database so that other threads of your application can actually report that to the user This action you took, user, I’m imagining an Ajax UI We’re in another window You say, you know, “This action you took succeeded “This one succeeded Sorry, this one failed “You might want to retry that “But since the database has changed, you might want to check it out before you retry it.” So you can really have a very professional conversation with the user “These are exactly what succeeded, This is exactly what failed.” The user doesn’t have to guess And then the client code, the calling client code, can delete the distributed transaction objects when it’s sure it has informed the user, and the user knows what they need to know So this is a way to really do crisp apps with very clean semantics, where there’s nothing funny like, “Oh, I did something, and, well, it didn’t happen I wonder why.” You know, you can find out The user can find out So the whole vision is to write enterprise-ready apps where you can write a bank, and, you know, if you try to transfer money and it doesn’t happen, you can tell the user precisely, “That did not happen For sure it didn’t happen.” Not, “Oh, this seems to– I guess it doesn’t seem to have happened.”

We don’t handle queries That’s because queries are hard The ability to handle queries depends a lot on the semantics of the query predicate Also even local transactions in App Engine do not honor queries Ryan is nodding his head But soon maybe some of them? Oh, sorry Maybe I shouldn’t have said that Oops. Sorry No [laughing] They don’t handle So queries are really tough And one thing you can do, and I’m not quite sure what the semantics is, but you can do a query and then take all the objects you got and mark them as read in a distributive transaction, and that will give you something But I’d have to think exactly what it would give you We need future work on that Maybe Google will hire me to work on that I don’t know They have a hiring freeze He tried to get me hired, but they have a hiring freeze ACID correctness example isolation Ah, yes Here’s an example of why you need ACID This is from Simon So distributed transaction one gets write locks on two objects, and has written object A It has written the new balance for Alice, but not the new balance for Bob The other distributed transaction– Oh, and then DT1 is paused I don’t know Gets swapped out or gets scheduled out DT2 now comes and reads the state of A and B So it sees that Alice has lost $10, but not that Bob has gained $10, or whichever DT2 has read a state of the database that does not satisfy the invariant DT2 must die It must be doomed We cannot let it complete We must abort DT2 And we do guarantee that if this happens, DT2 will abort, and the reason it will abort is because when DT2– when it goes to commit, when DT2 checks the version number for object B, it will fail Well, there’ll be either a lock on object B, or the version number check will fail Oh, yeah, this is– Yeah, there’ll be a lock, so it’ll fail Anyway, there’s various different ways it can fail depending on when things get scheduled, but it will definitely fail Here’s that example written out in painful detail I’m not sure if you really want me to go through this example Anybody want me to go through it? Wow. Nope. Okay Future work We’d like to do queries It’s very dependent on the predicate We need the– Do we really need to rely on underlying strong consistency? Google App Engine provides it, so that’s cool, but maybe you could use this on other infrastructures We don’t actually provide strong consistency to some degree You have to do queries to find out when you issue a transaction, you know, did it complete? So you can kind of do it, but you’ve got to keep asking, “Did this distributed transaction complete?” So you can probably get it if you work We can add that Performance I’m not an expert on the deep, underlying layers under Google App Engine, but Erick Armbrust at least did one really cool optimization where shadow objects actually aren’t manifested as objects They’re just a string which is in the protocol that Google sends over the wire to other machines So there are people at Google– The reason to do this is open source, is because, you know, people at Google can help me get it right, A, and, B, do the cool optimizations underneath, and, C, I get to come talk to you guys So…conclusion Ryan said, “You have to have a conclusion slide.” What does it say? “Distributed Transactions on Google App Engine exist.” Or sometime in the next month or two we’ll get them released And let us know if they help you, or let me know, I guess, or write any of us, I suppose Probably Ryan would want to hear from you as well Yeah, he’s nodding his head That’s pretty much it I’d love to take questions if you have them Thank you [applause] Yes, sir Would you guys mind using the mics? They asked me to ask you to use the mics I can repeat questions if you’re in a wheelchair or something, but man: So is this gonna be a library that’s released that we can use in our application, or is it gonna be built in the Google App Engine itself? Wilkerson: Erick Armbrust is releasing his implementation as a personal, open-source project at Google So it will be released as part of I think, in fact, he’s gonna call it Tapioca Or Tapioca ORM, which has an R in it for “Relational,” even though there’s nothing relational going on, which drives me nuts, but anyway I couldn’t get Erick to change that Yeah, so it’ll be a library You’ll be able to– It’s for the Java version of App Engine right now I asked, “Will this work for Python too?” They said no Sorry

So if you’re writing App Engine apps in Java, you’ll be able to use it as a library, as an open-source library Yeah Please Hey, Ryan Barrett: Hey, Dan So you talked a little bit about roll-forward And so most people, myself included, have more background with rollback than roll-forward Do you want to talk about the difference between the two and why this is roll-forward? Wilkerson: Oh, right Ryan’s the one who suggested that this be an optimistic transaction protocol, and I don’t know if I would have thought of that without his suggestion In fact, he had to suggest it to me repeatedly, ’cause I kept trying to do it another way So that was very helpful So thanks, Ryan And I think it’s because of that that we have roll-forward Basically, when the client function is running, if anything goes wrong, abort Once the client function is done, now the client function has basically created a map where it did some reads and it wrote them to some writes It’s just a finite diff on your database And now all we have to do is apply that diff Preserving atomicity and isolation Well, the entire result of what the client did, that whole diff, the change to the database, that delta to the database, is written in the database as objects in the database It’s in shadow objects It’s in the distributed transaction object There’s no need– Okay, so It still might fail, because there’s checks We have to get locks We have to check read object version numbers, and we have to get locks on the written objects If any of that fails, then your transaction’s gonna abort, and we’re gonna throw the whole thing away But once you have locks and you’ve checked the version numbers, at this point, nothing can go wrong And my way of thinking about what is two-phase commit, in the first phase, do all the work, and get it set up so that nothing can go wrong And phase two, throw the switch That’s basically what we do Once you have all the locks, you’ve checked all the version numbers, nothing can go wrong So if you time out, which happens on App Engine all the time, right? App Engine comes and times you out There’s no reason that we have to abort your transaction In fact, once you’ve started copying shadow objects to written objects and you’ve copied a few of them, you can’t go back It’s in there You must roll forward to completion There’s no going back There’s no way for us to undo that So if you time out, and another thread locks on the same objects or a background thread says, “Hey, what’s this old DT from yesterday doing lying around,” those threads can roll you forward to completion, and there’s nothing that can go wrong other than being delayed by more timeouts Rollback would be, I suppose, if, I don’t know I had to learn about databases in order to– I’m trained as a theory guy and a software guy and maybe PL guy, but I’m not a database guy, or wasn’t I don’t know how rollback works, actually, so whatever it is, we don’t do it, ’cause I don’t understand it Yeah. Please man: Hey I have two questions Wilkerson: Please man: I don’t quite understand what you mean– Wilkerson: Could you speak a little bit closer to the mic? man: Oh, excuse me I don’t quite understand what you mean when you say the DT is synchronized when it’s issued, but not synchronized Wilkerson: When it completes man: Who sees what where? I mean, what is– Yeah Wilkerson: Very good Thank you Great question If I’m the client code, I have two parts I have the caller code, which says, “Gee, I need to update– I need to send $10 from Alice to Bob.” Then that code, your code, calls, “Run indistributed transaction,” my code, but you hand me a function called transfer, with arguments Alice, Bob, and $10 So then I run your function for you, okay? So that’s all synchronous You called me, I called your function back You know, that’s normal code If function F calls function G, function F pauses, function G gets a new stack frame Function G returns Function F then resumes That’s all synchronous That’s what you’re used to However, if you tell me to run your function in a distributed transaction, what can happen is while that’s happening, your code can take so long or there can be so many delays in the database, it can all timeout Google can come and just time you out And then boop, it’s timed out There’s nothing I can do about that That’s part of the constraints of App Engine However, that distributed transaction, if it got far enough, it’s in there in the database It just hasn’t applied yet So some other thread may discover it In fact, the user will probably hit reload, or, “What the heck?” And they’ll say, “What happened?” Somehow your user will come an interact with the database again Your calling code ought to query the database, find any pending distributed transactions by that same user that haven’t completed yet,

and go, “Hey, I didn’t finish this.” And then call us and go, “Hey, finish rolling that forward.” That might timeout again Some background thread might find it and finish rolling it forward In any case, the completion, you never quite know when it’s gonna happen, ’cause your code could take ten hours, and every ten seconds App Engine’s timing you out So there could be a lot of different threads that have to discover that, roll it forward a bit further Finally it completes It’ll complete who knows when, but whenever it completes, your threads may not be around Somebody else’s threads maybe have tried to lock one of the same objects and have rolled it forward for you and completed it for you Now it’s sitting there in the database, completed And it’s sitting there “I’m complete.” Then your user comes back and you find it lying around, and you go, “Ah, this completed asynchronously It completed some other time.” And there’s really no way around this If you write code that takes long enough, there’s no way I can prevent, in the presence of timeouts, there’s no way I can prevent the need for sometimes asynchronous completes to happen So it’ll complete If it completes synchronously, if you say, “Run this,” and it completes synchronously, you get the distributed transaction object back, and it has a return value This is my recommended idiom that they do in the implementation It would have the return value If it failed, it would have why it failed, the exception value It would also have the function you called So you could say, “You were trying to do what? “Oh, yesterday you were trying “to send $10 from Alice to Bob “Oh, let me put that in the UI “That thing you tried to do yesterday, “it completed Let me send that to you, user.” So you could complete synchronously, but if you timeout, you could complete asynchronously, and have to discover that fact later With a query I thought about how– And I’m not just a theory guy I thought about how to write the code to do that And there’s a nice way to do it, so you can discover things to show ’em to the user man: So you’re saying we’ll be able to query the distributed transaction itself to ask it, “Are you complete?” Or–? Wilkerson: You can query the distributed transaction infrastructure Go, “Hey, does this user “have any pending distributed transactions “lying around that aren’t done? “Do those first, “or tell me so that I can tell the user “Or are there some that are done, and I should tell the user?” You can query it for that man: Well, the second one was, if this works as you say it works, what would the point of entity groups be in it? Wilkerson: Because you might have something you can do with only local transactions Those will be faster Our infrastructure costs you Man, we really worked on it for months to try to minimize the cost Erick put stuff in there Rob thought of some optimizations We really tried to minimize the impact But if you can do something with only local transactions, go ahead, because they’ll be faster man: But, I mean, like, you could put entity groups around every single property, what you’re currently putting around, like, user spaces, and it’d be the same with distributed transactions Wilkerson: You could put entity groups– In fact, you can– Erick was like, “No, I want people “to be able to use them both “And if you have an entity group, some objects can be distributed transactions and some local.” I was like, “Erick, nobody’s gonna need that.” He’s like, “No, there’s this optimization Sometimes you got to group things for speed.” And he made it all work So you can have– Evidently, he’s told me you can have an entity group Some objects in there you do transactions to the distributed infrastructure And others you do with local But if you try to mix them, it’ll have to go to distributed for all of them That answer your question? man: Yeah, thank you Wilkerson: Cool We try to think of everything We really wanted it to work in reality for you guys I spent a long time trying to think of– Yes, please man: So are you saying that the application then has to check if there were transactions that completed but weren’t What is it called? Wilkerson: The completed asynchronously You issued a request man: Yeah Wilkerson: “In a distributed transaction, please transfer $10 from Alice to Bob.” But for some reason, that has to go recompile Mozilla Okay, so that takes a long time That doesn’t return in the timeout window You got, like, ten seconds, is it, Ryan? Something? Barrett: 30 Wilkerson: 30 seconds Oh, they got generous Now you got 30 seconds for it to do something, you know? But if it doesn’t come back in 30 seconds, then it might still succeed, but later If it does come back in 30 seconds, you get a synchronous return back man: So if it succeeds later, do I still have to have in my code checks to see if there are any pending ones and then ask for them to be completed? Wilkerson: Unless a background thread– Somebody else’s thread or the background thread may have done it for you, but you still have to check– Even if it got completed, even if someone else rolls them all forward for you and they’re all completed, you still have to check– “Hey, are there any completed tasks “that got done? “Oh, well, thank you Let me tell the user.” ‘Cause how can I otherwise tell you, right? man: No, I’m thinking if I don’t care, right? I’ve got this transaction and it either succeeds or fails, right? Wilkerson: It will eventually succeed or fail Unfortunately it can starve— Man: So I don’t need to do anything else? Wilkerson: If you want to absolutely make sure it doesn’t starve, and you’re kind of wondering the background thread might take a while to get around, you can do background stuff in App Engine now If you want to rely on the background thread to find old distributed transactions and keep pushing on ’em till they’re done, and that thing’s pretty aggressive,

then, yeah, there’s nothing you have to do It will abort or it will succeed man: All right Wilkerson: Yeah One or the other, eventually Yeah. Please man: Probably related to his question was, did you look at stability as a factor with these transactions? I mean, the bane in web applications for the most part is in instability of the web applications by complications, so Wilkerson: What do you mean exactly by instability? Like, a not reproducible– like sometimes takes a long time, sometimes a short time? man: Like applications going offline because they’re clogged up or they’re destabilized because of an overly-complicated component in the application server So did you take a look at that? And then related to that– Wilkerson: Well, hold on I still don’t understand What exactly does instability mean? What’s the exact phenomenon? Like, you mean did we look at if we make things take longer and therefore they’re timing out when they didn’t used to? man: They’re timing out, and then the work required to go remediate those transactions that are still left running in the system In other words, is there a consideration of stability in the application in your design center looking at this? And related to that is did you look at any other technology that’s an alternative to distributed transactions, like technology provided by business process execution languages that sort of compile, like, a business process into a you know, a simpler type of back-end mechanism? Wilkerson: All right Both these questions are kind of a little amorphous, so I’ll do my best In terms of instability, what we did is, we did the algorithm that has the lowest impact– Oh, I forgot There’s something I didn’t put in a slide for It would have been really easy to have read locks If you notice, our system has no read locks So web apps tend to be read-heavy, as Ryan says So one thing that– Several months ago, I had an algorithm, and I was like, “All right It’s done. It’s baked.” We’re way ahead of we’re done But then I realized I was getting read– I had these read locks, and I was getting read locks on objects which I’d read, which means if you read in the distributed transaction layer, that was turning into a write at the underlying local transaction layer, and I realized, “That’s really bad,” because writes are always way harder to handle than reads So we had to turn the whole algorithm around and spend a long time on this just so that reads at the distributed transaction layer turn into reads and no writes on the underlying layer So if we hadn’t done that, this would be a piece of garbage I think, probably And you would have a very good point You’d say, well, you just destabilized my app, and you know what? You’d probably be right But we’ve done everything we can to use the minimum resources to make this happen We have not implemented the latest thing and done lots of performance evaluation on it We haven’t done that So we can find out when you profile it Are there alternatives to transactions? Boy Could you not use transactions? You could try to fold the transactional semantics into the actual logic of your app in such a way that you just happen to manipulate structures in such a way that they’re kind of doing transactions and manipulations at once, and kind of save the cost of our layer Good luck I would never do that It’s just too hard I don’t know Alternatives to transactions, It’s an ongoing area of research I think, actually Can anyone–Ryan, you know a lot about this stuff Do you want to say anything about that? I think they probably want you to talk into the mic, but Ryan says there’s a talk tomorrow at 10:45 on offline processing, and to go see that And, in fact, you should probably stick around for Ryan’s talk, which is next He’s gonna tell you a lot about App Engine infrastructure Yes, ma’am You have a question? And, sir, if you want to get at the mic, I can Go ahead and get to the mic Yes, sir Yes, ma’am woman: In Java, you can specify something to run in a transaction and something to not run in a transaction, right? You provide two interfaces Wilkerson: You can specify this– “Do this in a transaction, and then don’t bother with this other stuff,” yes woman: Can you put a little more highlight about that? [speaks indistinct] In a transaction, and to run in a transaction? Wilkerson: Oh, boy Can you say that again? Did any– I just didn’t really hear what you said, so maybe you can woman: So suppose you specify something to run in a transaction, and some things to not run in a transaction Can you give me what will be the difference in the state of the two of them, and what will be the difference between them? Wilkerson: Oh, how to know what to do in a transaction? Well, you explicitly say,

“Here’s a function “Run that in a transaction “with these arguments And then here’s another one Do that one.” So you are explicitly telling the program what to run in a transaction Now the question is, How do you know what to run in a transaction and what not to run in a transaction? Basically, as I said earlier, you want to figure out how you guarantee the correctness of your application What things always have to be true? The invariants Like your data structure You have a doubly-linked list Pointing like this, okay? Whenever you go to– If you start in a good state, you know– At this point in my code, I’m in a good state All of my invariants are true At this point, I’m at a good state again All my invariants are true But in between, I might have to temporarily inviolate an invariant I have to, like, delete this pointer, point it somewhere else Uh-oh Then I have to point that pointer back to there Ah, now we’re good again That, from a good state to a good state, that should be in a transaction Yeah Very good question And there’s a lot of literature on that You go look at an undergraduate database’s book, they’ll probably say a lot about that Sorry. Sir man: Sure Let’s say I built my application, and I have various types of objects to find and I’m about to, you know, I develop all my code using regular, in-built Google transactions, and then I want to add– But I know it’s not reliable, because I can’t guarantee that things will all commit across these various operations You know, either all or none So then I add your framework and your distributed transactions How much longer will it take for typical-use case where I’m updating a number of records, inserting some, deleting some You know, what’s the overhead of doing these distributed transactions? That’s one thing Wilkerson: What’s the overhead? I’ve got less than one minute, unfortunately Unless they want to let me go over Is there any chance to go over or not? No, she’s shaking her head man: The other is how intrusive is it to my code? You know, I’ve got my objects to find How much more do I need to add? How hard is it from an engineering viewpoint? I don’t want to worry about the details Wilkerson: I tried to make it as easy to use as I could If you need transactions to maintain the correctness of your code, there’s no–I don’t, like, you have to use them So there’s sort of a– Not using them is a bit of a red herring It’s like you’re waiting for a disaster And in terms of the performance, I’ve got 20 seconds or so You know, for every read, we actually do two reads For every write, we actually do three writes And if you have a read and a write of the same object, you get to save one of those reads So if you read and write the same object, they have to do three writes and one read So, yeah There’s a constant multiple on how many reads and writes are going on, but it’s as small as we can get it I’m pretty convinced myself that there is no shorter way to really do it And now I’m over. Okay The nice lady in the back is telling me– No? One more? Okay man: I was just wondering if there was any resource constraints on the distributed transaction table Like, can you have a limited number open at any given time, or is there a rate of insert limit like there are on other tables? Wilkerson: It’s gonna be– I mean, a distributed transaction object is just an object like any other object, and it has any limits that Google App Engine has decided to put on the number of objects you can have man: So like any other entity, basically Wilkerson: It’s one more entity, yes And they do go away So they are increasing the amount of resource usage of your app, but when the transaction completes and the client code reads it and says, “Yep, that’s complete Done. Delete that thing”, it shouldn’t be using– There shouldn’t be any garbage There’s one very rare case, where I’m trying to get rid of the– It’s an extremely rare case where you might get a little bit of garbage All right It looks like I’m out of time, right, ma’am? Yes, I am. Thank you [applause]