Dropping ACID: Architecting with Eventual Consistency in the Cloud

Just another WordPress site

Dropping ACID: Architecting with Eventual Consistency in the Cloud

well good afternoon everybody my name is Jason Bloomberg I’m a president of zap think we are de velde that thing’s been around for about a dozen years industry analyst firm became a training firm focusing on agile approaches to architecture so we were talking about so and so is the hot topic increasingly talking about enterprise architecture and cloud computing acquired last year by develop technology is the US government contractor but I understood a guy I get to travel the world speaking at conferences or buddy else is stuck in DC so they’re all jealous I’m here so I’m thinking it’s 12 years of existence it really pretty much been focused on the enterprise right big organization whether it’s public or private sectors now the US government is part of our focus but we talk Enterprise all over the world and we’ve talked many many different enterprises and we found one universal truth they’re all completely screwed up you might if you work for a larger organization you I think your company is uniquely pathological you know but absolutely reality bureaucracy and strange ways of spending money and making decisions and of course incompetent people throughout the organization are just coming to what it means to be an enterprise so here we are in Silicon Valley it will show so my sense is that only some of you may actually work for a larger organization how many people here work for over consult for software under Oh fewer than I thought okay so maybe about 10% how many of you work for a consult for what you might call a web of company a company whose business is really based on the Internet ok so how many of you would work for consult for enterprise large organization public by the center ok so it’s interesting maybe it’s an this way the show or maybe it’s just it’s a big attracts the enterprise people either you know either way here’s our here’s sort of how we view the enterprise the dark side okay so most of you in the room for know I’m talking about so what do you have an enterprise IT well you have a legacy right big monolithic app sometimes they’re old sometimes they’re not but they’re big and expensive difficult to to scale difficult to work with they tend to run in single partitions but not you know they they run on big servers it’s difficult to scale them if you have a transactionality in a traditional database right the the old world the sequel databases that are have asset transaction elodie can be very expensive and they have a single point of control which sort of goes without saying but if they’re in a large organization in your IT shop obviously you’re going to control that application so that’s sort of the starting point but what’s happening in the world here is this whole picture is changing and cloud computing in particular is is basically forcing enterprises to rethink this this dark side model so let’s go look at your applications in the enterprise so here you go I know you have one like this spaghetti company buddy I’m just beginning good yeah you know what admit it right but there’s a lot of it out there right so older applications maybe old obsolete code or code that’s been tweaked over so many years by so many people nobody really knows how it works or maybe you lost the source code or lots of documentation or maybe there wasn’t documentation or maybe the documentation all in Italian that’s all and nobody speaks Italian that’s happening this happens too so what we’re going to do it we’re late take this and we’re going to put in the cloud all right that’s gonna fix all the problems the CIO they wrote about cloud and you know Businessweek or something oh we got to move it to the glass but and clean up all our legacy issues it’s going to run faster it’s gonna be a little elastic okay so here we are Zab thinks secret technique for moving spaghetti code into the cloud right there we got in the cloud okay well unfortunately there’s more to it than that that’s really what this talk is about right you have to think about how that application is architected moving to the cloud requires more than just picking it up in right it’s not just some big server in the sky you have to get into what’s really going on to understand how we can doing this okay so let’s say problem isn’t quite that bad we have a modern distributed application written in Java c-sharp whatever like it’s follows all the object-oriented best practices so you might look something like this and now cio comes along and says let’s put that in the clinic okay so how we’re gonna do that Maryana well even with something like this where it’s not necessarily spaghetti code it follows you know modern programming best practices you still have this problem right the problem is that the cloud requires you to consider certain architectural issues that are unique to the cloud would isolate unique but are characteristic of the cloud and of

course this top being focused on on data we’re going to be focusing on that part of the story okay so this is an important point right if you have the existing applications yet in your legacy environment enterprise environment and you’re looking to move to the cloud for whatever reason you want better elasticity you want to pay as you go financial model the infrastructure you want leverage the platform as a service to it whatever your motivations are typically that’s going to require you to rethink the architecture for your application but the good news is you’ll end up with a better architected application right the principles that the cloud requires you to follow our prints are good practices in general and we’ll see this as we go through a few within few days ok so here’s sort of what why the cloud why this cloud is requires this level of architectural rethink so here’s an app before this well write traditional distributed application we have our three tiers here our middle application to you are our presentations here okay we know sort of how to deal with this right you know how to scale it cetera and you know with various issues okay what if one from one who this is the class well the problem here is we can put our persistence here in the cloud right there’s a number of cloud-based database vendors or other vendors at the show and they’ll talk about how they can either give you cloud-based persistence services or maybe their applications are tooling running the cloud but essentially we have its elastic right we can scale this same with the application to where we can put the applications here in the cloud we can make that elastic we can make that dynamically provision provision so how many database instances our application system we have well we don’t know it could change from day to day minute to minute hour an hour and that requires a new way of thinking about how we’re going to architect those tiers of course we’re going to be focusing more on this year this time so what are some of the challenges well elasticity is perhaps the most important of the essential characteristics of the cloud right when we say that we want the cloud to be elastic it means whatever resource we’re talking about servers storage etc can’t we get dynamically provision them and we can provision as much of that resources we need in an automated way and if we don’t need that resource anymore we can deeper vision it in an equally rapid automated weapon so this gives us a level of mystery right we don’t know how many instances we’re going to have because the number could change depending upon our capacity requirements and right and so it’s we have this notion of mystery and we also have this illusion of infinite capacity but if you always have enough capacity even if you need more you have this illusion that you have a different quantity well I know there’s no such thing as infinite capacity but it’s a cool illusion and this is part of why we use the term clouds because cloudy is infinitely large but even though we really know it isn’t but but that’s the illusion that if we get any electricity right that’s what we get okay scalability right the way the enterprise way the traditional way is vertical scalability right if we want to make our Oracle database you just scale it then we need to you know add more big machines right and they all typically run in the same cluster so we want to add more resources to individual boxes works well for monolithic applications and that’s the way traditional enterprise apps that were architected of course a new way you care about it I’ve been at this show now a day and a half so every single session talks about more level scalability when I scale out instead of stealing them adding commodity servers as opposed to buying very expensive servers but if you want to add more capacity add more servers so now your applications have to be able to run on multiple servers or each individual box the hardware might be relatively less expensive right it may not be some very expensive piece of hardware so you have to architect your applications accordingly okay fault tolerance another difference with the cloud when should you think about it really isn’t that it’s different instead the cloud shines new light on to fault tolerance best practices that we sort of like to keep in the dark before right so the old way how do we deal with fault tolerance we want to have a highly available system well we can do set up a raid disks right so we’ve done the DeRay of indepent of inexpensive disks that’s one way of you know dealing with hard drive failures we can do mirroring where we have you know two databases that are maintained and you know identical at all times of one goes down you switch to the other and this can give us a level of high availability but essentially it’s within a single partition we’re basically doing it within the context of a single application environment so the cloud way

well we’re not buying really expensive hardware we have commodity Hardware it’s all be hidden from view we don’t really care about the specifics of the hardware and we’re not expecting systems not to now my commodity hardware is suspected to fail right if your Amazon you have how many servers you have found hundreds of thousands right they’re failing all the time Google failing all the time so it’s not a question of a waiting failure it’s a question of responding to failure it is automated web hardware is going to fail servers are going to fail things are going to go belly-up all the time so you want to have an automated way of reacting to failure so wanted to give you note so if a box goes down a new box comes up or new virtual machine instance new storage instance comes up and says well what do i what am i okay here’s what I have those fine said you know what David should I have goes and finds it hey what should I be doing that figures out what it should be doing and it can basically pick up where it left off in an automated weather so we want to be able to provide for a level of they say availability in the cloud where essentially any individual thing can go down the overall application keeps worthy and that’s the tension will remain by basic availability okay this mean this is the cap theorem now it’s interesting talking about the cap through now for about a year this this is the first conference I’ve ever spoken at where I’ve heard all the people actually talk about this so this is this is something I’ve seen three or four times in just the last two days so see me you know this this crowd may be sort of familiar with attempts are getting the sense that this is becoming a more familiar topic for data you know data people I would say memorize data world and even they’re not that many people in heard of it so this is something that’s definitely shifting in the marketplace but sense of what the Kathryn says is that no distributed computing system can guarantee and meaning consistency basic availability and partition tolerance all at the same time you can have any two but you have to give up the third one so this can be a real challenge for an enterprise environment where they’re comfortable with traditional relational databases that essentially guarantee consistency and availability but running a single partition right there not partition tolerant the challenge of the cloud is we want to be highly available or when I basically mobility that is partition tolerance which means we have to give up immediate consistency and that is that is a challenge that we have to deal with a cloud environment so just define these terms basic availability essentially means that individual parts of your infrastructure can fail and the overall application will keep working so every request receives a response about whether it was successful or failed right so even if a box crashes the overall application keeps working so this is important characteristic of the cloud right clearly you wouldn’t want to be in a cloud environment where we’re not in control of the individual nodes because that’s a cloud provider to control of them and if one of them goes down our application crashes well clearly we want to avoid that partition tolerance this one is the it may be the one that is the least familiar essentially when we say assistance petition tolerant it means that the individual nodes can stop communicating with each other there can be a network issue and the overall application will keep working now if a node just fails then that’s an example of it no longer communicating so that sort of included in this but even if all the nodes are up and running but there’s some sort of communication issue where there’s no way these servers over in here to communicate with those over there for a while the overall application you should keep working so this is also something characteristic that we want from a cloud environment but we don’t want to be in a situation where the new Bijal nodes in our in our cloud infrastructure you know some sort of network issue or whatever latency issue we don’t want that to bring down our application remember do if you’re using a public cloud for example you have no visibility into that so the last thing you want is for a year to the application is not working and it turns out it is because there was some you know some you know switch window besides of data center somewhere well obviously that’s important consideration for you okay so what about consistency now consistency this this is a tricky term and it’s even trickier than I thought it was 48 hours ago right because I learned a lot just in the last two days there’s many different flavors of consistency and this could be one of the tricky parts of the consistency stories there’s a lot of theory a lot different types of consistency and it can be very confusing but we’ll talk about a few different kinds here not all of them a few different kinds the first of all talked about is what is traditionally called high availability

consistency so this is something that relation traditional rational databases right the the old sequel databases offer right that is essentially they can guarantee that all use of the data are always the same right that there’s never going to be some user somewhere that sees or two users that see different views of the data at the same time right they can guarantee that and there’s a lot of stuff that has to go on under the covers in order to provide that guarantee the challenge here is that it has to work within a single partition right so if you have multiple partitions then there’s no way this kind of application can guarantee that kind of immediate consistence so to see an acid and actually I learned a bit yesterday there’s more to that the consistency in the acid requirement transactions atomic consistent nicely durable when they say consistency it means that the essentially all of the operations of the database maintain consistency of figures and indices and all of that but at you can still definitely say that a traditional racial database should provide a level of essentially the consistency of the data okay so what about enforced consistency with enforced consistency essentially we’re willing to give up basic availability because we want to be partition tolerance so essentially we’re saying that if we’re they are inconsistent at a point time up we’ll just and so we clean up our act and we do some sort of synchronization steps oh okay we’re done we’re good assistant you didn’t guarantee consistency because if you’re not consistent the consumer of your data whether it’s an user in terms of an application that’s the way people are consistent and that’s one way to enforce consistency so this in this situation is when consensus is important where it’s important that all your notes have to agree with each other or your application can’t work properly but it’s okay if you have to wait around for how long it is and willing to milliseconds until your native are consistent so then we have the notion of immediate versus eventual consistency so immediate consistency when all notes agreed on the same day at the same time you all agree on the day the same time all those are always consistent with each other well this is what we can’t guarantee in an environment that distributed environments that is partition tolerant and has basic availability but we can give you is eventual consistency so an eventual consistency basically what we’re saying is that the data may be state by two different users might see different copies of the day different views of the data at a particular point in time until such time as we can bring them back into persistence the importance consistency we say up you know rounds 1 to 80 fixes with eventual consistency will say well you can read our data it just like you might have to wait around for nothing and don’t or synchronization step and sometimes that might only be a couple of milliseconds later so it may not be a big deal but it’s something you have to think about in the context of whether your application can support that ok so eventually consistency this is sort of the wrench in the works for a traditional enterprise you know the architect of the enterprise is used to traditional systems so how do we ever have you know an ERP system or you know our you know inventory system or whatever enterprise that is if you know we have different you know different you know different versions of the truth at a point time there is no way the Boston goes back right well and we have the eventual consistency has been with us since Babylonian times it’s been around for a long time the Babylonians invented modern account right the double-entry accounting well the double-entry accounting but even if you knew you’d be doing it in clay tablets you closed the books of the entire reporting period so if they every month you bring all your data into a consistent state in the meantime you may have any consistent data right this account might not agree with that account because you have records out of the books here so that’s unfamiliar behavior even before we had computers where the period consistency might be right now instead we’re arguing about milliseconds as well and so that’s the computers rate to us so this is cease familiar for any any process any business process that has an out out advance settlement step so even real time stock trigger what perhaps the most kind sensitive process you might find in an enterprise even real-time stock trading has ended a settlement but if you track of all the trades during the day at end of the day they figured out we’re all money goes like that’s that’s

you know those accounts may not be consistent during the day or mobile home roaming that seemed idea the mobile phone providers have to reconcile all their accounts that might take them a few days before you know one company knows how to build another company when you took your phone okay so if we can’t have acid we can’t have that same kind of high availability immediate consistency then what can we have well we can have base wait so we’re dropping acid and we’re raising our pH right way from mass into base now I didn’t make up these words I’m it’s like if some argue that long disappear it’s like Oh sort of you know Lane kind of based thing have you tried is that but actually the acid acronym was contrived for the first place so anyway whatever it wasn’t my fault I think he’s up they’ve actually been around since before the clap right they spin around you know since the you know why not only days what is base hand for well basic variability soft state and eventual consistency so basically millimeters you like the sort of availability we expect from the cloud hood the eventual consistency stay on the data are ok so at the time and then sauce they haven’t talked about stop states so what’s the deal with saw state well soft state basically means that the state information any node has about some what’s going on somewhere else might be out of it so each node has to know to expire it after a certain amount of time so if you work with caching mechanisms this is a familiar behaviorally capture to expire after certain amount of time but the example I like to use this instant messaging but you can reduce the messaging client get a little smiley face nature bunny is available right but let’s say your buddy goes through the tunnel pops off the network well the buddies phone wasn’t able to send you an unavailable message so your phone doesn’t know that your buddies on there now see what see little smiley bits roll out even though they’re not available but of course your phone is smart enough to know that if it doesn’t get that information refreshed after certain amount of time then it’s expired and your buddy will go unavailable so this is familiar familiar behavior for any kind of caching mechanism or anything that the communication between nodes is a hundred percent reliable which in the real world is a very common situation it’s sort of the enterprise context where we’re assuming the network is always working is a bit artificial the real world it’s not like that right in a world we can always have between those and that’s essentially what it means to be partition Ptolemy right so there’s always a trade-off between partition tolerance hi you know it’s a high availability and median consistency so one of the requirements that eventual consistency gives you is that your data may be inconsistent we have still data until such time as you can synchronize your data so synchronization but synchronization takes place after the fact but if you require synchronization before your data are available that’s the fourth consistency we’re talking about here is synchronizing out of man you know there may be a motor to the data are at the same moment or two might be milliseconds depending on your technology well this is really that map well it’s a fairly big anger again but if you’ve ever seen your phone’s calendar with your desktop calendar with your company’s server and calendar right it’s a familiar behavior sometimes you know you make an appointment stick it on your calendar might take a couple minutes work appear on your phone of course if your phone’s turned off it’ll take more than that but it’s a familiar baby so what some of the vendors at the show are talking about is because they work in a cloud environment because of course always saleable they have to perform a synchronization step because they want to guarantee high availability they have people eventually consistency so they have to perform a synchronization or replication step and then they’re spending all their effort you know as fast and this stable as possible other vendors are hope it’s going to different part of the story okay so for the enterprise developer the architects the enterprise application architect about looking at your existing applications and saying well I moved issues the cloud I want it to be elastic which means it has to be partition tolerance so how do I deal with this requirement for eventual consistency are you ready for inconsistent data that’s the key question so for example it’s taking inventory application what they do while they keep track of how many wait two three mentors right they did it right so two users query database at one time

they might get different numbers open system it’s all about so that could be a problem if somebody places an order for an item that they think is available they pay their money and then the system gives them the error says sorry we took your money but we can’t give you the product please call our customer service nobody wants that right so what’s the desired name well I mean depends on you have to think about it right one example might be when I get closer to threshold and you give the user a message saying well we’ll attend to reserve this product we won’t charge your credit card until we can guarantee five minutes later the good email saying yes you got your product or sardines on the stock but the cards will be charged if they got the problem so what this means is if he has some legacy inventory application we have to make adjustments to the business logic in that application maybe back to the beginning though so that can be challenged but again it depends on your situation so it’s not these problems are solvable but the real point here is you have to think about what it will take to take some of the existing legacy app and put it in the cloud and you have to think about what are the consistency issues that partition tolerance introduces how in it what what are my what my priorities the cap theorem is of the earth it’s mathematically proven you can’t get around it what we can do though is we can move the priorities around right so what what are our priorities how am I going to make the consistency step more or a more rapid or how much do I need or what those so to leverage the cloud essentially have to think elastic and we’ve talked a lot to different people and this is perhaps the hardest part about really understanding what the cloud is all about but you can’t think of the cloud as just a virtual server or original database where the whole story where you have to think about it as elastic versions of this right how many nodes there are to support that storage engine for that server instance or whatever it is can change over time in Oahu now well that impacts how you actually architect your application okay well anyway Emily says its name you call the Piazza right there in a cloud environment and they put together this this small you know a small app that randomly kills processes and services in the property market people on the coast right because the whole point is that the whole application there’s a whole Netflix application hacks to stay up even if a random service random you know process and whatever crashes and burns so they actually put this in French so can you work legacy application environment withstand your catchment if not it may not be fully made with the cloud okay so you win from the dark side to the light side now of course purists will say this is still the dark side because the Lucas switched out a picture of here but anyway but the light side like the world of web scale now the challenge here we have the enterprise context and we have the cloud but the enterprise wants to do all the top stuff and they’re being forced whether they like it or not to think about things the way the world of web based applications have been working now for what yet right so Google Amazon eBay all these companies have to do this about and now is under the enterprise again when the program right they have the one who the faculty got to figure out how to build and manage an architect apps that can take advantage of these best practices that the eBay’s and Amazon’s the world have figured out now generation so how can he do based applications not just web interfaces but leveraging hypermedia to be the core of what your applications are partition tolerance base nationality resilience you might something goes down it comes back up is to be a capability of any application and the noticing point in control but if you think about you know a lot of these app store based applications they’re many different pieces of work together the pieces are provided by different people there’s no single point of control was that’s quite alien enterprise great thing the last thing Enterprise wants as an application in their environment they don’t control but that’s a whole new way of thinking so do the posters they’re gonna be composted as a pile of imagine if there’s any extras a leave the month of literature table after the so that’s

what makes what we’re talking about in the poster right our vision for enterprise IT 2020 only basically is not just oh we’re going to be service we’re gonna be in the cloud or when you use mobile computing but all those things are more like many different trends of interrelated very complex web of changes that are coming where they’re in products is not just in the future we’re in progress now so this is a part of the story right and you know you hear this at this conference all about holy ways of thinking about dealing with information and about dealing with scalability about providing business value about building applications my whole new world that the enterprise is only it’s really getting an idea that there’s this whole world out there and they can move to environmentally moves a bit cloud they can be hypermedia driven but it requires rethinking when it means to build applications and to deal with information and it’s just it’s changing so many different factors in the enterprise today so it’s an exciting Communist it’s in the same speech different audiences because everybody’s seasonal wet right if you’re coming from morning to evening for 20 years you look at this to say well why would you be any other way right yesterday and I’m wondering is the restful principles part of what you’re talking about here oh so yeah those those fighting words when I’m finding and I’ve been doing research on rest now for a number of years is that most people completely misunderstand this but there’s like two versions of Reps there is the version that one fielding laid out in his dissertation and there’s a version that it seems like everybody seems to think web steps but it’s just not yet so I didn’t go to this talks I don’t know which one what I’m gonna be not a writer he was got it wrong so the common misconception is at rest is about a be honest okay that if you want to build you have uniform interface you crazy dress away the eyes and there’s some things a lot of value there right that’s a key part of the story you know it’s uniform interface gives us an additional it’s coupling that web services promised deliverer and that’s important part of the story but the way we see it for coming in the architecture perspective is rest about building buildings and any noise are the border for the Rex as I wanted to speak building but I’m sure no and you’re missing the big picture the big picture of grasses rest is for building distributed hypermedia applications like the World Wide Web feel them look at the web that said this is cool right its immense nobody’s in charge its enormous ly resilient right has all these great properties and would then be greatly missed all those properties to the sit at the core of potential principles we can apply to a whole class of striated applications and that’s what he was talking about when you talk about distributed hypermedia applications so best stands for representational state transfer transferring application state to the client in the form of representations that gets missed in a lot of discussions at rest separating out the feed and reach from application resource name only the resource state gets stored the persistence tier the patient state is transferred to the client and the point is that we’re building distributed hypermedia applications where the API restful uniform interface API supports that this is all part of the story and also visited cloud story because we need to be able to scale the middle tier maintain state information we could put in the database when that introduces issues although at least one of the minutes to show is addressing that issue can we maintain application state in the database and how do we scale that that’s not a restful approach it’s not primitive approach a restful coach is let’s maintain application state on the client and what is that McKinlay and how do we support that in the context of hypermedia application so I don’t answers your question but I need just be going about that it’s several different points that are yeah it’s a whole pocket itself you now give that one other conferences but this is a data center conference well yeah I

think he Jason was just pointing out that there are different approaches and the restful approach if people can understand it and get to it would seem to be the standard approach well it’s the web the standard puts on the web yeah right but it’s it’s unfamiliar the context of enterprise apps right so we don’t want to be tasty information here because we want this to be able to replace it so get some of the majors at this show do very well because we have you know all these new ways of storing information that decisions here now include store state information there but that’s not restful the rest approach is moving here well now we have to worry about security other issues but that’s what it means to build anybody else well thank you very much