Wall St. Derivative Risk Solutions Using Geode

Just another WordPress site

Wall St. Derivative Risk Solutions Using Geode

I have Andre Lodge Val I build risk systems for banks I’ve worked for four banks a stock exchange a major pension fund building risk capital markets trading and operational you name it I’ve done it my favorite topic i’ve been using gemfire to do this for about 10 years this actually probably makes me one of the older customers in the the entire roster I’ve done this at several different firms as well which is kind of neat so you’ll pardon me for using the speaking notes here I had a lot of stuff I wanted to cover so what I wanted to talk to you about today was building Wall Street Live derivative risk systems using geode I’m sure you know this but gemfire has been an absolutely core blue-chip go to part of a lot of the risk landscape on Wall Street for quite a long time now at dinner the other night we were talking about this and we realized that the people who build these systems tend to be a very small circle a tribe if you like and the the knowledge of how to do this is not something that is published in a book or you know you can go look up you really learn this by doing it and so it is sort of tribal knowledge in some sense so what I’m hoping today is to invite you all into the tribe by walking through some white boards that explain how we actually build this stuff it’s a good drive we don’t have a tattoo it’s pretty painless to join so I hope you’re willing to take the adventure with me so woof there it is we could have that built by lunch tomorrow right nice and easy so this is actually the simple schematic for most of big-time wall street’s live risk systems let me tell you a little bit about what’s on the board here the the inbound side we have trades so obviously we trade things we have some market data that’s things like interest rates price quotes represents the market we have a risk calculation engine this is where things get kind of interesting in this space in order to figure out the value or to figure out the risk and a derivative typically you hire a whole bunch of physics PhDs who go away for a long time and come back with a really slick math library that takes thousands of computers in a grid to run effectively so that little box there on the bottom is that thing and we’re going to integrate that that’s pretty cool the last thing risk comes flying out excellent that can go to any number of things buses application spreadsheets you name it so our task here basically is to integrate these things with geode in order to produce the solution sounds good everybody’s in the tribe so far rock on all right so let’s take a sec here and understand what are we trying to accomplish from a business perspective the pun is intended I thought a crash course was kind of funny the if you think about trading on Wall Street these days and I think some of you have SAT through a couple of other presentations that involved wall street style solutions there are a few important business trends that are changing the way people try to manage risk in these businesses trading 20 years ago when I started in this business trading was organized by product so you had one system one product one guy it was really easy because it was all on one screen that’s been turned on its side now trading is organized by market so we have a rates desk a commodities group a credit group each of these groups trades a whole lack of different products the products themselves have become very specialized so the trading systems have become very fragmented and specialized a good Wall Street shop might have 50 different looking points now its enormous on top of that risk has become a lot more sophisticated we try to measure a lot more stuff and we try to measure it in different ways we try to aggregate it in all kinds of fine and coarse ways so the sum total of all this is that the good old days when one guy with one screen Cadoc run a good business are over and have been over for a long time what we do instead now is we need to build a system that collates all of this into one view so that your group that trades rates can have some understanding of what they’re doing collectively it’s basically ushered in the era of externalized risk systems so and nobody here in the audience is a PhD in physics awesome so I’m going to totally simplify this in a way that would drive those people crazy let’s just define risk for today as

meaning how much money do I make or lose if the market moves a little bit so kind of market risk the solution that I’m talking to you about though works equally well for all sorts of other sophisticated calculated risks like CVA VAR liquidity etc so in the abstract to my talk I promise to explain all the business benefits of building this solution I’m not really going to go through these in a ton of detail but here are all the things that don’t work so do not go home and try to do this one what’s interesting about all of these solutions and the reason they don’t work is they tend to rely on duplicating things so if I have 10 trading systems why don’t I try to book all the trades from all 10 all in each other so that that one guy with one screen in one system can suddenly see everything again you can imagine what a colossal mass that causes when accounting and clearing come into the picture you have to undo this giant mess so all the bad solutions have that in common they create chaos downstream when you try to implement them again that’s why we’re going to go and externalize and build a new system that does all of this together all right everybody’s convinced now that we want to do this right let’s do this so we have the things that appeared in our environment our trading systems or market data our calculation engine our grid in other words our task is to sort of fill in the middle ng out here in order to complete our solution so I would suggest that we try it in this order we need to think a little bit about the data objects that we’re going to model so trades market data some other stuff we need to think about the region’s NGO that we’re going to put them in and then think a little harder about that in order to try to make the whole thing efficient we’re going to wire up our incoming data that sounds really easy just plug it in right and then we’re going to do the really interesting hard part connect the computer it that is below it so one of the interesting things about gemfire is that it is amazing at this particular task I think the reason it has become such a blue-chip solution is that I can’t think of another way to do this easily so that part should be the interesting part ash I hope the whole thing is interesting but that part will be a highlight out so whiteboards the first thing that we’re going to notice when we try to integrate all this is that we have a lot of different languages and operating systems kicking around here you know gemfire geode itself is of course written in Java our math library and the compute grid are probably c sharp or c++ and they’re probably running on windows gemfire itself is probably sitting on Linux there are probably clients that are going to consume this that are both web pages java apps c-sharp apps on people’s desktops we have a huge mishmash of things that we need to somehow connect up here so this is where again geo gemfire really scores pdx serialization which I know is not the most i guess high-profile part of this is the glue that makes all this stuff come together easily so what we’re going to do is we’re going to serialize everything in pdx and then we’re going to use the client libraries for geode to swap those data elements back and forth between the C++ C sharp Java components seamlessly there is a small trade off to this the trade-off is that your data model should be buildable using only primitive data elements so try to avoid the language specific fancy stuff just stick with the simple ones if you’re thinking the not really a restriction that I think is commercially viable I offer you cloudera in Palo which has the identical restriction it is done fairly well for Cloudera so not as bad as you think let’s now that we know we’re going to do some pdx let’s think about how do we design and name these objects I love technology but I love what you can do with technology even more and I think some of the very best tricks you learn in this business are actually social engineering solutions to technology problems so I’ve brought you a couple of good tips here on how to actually think about your data objects but I’ve also brought you what I think is probably the most golden tip of all which is named

the objects in a way that people can guess so we’ve had lots of discussion at the cab yesterday and through some of our presentations today about searching and aggregating and how do i roll things up forgot it just let people guess the thing they can find it in one shot so this dot notation which nobody wants the trademark to is a common way of naming the objects so this is your hash key when you do a put in a way that business guys can just guess and retrieve nobody has ever asked me to build a search engine for gemfire because they know where everything is already 50 dandy a couple of other important builder tips here your trade object has got two key types of data it’s got descriptive data you know this is things like the currency or the counterparty this is data that goes downstream for aggregation its data that human beings need to be able to read it’s also got a huge whack of things called model parameters so this is all sorts of gory details that only the compute engine cares about so doesn’t have to be human readable if you try to make it human readable by giving it nice fancy names you will burn tons of effort reformatting it constantly for the compute engine so my best advice is just take the thing exactly as defined by the compute engine and repeat it you might even consider creating trade and trade for the compute engine objects separately that split this data apart I see slotting in the audience I should mention to this is not a sermon if you have a question or a comment just throw it at me in the middle please all right next up region design so we have some objects now we’re going to go and lay out how we put them into gemfire why is this important well this is important because you’re balancing a fundamental size versus speed type of equation here so if you want things to be really fast to retrieve you have one region and the region would have objects called everything on earth in them and that would have a thousands of objects and one get would get you everything and it would be huge and bloated and slow you know the reverse is that you split objects into all kinds of little pieces and create regions for each of them probably also not so good because you’re constantly fetching pieces back there is a balance somewhere here that you want to strike and the way I’ve drawn this is the best balance I found over time have a trades region but just one encapsulate everything in it have a curves region for your market data if you’re able to also cash these zero curves this is a physics thing don’t ask me do that having little regions for foreign exchange rates there’s only about 25 of those that matter but you access them constantly and another one for rate fixings if that’s what you need for your computation again very low number very high access all right I feel like we’re almost halfway done here so the next thing now that we’ve got these regions how are we going to physically organize them on our cluster one of the great lessons I have learned over time is that it is more effective to change the physical storage of the data that goes into your risk solution so bend it into different shapes reformat things can cat ate or split files that is so much more effective in terms of optimizing performance than actually spending your time trying to optimize the performance in code you can do some dramatic things how this applies to gemfire gemfire will of course partition regions it will replicate regions so you have the opportunity to control the split or to just repeat things all over the place in our situation here we have trades that’s kind of our primary event when we try to compute the risk for a trade we need some market data so it would be great if we could have the market data that applies to every trade kind of co-locate with the trades so that’s what we’re trying to achieve here there is a many trades to one market data relationship the first thing that will probably come to mind is why don’t I let gemfire just spray the trades all over the grid and then I’ll repeat all the market data on every single node that works sometimes it works in a lot of cases to be honest it does not work at all at all if you’re

trading credit which is an important use case for this so if you’re trading credit default swaps for example the strategy blows up on you big the master version of the strategy is trying to find a business term to partition with so industry geography trading desk maybe currency curve find a business term that you can partition the trades with relatively evenly relatively and then let the market data follow it that works well things like foreign exchange rates they’re really teeny-tiny just replicate them everywhere you know put one in every coffee cup island on every grid node all right next up how do we get the trade data I thought it was absolutely fascinating listening to some of the people who spoke earlier that the envision piece in particular is a massive engine that’s trying to gather all the trades together for settlement so I was going to tell you about how to build a really fancy trade broker engine to gather all the data my new answer is just get copy of envision that seems to do the trick even better than anything we could build but it does illustrate the complexity of this problem you’ve got 50 booking points you’ve got a lot of trade sources you’ve got multiple message formats you need to somehow ingest all of this stuff what tends to happen is you tend to buy an application server and you tend to use that as a message broker to try to mash everything down into your pdx model so that you can push it into gemfire that’s the best practice I I’ve occasionally thought it would be awesome to have jump fires and how directly consumed this stuff but it almost never works that way but envisioned by plugging this good product that would solve this very well too so we have our trade data and John fire now we’re starting to fill it up this is all good we need to get the market data market data is a really unusual beast it is either overwhelmingly fast foreign exchange tix at 70,000 quotes per second I believe the current tick rate for u.s treasury bonds is around a thousand takes a second so this is a thousand price quotes a second amazingly high volume in order to get this kind of volume it usually is delivered to you in these very compressed quirky proprietary formats and just as much as that kind of data is ridiculously fast it is also remarkably slow most of the really important interest rates that you care about for risk or set once a day so LIBOR bang lunchtime in London that’s fixed that’s all the updates today and meanwhile these foreign exchange quotes are pelting you endlessly so the first thought you would have is how do I build something that arbitrage is the speed of these two market data feeds this is again we’re a little bit of business engineering comes in very handy it turns out that the really really fast data just jitters around but stays relatively constant once you kind of backup and squint at it a little bit it’s actually the slow data that drives the changes to risk members that you care about so the solution i’ve shown you here in the best practice that I think you’ll find works is sample the really fast data on the pace that’s dictated by the slow data and then push those changes in to your solution just periodically so what you’ll get in real life is maybe a start of day maybe a mid day fixing and then an end of day perhaps one for geography but that still only gets you up to around 3 4 5 that’s very very manageable all right here’s where things get really interesting in my humble opinion so the solution is full of data we got events let’s crunch some numbers right so the grid that we’re going to try to crunch these numbers on is typically data synapse platform symphony maybe windows HPC and it’s probably running maybe your ex and it’s probably sitting on a big compute farm that is shared in the organization so our task is to take the trade events gather the relevant market data and push these things out to the compute bread so that it can go and run all the math of the physicists designed and then give us back some interesting numbers that represent risk key tip here make the outbound flow asynchronous so gemfire has both continuous queries and asynchronous event cues this is a

definite aeq moment why do you want this to be asynchronous because the grid is one shared so you don’t control the workload a lot of the time but even if you own the grid most products that you’re going to try to compute risk for boom done 10 milliseconds there are a few diabolical ones that survive the credit crisis called exotic derivatives that might run for a minute 23 you don’t want one of those things blocking the flow of events so cue the may synchronously on the other hand you would like all of the grid nodes to put the results back into gemfire synchronously the grid is stateless that means as soon as the thing releases on the computation gone right so you want to make sure you capture the number nothing makes people angry or in my experience than missing the risk on a trade the other thing that’s kind of interesting about this is that if you have two three hundred blades in your compute farm you’ll end up with two or three hundred writers pushing data into gemfire gemfire is really really good at ingesting from a large number of writers concurrently it’s actually more efficient than having one really high volume writer trying to push back to it so asynchronous out and just one stream synchronous back as wide as possible oh that is a smoking good question thank you for that the question was is I’m going to paraphrase it rather than repeat it the question is is gemfire over top of the compute grid or is it separate effectively one of the great use cases for Jem fire without respect to this is as a cash on a computer read so it’s quite reasonable that there will be a gem fire distributed fabric floating over top of the grid just being used to cash results by the math library you can have a separate one for our solution if you can use the local caching mechanisms in gemfire to take the data out of our cloud and hold it over top of the compute grid that is even better still i think i mentioned that is caching locally it’s hard for me to read through the bad the eyes in the back of my head that is a master class all by itself and getting that thing to work well but getting it to work well is the difference between your solution working at JP Morgan and Goldman Sachs so well worth it that sort of answer your question thank you alright so as new members of the tribe of people who know how to do this I have a mission for you for years and years and years we have been doing this with external computer it’s no problem works well we’re all very happy back in 2010 stock which is a sort of Wall Street think-tank standards group published a really interesting paper in which they tried to measure how much faster would a book repricing run if all the math was in gemfire versus gemfire was pushing the math out to a compute grid so this is kind of you know bring the compute to data that you hear about in the the hadoo big data space all the time except this was in 2010 so measured with gemfire 6 perhaps regardless the answer was it is drastically faster like not a little but drastically I think the number is 76 times so would you like a seven thousand six hundred percent improvement in the performance of your solution you bet you would you’d like two of those if possible so why doesn’t everybody just do this I mean it would be so efficient you’d have less hardware the problem is that math library that the physicists gave us they wrote it in C++ gemfire unfortunately requires the library to be in Java to execute and that’s actually quite uncommon in the world of Wall Street risk however there is hope as the what I would say as the solution has become democratized in the sense that it’s now affordable at Super Regionals pension funds etc you’re going to run into firms that don’t own a computer read that are using perhaps vin cad in canada or numerics here that a math library you could actually put into gemfire itself if you can do this you get a much more compact lower cost high utilization solution it’s a really good thing what would be different what would be really different is the event flow

would be synchronous you would just use region sort cash listeners and functions to propagate the computation between the regions so the total throughput would also accelerate dramatically as measured portrayed anyway one of you has to do this in the event you do my email address was in the thing tell me how it turns out oh come on where’s your idealism all right so famously Verner von Braun who invented the v2 rocket said that my job is to put the Rockets in the air after that somebody else’s problem so in that spirit my job is to crunch the risk numbers where they go after that is somebody else’s problem not some ideas for you to help complete your solution there are a whole bunch of ways to may take advantage of this now taking risk data that you have piling up within geode you can get a heavy client application to issue a continuous query in order to to take the results out that’s very common works nicely you can try pushing them to web pages with Kaplan Liberator there are some great kits that I’ve listed here it’s interesting that these kits exist almost entirely to consume this type of data that tells you have prevalent the solution with gemfire is but there are lots of options and of course as everybody on Wall Street knows it doesn’t matter how much money you spend on this beautiful heavyweight client application that takes the risk some trader is going to say can’t you just put in Excel so yeah right a plug-in right an issue a continuous query you’re rocking with Excel and why did I tell you way back when to make really easily guest user friendly keys for the data because the trader using Excel is going to use that to pull the data out everybody likes happy traders with Excel and data they can find all right that is that the fastest 20 minutes and systems building