Oracle Midlands #9: Oracle Distributed Transactions – Joel Goodman

Just another WordPress site

Oracle Midlands #9: Oracle Distributed Transactions – Joel Goodman

good evening everyone hope you didn’t have too much hair growth in the traffic took me quite a while to get here from New Street actually earlier half an hour I don’t know what the distances but I’m here to go over something which is kind of a lost art now and I’ve spoke about this at a few conferences recently I wrote this a while ago to presented it and then you know you move on to other presentations and then I realized that sometimes you have to go back because people either forget or new people come along and this is the problem of how oracle manages distributed transactions and the kind of problems you might have and just for people who don’t know me that’s my email and my blog my main rolled in oracle these days is working on certification I’m the head of the development for DBA certification exams from a technical point of view and I also do curriculum work speak at conferences and so on but this is what we’ll be doing tonight we’ll be talking about distributed versus remote transactions and query so therefore concepts altogether remote queries remote transactions distributed queries distributed transactions what the differences are we’ll also be looking at the big problem which is two phase commit and failure during two phase commit which might occur and of course since you have to be very lucky to catch a failure during two phase committing you may want to practice this also how you can go about practicing recovery just like you have a backup and recovery strategy for things generally every DBA should have a backup and recovery strategy for this if you use DB links and you have distributed transactions so just throwing out of curiosity does anyone have distributed transactions where you have updates that occur in transactions in more than one database within the same well in that case that nest must be part of your recovery strategy unfortunately in my experience this is not the case it’s one of these things that people just neglect okay recovering loss of a data file also control file but if you have a failure doing two-phase commit there may be problems so we’ll teach you how to do this we’ll look about manual in doubt resolution and what the implications are for all of that so of course distributed databases is one of the things that permit some of the ideal implementation that can’t date wrote about location independent way back when the ability to put data where you want and make it independent and DB links in Oracle allow us to do that you can have your DB links you can have your public or private synonyms so that no one who is writing the code even necessarily has to know where objects reside of course if you look at plans you might see serial from remote something like this in your execution plans for those people who look at that but argyll provides two ways to do this we’re focusing on Oracle databases not using heterogeneous services of we have what used to be known years and years ago as gateways which then became known in oracle8i as heterogeneous service gateways and we won’t be focusing on those although you can still set up DB links to a gateway and access things remotely from db2 or some other database as well so a distributed database has these attributes it has a degree of sight independence location independence possibly fragmentation independence you can have data joined from one place to another so you’re basically having a vertical parts of data in different places as well you can have replication independence you might have data replicated from one place to another with various technologies as well but underlying whether or not you use oracle streams or whether you use the old replication in the old days of anything that’s not based on redo logs yeah entirely was using database links in some way or another and to some extent if you want real-time transactional distributed updates and distributed queries you need to have or remote updates or mode queries you need to use database links and there are many different types of database links there were private database thinks or a public database think there are shared database links there is fix user database links or connected user database links the assumption is people have familiarity with DB links and you can have a DB link that says connect to so that no matter who uses it but always connects to the remote database is the same user or by default if you don’t have it connect to it will connect us or attempt to connect anyway as the user who logged into the total local system so there’s a bit of administration that needs to be done for DB links which is really just a prerequisite for this talk there are some other things that you get with

distributed databases obviously you can have different platforms you have different operating systems all the things you can see and read on the slide and therefore you can join up things very very nicely it gives you DBMS independence as well via heterogeneous services so lots and lots of things can be independent by setting up a distributed environment so some definitions a remote query X tributed query we’re going to go through some of these so look through this list some of you might recognize or no the definitions of some of these and maybe not others like transaction branches when do we do single phase one do we do to face commits I’m in doubt resolution that so our goal is by the time this talk is over you’ll have a much better comfort level with these as well as some of the things that can come and really get you if you don’t prepare for them so there’s an example of creating a database link that’s a fixed user DB linked by the way somebody is creating a DB link connect as HR HR using a TNS names called angel and then you can do some kind of query like that you should know but I’ll repeat it anyway just in case if you set by default um the DB link needs to match the global DB name of the remote database this is to as a check against spoofing yeah or being spoofed you can set that to false so you don’t need to match it but most people will leave it on so that you have to have global do meanings too true in order to make sure that when you set up a DB link the name of the link in your local database matches the global DB name of the remote database and each database in your environment should have a unique global VV name so that every single database you know which one it is and there’s no possibility of accidentally connecting to a test system instead of a live one or vice versa and then losing your work so we’ll start with the simplest thing what’s a remote query a remote query occurs when somebody connects either via client-server or middle tier there’s a session on a database instance and then over a DB link we query from a remote database and all the data we get returned is returned from the remote environment so we don’t get anything locally and that remote query could be a single table query could be a joint it could be a correlated subquery it can involve multiple tables but they’re all remote yeah they’re not distributed there in one database but it’s remote so you’re not accessing anything locally you’re just going through your local instance and then onwards via a DB link to access something remotely so the local instance will send the entire query it doesn’t have to do any decomposition and part time we recognize that everything is remote in a particular database so we just send the whole query the query will get hard parts down at the remote database as well and be in the library cash there it executes over there it returns resource and if you look at the execution plan locally you’re going to see something like serial from remote and we’re just accessing all the data from the remote database so as you can see can access one more tables or views and you might use synonyms or maybe not that you could have your table at the e-link explicitly defined in the local database in the query that’s being used and the DB link could be any of these DB links that I’ve mentioned public/private single um single user whatever the key thing from an admin point of view Z could also get errors back for example if you have a connected user DB link so Fred logs in the local and you’re using a DB link that doesn’t have a connect using there has to be a friend user on the remote system as well with the same password otherwise not going to work ok and the remote is it just returns whatever data it doesn’t get filtered out or removed from joint if you’re doing a distributed remote query that to join so a remote query can join 10 different tables but they’re all on the same database that’s what quite distinct from a distributed query a distributed query involves at least two databases returning rose it could be the local database to which your instance or your client is connected to its instance and one remote database instance and remote database or it’s conceivable that you connect locally to a local database instance and you query and you get data from two different remote databases via there is another 12 different DB links are used and you’re accessing data that way that becomes a distributed query so it’s a joint if you have a daisy chain meaning database a has a DB link to

finding it’s used by the client connected to a so we access something from be so we have a server to server session between a session on database a instance to database B’s instance but then inside database be it says all that thing’s actually remote in see that is not considered distributed it’s still remote because C is not visible to a C is only visible to be so from the point of view of a it’s performing a remote query to be and from the point of view of B it’s simply performing a remote query to see and then I mean theoretically you can daisy-chain you know till the cows come home but it’s not that good to design something that way what you don’t want to end up is back where you started but to be a distributed query requires that two different databases are returning rose so typically you get a joint like this so you have a DB link from A to B and D be linked from A to C they don’t have to be the same type of link necessarily and we get to join or possibly we just have a link are made we join rose from a two rows from B or all three of them we have a B and C a is local B and C a remote and we have a three-way joint and this is where some performance things come in that you might want to bear in mind does anyone have this by the way have distributed queries the way I’ve just described just curiosity yes you do how many different database is involved in one query three okay here for okay here here’s here’s an interesting thing but provide oracle ada prior to oracle8i the way it used to work is all the remote instances would return their rose to the local instance and the local instance would join them but this isn’t always good for business and let’s imagine i have a b c and d a is local and the returns a million rows and B returns 500,000 rose and a returns 5000 rose locally you know if we and see returns whatever it is and we if we send all this stuff back to a and then join them and throw ninety-eight percent of this away there’s a lot of data being sent over the network and then removed when you join them so there was a hint that was added an Oracle a delightful driving site which you can use some of you may have heard of it and what you can do in theory is say well this is where i’d like to join to be done in the example i gave if you know that you’re gonna have a million rows from d you do a driving site d so all we have a distributed query we have to decompose the query and from the local if that’s part of the query is sent over the DB link to each remote database to execute the part of the query that they can do and the local database instance execute the bit that’s local assuming there is any and then we may have a situation where d gets a million rows but a sense its results over to D and B sends its results to a witch send them to D like that and then we join where most of the rows are and then we send back the results to a of course it’s not always straightforward for example if we had a million rows passing through the filters in D and six hundred thousand rows in be seventy six hundred thousand rows back to a and then back from a over to 2d isn’t very good that’s more than a million rows so there’s some knowledge that you need to have as a developer let’s say about if you’re going to use the driving site why you’d want to do it Oracle is meant to try to work this out but you can’t always rely on the statistics you know with by doing remote describes and remote stats to work out well where is a good place to drive such a query but it’s always better if you know where the data is that you specified and driving site okay so that’s a distributed query in the next definition is a remote transaction so the question yeah what are you doing obviously you can kind of testing 21 and figure out which is the best sort of cyclist but is there any kind of 41 Tibetan rule of thumb we’re above so many orders of magnitude it really starts making a lot of sense is you know I don’t think so because also data volumes can change it could be skill I can skew over time the ones that’s one of the problems like the driving site hint of course is what drives well known in a drive well tomorrow but I that that’s the purpose of the driving side hand is to give you the option of saying this is where I want to join to be done and it’s particular for remote I sorry for distributed queries that’s its purpose but I can’t give you a good rule of thumb there isn’t one and if I did things can change anyway it has magically yes Oracle well you know what

Oracle buzzes in addition to the remote described you need to do when you have these things it can get remotes that find the statistics assuming that they’re reasonably accurate and try to work after itself where to drive it the driving site hint is designed to try to give you the control rather than Oracle but it isn’t determinist it is in thumb if you use the driving site hint then it’s not adaptive in other words until you change that hint it might just carry on using the wrong place to do join remote transactions are like remote queries only involve one remote database so in a remote transaction you connect to your local instance via adb link we do various DML statement against a single remote database and then we commit it or roll it back and of course a rollback ends up applying the undo and then committing that anyway worth mentioning that you don’t actually ever roll back anything in Oracle what you do is you reapply the undo but that’s additional updates and then that gets committed so you’re basically putting back what was there before but the scn is move forward yeah so roll back toward commits are both commits basically and all of that gets done remotely and is therefore in a remote transaction there’s only one database that’s going to be doing redo generation with regard to this which is the remote database and therefore there’s no two-phase commit it’s either the case that you issue a commit or you issue aerobic or you log evan is an automatic commit being done if there’s a failure occur we’ll just work as if the front-end instance wasn’t there and other failure will occur then it will just roll back okay single phase commit so conceptually this is very very easy and a remote transaction to be one single DML statement against one table in the remote database or in the mls of different kinds against one more tables in the remote database the key thing is it’s one database and that you don’t do any updates of anything locally and finally distributed transactions are when you have rows updated in more than one place like and distributed queries it could be the local database and one remote or several remote databases without doing anything locally or both local and remote databases as in the case with distributed queries we still need to we still need to decompose the statement sometimes but it’s often the case of distributed transaction that you’re doing an update by a dealing in this database then another statement up takes something else in another database another update statement update something locally it like that so one instance is used to each remote database the point there is when you’re doing remote transactions all the transaction branches need to go by the same instance if it’s rack that used to be something that meant people needed to look after the locknut since oracle of 11 it works on its own the local instance retrieves the status from the remote ian since each of the ml but this requires two phase commit to be done in other words two-phase commit is a well-known paradigm for distributed transactions it means that either it’s the case that for the individual sub transactions in each database had a part of the student transaction they all commit or they all roll back you don’t want this mixed outcome so these type of dbl and DML can be done in distributed transaction obviously see taps is a ddl but you’re creating a new table and details automatic and commit yeah but then you have all the usual things and you can also lock remote tables as part of this lock table in a particular mode and you can do select or select or update so these can all be done in distributed transactions now this is hard or start of some very strange things that you need to know about with distributed transactions first of all in a normal one instance transaction to prevent deadlocks from occurring the old or a 60 yeah oracle walks around the memory structures of the locks to work out whether or not you know a is locking B&B is locking a or a flocking BB is locking CC is locking a and you can get an or a 60 occurring it rolls back the statement when you get an or 60 not the transaction the people who’ve ever had it happen before you may be familiar with it but with distributed environments we have a problem still that you could end up dead locking but since the locks are in different instances there’s no way to want the chains to check whether or not you’re closing a loop in a deadly

embrace and so we have the distributed lock timeout which this parameter is used for this purpose the specify how much time you’ll wait for lock resources before deciding well there’s a good chance that there’s a deadlock and it defaults to a minute but it’s something to bear in mind when you’re testing as well a minute is look now it’s a long time if there is really a lock but it’s something you have to work out there’s no right or wrong answer this is what I said before about rack to be anyone here running rack yeah all transaction branches involving the same database should be connected to the same instance okay I had a case several years ago where a colleague of mine was doing something somewhere I can’t remember whether it was consultant or not but to be clever they had set of eb links from one instance to another of the same database yeah to try to do something in parallel so they were updating some rows of a table in one database instance of database a and some other updates among some other table in from the separate instance of the same database and the system just couldn’t figure out what was going on we have these tables of DB a 2pc pending and DBA to PC neighbors the underlying base dictionary tables pending trans dollar and it’s not designed to deal with updates to the same bed of s yeah so it was a problem it’s just not supported to do this don’t be tempted the reason why years ago sometimes people had DD links between instances was because prior to Oracle 80 there were no global be dull abuse so anyone here worked in the old days of the Precambrian in Oracle seven anyone here when Oracle 7 graduate like I am horrible so yeah so you may remember that in the old parallel server if you wanted to know all the sessions there was no GV dollar session you had to go and either connect to every instance or would create these DB links from instance of instance but a view on top of them and just doing some kind of Union old query in the view to access all the veedol of this or meet all of that and you could still do that but you’d be wasting your time because we have global T dollar views so the fact that there were DB links may be in somebody’s heritage from instance of instance maybe that led them astray well why don’t we try using that and being clever and it causes all kinds of weird things not supported anyway okay okay single phase commit as a term single phase commit is used in these cases single database commits and roll backs when you’re not using DB links that we all know about that or remote transactions as we described a bit earlier or type of tightly coupled transactions using XA and Oracle support sexay which is a protocol for having heterogeneous distributed environments with different vendors all support XA and the Titan couple the local is intent to commit request to a single remote in that the road infants will either say yes I’ve committed or no a rollback like that and the result gets passed back any DB links that used in the conventional Oracle way required today’s commit so here’s some terminology for two-phase commit this is really the brain the brain surgery of this discussion each database involved perform this one or more roles it depends on the context and also depends on whether you’re daisy chaining or not so for example I can have a distributed transaction will eight connects to be and eight connects to see but see connects onwards the DNA so a has a kind of coordination role to start to commit going but see is also got a local coordination role because DNA and not visible to it so whenever you have a distributed transaction with a daisy chain you get local coordinators as well as global coordinators in the whole business so the client is either a middle tier or a client and it issues transaction requests the database server receives them but the global coordinator is where the oh it’s the originating instance for the transaction so I’m logged into instance a that’s the global coordinator or I should admit it’s the one that starts doing things like prepare forget commit rollback things like that okay the local coordinator only occurs if you have a daisy-chain situation in this kind of configuration so if a is connected to b c d e NF but all of the distributed or remote databases in our locally visible to a then you have a star not a snowflake and therefore you don’t have the need for a local coordinator so if you’re familiar with the term star and snowflake from the data warehousing point of view it’s the same concept it’s whether or not you your daisy chaining

onwards and the commit point site is the node which is the first to commit as part of the two-phase commit process that’s not necessarily the same as the global coordinator well idea of the commit point sigh this is um we normally prepare everyone except for the commit point site and if everyone else is prepared okay we asked the commit point site to commit without a preparer and it did can commit it doesn’t everyone else does too and if it can’t everyone else is asked to roll back the whole idea is that doing a preparer is some work if you have to flush redo for example the resources stay long so the idea of having a commit point site is the site which has done them typically the most work is the one that should be the commit point site that way it doesn’t both have to prepare and to commit it just has to either commit or rollback now all the other thing that’s true about this terminology is that if you’re agreed to prepare in other words if you get a preparer call your time saying yes I am prepared that means you’re guaranteeing that when a commit call curse you’d be able to commit assuming you’re still there at the time commit comes and we don’t have a failure in the middle so in Oracle terms when a preparer comes we flush the redo buffers which may result in database right are also writing things out but we keep the resources locked and only when the distributed transaction finally does get the two-phase commit completed as a forget occur and the forget unlocks resources so the commit point site is what I’ve described as a parameter which you may or may not recognize called commit point strength and on the basis of this parameter the distributed environment determines who the commit point site should be if the commit point strength is one for everyone by default then usually it’s the local those are the global coordinator will also be the commit point site if everyone is one but if you know that a database d does the most updates you might decide well let’s do the commit point strength higher over there and make that the commit point site whether or not it’s materially different in performance depends on how different the amount of work that is done by each of these sites is if there’s only a difference in five or ten percent you know I’ve updated a hundred thousand rows here and 90,000 there it may not make any difference now so another thing is that the commit point negotiation is only a be able to be done between the local node and adjacent nodes the note that are not visible to the local node even if their commit point site will hire for example if a has ten and b c and d have twenty-twenty 42 d is 40 but d then connects onwards to en f and f as a hundred yeah then that only governs how we deal with that local coordination between d e and f yeah and we’ll see that in a moment so let’s look at the picture now that we have all this terminology we have a client to middle tier we have a local database instance which is also the global coordinator let’s imagine commit point strength is one in the in the over here and then we have a B and C which are visible to the local database and the various commit point strengths as you can c and d any remote from be so you can see that commit point strength is highest in E what’s going to happen here is the local instance who is also the global coordinator when the transaction is on the way there’s all this handshake is as well they may see as the highest commit point strange so what would normally be expected to happen is act at the time of commit is done the local database sends a prepare here a prepare here prepares itself yeah and tend to commit here but when a commit comes here sorry oakham it comes here rather okay what’s art when the preparer comes to be my apologies but what prepare comes to be before returning with the prepare it has to do the Local Coordination so once again prepare to hear prepared to hear before returning positive to prepare visas actually I have some downward database instances be then prepares d yeah be

then says oh you can commit okay if if this one rolls back yep we’ve got a problem otherwise not and then we can commit if that one prepares okay this one commits so you sound like you might have something that big with four or five so again it’s hard to say so I that’s our okay because right now next thing is when you have multiple databases their system change numbers are not necessarily coordinated each of them still has their own sen but there is something called a global system change number and it said to all the databases participating in this distributed transaction and each database has its local sen problem is that a failure can occur in two-phase committed at various but for example your kind of a failure where we send everyone to prepare but we have a committed anymore we sent a prepare to everyone and we sent to commit and the commit has occurred in a database d but it hasn’t got its response back to the coordinator so we don’t know whether he’s committed or not yeah so there’s all these kind of scenarios that can happen and the problem is that resources remain locked until forget is done and they don’t only remain locked in the sense of being unable to update them you can’t query them either that totally locked it’s totally unique for distributed transactions that if a table has involved in the two-phase commit until we either know that we’ve rolled back or we know that we’ve committed all of the tables involved in the two-phase commit or watch for both reads and writes so if one of your databases fails because there’s a bum let’s say there’s a power failure or some kind and you don’t have a BB you battery backup or there’s a fire or flood and what you would wipe out a whole data center and they’re part of a distributed transaction it’s gone down in the middle of the two-phase commit we may have tables in the surviving databases which are unclear able now that is a real risk and therefore you need manual in that resolution to unlock them that’s what I was referring to earlier by saying this needs to be part of anyone’s recovery strategy knowing how to deal with this yeah so prepare commit and forget are the three parts of two-phase commit and this is that what happens in a prepare the global toward later sends a request each instance except for the commit point site the commit point I’m asking for this prepare to be done I’ve described over there what it means when we do this repair this is what the site does replies with Reno me if know what done we didn’t update anything we’ve only queried even though we’re in the the distributed environment well we don’t that means we don’t have to get a commit sent to us later after all you could have a distributed transaction and you also query in database f but nothing got updated their rocks or tables in the transaction for both we then right so no one could query that no one can update them blushes renew to this to change SC ends with other instance to determine who has the highest Sen that becomes the global commit sen for the distributed transaction and then of course if there you’ll have a local coordinator because you have a daisy chain or a star whatever you want to call it a snowflake pardon me then it recursively issues prepare to its own adjacent nodes and everyone does that except to the commit point site and finally that’s the one to ask to commit it it commits we then send commits to everyone else if it rolls back we said roll backs to everyone else at the end of all that we send to forget and we can unlock the tables there’s the commit phase the global coordinator said to the commit if it’s also a local coordinator then it must recursively prepare midpoint site commits releases locks and doesn’t forget and then we then have the global coordinate it propagate that onwards to everyone else yeah and that’s the note that read only sites are not asked to do commit because there’s no point there’s nothing for them to commit ok if you’re rolling back because some site said no I can’t commit then the global coordinate attends a commit let’s say if the site is a local one and it must recursively prepare Benny site is unable to commit then it rolls back releases locks propagates the roll back to the coordinator they then propagate the roll back to everyone else and we rolled back and then we forget and unlock the resources everywhere else forget is a simple phase and he slides it to the coordinator updates the data dictionary it will remove references to DBA to PC pending from the dated issue i had a lady in germany a couple of the year and a half ago and dog conference came up afterwards when i did this talk and said no I’m is consoled and at this site and they have hundreds of entries in DBA 2pc

bending because they’ve had obviously things that have never been resolved somehow and nobody knew about this either and she was looking at it she came on the talk so I said well you need to manually resolve them to get rid of them talk about that later ok so in doubt transaction failure if you issue a standard commit or it’s in the code you might see some message like this ever look familiar to anyone pending tables have a look yeah I’ve always got nothing that I think I’m really pleased by that no one’s ever told me I’m going to do the guys me in it what are the good thing is why are you getting things in doubt then yeah anyway look resolving in doubt transaction they can be caused by various things there are some examples tables remain locked until this is resolved so the worst case scenario is that you’ve got a business critical system which you can’t query you can update nothing until you resolve this now that’s why you need to have this as part of a strategy that you know what to do if this occurs so now there is a background called Rico the recovery and its job is to automatically resolve things so if the problem is that an instance went down so the database is unavailable as soon as you reconnect with handshake it says well I have these in doubt I didn’t know what happened or the other participating instances are able to talk to it the coordinator then looks at it says all well actually um I sent you this commit and it said yeah I did commit so now the coordinator says okay they committed i can send the commits to everyone else we can forget about everything unlock all the resources all very happy if the other side said well actually when you asked me to do that i tailed so I rolled it back then the coordinator says well I better roll this back locally and send rollbacks to everyone else but until the other side reestablishes contact we can’t make that decision that should be done automatically by recup and here’s the problem it is to test this so that you know how to handle it you have to anesthetize Rico otherwise if you try to test things and you bring systems back up noriko will just sort this out for you so there is an alter system command to disable distributed recovery which I’ll show you the moment where you can it doesn’t stop Rico it just puts its sleep so Rico doesn’t do its job and then you can practice to face commits failures on your own yeah provided you have de links and then if for example you want to practice what you would do see by putting Rico to sleep you can pretend like this other remote database is never coming back even though you’re not going to destroy it and then you can practice your manual resolution either using commit or using rollback so we have to commit force or rollback force statements for this so in that resolution a failure occurs during any of the phases you might have compromised data integrity in other words it may be that you don’t know whether the other side committed it or not but somebody has to take the decision otherwise the table remains locked in this database and several other connected databases like you might have okay so this is a list of what Rico tries to do as I’ve said paraphrased earlier yeah if the crash occurs before the commit point site commits then roll back is done when we do connectivity being restored that’s quite obvious now if the committee is sent by the coordinator and a crash occurs after the commit point site receives it then commits should be done when the connectivity is restored because if the commit point site was given a committed there’s no reason it shouldn’t do it yeah well if a rollback is sent by the coordinator and the crash occurs before the commit point site receive that that a rollback will be done this or what Rico does but if Rico isn’t there or can’t do it because a remote database is just gone or your local database is just gone and you have resources lot than the remote ones we need to do the manual resolution and therefore it should be done if you have in doubt transactions that have locked critical tables and there’s no way for your business systems to can on and you can’t afford to wait for four hours or a day in order to get back this remote database assume you’re going to get it at all yeah and so we need to do it or if one of the participating databases is lost forever for some reason you need to do this as well now to do this properly requires somebody who knows about the current state at each database and knowledge of the application what the implications might be of forcing a commit or forcing a rollback so you need to have discussed this with the application designer but that make me a third party product for you know this is not easy no one said it

is that is not easy so the views that we have a DBA 2pc pending which is built on pending trans dollar and the base table and the neighbors table based on two other tables and there’s also a V dollar global transaction view which was a later view that came in a king an Oracle 10-9 something like this and if you’re uncertain what to do and you’re worried then since this is not an everyday occurrence perhaps it is you’ve got real problems then you might want to contact support and then well I don’t know it’s probably not an everyday occurrence to them either but never mind DBA to pc pending as you can see will show you from the point of view of your own database by the way this is a this bit of persistent data dictionary tables Thunder dynamic performance you so it persists after shutdown because we need to have a view of what spending no matter how many times we shut down the local tran ID from the transaction branch that was in this database is there and then there’s also the global tran ID yeah and the state yeah and whether or not we’ve had mixed outcomes for this if you’ve had a force committee force roll back in this potential for mixed outcomes this will stay around you have to manually purge yeah there’s a purge mixed procedure and there’s a purge lost database procedure as well there’s also the possibility uncommital various advice to be given by the developer in the commit statement so i’ll show you that in a bit of then you have x terminals and other obvious things ok this is the state collecting only applies if you coordinator you could be a global coordinator or a local one and collecting says the node is collecting information before deciding whether it can prepare so if we have a failure let’s say the network fails in the middle while we’re collecting the state might say well we were in the middle of collecting yeah and we’re stuck there prepared well it means that we have been prepared but we haven’t committed yet self-explanatory committed we committed yeah but we’re still in the view at all because there may be other sites which haven’t committed yet so you might ask well why should I have an entry in this DBA to be seen pending locally in my database when it says committed well somewhere else it hasn’t committed we’re still in doubt about the final resolution and then whether it’s force committal forest rolled back the neighbors view uses the local Tran ID and then it says in if it’s an incoming transaction out that we were the client in other words in a DB link the side that makes the connection to the remote database access the clients it’s a servant to serve a connection but one side acts more like a client and the other side acts more like a server yeah that’s what that means and then the database column has the name of the client database incoming meaning i am the server which is the database that was sending something to me and the outgoing is the name of the DB link which should match the global DB name of the remote database so you have a good idea who the databases that was on the other side of this neighbor row yeah and of course you can have more than one neighbor for the same transaction because you may have a star yep and then you have the the name of the local user the owner the DB link depending on director direction and whether or not there’s a request for a commit or you’re prepared only stuff like this read all global transaction which was a later view shows you all global transactions that are currently visible okay which branch it is how many branches are there what’s the state of the transaction this is more useful i think in XA if using heterogeneous distributed recovery or distributed transactions rather than just oracle database instance the oracle database instance manual in that resolution let’s again go back to the eight commit request that results in this endowed message note the trans ID number from the error that is the local transaction ID then query the local DBA to PC pending view yeah and that will give you all the other information about this particular transaction if the state says commit and the local database is committed the reason the row is in there is because there may be remote databases that is still in doubt okay now you compare this with the same on the other nodes when they’re accessible and of course if rico has done its job you met by the time you bring the other database up it may be faster than you are it may resolve everything and then there’s nothing to look at anymore unless of course you’re playing around you’ve you’ve anesthetized it by disabling distributed recovery if it says prepared than the local database the one you’re looking at now for this transaction has not committed it okay and then if it’s not committed then you look at the global tran ID and the commit number examine the neighbors to find out who

the other databases are that are involved yeah compared with the same on on the nodes and if other nodes have found prepared it’s safe to commit force commit if other nodes prepared of course you could still roll them back and when you do the force commit you use the global scn not the local Sen with a force commit statement you if you try to use the local one your skin an error anyway okay developers and this is not that commonly known developers can help if they know what’s going on in these transactions of course if they don’t then maybe you’re the kind of help you don’t want and this appears in the advice column on DBA to PC pending you so when we we have this ultra session advice commit advisor or Becker advise nothing there may be windows in the code so you enter a piece of the code where at this point the developers saying well my advice is if you in an in doubt situation in this window of code it’s better to roll by whereas product alternatively there might be some other window of code later on in the transaction where they may say well at this point we’ve done enough so if you’re in doubt it might be safer or more advisable to commit or they may say oh I have no advice at all regarding to this and then the via the default is no advice yeah so you can do alter session or you can use DBMS transaction to do it from the middle of peel sequel code whatever works better yeah so that’s one of these little obscure pub quiz type of Oracle syntax for people who like that kind of thing and of course whoever is doing this must know what the implications are that there’s no point in giving this advice if you don’t know what the implications are just a waste of time first commit would look like this commit and optionally with a comment because the comment is visible also in DBA to PC pending as a comment column and then there’s an optional word work you don’t have to use it it’s just syntax and there’s the force option the Tran ID or you can use DBMS transaction if you want to do it Tran ideas the local trend I the integer is the scn XXX is a comment and is optional and the requires of privilege as well so not everyone because you a force transaction commits typical DBA same thing for force rollback you can read one you could read the other now now the problem is that you could have mixed outcome so data dictionary clean up if the force has been done of a rollback or commit you could end up with a mixed result because one side may have done a commit you force the rollback or vice versa and in the mixed column you’ll have a yes in the tea in the DBA to PC penny view on all of the different databases involved Rika won’t clean this up even if UD anesthetize it yeah because it says it’s mixed it means you want to check this so we have a purge mixed procedure that allows you to get rid of that what you’re using to get rid of these no there is empty indep transaction of warnings when I go to start to have a look it’s never anything in that must be because Rico is rio rico whatever is causing the in doubt its network time yeah that’s exactly I was going to say it latency in the network something like this and by the time you look it sorted it out yeah you can prove that by the way by anesthetize enrico but it’s a large system yeah what do you want yeah don’t do otherwise you’ll end up with job security for life sure yeah if one of the databases involved is gone forever we have approached loss DB entry because there’s no way Rico can clean it up or anyone else can clean it up we can’t commit it we can’t force for all the back it’s gone so we just do this instead because we try to recover you just get an error message so instead we do this so this is just right out of the documentation i only put it in here for completeness in case you refer to this later on lot of presentations that i do my presentation to make some of them are right myself some of them are based on oh you content the oh you content once I can’t share online but this I wrote myself so it’s okay that way you have this information about what the state is what the state of the transaction is etc you can read that all yourself I’m not going to bore you by just reading through the matrix but it’s there to give you completeness in the presentation okay now this is something you may or may not have heard of this will it really it’s a good way to impress your friends how do we practice this well we use a special form of the comment on commit it’s a very obscure bit of Oracle technology then in Oracle since Oracle six or seven it’s been a long time in there but nobody knows that a very few people know it what you do is

you disable distributed recovery to put an Rico in to sleep so that even though it’s there it doesn’t wake up and do it stuff then you start doing your updates and whatever and then you practice all kinds of scenarios so let’s say you did a wanted to do a practice of crashing an uncommitted point site before it it’s prepared but before it’s been committed then you do it or up to pc crash-test hyphen seven as a commit and that will just generate that as if that’s happened and then you can go and practice manual resolution on this and there are 10 different crash tests you can do well theoretically how many of them you can do of course depends on how elaborate your distributed environment is if you don’t have any local coordinators then there are certain things you can’t do so this is something that you need to practice people should practice this you should practice this for sure yeah I’m glad you’re here I do it so some very interesting things there’s these crash-test the ability to anesthetize Rico and the fact that your tables remain locked until this is all fixed that’s probably the biggest shock to people when they haven’t heard this before that you can have tables that can’t be read can’t be written nothing until you resolve this problem now do you want to get a call at four in the morning when you don’t know who you are let alone where you are and make this decision this has to be part of your normal kind of recovery situation if it were to happen but it isn’t I’m afraid unless I’m is anyone here got something documented on how to deal with no nobody I’ve done this talk many times in the last few years and people don’t you separate your discouraged yeah yeah prisons are full of guys like right the recovery strategy for distributed databases yeah complete recovery this is another thing to watch out for separate let’s say that I have a point in time database recovery necessitated by something yeah whatever it might be I do a flashback database or I do an old-fashioned one where I where you know try to beat down I restore backup and roll forward whatever the case is but those databases had participated in distributed transactions you’ve got a separate problem or a nightmare scenario on your hands I’ve restored a database be to last night it’s now six o’clock and evening today I had to do this restore for business reasons say corrupt data whatever the reason is but database be with part of a distributed environment unless you also roll back all the other databases involved in the same distributed environment to the same global scn then you have a logically corrupt distributive environment and I was teaching a replication course years ago this was made me back an oracle 9i 9i days and there was a lady from Canada or the u.s. she was working for a Swiss bank in Zurich ji just joined them about four months earlier she was on this course and it came up i was talking about i think it was a backup a recovery course but i was supplementing the normal oracle stuff in there to talk about recovery from distributed environment because our course way it doesn’t deal with it or didn’t at the time and I said does anyone had I mentioned this problem and her mouth dropped open in the back row and she ran out of room and she spent the rest of the day on the phone apparently they had a three-way distributed environment Zurich London Toronto I think and they had a couple of months before she joined she had heard about this once she joined that there was a case where they had to roll back one of the database didn’t roll back the other two because nobody knew this this was major international banking system of some kind she didn’t say who was effective I said I’m not banking with you so that happens it does this needs to be something that you add to your recovery strategy as well if you do point in time recover of a database or the distributed databases have to be going back in time otherwise you live with the consequences yeah unless you can identify which objects are involved and dew point in time tablespace recover for those objects and you want to pick them then fair enough try to do that or Oracle well you can do point in time table recover which you may have heard of and surgically recover those tables but that may take you a bit of time because that doesn’t do transportable table space yeah so some of you may have been fiddling around with that feature but bear that in mind okay you need to recover to get those objects back to the same global scn okay so a complete

database recovery distributed transaction restored to the state prior to failure rico then automatically resolves any in doubt transaction to done automatically it may be done by the recovering instance in rack because you only have one instance doing recovery in rack tablespace or table point in time may be required also for corruption or logical errors okay so data in other databases must be restored and recovered to the same point as I’ve just said data voice point in time recovery same thing and so whichever way you do it you have to get those objects to the same global scn that’s my message so that plus the fact that your tables remain locked and when you’re in doubt that’s a lot to take on board about the dangers of having things go wrong in a distributed environment so I didn’t mean to scare everybody but I hope that you get a flavor for why this is important because backup and recovery as well still along with security I suppose these days of you know Pete Finnegan is busy at work everywhere because security is such on topic but we’re still judged by our ability to deal with failure scenarios this needs to be raised in everyone’s consciousness if you using distributed environments otherwise and they might come when things go wrong and you’re not prepared I hope you enjoyed that does anyone have any questions if you do just think of them later on just send me an email but thank you very much I hope you had a good fight