Rama Ranganathan (U. Texas Southwestern) Part 1: What is Protein Design?

Just another WordPress site

Rama Ranganathan (U. Texas Southwestern) Part 1: What is Protein Design?

Hello My name is Rama Ranganathan, and I’m a professor in the Green Center for Systems Biology and also the Departments of Biophysics and Pharmacology at UT Southwestern Medical Center In the next three lectures, I’m gonna tell you about a problem that I’ve been very interested in for many years now, and that is the problem of so-called evolutionary “design” of proteins So what we’re gonna do in the next three lectures is to begin with a general overview and a statement of the problem What do we mean by this problem of the design of proteins? And what’s the nature of the problem? Why is it a difficult and interesting problem? In the second lecture, we will move on to present or talk about an approach to this problem that we have started in our laboratory, and discuss a general model for proteins that emerges from these methods And in lecture three, we’ll talk about experimental approaches for testing the model and, ultimately, producing a general hypothesis that will drive future research So, let’s begin with a statement of the problem What do we mean by evolutionary design of proteins? So, the problem with proteins, in some sense, is very easy to state, though difficult, perhaps, to solve And that is that we know that proteins are synthesized as these long polymers of basic building blocks that we call the amino acids, and classical studies teach us that information in the sequence is such that if you put this polypeptide into a physiological buffer, it is able to spontaneously fold up into this precise and well-ordered three dimensional structure – that problem we call the “folding problem” of proteins But then since evolution is designing these sequences, not for just for the property of folding, but for biochemical functions, there’s information written in the sequence, not just to specify the folding problem, but the capacity of the folded state to carry out various biochemical activities So for example, in the case of this little protein, the PDZ domain, the ability to recognize target ligands, small peptides, at a groove on the surface that we call the peptide binding pocket So fundamentally, the question is the nature of this information encoding problem What information in the sequence, what constraints on amino acids and, in principal, interactions between amino acids is necessary and sufficient to produce these properties of proteins: the capacity to fold, the capacity to carry out biochemical activities And I want to point out that natural proteins have to do, in principle, more than just fold and carry out biochemical functions They have to be possible through the process of evolution and able to have the capacity for adaptation to evolve to new functions when the nature of the fitness function demands it So, we’re interested in understanding what design of proteins accounts for all of these properties Now, what do we actually mean by this word “design”? What exactly is meant by design? Well, the first thing to realize is all the properties of proteins that we’re interested in — folding, biochemical activities, and even this property of adaptation — arise somehow from the pattern of physical interactions between all the atoms that make up the structure So, by design, we specifically mean this pattern – the architecture of physical interactions between atoms and, in principle, the generative process, that is to say, evolution, that shapes this pattern And we can gain some intuition on this problem of the pattern of these physical forces by a knowledge of physical chemistry For example, we know that all the fundamental forces that bind atoms act at very short range, and they act with great distance and geometry dependence As a consequence, when we look at atomic-resolution structures of proteins, we observe this property of good packing, and let me explain what this means So, what I’m showing in these images is a slice through the core of this protein, so I’ve chopped this protein down the middle and opened it up like a book, and you can look at the way that the atoms are arranged in the core of this protein And what we observe is that the density of atoms in the core is so high that it approaches that which we observe in crystals of free amino acids and, as a consequence, we find that proteins look in some sense like 3-dimensional jigsaw puzzles That is to say, they look precise and locally exact As a consequence, when we ask questions about protein function, such as which amino acids in this protein are responsible for recognizing this substrate ligand in yellow, we all use a principle I will call the principle of spatial proximity What that means is that, well, if you wanna know which amino acids are recognizing this peptide, the idea is to look in the immediate environment of that peptide in the structure So, the red amino acids are the ones that are physically contacting the substrate ligand, and a first-order hypothesis would be that those amino acids are the ones

that are responsible for the specificity and affinity of target ligands for this PDZ domain And in fact, many experimental and computational studies are done to interrogate this local environment for the origin of binding and specificity But the trouble is that, even in this little protein, there are back-side surfaces, so for example, this patch of red that’s in the back of this protein quite far away from where substrates ligands are binding at the active site, but these surface sites control the affinity and specificity of target ligands for the active site And what’s interesting about those kind of surface sites that have this capacity is that they’re quite special There are many other surface sites that you can see in blue that are equidistant and just as well connected to this pattern of interactions to the active site, but they’re inert; they don’t have this property Interactions there do not affect the affinity and specificity of ligands at the active site This already tells us, then, that there must be some heterogeneity, some pattern of interactions which we don’t observe in the structure of the protein that makes it so certain surface sites are functionalized and can carry out these properties of long-range communication, whereas many other surfaces do not have that property And that’s not obvious when we look at the protein structure Now, this property of long-range interactions is not just an arcane property of this one protein, in fact, one can argue that biology in general is driven by such long-range interactions For example, in membrane receptors such as G coupled-receptors, signaling proteins like G proteins, and other receptor molecules such as the TonB-dependent receptors that are in the outer membrane of bacteria, there is a critical need for long-range interactions So for example, in G protein-coupled receptors, when ligands bind at an extracellularly accessible side over there, the effect of that binding is felt at quite a long range in the cytoplasmic regions that mediate downstream signaling, and the fundamental purpose of these proteins is to convey this information from one site to a site that’s quite at a long distance It’s not just signaling proteins Even… it’s now clear that in enzymes, the active site function can depend upon the dynamics of surface loops that are quite far away So, we would like to say, generally, that long-range interactions are important in biology They mediate signal transmission, catalytic specificity and efficiency, and these are often defining features of protein families, so these long-range interactions tend to be quite important This problem of trying to understand long-range interactions is also not anything new In fact, it dates back to the origin of structural biology and our understanding of proteins So for example, this is the classic problem of understanding function in hemoglobin Hemoglobin, of course, was the structure was solved by Max Perutz, shown here, in the late 50s, and the problem of hemoglobin was well known even at that time Hemoglobin comes as a heterotetramer of alpha and beta subunits, and the structure of the molecule is shown here Each of these subunits contains a heme moiety that ligates an oxygen atom, and the interesting property of hemoglobin, that was know since the beginning of the last century, is cooperativity in oxygen binding, such that an oxygen ligating to one of the heme molecules influences the affinity of oxygen for a distantly positioned heme on a different subunit, and of course when the structure of hemoglobin was done, the first question was, how does this occur? How does cooperativity in oxygen binding emerge from properties of the 3-dimensional structure? Now, to illustrate the nature of this problem, I’m actually gonna let someone else tell you the nature of the story It turns out that the great physicist Richard Feynman, in the early 60s, gave a lecture called “The Relation of Physics to Other Sciences”, which actually was recorded, and in this lecture he addresses this problem of hemoglobin, and I’m gonna play a short audio clip that, according to Feynman, illustrates sort of the nature of this problem So, let’s listen to Richard Feynman – we’ll come back in just a minute (Richard Feynman – “One of the great triumphs of recent times within the last year, in fact, was at last to discover the exact arrangement in space… and the order of all the amino some 56 or 60 amino-acids in a row over a 1000 atoms, well nearly 2000 if you count the hydrogen atoms have been all located in a complex pattern for one, well now two, proteins the first was hemoglobin One of the sad parts of this is that for all we can tell, yet, we can’t see anything from this pattern We don’t understand why it works the way it does because of that pattern but of course that is the second problem.”) Okay, so this is 1961,

and Feynman claims that this is the next problem to solve, understanding how this pattern and we find the idea was, we finally have the structure, the precise positions of all the atoms that make up hemoglobin, but we can’t see in this pattern how important functional properties such as the cooperativity in oxygen binding emerges from the structure I’ll point out that Feynman gets many things wrong here I mean, hemoglobin doesn’t have 55 or 60 amino acids, but you know, like every good scientist he really gets the essence of the problem: we don’t understand the function of the protein given the structure And the question really is, why? What’s the fundamental problem that limits our ability to understand protein function given the structure of proteins? And I’ll sort of summarize the whole thing by saying that the essence of it is that that we don’t see energy in structure And what I mean by this is that we may observe in this molecule, say, a hydrogen bond that relates two of the atoms, but what is the net value of this hydrogen bond in this complex transaction of free energies that happen between the unfolded state of the protein and the folded state In fact, everything we know about that problem teaches us why it’s a difficult thing to know the net value of these interactions in protein structures First of all, natural proteins tend to fold into very marginally stable to marginal stability So for example, a typical molecule such as this, the PDZ domain, might be stable by just 8 or 10 kilocalories per mole, that separates the unfolded state and the folded state Now, what does 8 or 10 kilocalories per mole mean to you and me? Well, that’s on the order of 3 or 4 good hydrogen bonds worth of free energy, distributed through all of the interatomic interactions in this protein structure And also, coupled with what I told you earlier, which is that the forces that bind atoms act with such distance and geometry dependence, it’s no wonder that by looking at protein structures we don’t know the net value of all of these interactions, and that is really the essence of the problem Which of these interactions between atoms are important, and which are not so important for specifying properties of a protein, of folding, function, and, in principle, properties such as adaptability? Well… how do we solve this problem, then? We know some things about the nature of the solution, though, that can help us, that can guide us in looking for an approach And I’ll you about three of these sort of nature of the solution, that I think that the empirical data collected over many, many years tells us We’re looking for a solution that actually has three characteristics The first characteristic, which is interesting, is that it’s sparse, and what I mean by that is that most of the important contributions to folding and function in many proteins are contained within just a small fraction, or a subset, of the amino acids that make up the protein One way to show this result is to use mutagenesis methods to probe the importance of amino acids for, let’s say, protein function So for example, what I’m showing you here is the thing that I showed you earlier, the structure of the PDZ domain sliced into the core, showing that it looks like a homogenous and dense network of amino acid interactions But if you carry out a comprehensive mutagenesis study, so in this matrix, what’s shown is the effect of mutating every position in this protein, going from N-terminus to C-terminus on the rows of the matrix, to every other amino acid on the columns of this matrix And every pixel of this matrix reports the functional effect of that mutation on the ability of the PDZ domain to bind substrate ligand And the result is quite striking: you see that there are some positions, such as this one, that wants to be the native amino acid, which is histidine, and essentially can’t tolerate any mutation Any mutation at that position is deleterious for the binding of the substrate ligand Yet there are other place, like the amino acid just next door, which could care less what you put at that position; every amino acid suffices to give you native-like function And if you examine this pattern of amino acid importance over the whole sequence of the protein, what you conclude is that there’s just about 20% of the amino acids that are critical for the property of protein binding and actually I’m not showing it here, but if you look on the protein structure, it includes amino acids that are both near and far from the site at which peptides bind in the PDZ domain So this is really a dataset that illustrates the sparsity and heterogeneity of proteins, in their function, which is not obvious when we look at the protein structure The second property is more complex to explain, and that is that the solution is non-additive, that is that amino acids that are involved in function are often not acting independently, but they’re engaged in pairwise or even higher-order cooperative interactions, such that the effect of the amino acids taken together is not just the additive, or the independent actions, of the amino acids taken independently

That requires a little bit of explanation because it’s not a simple thing to understand, so I’m going to review very quickly the concept of thermodynamic cooperativity and what it means to say that amino acids are acting non-independently, because this is quite a critical problem in our understanding of proteins So, what I’m showing you here is just a model protein, a schematic of a protein, and I’ve labeled three of the amino acids So, consider this protein with these three amino acids how can we sort of generally describe the effects of mutations at these positions with regard to some equilibrium free energy measurement that we make on the protein? And what I mean by that is we might observe or measure the effect of these mutations on, say, the stability of the protein, or perhaps on the ability for it to bind substrate ligands, or any other equilibrium free energy measurement So, how can we generally describe the effect of these mutations on such a measurement? The single mutation effect is simple to describe You measure the free energy of the wild type state, you measure the free energy of the mutant state, and take a difference between the two, and that is the energetic effect of a single mutation on whatever property it is you’re measuring But the effect of a double mutation is more complicated In this representation, the formal description of how to think about a double mutant experiment, what we do is we make what’s called a thermodynamic box, or a Carnot cycle, and the idea is that each corner of this box represents one state of the protein So this is the wild type protein, this is the effect of mutating position i, on this dimension we put the mutation of position j, and this corner now has the double mutation state where both positions are mutated jointly And the idea is that we’re trying to write an equation that describes, most generally, the energetic effect of the double mutation That’s the difference between this state and the diagonal state, which is the wild type That’s the effect of the double mutation So it turns out that, in general, the right way to think about it is that the effect of the double mutation is a summed effect of the two single mutations, but corrected by a factor called the coupling free energy between the two mutations Right… and the idea is that if the two mutations are totally independent of each other, they are thermodynamically independent, then the coupling free energy is zero, then it’s simple: the effect of the double mutation is just the additive effect of the two single mutations But if the two mutations see each other, if you will, or are interacting with each other, then there is a coupling free energy that must be taken into account in describing the energetic effect of a double mutation Now, what is this coupling free energy? Well, it’s interesting when you look at how it’s defined The effect of the coupling free energy, or the ΔΔG between mutations i and j, is the energetic effect of making a single mutation in the wild type background minus the energetic effect of that same mutation, mutation i, made in the background of another mutation j So, to be clear, this quantity, the coupling free energy, can be thought about as sort of two things that are equivalent The first is the degree to which the effect of one mutation depends on the background of another mutation, and of course if the effect of the first mutation is entirely independent of the other mutation, these two quantities on the right side of this equation are the same, and the coupling free energy comes out to be zero That is, if two mutations are thermodynamically independent, the coupling free energy is zero But if it’s non-zero, then that means those two mutations are interacting with each other, and the magnitude of this coupling energy measures the degree of coupling between the two mutations Okay, that’s the formal description of a double mutation, but we’re not done yet Why stop with two mutations? What if we made… introduced now a third mutation? How should we think about that? Well, now the problem expands geometrically So now we have to describe a triple mutation experiment, formally, in thermodynamic terms, as a cube, right? Where the front face of this cube is precisely the same thing that we have here, it’s the cycle that describes the energetic effect of a double mutation, and now the third dimension going into the board is the effect of the third mutation, okay? And I won’t bother describing the details of the calculation of how to write down the energetic effect of the triple mutation See, the triple mutation effect is the cube diagonal, the effect of this state minus the effect of the wild type state, here And, without derivation, I’ll just make a simple point, which is that, formally, to describe the energetic effect of a triple mutation, one has to consider the additive effect of the three single mutations, corrected by all the underlying pairwise couplings, that is to say the coupling between mutations i and j, the coupling mutations i and k,

and the coupling between mutations j and k, and then there’s a new quantity that emerges, which is a three-way coupling, a ΔΔΔG, which is the degree to which all three mutations interact, which is not simply the additive effect of their underlying pairwise interactions So, you can see where this is going If you consider four mutations you can imagine what’s going to happen We’re going to have to introduce, at least formally, a fourth-order term that describes the coupling between all four mutations, and so on and so forth, so this becomes an extremely complex problem when one considers the idea that a typical, even small protein is made up of maybe 100 amino acids So, the question we can ask is, in practice, in real proteins, do we observe such higher-order thermodynamic couplings and really, what is the extent of the problem of cooperativity between amino acids in practical cases I mean, we don’t know much about this actually because it turns out to be very difficult to measure these higher-order energies, but it has been done beautifully in a study in the potassium ion channel, so let me just describe to you what these investigators saw And here I’m talking about, specifically, the work of Sadovsky and Yifrach that was reported a few years ago So, just to introduce the model system they were working on, this is the potassium ion channel This is a membrane protein that’s found in the plasma membrane of many cells and has evolved in selective permeation of potassium ions across the plasma membrane The structure of the potassium channel is shown here It’s actually a tetrameric protein, but only two of the subunits are shown here for clarity, and this is the outside of the cell and the inside of the cell So, the channel adopts a roughly inverted tepee-like configuration where on the outside face of the channel there’s a narrow constriction right at the top that’s called the selectivity filter, and that’s a narrow region where potassium ions go through the pore in a single file And then on the internal aspect, near the cytoplasmic face, there’s a narrowing of these helices in a region that’s called the activation gate, and that region has to undergo a conformational change to allow the channel to open to conduct ions Now, what is really interesting about the work from Sadovsky and Yifrach is that they showed that amino acids in the channel that are important for gating, that is the ones that influence the ability of the protein to open and close, actually make up a network of amino acids that connects the activation gate down at the cytoplasmic region with the selectivity filter that’s located more near the extracellular region of the channel And, experiments these investigators did in actually measuring second-order, third-order, or even fourth-order coupling free energies between these mutations showed that along the pathway the pathway, by the way, is indicated here in these red amino acids along the pathway amino acids tend to show strong higher-order terms That is to say, that there are fourth-order coupling energies, and they didn’t go any higher than that, but maybe there are even higher-order terms there But, what was interesting is, though along the pathway you see these higher-order terms accumulate, if you go orthogonal to the pathway, that is to mutate residues that are leading away from the pathway, this cooperativity degrades rapidly, and now amino acids are only engaged in lower-order cooperative terms or not even cooperative at all So what this really teaches us, or suggests to us, is that inside of this protein is embedded some cooperative subsystem of amino acids that makes it possible for information to flow from two distinct points of the protein, and outside of this system amino acids are engaged in much more lower-order, simple energetic interactions This was a very illustrative example of the idea that proteins do contain non-trivial non-additivities That is to say, cooperative interactions of amino acids, although that study also suggested that those cooperativities were loaded, perhaps, in only small networks of amino acids and not so involved in the majority of the protein But that’s a statement that requires much more study To review what we’ve said so far, again, we’re looking for something about characteristics of the nature of the solution to the pattern of physical interactions that are in the protein structure, and we find that there are these three characteristics The first is sparsity, that is to say most of the contributions of amino acids are contained in a small fraction of the residues Second is this property of non-additivity What I’ve shown you just now is that at least the functional residues in the protein are often engaged in pairwise or even higher-order cooperative interactions, although the work that I showed you just now suggests that these non-additivities may be localized in just a subset of amino acids that are functionally important,

with less importance of this cooperativity in surrounding amino acids, but those assertions need to be tested further In addition to these two properties, the third property that we think about is the property of modularity And what modularity means is that different groups of amino acids can encode for different and independent properties of the protein, and an example of this was illustrated in the study of these proteins known as the S1A serine proteases These are the enzymes that catalyze the hydrolysis of the peptide bonds involved in many things in biology, but the analysis of this protein family showed that this single protein contains three distinct functional units — the red cluster, the green cluster, and the blue cluster — and experimental studies showed that these clusters mediate distinct and orthogonal biochemical properties of the enzyme So, the important point is that these properties of proteins — sparsity, non-additivity, and modularity — in general are not evident in our examination of even high-resolution atomic structures, for all the reasons that we’ve said so far And so, what we need to ask is, what approach can provide insight into this pattern of constraints? What’s an approach that can lead us to the analysis of the so-called “design” of natural proteins? In the next lecture, I would like to propose an approach to this problem that we’ve taken in my group and then discuss a general model that emerges from these studies Thank you very much