CS50 2017 – Lecture 3 – Algorithms

Just another WordPress site

CS50 2017 – Lecture 3 – Algorithms

[MUSIC PLAYING] DAVID MALAN: All right This is CS50, and this is lecture three And today, the goal is to be a lot more algorithmic than code-oriented, and to really kind of come back to where we started in lecture zero, when we talked about computational thinking, and algorithms Because now you have a few weeks under your belt, and you have a few new tools, and hopefully skills, under your belt, even though odds are, you’re still getting comfortable with some of those skills And so let’s look, then, at this piece of hardware that we talked about a couple times And this is an example of what? AUDIENCE: Memory DAVID MALAN: Yeah, memory, or RAM, random access memory And this happens to be pretty big on the screen, but it’s actually pretty small, if you were to find it in your Mac, or PC, in your laptop, or some other device And this is just memory And so, memory is places you can store information– numbers and characters and strings, and bigger things still And recall that if you want to use this memory, you need to address it You need to put something in particular locations And we’re going to start doing this all the more So in particular, there’s at least, like, five– four black chips on here, which is actually where the zeros and ones are stored, in the green circuit board And little gold connectors you see on the screen, too– that just allows all of those black chips to intercommunicate, ultimately, and for the zeros and ones to move around But if we zoom in on just one of these chips of memory, arbitrarily, and then zoom in here, you can think of this, wonderfully enough, as a rectangle Could be any shape, certainly, but if we think about it as a rectangle, we can divide this chip of memory up into a grid, like this Completely arbitrary how I’ve drawn this, but the fact, now, that I have a whole bunch of small boxes– you can start to think of this as being one byte, and then another byte, and then another byte And I coincidentally wrote eight bytes across, but it could be any number It doesn’t really matter And so, you can start to think about these bytes as having specific locations, or numbers, or addresses, as we’re going to start calling them So if I zoom in on the top of this, you could start to think of the top left as being byte number 0, because computer scientists generally start counting at zero, and then byte 1, and 2, and 3, and 4, and dot dot dot And then, just as a refresher, in a typical Mac or PC these days, how much RAM, or memory, would you typically have? AUDIENCE: Four? DAVID MALAN: Four what? AUDIENCE: Four gigabytes DAVID MALAN: Four gigabytes And giga means billion, so that’s four billion bytes And maybe you have a little less, maybe you have a little more, but on the order of billions of bytes So we’re only scratching the surface of just how much storage there is, and how many numbers and characters you can store But the point is that we can address them, in some way– zero, one, two, and so forth So this is how we stored Stelios’ name in memory Recall, this was an example last time, where we needed to put the characters of his name somewhere And so we actually put the S, and then the T, and then so forth, starting at the leftmost location on forward But we also put something else It’s not sufficient just to put the letters of his name in memory What else did we have to do? AUDIENCE: Put a zero in there? DAVID MALAN: Yeah, the zero command, or more specifically, the zero byte, or the so-called null byte– N-U-L. And it’s represented as backslash 0 Because if you were just to type a normal zero, that would technically be a character from your keyboard It looks like a number, but it’s technically a character, a char And so backslash zero literally means 8-0 bits in that byte’s location OK So now that we have this ability to represent Stelios with a sequence of characters, null-terminated at the end, we don’t really need to worry about the hardware anymore Right? We can abstract away from that just as we did in week zero, and not worry about how those zeros and ones are stored We can just trust that they are, somehow And so we can abstract away, and just start thinking about it as a sequence, or a grid, of letters like this But that backslash zero is important, because it is the only way that this language we’re currently using, C, knows where strings end If you didn’t have that, it would– printf, for instance, might just keep printing all of the contents, all four gigabytes, of memory that your computer has, no matter what those characters actually are And then, of course, you couldn’t distinguish Stelios’ name from Maria, or someone else altogether in memory So, let me go ahead and open up the IDE, just to demonstrate now what you can do with this kind of knowledge Suppose I wanted to write a program that allows me to let a user type in his or her name, and then I just want to print out his or her initials And for simplicity, I’m going to assume that the person’s initials, or whatever the capitalized letters are, in the input they type So, I have to be a little nit-picky, and type in my name properly But with that said, let me go ahead and whip this up So, I’m going to save this as initials.c Just out of habit, I’m going to start by including the CS50 library, because I’m going to want to get a string from the user I’m going to go ahead and include standard io.h, so that I can print something to the screen And we’ll decide later if we need anything more I don’t need any command line arguments for the purpose of this program, so I’m going to go back to our old version of main, where you just specify void– no argc, no argv And then here, let me go ahead and declare a string called s, and get a name from the user, as with, “name,” and get string

And then, what do I want to do after this? I want to go ahead and iterate over the characters in the user’s name, and print them out only if they’re uppercase But you know what, rather than just print them out, you know what I want to do? I actually want to create a new string, myself, containing the user’s initials So, I don’t want to go ahead and just print out, with percent c, one character after the other– rather, I want to go ahead and store the user’s actual initials in a new string, so that I’ve got one string for their name, and one string for their initials And, ladies and gentlemen, Ian SPEAKER 2: Sorry for the video glitches DAVID MALAN: Thanks All right So, to be ever more clear, let me go ahead and rename this to name, literally, and then I want to have a string for their initials But we know what a string is, as of last time It’s really just a sequence of characters And a sequence really has another name in programming What is another synonym we’ve used for a sequence of something? AUDIENCE: [INAUDIBLE] DAVID MALAN: An array An array is that data structure we had when we started using square bracket notation So, if I actually kind of roll this back and break the abstraction, if you will– don’t think about it as a string What is a string? It’s a sequence of characters Technically, I could say char, and then I could say initials, and then I can specify however many letters I want to support in a human’s initials And by– assuming we have a first name, maybe a middle name, and a last name, three characters should do it Three characters So, David J. Malan, DJM, three characters Is that enough chars? AUDIENCE: [INAUDIBLE] DAVID MALAN: I’m hesitating, because it doesn’t– AUDIENCE: [INAUDIBLE] DAVID MALAN: It’s not, but why? AUDIENCE: You need for the null character DAVID MALAN: Yeah So if we want to terminate even my initials– which isn’t technically a word, but it’s certainly a string, it’s a sequence of characters– we need a fourth character, or we need to anticipate a fourth character, so that whatever we put in the computer’s memory is technically, in my case– like, D-J-M backslash 0 You need that fourth byte Otherwise you do not have room to actually terminate the string So, now, even though this doesn’t look like a string, insofar as I’m not saying the word string, it really is It’s a sequence of characters of size four that I can put characters in Now, what are the characters in this array, by default, have we said? When you just declare a variable of some size, what values go in it? AUDIENCE: [INAUDIBLE] DAVID MALAN: Sometimes zeros, but generally, what’s the better rule of thumb? AUDIENCE: You don’t know DAVID MALAN: Yeah, you don’t know It’s so-called garbage values Nothing– you should not trust the value of a variable, generally speaking, unless you yourself have put the value there, as by storing it, with the assignment operator, or by manually typing it in yourself So, just to be clear, if I wanted this program to be kind of useless for anyone except myself, I could actually do this– I could go ahead and do– initials, bracket 0, get “d”, initials, bracket 1, get “j”, and then finally initials, bracket 2, get “m” And then lastly, and this is the thing you might forget sometimes, you actually need to do the backslash zero there But of course, this is not at all dynamic But I have, in this these lines of code now, created a new string called initials It’s of length– it’s of human length three, DJM, but the computer is actually using 4 bytes to store it But this is not the point of the exercise, because we already asked the user for his or her name I need to now figure what that is So just logically, or algorithmically, if you will, what process could we use, given a name like David J. Malan, or Brian Yu, or anyone else’s name– how could we look at that input and figure out what the user’s initials are? What’s the thought process? Let me go a little farther back So, David J. Malan, or any other name What’s the process? What do you think? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, good! So, iterate with a for loop over the letters in the name– and you’ve done that before, or in the process of doing that, for something like caesar or vigenere, most likely And then you can use something like is upper, or you can even do asciimath, like we did a while ago, to actually determine, is it in the range of A to Z capitals on both ends? So we have a couple of options So, let me try to convert that to code Let me get rid of the hard-coded name, which is just kind of nonsensical, but demonstrative of how we could store arbitrary characters And now let me do this For int i get 0, i is less than the string length of name, i plus plus– and then I’m going to do something like this If the i-th character character in name is an uppercase letter– and I’m going to use a function that you might not have used yourself, but recall that it does exist– is upper will return, essentially, true or false, this is an uppercase letter–

so, if this is uppercase, I’m going to go ahead and do what? Well, the story is not quite complete It’s not enough to just iterate over the names and– the letters in the name– we now need to decide where to put the first capitalized letter that we find It’s obviously going to go in the initials array, but what more do I need to add to my code to know where, in this block of four characters, to put the first character, D, in the case of my name? Yeah AUDIENCE: [INAUDIBLE] initials i? DAVID MALAN: Initials i OK, so if I do that– good thought So let’s do, initials, bracket i, gets name bracket i– that would seem to put the i-th character of name at the -th location in initials, which at the moment is perfectly correct, because i is 0 And if I typed in David J. Malan, D is at the zeroth location So, we’re good But there’s going to be a bug here AUDIENCE: [INAUDIBLE] continue to the name, then you’ll have less slots DAVID MALAN: Exactly I is going to continue marching forward, one character at a time, through the entire name of the user, but you only want to index– you don’t want to keep doing that same amount in the initials array, because again, the initials is much smaller So even as i moves through the user’s name, you want to take baby steps, so to speak, through the initials So, how can we solve this? I can’t use i, it would seem AUDIENCE: You could use a variable that’s like a [INAUDIBLE] DAVID MALAN: OK, great Let’s do that We need another variable So, I could put this in a few different places, but I’m going to go ahead and put it here for now So, int counter gets zero– I’m just initializing my counter to zero– and then as soon as I won’t find an uppercase letter, I think I want to do this? Put it at whatever the value of counter is? And then there’s one more step What do I need to do once I, yeah, put it at the counter location? AUDIENCE: Increment counter by one DAVID MALAN: Exactly Increment counter by one So, I can do this in a few ways I can do it very literally– counter gets counter plus one, semi-colon It’s a little annoying to type You could do plus equals one, which is slightly more succinct Or, the ever-prettier, plus plus, which achieves the same result Fewer characters, same exact result, in this case OK, so now, I think we have a for loop that iterates over all the letters in the name If it’s an uppercase letter, it stores that letter, and only that letter, in the initials array, and then increments counter so that the next letter is going to go at the next location in the initials array So, if all that seems to be correct– ultimately, I want to do this– I want to go ahead and print out percent s, backslash n, initials I want to just print the user’s initials But there’s one key step in this blank line that I should not forget What’s the very last thing I need to do? Yeah AUDIENCE: You need to print a null character [INAUDIBLE] put a null character at the end of the [INAUDIBLE] DAVID MALAN: Exactly I need to put a null character at the end of the array So, how do I do that? Well, I have the syntax, and I think– you know, I want to say, end of array– but how can I do that? What values should I put here? Yeah DAVID MALAN: The string length name? DAVID MALAN: Yeah, I could do strlen of name, well, not of name– AUDIENCE: Of the initials DAVID MALAN: The initials, but now, you kind of got me in a circular argument, because I’m trying to– it’s kind of a catch-22 now I am trying to store a backslash n at the end of the string But recall from last time, the way strlen knows where the end of the string is, is by putting the backslash 0 So, it’s not there yet So we can’t use strlen But, we already have, I think, the solution– AUDIENCE: Can’t you just put initials four? [INAUDIBLE] DAVID MALAN: OK So, we could absolutely do that, or almost that It’s not quite four One tweak here, yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: In back, yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK, good So, actually– so, counter– is, yeah Spoiler, counter is going to be the answer But let me just fix this one bug here Four was the right intuition, but remember, if you have four characters possible, it’s 0, 1, 2, 3, is the very last one So, we needed to fix that to 3 But even more general would be to put counter, because the value of counter is literally the length of the string we just read into the initials And so if we want to terminate that string, we already know how many characters we’ve counted up And in fact, it would technically be wrong to just blindly put backslash zero only at the very end of these four characters Why? In what situation would that be– logic be incorrect? Yeah? AUDIENCE: If someone has more than three initials DAVID MALAN: If they have more than three initials, we really have a problem, because we don’t have space for anything beyond three actual letters and a null terminator And there’s another problem, potentially Yeah? AUDIENCE: If they don’t have a middle name? DAVID MALAN: Yeah, if they don’t have a middle name, there’s going to be, only maybe two letters, first name and last name And so, you’re putting the backslash zero at the end of your array, which is good

But what’s going to be that third value, that second to last value, in the array? AUDIENCE: A garbage value It’s just garbage, so to speak And so it could be some funky character, you might get weird symbols on the screen– you don’t know, because it’s just incorrect The backslash zero has to go at the very end of the string So, let me go ahead, and if I’ve made no syntax errors here– let me go ahead now and save this, and go ahead and do make initials– OK, so, hmm Implicitly declaring library function strlen with type unsigned long So, there’s a lot of words, only some of which I kind of recognize immediately I could run this through help50, which should be your first instinct, on your own or in office hours But let’s see if we can’t tease apart what the error actually is What did I forget to do? Yeah AUDIENCE: Uh, [INAUDIBLE] DAVID MALAN: Yeah, I needed the string library, which now I’ll start to get in the habit of more Anytime I’m using strlen, just try to remember to use string, otherwise you’ll see that error message again and again Maybe there’s more errors, but I’m not sure, so I’m going to go ahead and just recompile, because again, you might have more errors on the screen than you actually have in reality But there is indeed one more So, similar error, but different function Implicit declaration of function is upper So that’s, again, same mistake I’ve forgotten something Anyone recall? This one’s a little more– less common, I would say Yeah AUDIENCE: You need the character type library DAVID MALAN: Yeah, the character type, or C type, library So again, you’d only know this by checking the [? man ?] page, so to speak, or reference.cs50.net, or checking your notes, or whatnot, or Google And so that’s fine It happens to be in Ctype.h Now, let me go ahead and recompile Seems to work, so, initials– let me go ahead now and type David J. Malan, enter– seems to work Let me try a corner case, like Rob Bowden, without his middle name RB And so, it seems to work This is not a rigorous testing process, but let’s trust that we did at least get the fundamentals right But the key here is that we broke this abstraction, so to speak, of what a string is Because if you understand that, well, a string is just a sequence of characters, and hey, a string must end with a backslash zero, you can do that yourself And this is what you can do in C You can put characters anywhere you want in memory, you can put numbers anywhere you want in memory And this ultimately gives you a lot of power and flexibility, but also, as you’ll soon see, a lot of opportunities for bugs, with this power All right, any questions on that particular example? Yeah AUDIENCE: Can you explain the initials counter? DAVID MALAN: Sure AUDIENCE: [INAUDIBLE] DAVID MALAN: Sure Let’s explain the initials counter, which is used here, and is declared up here So, we essentially want to keep track of two locations If my name is sort of of this length, I’m going to start at the first character, D, and with i, iterate from D-A-V-I-D, space, and so forth So, that one just kind of marches on, one step at a time The initials array, meanwhile, is another chunk of memory It’s of size four, somewhere else in that green silicon chip, with all those black chips on it And we want to move through it more slowly, because we only want to move to the next location in the initials array once we’ve put a capital letter in it And that’s going to be less frequent, presumably, unless the user typed in all caps So, in order to achieve that, we need a second variable And you proposed that we use a variable called counter But we could have called it j, or something else And so, the counter is initialized to zero, and it is incremented here any time we encounter a capital letter So, it has the effect of literally counting the capital letters in someone’s name: D, J, M. And so it should be 3, at the end of those loops And that’s perfect, because as we realized earlier, 3 happens to be the correct location for where you put the backslash 0, even though it wants to go in the fourth location, the address of it, or the location is technically 3 So, we sort of solve multiple problems at once, in this way Good question Other questions? Other questions? All right So With that said, let’s not worry as much about that low-level kind of implementation, and consider what more we can do with these things called arrays If we start to get rid of the addresses, and just know that we have sequence of characters, or anything else in memory, and really, at the end of the day, we have the ability to lay things out contiguously in memory Back to back to back to back, like this So, here are, then, eight boxes, inside of which we can put anything But we’ve kind of been cheating as humans for some time When we had Stelios’ name on the screen, all of us in this room just kind of glance up, and you kind of absorb his name all in one fell swoop But it turns out that computers are not quite as intuitive, or as all-seeing as we are, where we can sort of take in the whole room, visually A computer can only look at one thing at a time And so, a better metaphor than a box like this for the locations that you have in your computer’s memory would really be like eight lockers in a school, where a computer, in order to look at the value in any of those boxes, actually has to do a bit of work and open the door to see what’s inside And you cannot, therefore, see all eight locations simultaneously So, this is going to have very real world implications, because now,

if we want to start checking the length of a string, or counting the number of things in an array, or moving things around, we’re going to have to do one thing at a time Whereas we humans might just kind of look at it and say, oh, sort these numbers in some intuitive way And that actually is a good segue to a very different problem, which is that of sorting values So, for this, we have, say, the equivalent of seven lockers, up here now And behind these doors, so to speak, is a whole bunch of numbers So we’ll transition away from characters and strings, and just generalize it to numbers, because they’re convenient to work with, and we have so many of them at our disposal But I’d like to find one number in particular So, suppose that a computer were storing an array of numbers, a whole bunch of integers, back to back to back to back Here’s what they might look like The doors are closed, though, so we need an algorithm for finding a number, because a computer can’t just look at it and say, there’s your number there The computer has to be more methodical, probably going from left to right, from right to left, starting in the middle, randomly opening them We need an algorithm So, for that– could we get one brave volunteer? OK, come on down What’s your name? CHRISSY: Chrissy DAVID MALAN: Kristen? CHRISSY: Chrissy DAVID MALAN: Chrissy Come on down, over this way All right, so Chrissy, all you know is that there are seven doors– nice to meet you– here on the screen And using your finger, you should be able to just touch a door to open, or unlock it, and we’ll see what it is And I would like you to find the number 50 Dammit [LAUGHTER] Very good! [APPLAUSE] I feel– we need a better prize than a stress ball for that But very nicely done And let me ask you, if we can– if you don’t mind me putting the mic in your hand– here you go So, what was your amazing algorithm for finding 50? CHRISSY: I just clicked on it DAVID MALAN: OK, that’s great CHRISSY: It looks nice DAVID MALAN: OK So, you– OK So, good So, wonderfully effective Let me go ahead and reveal– actually, let’s do this So, here’s where you could have gone wrong any number of places And let me ask the audience before we try one other example What strikes you about these numbers? Anything in particular? AUDIENCE: They’re unordered DAVID MALAN: They’re all in order? AUDIENCE: Unordered DAVID MALAN: Unordered Kind of random Although, the very astute might notice something Or, someone who watches too much TV Yeah AUDIENCE: They’re the numbers from Lost DAVID MALAN: Yes, thank you The two of us watch a lot of Lost So– plus we had a seventh number, so we added ours So, they are actually in random order I just kind of shuffled them arbitrarily They’re not sorted from smallest to biggest, they’re not sorted from biggest to smallest There’s really no pattern, because I really just did shuffle them up And that’s problematic, because even though Chrissy got lucky, finding 50– was just so good at finding 50– it might be hard to replicate that algorithm and find it every time, unless you know a little something about the numbers behind the doors And so, we do know that in this next example So, in this next example, there are still seven doors, and still the same seven numbers, but now they’re sorted And knowing that, does that change your approach? CHRISSY: Uh, well, I guess if they’re sorted, like, lowest to highest, then it would, because I would know it’s closer to the end But if I don’t know how it’s sorted, then– I guess I wouldn’t really know DAVID MALAN: OK, good So let me stipulate they’re sorted from smallest to biggest, and let me propose that you, again, find us the number 50 Yeah, this is not working out very well That is very well done OK, so, congratulations [APPLAUSE] So, you’ll see– thank you So, you’ll see just how much more efficient her second algorithm was, when she leveraged that information But in all seriousness, you could do better, especially for very large datasets, if you know something about the data If you know that these numbers are sorted, you could, as Chrissy did very intuitively and very correctly, go to the very end, knowing that that’s the biggest number, it’s probably all the way on the right If she didn’t know that, and just knew that the numbers were sorted, and did not know if 50 was a small number, a medium number, the largest number– just it was a number behind doors What would a smart strategy be, given that lesser information? AUDIENCE: [INAUDIBLE] halfway, and then if it’s greater, you move it to the right, if it’s less, move it left DAVID MALAN: Yeah, we can try that same divide and conquer approach we did in the very first class, with the phone book, right? Where we looked roughly in the middle, because we know that the Ms, give or take, would be in the middle And then if the– we’re looking for Mike Smith, whose name starts with an S, we know that he would be to the right, and so we would go to the right, and then to divide the problem– [LAUGHTER] Today is not going very well So, we would divide and conquer the problem again and again And we can do that here Even though it’s not quite as visually engaging as a phone book, you can kind of go to the middle And if Chrissy had opened that middle door, and seen 16, you would know– what? And actually, I can recreate this

I’m just going to refresh the screen So in this case, we have the same doors I know 50’s somewhere, and I don’t know how– where it is, but 16 Now I know it’s to the right, as you propose So, now I can essentially eliminate all these doors, not even worry about them And indeed, if I open them, we can confirm that I don’t need to waste any time actually searching them Now I’ve got three doors What would you propose we do next? AUDIENCE: Go in the middle again? DAVID MALAN: Yeah, go in the middle again So here, 42 is a great answer, but it’s not the one we’re looking for And indeed, we can throw this half of the problem away, and finally search the right half, which now has been whittled down to one, and then we would find 50 And if I can reconstruct what would have been a great history, in the first example, how well might Chrissy have done theoretically, or if we did this exercise again and again and again, with the first example If you don’t know anything about the numbers, you can get lucky, as one might by just choosing a door and, wow, that happens to be the number But that’s not going to happen all the time, most likely And so you might have to just start, you know, maybe at the beginning– no– no– no Maybe you can get clever and skip ahead– no– no– OK, eventually, you will find it, if it’s actually there But if you don’t know anything about the numbers, the best you can do is what’s called brute force Just brute force your way through the possibilities until you find the answer But in the worst case, how many doors might I have to open to find the number 50, if I knew nothing about them? AUDIENCE: All seven DAVID MALAN: Yes, all seven Right? If n is the number of doors, it might take me as many as n steps In this case, it was like n minus one, or six steps But in Chrissy’s case, clearly, there’s a really exciting lower bound, because if you do get lucky, it might only take you one step So, that’s an interesting range to consider The solution to your problem might take one step, or n steps, or any number in between But the binary search that you proposed in the second approach, where you divide and conquer– recall that when we did that in the past, we had a very nice shape to the curve that we drew that described the efficiency of that algorithm And we’ll come back to that before long But let’s just, for clarity, formalize just what these two algorithms are If I start from the left and go right, or start from the right and go left, following a line, we would call that linear search And it can be written in any number of ways I came up with this pseudo code, but again, any reasonable person could come up with an alternative one and say it, too, is linear search These are not official definitions And I wrote it as follows For each element in array, if the element you’re looking for, return true So, this is a very concise way of saying, for each element in the array, just look at it from left to right, or right to left If it’s the one you’re looking for, return true I found the number 50, or whatever it actually is Otherwise, return false at the very end And notice the indentation I un-indented it because only as the very, very last step do I return false, if none of my iterations through the loop actually return true So, that would be linear search Binary search, you have to be a little more verbose to explain it, perhaps And there’s many different ways to write this out, but this is very similar in spirit to something we saw before, with Mike Smith and the phone book So if we go ahead and look at the middle of the sorted array, just as you proposed, and if the element you’re looking for is right there, go ahead and return true I found 50 I got lucky, it was dead center in the middle of my array, in one particular running of this program Else, if the element is to the left, search the left half of the array Else, if it’s to the right, search the right half of the array Otherwise, return false, because it’s presumably not there So, this would be binary search And even though it’s more lines of code, or pseudo code, it arguably should be a little faster, right? Because of that dividing and conquering, and throwing half, half, half, half, half, half of the problem away, the problem gets smaller much, much more quickly All right So with that said, it seems to be a very good thing that having things sorted for you is a very powerful ingredient to a problem Right? It takes more work– should have taken Chrissy more work; would take all of us, in general, more work to find a number using linear search than by using binary search Much like it would have taken me forever to find Mike Smith flipping one phone page at a time, versus using divide and conquer, where I found him much more quickly by dividing the problem in half, and half, and half again So, that invites the question, how do you get something sorted? Well, let me go ahead and pull up these things, which we happen not to use in this class, but elsewhere on campus, you might have these blue books And at exam time, you might write your name, and the class, and all that on them And we’ve kind of simplified, so, this is a person whose last name starts with A, last name– a person’s name starts with B, C, and all the way to Z But suppose that people finish at different times during the exam, and of course, when you’re in the science center or wherever, everyone just starts handing them in, and they kind of come in like this, and then the TFs at the front of the room, or professor, has to actually sort through these values Well, what’s this algorithm going to be that we actually use? If you’ve got a pile of exam books like this, all of them

have a letter associated with them, how would we go about sorting these? What do I do? AUDIENCE: Compare two at a time DAVID MALAN: Compare two at a time? OK, so let me go ahead and pick up a couple here And since I’m the only one that can see them, you should be able to see them on the overhead, thanks to Ian and Scully So, P and H, so, H goes before P, alphabetically All right, now what do I do? Pick up another two? AUDIENCE: Pick up one DAVID MALAN: Yeah, so maybe just pick up one, and I happened to get N, so, that goes in between H and P So I can kind of slide it in Now I picked up G. That goes before H. Now I picked up another one, E That goes before G. So, it’s actually getting kind of easy Uh-oh, U– this goes at the end, after P And so, I can keep grabbing one at a time, F in this case, and then kind of insert it into its appropriate location And we won’t do this the whole way, because it’s going to get tedious, and eventually I’m going to embarrass myself by getting one of the letters wrong But that’s an algorithm, right? For each book in the pile, pick it up, compare it to all of the elements you’re already holding, and insert it into the appropriate location And we can actually call that something, and that’s something called insertion sort, insofar as the emphasis of the algorithm is– thank you– on inserting letters, in this case, into their appropriate location So, with that said, are there other ways than that? Let’s go ahead, here– and we have enough stress balls to do it with just one more human demo here, beyond these books– these numbers here Suppose we wanted to sort these I have eight stress balls, eight pieces of paper– could we get eight other volunteers? So, yes, right and back, two three– let’s go farther back– four Can I get farther back? Five, six, seven, and eight Come on up Ah, next time Next time All right, come on down OK What’s your name? JERRY: Jerry DAVID MALAN: Jerry OK, If you want to go ahead and step in front of a board there ARMAH: [? Armah. ?] DAVID MALAN: [? Armah, ?] David CHRIS: Chris DAVID MALAN: Chris, David Thank you KAYLIND: Kaylind DAVID MALAN: Kaylind NOLAN: Nolan DAVID MALAN: Nolan, David JAY: Jay DAVID MALAN: David MATTHEW: Matthew DAVID MALAN: Matthew, David OK So, this is statistically anomalous, insofar as we’re actually very excited to say, in CS50, this semester, for the first time ever– we watch these numbers annually– and we actually have 44% women in CS50 this year So, as you can see, none of my demonstrations are going correctly today But trust in that data So if each of you could stand behind one of the music stands here– hopefully I have counted out exactly eight people We have, in front of you guys, numbers So go ahead and turn around the pieces of paper, which represent the numbers that we have on the board here, if I got the same order right So, there’s going to be a bunch of different ways we can sort these eight values So, how do we go about doing this? Well, much like you proposed earlier– just pick up a pair of blue books and compare them– why don’t I try that same intuition? So, four and two– these numbers are obviously out of order So, what do I want to go ahead and do? Yeah, so I can go ahead and swap these And now, problem is solved, which is great I’ve taken a bite out of the problem And then we move on Four and seven? Those look OK Seven and five? Not OK So, what do I want to do here? AUDIENCE: Switch them DAVID MALAN: Yeah, so I can swap those, thank you So, seven and six? Also out of order Let’s swap those Seven and eight? That’s good Eight and three? Not correct, so if you want to go ahead and swap those And, eight and one, if you want to go ahead and swap those So, what was the net effect? Have I sorted this list of numbers? So, obviously not But I did improve it And what’s the most glaring example, perhaps, of the improvement, in front of these? AUDIENCE: Eight’s at the end DAVID MALAN: Eight is now at the very end So the biggest number, if you will, bubbled up to the end, as though it was bigger and sort of bubbled up So that’s good, but there’s still work to be done here So, I can, again, just try to fix these problems locally Just pick up a couple of problems and try to solve it So, two and four? We’re good Four and five? Five and six? Six and seven? Ooh, seven and three? If you guys want to swap those? Wonderful And then, seven and one, we want to do it again And now, do I need to bother comparing seven and eight? Technically no, right, because if we know eight made its way– now we can start cutting some corners, but correctly, just to shave some time off of the algorithm, make it a little more efficient Good So, sorted now? No, so we have to keep doing this And let me let you guys now execute this, pair-wise at a time So, here we go Two, four Four, five Five, six Six, three Six, one And we can stop there, because we know seven is in order Now we do it again Two and four Four and five Five and three? Five and one? Improved, good And now, next, two and four, four and three, four and one, and then two and three, three and one– and then two and one– OK, now it’s sorted Yes, very well done Very well done So, it’s kind of tedious, frankly, and I didn’t want to keep walking back and forth, because thankfully we have, like,

all of this– this manpower, these multiple CPUs, I guess, literally today So, we have all of these CPUs, or computers, that are able to help me out But it was still very slow I mean, it’s a long story So, let’s rewind just once and do one more If you guys could rearrange your pieces of paper so that it matches the screen again, just to reset to our original location Let’s go back there And let’s try one other approach I’ve tried the insertion approach, whereby I just take a problem, like the blue book, and insert it into its correct location But honestly, that was getting a little tedious, and that’s why I aborted, because it’s going to take me longer, and longer, and longer to find the right location among all of those blue books So, let me try just a more intuitive one If I want to sort these numbers, let me just go and select the smallest number I see OK, four, at the moment, is the smallest number I see So I’m going to grab it for just now Now, again, these are lockers Even though we humans can see them all, the computer can only see one location at a time And this is, indeed, the smallest number I have seen thus far So I’m going to grab it But very quickly, I can abort that, because now I’ve found an even smaller number So I’m going to hang onto that instead Seven, I’m not going to worry about that; five, six, eight, three, one– I’ve found a smaller number Now, I need to kind of do something with this I want to grab the one, so I could just put the two here And what do I want to do now with the number one? Yeah, I kind of just want to put it here And so, I can do this in a few different ways, but you know what, I’m just going to evict whoever’s here, because it’s a pretty low number, but it’s also random It could have been a big number So let me just make room for it and do that I have selected the smallest element Is the list sorted? I mean, obviously not But is it better? It is, right? Because the one is now, at least, in the right location So I’ve solved one eighth of the problem so far So that’s not bad What could I do next? Let me apply the same algorithm Let me select the smallest, which is currently this one, still this one, still this one– nope, three is smaller– oh, two is even smaller, so let me ultimately grab this And you know what, four, you really don’t need to be there; I’m just going to evict you again, and put two where it belongs, and move this one over here So now, the list is even more sorted And if I proceed to do this again and again, if you want to just keep handing me the smallest number– three? OK, so I’m going to go ahead and just evict seven, because it’s kind of a random number anyway And now, thank you, four– I’m going to go ahead and evict five, even though, kind of feels like I’m making a little bit of work for myself, on average, it’s not going to matter Sometimes it will be good, sometimes it’ll be bad So let me go ahead and put five over here Now I need to select the next smallest element, which happens to be five So we recovered pretty quickly So I’m going to evict six over here Now I’m going to look for the smallest element Now it’s indeed six I’m going to evict eight, put this over here– and now seven is good, eight is good– done But it’s still kind of a long story, right? Like, I’m going back and forth, looking for these numbers But this would be what we’d call selection sort So thank you all very much– if you’d like to keep your pieces of paper, you’re welcome to And let me give you guys a stress ball as well And a round of applause, if we could, for you guys If you want to hand those out So, as before, let’s see if we can apply– let’s see if we can apply some pseudo code to this algorithm, because it’s one thing to talk about it, and it’s one thing to sort of reason through it intuitively, but at the end of the day, if you want to program this, we’ve got to be more precise, and we’ve got to consider the lower level operations that the computer is going to execute So, here’s how we might implement bubble sort Repeat until no swaps For i, from 0 to n minus 2– and n is just the size of the problem, the number of doors, the number of humans, the number of numbers, whatever the input to the problem actually is So, for i, from 0 to n minus 2, if the i-th and the i-th plus 1 elements are out of order, swap them So, this is kind of a mouthful But if you think about it, I’m just kind of applying some of the vocabulary that we have from C, and kind of sort of from scratch, to an otherwise very organic human experience, but using more methodical language than I was just kind of doing off the cuff, when we were doing it with humans Because what does this mean? For i from 0 to n minus 2 That means start at, like, the 0 location, and if there’s n elements here– this is 0– and this, at the very end, is location– it’s going to be n minus 1 Right? If you start counting at 0, you have to readjust your whole life to subtract 1 from the tail end of that range of numbers Right? 0– if that was 1, this would be n But if that were 0, this is now n minus one So I’m saying, though, for 0– for i from 0 to n minus 2, which is technically this So I’m using sort of for loop-like language to start iterating here, and do something like this, up until the second to last element, which we’ve not done before Seems almost buggy

But if you read ahead in the pseudo code, why did I do that? And only iterate until the second to last element with i? What jumps out at you? Yeah AUDIENCE: Because then you’re gonna swap the i in the i-plus-one-th elements? DAVID MALAN: Good AUDIENCE: When you get to n minus 2, you’ll swap it with n minus 1 DAVID MALAN: Exactly Recall that bubble sort was all about swapping, or potentially swapping pair-wise elements, neighbors, if you will So, you have to make sure that if you’re iterating through all of these numbers, you have to stop short of the end of the array, so that i plus 1 actually refers to an element that’s actually in your list, and not, for instance, way over here, which would be some garbage value that you shouldn’t actually touch So, if those elements are out of order, we swap them, and I have a big outer loop there that just says, keep doing this, again and again and again, until you don’t swap anything At which point you can infer that you’re done Because every time I walked back and forth on the list, and the guys helped out by swapping their numbers as appropriate, I kept doing it again if there was still room for improvement And intuitively, why is it absolutely, logically safe to stop that whole process once you have not made any swaps on a pass through the list? Why is that a safe conclusion? So, if I walk through the list– no, these are good, these are good, these are good– OK, you didn’t want your number– these are good, these are good– how do I know that I don’t need to do that again? AUDIENCE: It’s sorted already DAVID MALAN: It’s sorted already, right? And it would be kind of irrational– if you’ve walked through the list, looking at everything pair-wise, found nothing to swap, to even bother doing that again– why would you expect different results, if the numbers themselves are not moving and you didn’t move them yourself? So you can just stop But this still is going to invite the question, well, how expensive was that? How many swaps, or comparisons, did we make? And we’ll come back to that before long Selection sort, though, can be expressed maybe even a little more succinctly And that was the second algorithm we did with our eight volunteers here, for i from zero to n minus 1 So this time, all the way through the end of the list, find the smallest element between i-th and n minus 1-th So between those two ranges, the beginning of your list and the end, and then swap the smallest with the i-th element So what does this mean? So again, for i from 0 to n minus 1 This is just pseudo code for saying, start a variable i at location 0 And do this until i equals n minus 1 So, do this until you’ve gone all the way through the list What is it telling me to do? Find the smallest element between the i-th element and the end of the list N minus one never changes It always refers to the end of the list, so that’s why I walked through the list looking for, ultimately, the number 1 And what did I do with the number 1? Swap the smallest with the i-th element And I might have gotten one of the steps wrong when I did a little switcheroo, but we fixed it thereafter Ultimately, I kept evicting whoever was in the i-th location to make room for the element that I knew belonged there And I could have shuffled them to make room for those elements, but it turns out, mathematically, it’s just as fine to just evict it and move it all the way to the end, as we did And once I’ve gone all the way through the list, there is no more smallest element to find And as we saw, the list is sorted So, maybe this is faster, maybe this is slower– it’s not immediately obvious And insertion sort, which we actually came up with by way of the blue books on the floor, might be described as this For i, from 1 to n minus 1, call the 0th through the i minus i-th element the sorted side– that’s a mouthful– so, consider the left of your list the sorted side of the list And initially, there’s nothing there You have zero elements sorted to your left, and eight unsorted elements to your right So that sort of describes this story, when we had volunteers and numbers here There are no elements sorted; everything to my right was unsorted That’s all that’s saying Remove the i-th element That was like picking up this blue book, if we were using blue books in this example Then what do I want to do? Insert it into the sorted side, in order So, if this is the sorted side, this is the unsorted side, this is the equivalent of saying, take that blue book and just put it in the first location And you can kind of make a visual gap for it Now, this is the sorted side, this is the unsorted side Or, equivalently, when I was down here with the blue books, the books in my hands were the sorted side, and everything still on the stage was the unsorted side Same idea So, what happens next? I then iterate one location next, and I remove the next element, and whatever number that is, I figure out, does it go to the left or does it go to the right? Which was the same thing, again, on stage, me sort of picking up a third blue book and deciding, does it go in between these books? Does it go below, does it go above? I inserted it into its appropriate location So in this insertion sort algorithm, you sort of take each number as you encounter it, and deal with it then and there You take the number and deal with it, so, you know what, this one’s got to go here, if we just kind of pretend what the numbers look like for a moment So that would be inserting it into the right location, like I did with the blue books Maybe this one– oh, maybe this one’s a really small number, and so I insert it over here So I kind of literally deal with each problem as I encounter it,

but it just gets expensive, or very annoying, to have to move all of this stuff out of the way to make room for those elements And that’s why I got bored with the blue book example, because it was getting very tedious looking through all of the blue books for the correct location So in short, all three of these algorithms, while very differently expressed, and while all of them are kind of intuitive until you try to describe what your human intuition has actually been doing for the past some number of years that you’ve been sorting things just in the real world– they can all be described, at least, in terms of these algorithms So these algorithms– and we started this conversation in the first lecture– all have, ultimately, some kind of running time associated with them, like how long does it take to do something And we talked about the process of finding Mike Smith in terms of this pretty generic graph It wasn’t very mathematical, it wasn’t very sophisticated– we just wanted to get a sense of the relationships, or tradeoffs, of space and time, so to speak And so, on the x-axis, or horizontal, we have the size of the problem– so, like, a number of pages in the phone book, or number of people in the phone book– and on the y-axis, or vertical, we had the amount of time to solve the problem How many seconds, how many page turns– you could count using any unit of measure you like And the first algorithm for Mike Smith, when I started with the very first page and turned, and turned, and turned, was a straight line, a linear relationship One more page, one more step So, it’s straight line The next algorithm was searching by twos, recall, in the first lecture Two, four, six, eight And that’s still a straight line, because there’s a predictable relationship between number of pages and number of seconds, or page turns It’s two to one instead of one to one, so it’s still a straight line, but it’s lower on the graph It’s better But the best one, by far, was the divide and conquer approach, we said Right? And it certainly felt faster; it’s great, because it was intuitive It wasn’t quite as easy to express in pseudo code– that was among the longer ones today– but it at least got us to the answer faster And this is logarithmic, in the sense that the logarithm– technically base 2, if you recall some of your math– it’s because you’re dividing the problem in half, and half, and half And it’s fine if you’re uncomfortable with it, don’t even remember what a logarithm is For now, just assume that logarithmic time has this different shape to it It grows much more slowly Any time you can choose log of n over n, when picking between two algorithms, go with the log n, or something like that, because it is going to be smaller, as we can see visually here So, let’s just consider something like bubble sort There’s a couple of ways we can look at this And again, the goal here is not to be very mathematical– we’re not going to start doing proofs, but at least, by taking a glance at some of the steps in this algorithm, you can get a general sense of how slow or how fast an algorithm is And indeed, that’s the goal There’s this fancy notation we’re about to see called asymptotic notation, with special Greek characters But at the end of the day, we’re really just trying to get an intuitive sense of how good or bad an algorithm is, much like we were trying to do with this picture But now we’ll do it a little more conventionally, as a computer scientist might So in bubble sort, recall that we compared every pair of humans and made a swap if they were out of order And then we repeated And we repeated, and repeated And we kept going through the list So, that can be quantized– like, you can kind of break that down into some total number of steps So, if you’ve got n humans in front of the room, and you want to compare them from left to right in pairs, how many possible pairs are there as I walk through the list for that first time? If there’s n elements, and I can put the stands back where they were How many pairs were there, as I walked from left to right? I compared these two, these two, these two Yeah, there’s n minus one Specifically seven And even if you’re not quite sure where we’re going with this, if there’s eight stands– like, this is one, two, three, four, five, six, seven– and there’s eight total So, that’s indeed n minus 1 So, that’s how many comparisons we might have made the first time through bubble sort But the very first time I went through the list in bubble sort did the list get fully sorted? No, we had to do it again We knew that 8 had bubbled up to the very end, so that was good 8 was in the right place But I had to do it again, and fix more problems along the way But the second time, I didn’t need to go all the way through the list To be clear, who did I not need to look at? The last location, or 8, in our very specific case So the second time through the list of humans, I only have to make n minus 2 comparisons Right? Because I can just ignore number 8, the final human, and just deal with the seven other humans that are somehow misordered So, if I wanted to really be nitpicky and write this down, and count up how many steps, or how many comparisons, I made, we could generalize it like this All right? It’s going to be n minus 1, plus n minus 2, plus n minus 3, plus dot-dot-dot Or, more specifically, 7 plus 6 plus 5 plus 4– this is just the fancier formulaic way of saying the same thing Now, I don’t remember my back-of-math-book formulas all that well, but I remember this one You know, in your physics books, your math books, often in the hardcovers, you’ll see little cheat sheets for what series of numbers

actually sum to or multiply out to And so it turns out that that summation can actually be expressed more succinctly as n times n minus 1, all divided by 2 That is the same thing, mathematically, as the long series that I had a dot-dot-dot there for So if I multiply this out, just using some algebra, that’s like n squared minus n, divided by 2 And then if I kind of multiply that out, that’s n squared, divided by 2, minus n over 2 So if I wanted to be really precise, this is the running time of bubble sort It takes this many steps to sort n people Why? Because I literally just counted the number of comparisons I made That’s how many comparisons it takes to do bubble sort But honestly, this is really getting tedious, and my eyes are already starting to glaze over I don’t want to remember these algebraic formulas here So let’s actually try an example, just to get a sense of how slow or how fast this is Suppose that n were a million So not eight, but a million people, or a million numbers How slow, or fast, is bubble sort going to be? Well, if we plug in a million, that’s like saying n is a million So that’s a million squared, divided by 2, minus a million divided by 2 Because that’s what it all summed up to be So if I do this out, it’s a really big number 500 billion minus 500,000 And in any other context, 500,000 is a pretty darn big number But not so much when you subtract it from 500 billion, because you still get 499,999,500,000 after subtracting those off Which is to say, of those two terms, the one on the left versus the one on the right, which is the bigger one? The more dominating factor in the mathematical expression? It’s the one on the left, right? That’s the one that’s massively bigger And so more generally, n squared divided by 2 feels like a bigger result than n divided by 2 alone And we’ve seen it by proof– by example, which is not a proof, but it at least gives us a feel for the size of the program, or the number of comparisons we’re actually making So you know what, ugh– if that is the case, if the dominant factor– the one on the left, in our case; the one with the square, specifically– is just so much more influential on the number of comparisons we’re going to make, then let’s just wave our hands at the lower-ordered terms, and divide it by 2, and anything like that, and just say, ugh, this algorithm feels like n squared I’m going to throw away the denominator, I’m going to throw away the thing I’m subtracting off, I’m going to throw away anything that is not the dominating factor, which is the term that contributes the most to the total number of steps incurred And indeed, this is what a computer scientist would describe, generally, as the running time of this algorithm It is on the order of n squared It’s not n squared, but it’s on the order of n squared, as we’ve seen It’s pretty darn close, and it’s good enough for a conversation with another reasonable person who wants to debate whether his or her algorithm is maybe better or worse than yours So, this would be called big O notation Big O is used to refer to an upper bound on an algorithm’s running time Upper bound meaning, we’ll consider, for our purposes, in the worst case, how much time might this algorithm take? Well, it’s going to take on the order of n squared steps Because if the list of numbers is unsorted initially, we’ve got to do a lot of work to actually sort it There’s other terms that we could put in those parentheses Some algorithms are not on the order of n squared Some algorithms are actually order of n log n; some algorithms are on the order of n itself, or log n, or even 1, where 1 refers to constant time And in fact, the ones I’ve highlighted here, we’ve actually seen examples along the way of all of these so far For instance, what algorithm have we seen that has a running time on the order of n? n steps? AUDIENCE: Linear search DAVID MALAN: Linear search If we were to think back, even to today, to linear search– or from the first lecture, when I was just looking for Mike, really slowly, one phone book page at a time, that’s a linear algorithm If there’s n pages, or n humans, it might take me on the order of n steps, because Mike Smith– S is toward the end of the alphabet, so he might be way over there, or way toward the end of the phone book, or, God forbid, his name starts with a Z, then I’m really going to have to go all the way into the phone book And so that’s on the order of n steps So, n here would be linear We’ve also seen another algorithm, here in yellow– big O of log n Saw it just a moment ago Which of our algorithms was on the order of log n running time? Yeah, so binary search Divide and conquer We didn’t call it– we didn’t describe it this formulaically in the first lecture, but that’s how you would describe the running time Not just with a pretty picture, but just with an expression like this, that all humans– at least computer scientists– can agree upon And constant time The funny thing here is, because we’re throwing away terms that don’t really matter, O of 1 does not mean an algorithm that takes one step only That would be a very limited number of options for your algorithms But it does mean, symbolically, a constant number of steps A constant number of steps So, what’s something you might do that takes a constant number of steps, in an algorithm? Maybe in, like, the first lecture, we had the silly peanut butter

and jelly example Which of the steps that day might have taken big O of 1 steps, a constant number of steps? I remember one, like, insert knife into jar? That’s kind of one step Maybe it’s two, because I might have to, like, pick up the knife, and insert it into the jar then One step, two steps– but it’s a constant number The number of steps is not at all informed by the algorithm itself It just happens You do that in one step So, if there’s any number of other algorithms here– and we’ll leave in white one that we’ll come back to– but let’s just consider the opposite of this, if you will If big O is our upper bound on running time, it turns out, there’s a vocabulary for discussing the lower bound on an algorithm’s running time Which, for our purposes, we’ll generally consider in the context of best case So in the best case scenario, how little time might an algorithm take? Well, this is a capital omega, and it’s used in exactly the same way You just say, omega of n squared, or omega of n, or omega of log n So it’s the same symbology, it just refers to a lower bound So, it takes this few steps, or this many steps Big O, big omega So, what’s an algorithm, therefore, that is in, so to speak, omega of 1? Like, what algorithm, in the best case, might actually take just one step? And who is best to answer this question today in the room, in fact What algorithm could take one step? Yeah AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, linear search could take omega of one steps Because in the best case, it is right there Or in Chrissy’s case, even if our algorithm is to sort of choose randomly, in the best case, it is right there, the number 50 So even her algorithm, and even our linear search algorithm– and for that matter, even our binary search algorithm– are in omega of 1, at least in the best case Because if you get lucky, you’re done One step By contrast, what is an algorithm that takes at least n steps? So, omega of n? AUDIENCE: [INAUDIBLE] DAVID MALAN: That’s a good one So you say bubble sort, I heard AUDIENCE: Yes DAVID MALAN: Why? AUDIENCE: Because if they’re all already in order, you just go through each comparison, and then make no swaps DAVID MALAN: Good So in the case of bubble sort, where we generally had a lot more work to do than just finding something with a searching algorithm, bubble sort is, minimally, an omega event you need at least on the order of n steps– maybe it’s n minus 1, or n minus 2– but it’s on the order of n Why? Because only once you go through the list at least once do you know– what, to be clear? AUDIENCE: That they’re all in order DAVID MALAN: That they’re in order And you know that as a side effect of having not made any swaps So, you can only determine that a list is sorted in the first place by spending at least n steps on that process Excellent So, there’s yet another one, and this is the last, whereby if you happen to have an algorithm, or a scenario where the upper bound and the lower bound are the same– turns out there’s a symbol for that too; you can just describe the algorithm in terms of theta notation That just means theta of n, theta of log n– whatever it is, that just means upper bound and lower bound are one and the same And there’s more formalities to this, and you can actually dive in deeper to this in a theory class But for our purposes, big O and omega will be a generally useful way of describing, generally speaking, just what the running time of an algorithm actually is So, big O of n squared is the fastest we’ve seen thus far Unfortunately, it does actually tend to run pretty slowly We saw it with an example of, like, 500 billion steps just to sort a million elements Turns out we can do way better than that Much like in the first lecture, when I crazily proposed, I think, suppose your phone book had, like, four billion pages– well, you only need 32 steps using binary search, instead of four billion steps using linear search So, it would be nice if, after all of this discussion of algorithms and introduction of these formalities, if we can actually do better And it turns out that we can do better, but this has been a lot to do already, so let’s go ahead in here and take a five-minute. break And when we come back, we’ll blow out of the water the performance of all three of those algorithms we just saw All right So, let’s take a quick look at what these algorithms look like, so we can actually compare them against something that I claim is actually going to end up being better OK So, here we have an array of numbers represented as vertical bars So, small bar is small number; tall bar is big number And so it’s a nice way to visualize what otherwise is pretty low level numbers alone I’m going to go ahead here and make the animation pretty fast, and I’m going to go ahead here and choose, for instance, bubble sort And, actually, let me slow it down a little bit, just for the sake of discussion So, you’ll see, in this algorithm, bubble sort It’s making multiple passes through the list, just as I did, highlighting in pink two neighbors at a time, and deciding whether or not to swap them, just as we were, with our eight volunteers, doing the exact same thing Of course, with this visualization, we can do it more quickly, and we can speed this up to the point where you can kind of now start to feel the bubbling effect, if you will, whereby the bigger

numbers are bubbling up to the top, to the right, just as the number 8 did, when we did it on paper So, this is bubble sort And we could watch this for quite some time, and in some sense, it’s kind of mesmerizing But in another sense, it’s pretty underwhelming, because at the end of the day, all you’re going to get is a bunch of bars, sorted, from short bars to big bars But perhaps the takeaway is that I’d kind of have to stall here for a decent amount of time, even though we’re running this at the fastest speed, because it’s only fixing, at best, one number at a time And maybe some others are improving, but we’re only moving all the way to the end one number at a time And we have to then go back, and go back, and go back, and do more work It’s going to be very ungratifying to abort it, but let’s go back to random And now, if we choose, for instance, selection sort, you’ll see that the algorithm works a little differently Let me slow it down And what it’s doing now, which is a little less obvious, is it’s looking through the list for the next smallest element, and it’s going to put it at the beginning of the list All the way at the left So, it’s looking, and looking, and looking, and it leaves highlighted in red the most recently discovered smallest element And then as soon as it gets to the end of the list, it’s going to move that smallest element all the way to the left So that we now, kind of like the opposite of bubble sort, have all of the smallest elements to the left Though, this is arbitrary We could bubble up the small elements by just reversing the order of our operations; we could sort from biggest to smallest– that is irrelevant It’s just by human convention we tend to sort from smallest to biggest, at least in examples like this And we can speed this up, but it doesn’t quite have quite the same comparison effect, because all you’re doing is a swoop through the list looking for the smallest, looking for the smallest, looking for the smallest And so, this way, it’s going to build up from short to tall Let me go ahead and do it one more time, this time with insertion sort, and slow it down And so, what we’re doing here is the following We identify the next element, and then we go and insert it into the place it belongs in the “sorted half” of the list So, recall that I generally describe stuff on the left as being sorted, stuff on the right as being unsorted, and the implication of that is that even though these numbers here on the left are indeed sorted, when I encounter a new number, out in the unsorted area, I might have to move some things around and shuffle things around And unlike the cheat I was doing here in person– when I grabbed that music stand before and just kind of moved it over here– that’s not really legitimate Right? This is garbage value land Like, I should not have had access to this memory And so what we did with our actual eight humans was more legitimate The fact that our volunteers did the physical labor of moving those numbers around? That was the low-level work that the computer has to do, too And you see it here all the more, either at this slow speed or the faster speed It’s being inserted into the appropriate location So, case in point, this tiny little element? We have to do a huge amount of work to find its location, until finally, we’ve found it, and now we do the same thing So, all of these have some pluses and some minuses But it turns out, with merge sort, we can do even better An algorithm that goes by the name of merge sort But to do better, we need to have a new ingredient, or at least more formally defined, that we’ve kind of sort of leverage before, but not by name And to do this, I’m going actually take out a little bit of code, in CS50 IDE, a program called sigma-0.c And we’ll see the interconnection in just a moment So in this program, notice the following We have a main function, whose purpose in life is to get a positive integer from the user, and to pester him or her, again and again and again, until they cooperate and provide a positive integer That’s what the do-while loop is often useful for And then, what do we do with it? We simply pass that value, n, that the human typed in, via get int, to a function called sigma And sigma is like the Greek character, or the capital E-looking character, that generally means, in math, like, sum a bunch of numbers Add a bunch of numbers together So, this is essentially a function called sigma, whose purpose in life is to sum all of the numbers from 0 to n Or, equivalently, from 1 to n So, 1 plus 2 plus 3 plus 4 plus, dot-dot-dot, n, whatever n happens to be And then, via printf, we’re just printing it out So, let me just run this program to make super clear what’s going on And I can do this by doing, of course, in my source three directory for today, make sigma 0, enter, dot slash sigma 0, positive integer, I will do 2 So by that definition, it should sum 0 plus 1 plus 2, so 1 plus 2– that should be 3 So I should see 3 And indeed, I see 3 Let’s do one more, if I add in three numbers So, this should be 1 plus 2 plus 3– so, that’s 1– that’s 6, in total And so forth And they get pretty big pretty quickly If I do 50, then we start to get into the thousands So, that’s all it’s doing And how is it doing this? Well, we could implement this in a whole bunch of ways, but if we leverage some of our sort of techniques thus far, we might do it using a for loop That’s kind of been one of the most common tools in our toolkit

And how am I using it here? I’m first declaring a variable called sum, initializing it to 0, because I’ve done no work yet Then I have a for loop, for i equals 1 all the way up through m Why m? Well, just because Recall that when you make your own function, whether in Scratch or in C, you get to decide what to call the inputs to that function The arguments, or parameters, as they’re called And just for clarity, I called it m, even though we’ve typically been using n I could have called it anything I want I just wanted to make super clear it’s a different variable But more on that in a week or so And so I’m just counting from one to m, and I’m adding to sum whatever i is Now, just as a quick check, why am I not doing sum plus plus, as I usually do in these kinds of cases? AUDIENCE: Because you’re not incrementing by [INAUDIBLE] DAVID MALAN: Exactly I’m not incrementing by 1, I’m incrementing by 1, and then by 2, and then by 3, and then by 4, and so forth So I need this loop to be counting up, and I need to be adding i to the sum, not just a plus plus, in this case Then I return the sum And so, this is an example of an abstraction Like, I now have a function called sigma– just like in math, you might have the big capital sigma symbol that just says, add all these numbers together, I have a C function that does that And so now, higher up in my code, I can call that function and then print out the answer But it turns out that this simple function lends itself to a nice example of another programming technique, something called recursion And we won’t have terribly many opportunities in CS50 to apply this technique, but we will toward semester’s end If you continue on to a class like CS51, you’ll use it all the time If you use another type of programming language, you’ll very often use this technique And it’s called recursion, and it looks like this Let me go ahead and open up another file that’s available on the course’s website called sigma 1 Notice that main is identical So, main is identical And indeed, it’s still calling a function called sigma, and then using printf to print out the answer So there’s no difference there But what is different, in this next version, is the code for sigma So, what’s going on here? It still takes, as input, an integer called m So that’s good, because I need to know what to sum up to It returns an integer, as before And it amazingly has, like– what, four real lines of code, plus some curly braces? And even those lines of code are super short And there’s no additional variables, and there’s this weird, crazy logic here But let’s see what it’s doing, first and foremost On line 23, I’m saying if m is less than or equal to 0, return 0 Now, why does this make sense? Well, I only want to support positive numbers, or non-negative numbers, from 0 to m And so I just kind of need an error check there, right? If the human somehow passes into this function negative 50 or something else, I don’t want the function to freak out and give unpredictable behavior, I just want it to return 0, in cases of error or when the number gets that small as to hit 0 or even lower So this, I’m going to call base case It’s just, like, this sanity check, like, don’t let the math go beyond this point of 0 or less So, amazingly, if you really zoom in on this code, the entirety of this program really boils down to one line And what’s going on here? So, I am returning, from sigma, an answer But, curiously, my answer is kind of defined in terms of itself, which generally is a bad idea Right? It’s like in English, if you try to define a word by using the word in the definition, usually someone calls you on that, because it’s not all that helpful to use a word in the definition of the word And that’s the same idea, at first glance, of recursion You are using the same function to solve a problem that was supposed to be solved by that function in the first place So, what do I mean by that? Main, of course, is calling sigma, and that means this code down here that we’ve been looking at gets executed So, suppose that we hit this line of code What recursion allows us to do, in this case, is take a bite out of the problem, and then defer to someone else to figure out the rest of the problem So, what do we mean by that? Well, sigma, again, is just this process of adding up all the numbers between 0 and some number, m So, 1 plus 2 plus 3 plus dot-dot-dot So, you know what? I don’t want to do all that work, as I did in version 0 with my for loop Let me just do a piece of that work And how do I do that? Well, you know what, when you ask me, what is the sum from 0 to m? I’m going to be kind of circular about it, and be like, well, it’s the answer of– the answer is m, the biggest number you handed me, plus the sum of everything below it Right? So, if you passed in the number 10, it’s like saying, well, sigma of 10 is 10 plus sigma of nine, and, like, leave me alone I don’t want to do the rest of the math But, because you’re calling the same function again, that’s actually OK A function can call itself, because if you think about where the story is going, now sigma gets called, in this story, with sigma of 9 What does the code do? Well, sigma of nine returns 9 plus whatever sigma of 8 is So we’re not solving the whole problem

We’re handing back a 10, plus a 9– and if we keep going, plus an 8, plus a 7, plus a 6 But we’re not going to do this forever Even though I’m using sigma in my implementation of my answer, under what circumstances am I not calling sigma? AUDIENCE: If m equals 0 DAVID MALAN: If m equals 0, or is even less than 0– which shouldn’t happen, but just to be sure, I made sure it can’t, with the less than or equal to So eventually, you’re going to ask me, what is sigma of 0? And I’m not going to be difficult about it, I’m just going to say 0 And no longer do I keep passing the buck to that same function And so even though it takes a while to get to that point in the story– because we say 10 plus sigma of 9, sigma of 9 is 9 plus sigma of 8, which is sigma of 8 plus sigma of 7– like, it keeps going and going and going But if you kind of mentally buffer, so to speak, much like a video in your browser, all of those numbers that you’re being handed back, one at a time– which are, technically, being added together for you by your program with the plus operator– the last number you’re going to be handed back is zero, and at that point, all of the plus signs can just kind of kick in and give you back whatever number you’re actually looking for So, recursion is the act of a function calling itself Which is very, very, very bad, unless you have a base case that ensures that eventually, as you take bites out of the problem, you will handle, with a special case, so to speak– a base case– a small piece of the puzzle, and just hand back a hard-coded answer, to make sure that this doesn’t happen infinitely many times So, any questions on this principle, of a function being able to call itself? Yeah AUDIENCE: So, the base case here was when m equals 0? DAVID MALAN: When m equals 0 or is even less than zero, just to be sure But yes, when m equals zero Indeed So, let’s see If you’re comfortable, at least, with the fact– oh, and actually, there’s a good little geek humor now– if you go to Google.com, and suppose you wonder, you’re wondering what recursion is, especially a few hours from now Well, you can Google it, and then the computer scientists at Google– there you go OK, so if you’re laughing, you get it, which is great So that, then, is recursion Something giving you back an answer in terms of itself So, why is this useful? Well, it turns out we can leverage this now to solve a problem if we know that we can actually convert it to code We’ll focus less on the actual implementation and more on the idea, but let’s see if we can’t wrap our minds around the problem to be solved with this code This is merge sort, in pseudo code And again, like all the pseudo code we’ve ever written, you could write this in bunches of different ways Here’s one such way Notice, the first thing, on input of n elements– so, n numbers, n blue books, n whatever– go ahead and do the following If n is less than 2, return So it’s a little different from the condition I had a moment ago, but the context here is sorting, it’s not summing So, why is it logically OK to say, if n is less than 2, just return? Yeah, that’s just itself If it’s less than 2, that means there’s only one blue book, or maybe even 0, so in either case, there’s no work to be done Just return The list is sorted, however short it is But if it’s longer than that, you might have to do some work, and actually do some actual sorting So, what happens then? So, else– you know what? Sort the left half of the elements, and then sort the right half of the elements, and then, OK, merge them together So it’s the same kind of, like, blase attitude, where, like, ah– if you ask me to sort something, I’m just going to tell you, well, you go sort the left, then you go sort the right, and then we’ll just merge the results back together And this is cyclical in the sense that, how do you sort the left half of anything? You need a sorting algorithm But this is the sorting algorithm So this is like saying, use merge sort to sort the left half, use merge sort to sort the right half, and then merge them together Merging doesn’t really need to be a fancy algorithm; merging is like, if you’ve got one pile of numbers here that are sorted, one pile of numbers here that’s sorted, you can just kind of eyeball them and grab the appropriate one to kind of interleave them in the right order That’s what we mean by merging So, how in the world is this even correct? Because we haven’t actually done any apparent work, in this way There’s no loops, there’s no comparisons, it seems It’s just completely recursively defined, so to speak Well, let’s see what actually this means And this is a sequence of visualizations that can potentially fall off the story of So I’ll try to go slowly, but not so slowly that the example itself is boring We’ll just go through this once, and then again, the slides are online, if you kind of want to step through the visualization So, here is a list of 8 numbers, the same 8 numbers, that we looked at before I’ve drawn them contiguously, as though they are in an array This list is currently of size 8 So an input of 8 elements is the beginning of this story What was the first step in our algorithm? Well, we were going to check, if n is less than 2, return That is irrelevant, because n is 8, not less than 2 So that’s a moot point

So, the first three things for me to do to sort this list is to sort the left half, then sort the right half, then to merge the sorted halves OK, so let’s see how we get there So here’s the list, here is the left half, and I need to sort the left half, apparently How do I do that? Well, how do you sort a list of four elements? AUDIENCE: Break it up again? DAVID MALAN: Break it up again Sort the left half, then its right half, then merge those two halves together So let me do that I’m going to draw a box around only the elements we’re thinking about at the moment So, let me look at the left half OK, now I need to sort this list How do I sort a list of size 2? It’s actually 2, it’s not less than 2 So I have to do some work So, how do you sort a list of size 2? It’s a little strange to say it, but– sort the left half, then sort the right half, then merge the two And at this point in the story, you may very well be lost, because we literally just keep saying the same thing, and not actually doing any work But think of it like you’re buffering these instructions Like, I’ve said to sort the left half, then the right half, but you focused on just the left half for now But unfortunately, you got a little distracted, because now to sort the left half, you have to sort the left half, so you have to do a little more work So if you just kind of let this mental baggage build up, we have to remember to go back through it But we’ve not actually done the real work yet We’re about to now Because now that you’ve told me, given a list of size 2, sort the left half, here’s where we bottom out with that base case and actually start to make some progress So here’s 4 and 2, a list of size 2 Let’s sort the left half How do you sort a list of size 1? You don’t, right? Because n is 1; 1, of course, is less than 2, and what was the one instruction that we had at the top of this function merge sort? Just return Like, do nothing So, OK, everyone I have now sorted the number 4 for you Like, it’s true, it’s kind of a foolish statement, but the magic must therefore come when we combine the results So, let’s see where the story goes I’ve sorted the left half– done Return Now, what was I supposed to do next? Now I have to sort the right half of that list of size 2 OK, done What’s the third step at this point in the story? Merge them So I’m now looking at a list of size 2 again, each of whose halves is sorted– according to the crazy logic we’re using here– but now, something interesting happens I have on the left the number 4, I have on the right the number 2, and each of these lists is of size 1 And if you vary, in your mind’s eye, or just visually, with my fingers, consider, like, your left hand pointing at the first list, your right hand pointing at the second list, the process of merging numbers is just comparing what your fingers are pointing at and deciding which one comes first Obviously 2 is going to come first, so in a moment, we’ll see that 2 should move over here, and then there’s nothing left for my right hand It’s done So, 4 is obviously going to go here And that process of merging 2 followed by 4 is what we mean by merging It’s pretty much what I was doing with insertion sort, but here we’re just doing it with individual elements at a time, kind of weaving things together, or zipping things together, like with a zipper, if you think of it that Way So, now, let me grab 2 and put it here Let me grab 4 and put it here OK So I sorted left half, I sorted right half, I merged them– how do we unwind the story? Where did we leave off? AUDIENCE: Sort the right half DAVID MALAN: Now we have to sort the right half that was, like, a minute ago in the story– which, just to highlight it now, is the 7 and 5 So now I have to do the same thing I’m sorting a list, of size 2, that happens to be on the right of the left So now, I sort the left half, done Sort the right half, done I now have to merge the two together So now my hands have to do some work, but I’ll just do it from over here 5 goes down, then 7 goes down And at this point in the story, we have sorted the left half of the left half, and the right half of the left half So, what point in the story are we at now? Right We’re– now we have– well, we did the right half just now We now have to merge the two halves together And, frankly, if you do this at home, if you want to kind of retrace your steps, literally just write down a to-do list, like, from top to bottom on the sheet of paper And then as you do something, cross it off, and go back to the previous thing in the list, you would actually see, or feel, even more what it was, that mental baggage you were accumulating that you need to attend to But now I have two lists of size 2, so let’s do the finger thing again here So, I start pointing at the left list, start pointing at the right list The first number to merge in is, presumably, going to be 2 Then what comes after that? I’m going to update my left finger, so now 1– my left hand’s pointing at the 4, at this point; my right hand, still pointing at the 5, so which comes next? 4 There’s no more work for my left hand, so it’s probably going to be pretty trivial– 5 and 7 But I do need to do the merging It looks merged already, but we have to do it And I’m going to do it in some new space, just as before So, 2 and 4 and 5 and 7 And now you can really see it for the first time The left half of the original list is finally sorted Unfortunately, like three minutes ago is when we started the story And now we need to unwind, in our mind, to go back to the original right half

So if you think about it now, even though I’ve said a lot of words, this is technically the second step in our algorithm Or at least the first invocation thereof All right, so we’ll do it a little faster, but same idea Sort the left half How do I do that? Sort the left half, then the right half, which are stupidly easy and dumb, but now I have to merge 6 and 8 So, merging in this case didn’t have much effect, but it needed to be done to be sure Next, let’s sort the right half of the right half Now I’m going to sort the left, sort the right Now the key step is merging And now we’re doing some actual work And now we really have some work to be done– now we have to sort the left half and the right half of the original right half So it’s 1, then 3, then 6, then 8 Now we’re finally, almost at the end Now what do we do with these? Now we have two halves, the original left and the original right, and you can think of the fingers as doing the work again 1 is going to go here, 2 is going to go here, 3 is going to go here, then 4– and I constantly compare where my fingers are pointing, but my fingers are constantly moving from left to right As soon as I deal with a number, it advances to the next number in the list So it’s obviously going to be, now, 1, 2, 3, 4, 5, 6 But notice, if you imagine my fingers doing this work, they’re constantly moving toward the right, to the end of the list So, as soon as my fingers hit the ends of those lists, I must be done merging And voila We’ve now sorted the elements It’s a huge number of words, and it would be a nightmare to kind of do it with humans, because there’s just so much going on, and you have to remember, or buffer, so many of those steps But in the end, we’ve done something that is kind of captured even by this picture So it turns out that merge sort, even though it sounds like a long story, is fundamentally faster, and it’s fundamentally faster because we’re dividing the problem in half, as we have been doing with binary search, in the phone book example even days ago So if we look on the screen, you can kind of see the remnants of work that we’ve done Like, how many times did we move the elements, from one row to another? They started up here, then they eventually made their way here, and then here, and then here So that’s one, two, three movements of the letters, or of the numbers, in memory, if you will So if you imagine each of these rows as just a different chunk of memory and RAM, I’m just moving things around in memory So, three is just a number But it turns out, and if we did more general cases of this, turns out that log base 2 of n, where n is 8– 8 is the number of elements we started with– log base 2 of 8 is 3 And so indeed– and if you’ll take on faith for now, so that we don’t have to go through an even bigger example, to show it even more– the number of times we move the numbers is going to equal, turns out, log base 2 of n Which, in this case, happens to be 3 And so that, then, invites the question– on each of the rows, every time you move these numbers into a new location in memory, how many times are you touching that number while it’s in that position? Or, how many times, equivalently, are you looking at it, to do something about it? What do I mean by this? Well, the movement from top to bottom was happening anytime we did the merging Right? We would move the numbers from here to here But as soon as we did that, we had to do some work, with the left pointer and right pointer I needed to then merge those together And I emphasized earlier that anytime I’m comparing numbers, my left hand and right hand are constantly advancing from left and right I never double back Much like I constantly was doubling back with bubble sort, insertion sort, selection sort– there was so much damn comparison going on, it felt like a lot of work, and it physically was But here, you know, merging, I’m moving things around, but my hands are constantly moving forward, looking at, on each row, n numbers total My left hand or right hand pointed at each of the numbers once Never doubled back So, it was never n plus 1, or 2 n, it was just n So, we have log n movements of the numbers, in memory And every time we do that, we merge them from left to right, effectively touching each number once So we’re doing n things log n times And so, that would be mathematically equal to n log n So, again, even if you’re not super comfy with logarithms, you do know, from our picture, with the straight lines and the curved line, that which is smaller? Log of n, or n, generally speaking? AUDIENCE: Log of n DAVID MALAN: Like, log of n is smaller, right? That’s why the green line was lower, and it was also curved It was below the linear line n So, generally speaking, the bigger n gets, the more slowly log n grows And again, if you just take on faith that this is a mathematical expression that communicates the time required to do something, it’s less So, which, therefore, is smaller? N squared, which of course is n times n? Or n log n? AUDIENCE: N log n

DAVID MALAN: N log n So, we’ve now found an algorithm that’s unlike all of the others we’ve seen And even though it took a while to explain, and even though, frankly, you might have to kind of sift through it again to really wrap your mind around it– it took me a while, too– it is fundamentally faster as well So, just to take one other stab at this, let me show one other perspective At least if you’re more mathematically comfortable it after today, if you’re worried that this is way more math than you were expecting, realize we very quickly abstract away from these details, and we start to wave our hands using big 0 and big omega Let’s consider how we could look at this a different way If the picture wasn’t really working for you, let’s see if we can just, like, jot down how many steps each of these lines of code is And there’s not many lines of code here, so it shouldn’t be a very long expression So, how long does it take to decide if n is less than 2? And, if so, return? Well, you’re past a bunch of numbers, so, you know, I’m going to call it constant time Like, you know how many numbers you’ve been handed– nope, it’s not less than 2, or yes, it is You’re just answering yes or no Constant time Big O of one All right? So I’m going to describe that as this This is the formal way of saying it T of n, which is just how much time does it take, given a problem of size n– just a fancy way of saying that It’s on the order of one step Maybe it’s two, maybe it’s three, because you kind of got to look at something But it’s a fixed number of steps to decide, there are fewer than n elements in front of me It’s not going to take you very long So, that piece of the puzzle takes big O of one step So now, we have three other questions to ask That’s, like, kind of a throwaway That’s really quick, if it’s just one step, or two steps, or three steps So, are these the expensive things? Well, let’s see Sort the left half of elements Well, here, too, I can be kind of clever and propose the following You know what? The amount of time required to sort n elements is technically equal to the amount of time it takes to sort half of those elements, plus the amount of time required to sort the other half of those elements, plus, to be fair, some merging time And it’s essentially n, but I’m going to generalize it as big O of n, because I did have to move my hands around But again, the key thing was, my hands were constantly moving to the right There was no looping back and again, and again, like with the other algorithms So it’s, like, n steps to do the merging If I’ve got 4 numbers here, 4 numbers here, I have to touch a total of 8 elements 8 is n, so it feels like, yes, on the order of n steps to do the merging Unfortunately, this is like a recursive answer to the question of how efficient is merge sort But that’s kind of consistent with the fact that merge sort is being implemented recursively in this code And it turns out here, too, if you’ve got one of those old-school textbooks that’s got a cheat sheet in the front or the back of your physics or your math book, this is a series that you can actually– that mathematicians know actually sum up to something known, which is n times log n And we won’t go into the weeds of why that is, mathematically, but if you take a problem of size n, and add the running time for first half, second half, and then add an n, this is what, mathematically, it ends up being And so, if you’re more comfortable with that, realize that this derives from just counting up of those several steps And ultimately, this is much better than that And in fact, we can kind of feel this here You’ll be able to feel it even better with other data sets, but let me go ahead and reload here, and go ahead, at the same speed as we were before, choosing merge sort Let me fit it onto the screen at once Actually, we should speed it up to the original So, what is it doing? It’s using a bit of extra memory, just as we were on the screen, using some additional space But notice, as it does that work, it’s kind of moving things back and forth And it’s actually saving space Even though I used log n amount of memory by keep moving it, this was stupid Like, I didn’t need to keep using more memory, more memory, more memory, because I wasn’t using the stuff anymore above So with merge sort, you really just need twice as much memory as those other algorithms, because the first time you need to move them, move them here And then, even though I did it visually, deliberately to move it to yet another location, just keep moving things back and forth as needed And that’s what’s happening with that algorithm there It’s not quite as clear to see with this visualization, so let me open up this other one here Now, go And you’ll see merge sort all the way on the right– done All right, so, insertion sort got a little lucky here, just because of the order of the elements, and the size of the dataset isn’t that big, which is why I wanted to show the other one But if we do it once more, you’ll see, again, that merge sort is pretty darn quick And you can see it doing things in halves And selection sort, and bubble sort, are still doing their thing And if we did this using not, like, what is that– 10, 20, bars total, but 100 bars? Or a million bars? You would really, really feel the difference, just as we did with the phone book example as well Any questions there on that? And we won’t walk through the code, but if you’re curious to actually see how some of these ideas map to C code, you will find, in the CS50 appliance, in the source code from today, a couple of files– binary zero and binary one in linear.c, all of which implement binary search and linear search in a couple of different ways,

if you actually want to see how those map But what we thought we would do, in our remaining time today, is tee up one of the next algorithmic challenges It turns out that there are wonderful opportunities in computer science to intersect with other fields– among them the arts, and very specifically, music And it turns out that music, whether you’re an audiophile or even a musical theoretician, there are relationships among the sounds that we hear and the rate at which we hear those notes Which is to say, they follow patterns And these patterns can be produced by computers, they can be generated by computers, and what we’ll do, ultimately, in problem set three, in fact, is introduce you to a bit of the musical world, whether you have some prior experience or not And Brian, if you wouldn’t mind coming up just to assist with this teaser Here are just a few keys from a keyboard, and here are 88 keys from an actual keyboard And Brian will help us, in just a moment, with some of these definitions But you’ll see here that there are eight keys, or one, two, three, four, five, six, seven white keys and five black keys on the board And it turns out that mankind– at least, in Western music, years ago– decided to standardize on how we describe these keys And we assigned letters to them And you might have heard of middle C, even if you’ve never played a piano before, and you might think of that as being the leftmost key there And then it’s D, E, F, G, and then A, B. And of course, on a real piano, there’s keys to the left, and there’s keys to the right Do you want to play what C might sound like here? [PIANO PLAYS] So, that’s C. And then, if you want to– very well done [APPLAUSE] Do you want to go all the way up through the scale to B? [PIANO PLAYS] That’s kind of unresolved, too, because what should have come next– [PIANO PLAYS] That would be another C. And so what Brian’s played for us is a full octave, now, referring to eight So, C to C inclusive, in this case And those of us who are kind of are familiar with music, or like listening to certain music, you’ll notice that certain things sound good And there’s actually mathematical and formulaic, or algorithmic, reasons that some of these sounds sound actually good But what about these black keys? They actually can be defined in a couple of different ways And if you’ve ever heard of flats, or sharps– Brian, do you want to explain what the relationship now is among the white keys and the black keys, and how they sound different? BRIAN: Yeah, sure So, a bit of terminology first A semi-tone is just the distance from one note to the note immediately after that, both white and black notes included And all it means for something to be sharp, represented by the hashtag or pound sign up there, is take a note and move it up by one semi-tone So, if we start with C, and make that note sharp, to C sharp, we move one semi-tone to the note immediately after it, which is that black note in between C and D. So, that’s C sharp And likewise, if we add E sharp, that is one semi-tone, or the note immediately after, E, which in this case, is the same thing as F. So, F and E sharp are the same note And in the meantime, flat is just the opposite of that If sharp means move up one semi-tone, flat means move down one semi-tone So if I have E, E flat is one semi-tone moving left on the piano keyboard DAVID MALAN: And so even though a typical piano keyboard wouldn’t be labeled as such, it does follow a pattern, and it does repeat, to the left and to the right as well And so as you learn to play piano, you learn what these notes sound like, you learn where these keys are, and you also learn, ultimately, how to read music, which might look like this This is a familiar song, now officially in the public domain And you’ll see here that there are these little shapes called notes, or little circles, that happen to be on specific lines And it turns out that if a note is on one line, it might represent the note A; if it’s on a different line, higher above or down below, it might represent B or C or D or E or F or G And if there is a sharp symbol, or a flat symbol, in front of it, that might shift it ever so slightly, so that you’re actually touching, in many cases, one of the black keys as well Which is to say that once you have the vocabulary, and you know what the alphabet is to which you have access, can you start to write it out, much like we write computer programs But this is what a musician would actually see And just to give us maybe a teaser of what you can actually do when you take into account the different sounds of notes, and the different pace at which you play notes, can you give us a little something more than just a scale? BRIAN: Sure [PIANO PLAYS] [APPLAUSE] DAVID MALAN: So, if you’re a little worried what we’re getting into, not only computer science and programming, but now music– I am absolutely among the least comfortable with this, and this is why Brian has kindly joined us here today But it’ll be fun, we hope, ultimately, to explore these relationships, and also the intersection of one field with another And to now tie these topics together, we thought we’d end by looking at a short visualization here, about a minute’s worth of computer-generated sounds, that

give you not just a visual feel of some of the algorithms and others that we’ve looked at today, but also associate sounds with the operations of moving things, swapping things, and ultimately touching bigger and smaller numbers digitally So, here we have, up first, insertion sort [COMPUTER SOUNDS] Again, it’s inserting into the right place the number This, now, is bubble sort And again, you can both see, and now kind of feel, all of the swaps that it’s doing We will get the satisfaction this time This, now, is selection sort, whereby you go through the list selecting the next smallest element, and plop it in its place And the bars are getting higher and higher, just like the notes, or the frequencies This, now, is merge sort And notice the halves that are developing And this is by far the most gratifying sound, at the end of this one This is gnome sort, which we didn’t look at, but very distinctly has a different shape, too It’s not quite as ordered as the others And that, then, are algorithms And Brian, play us out for today Otherwise, we will see you next week