13. Breadth-First Search (BFS)

Just another WordPress site

13. Breadth-First Search (BFS)

The following content is provided under a Creative Commons license Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu PROFESSOR: Today we’re going to introduce graph search in general and talk about one algorithm, which is breadth-first search, and understand how in principle you can solve a puzzle like the Rubik’s Cube So before I get to Rubik’s Cubes let me remind you of some basic stuff about graphs Or I can tell you to start out with, graph search is about exploring a graph And there’s many different notions of exploring a graph Maybe I give you some node in a graph, s, and some other node in a graph, t, and I’d like to find a path that’s going to represent a problem like I give you a particular state of a Rubik’s Cube and I want to know is there some path that gets me into a solved state? Do I really want to solve this on stage? What the hell? We started So this is a particularly easy state to solve, which is why I set up this way All right, so there you go Seven by seven by seven Rubik’s Cube solved in 10 seconds Amazing New world record So you’re given some initial state of the Rubik’s Cube You’re given the targets that you know what solved looks like You want to find this path Maybe you want to find all paths from s Maybe you just want to explore all the nodes in a graph you can reach from s Maybe you want to explore all the nodes in a graph or maybe all the edges in a graph These are all exploration problems They’re all going to be solved by algorithms from this class and next class So before we go further though, I should remind you what a graph is and sort of basic features of graphs that we’re going to be using This is also 6042 material so you should know it very well If you don’t, there’s an appendix in the textbook about it We have a set of vertices We have a set of edges Edges are either unordered pairs– some sets of two items– or ordered pairs In this case, we call the graph undirected In this case, we call the graph directed Usually, there’s only one type Either all the edges are directed or all the edges are undirected There is a study of graphs that have both, but we are not doing that here Some simple examples Here is a graph This is an undirected graph This is a directed graph The set of vertices here is a, b, c, d The set of vertices here is a, b, c The set of edges here is– E is going to be things like a, b; b, c; c, d– I think you get the idea Just for completeness, V is a, b, c, d Just so you remember notations and so on One of the issues we’re going to talk about in this class is how do you represent a graph like this for an algorithm? So it’s all fine to say, oh, this is a set of things This is a set of things An obvious representation is, you have a list or an array of vertices You have an array of edges Each edge knows its two end points That would be a horrible representation for a graph because if you’re, I don’t know, at vertex, a, and you want to know, well what are the neighbors of a? b and c You’d have to go through the entire edge list to figure out the neighbors of a So it’s been linear time just to know where you can go from a So we’re not going to use that representation We’re going to use some better representations Something called an adjacency list Over here, you’ve got things like a, c; b, c; and c, b

So you can have edges in both directions What am I missing? b, a So that’s E, in that case There are a whole lot of applications of graph search I’ll make you a little list to talk about few of them So we’ve got web crawling You’re Google You want to find all the pages on the web Most people don’t just tell you, hey, I’ve got a new page, please index it You have to just keep following links– in the early days of the web, this was a big deal– following links finding everything that’s out there It’s a little bit of an issue because if you define it wrong, the internet is infinite because of all those dynamically generated pages But to deal with that, Google goes sort of breadth-first for the most part It’s prioritized You want to see all the things you can reach from pages you already have and keep going At some point, you give up when you run out of time Social networking You’re on Facebook You use Friend Finder It tries to find the friends that are nearest to you Or friends of friends is sort of a level to search That’s essentially a graph search problem You want to know what’s two levels or three levels of separation from you And then you loop over those and look for other signs that you might be good friends You are on a network like the internet or some intranet You want to broadcast a message So here’s you You want to send data out That’s essentially a graph exploration problem That message, that packet, is going to explore the graph Garbage collection I hope you all know that modern languages have garbage collection This is why you don’t have to worry about freeing things Even in Python– even in CPython, I learned– there is a garbage collector as of version two But also in PyPy, and JPython and in Java– pretty much every fairly modern language you have garbage collection Meaning, if there’s some data that’s unreachable from– So you have your variables Variables that can be accessed by the program Everything that’s reachable from there you have to keep But if some data structure becomes no longer reachable, you can throw it away and regain memory So that’s happening behind the scenes all the time, and the way it’s being done is with their breadth-first search, which is what we’re going to talk about today Another one Model checking Model checking is– you have some finite model of either a piece of code, or a circuit, or chip, whatever, and you want to prove that it actually does what you think it does And so you’ve drawn a graph The graph is all the possible states that your circuit or your computer program could reach, or that it could possibly have You start in some initial state, and you want to know among all the states that you can reach, does it have some property And so you need to visit all the vertices that are reachable from a particular place And usually people do that using breadth-first search I use breadth-first search a lot, myself, to check mathematical conjectures So if you’re a mathematician, and you think something is true Like maybe– It’s hard to give an example of that But you can imagine some graph of all the possible inputs to that theorem, and you need to check them for every possible input– If this is true– the typical way to do that is breadth-first searching through that entire graph of states Usually, we’re testing finite, special cases of a general conjecture, but if we find a counter-example, we’re done Don’t have to work on it anymore If we don’t find a counter-example, usually then we have to do the mathematics It doesn’t solve everything, but it’s helpful And then, the fun thing we’re going to talk about a little bit today, is if you want to solve something like a two by two by two Rubik’s Cube optimally, you can do that using breadth-first search And you’re going to do that on your problem set

To do it solving this one optimally using breadth-first search would probably– would definitely– take more than the lifetime of the universe So don’t try seven by seven by seven Leave that to the cubing experts, I guess I think no one will ever solve a seven by seven by seven Rubik’s Cube optimally There are ways to find a solution just not the best one So let me tell you just for fun, as an example This Pocket Cube, which is a two by two by two Rubik’s Cube What we have in mind is called the configuration graph or sometimes configuration space But it’s a graph, so we’ll call it a graph This graph has a vertex for each possible state of the cube So this is a state This is a state This is a state This is a state Now I’m hopelessly lost Anyone want to work on this? Bored? No one? Alright, I’ll leave it unsolved then So all those are vertices There’s actually a lot of vertices There are 264 million vertices or so If you want To the side here Number of vertices is something like 8 factorial times 3 to the 8 And one way to see that is to draw a two by two by two Rubik’s Cube So these are what you might call cubelets, or cubies I think is the standard term in Rubik’s Cube land There’s eight of them in a two by two by two Two cubed You can essentially permute those cubies within the cube however you like That’s 8 factorial And then each of them has three possible twists It could be like this It could be like this Or it could be like this So you’ve got three for each And this is actually an accurate count You’re not over-counting the number of configurations All of those are, at least in principle, conceivable If you take apart the cube, you can reassemble it in each of those states And that number is about 264 million Which is not so bad for computers You could search that Life is a little bit easier You get to divide by 24 because there’s 24 symmetries of the cube Eight times three You can divide by three, also, because only a third of the configuration space is actually reachable If you’re not allowed to take the parts apart, if you have to get there by a motion, you can only get to 1/3 of the two by two by two So it’s a little bit smaller than that, if you’re actually doing a breadth-first search, which is what you’re going to be doing on your problem set But in any case, it’s feasible That was vertices We should talk about edges For every move– every move takes you from one configuration to another You could traverse it in one direction and make that move You could also undo that move Because every move is undoable in a Rubik’s Cube, this graph is undirected Or you can think of it as every edge works in both directions So this is a move It’s called a quarter twist This is a controversy if you will Some people allow a whole half twist as a single move Whether you define that as a single move or a double move is not that big a deal It just changes some of the answers But you’re still exploring essentially the same graph So that’s the graph and you’d like to know some properties about it So let me draw a picture of the graph I’m not going to draw all 264 million vertices But in particular, there’s the solved state– we kind of care about that one, where all the colors are aligned– then there’s all of the configurations you could reach by one move So these are the possible moves from the solved state And then from those configurations, there’s more places you can go Maybe there’s multiple ways to get to the same node

But these would be all the configurations you can reach in two moves And so on And at some point, you run out of graph So there might be a few nodes out here The way I’m drawing this, this is everything you can reach in one move, in two movies, in three moves At the end, this would be 11 moves, if you allow half twists And as puzzlers, we’re particularly interested in this number, which you would call, as a graph theorist, the diameter of the graph Puzzlers call it God’s number If you were God or some omni– something being You have the optimal algorithm for solving the Rubik’s Cube How many moves do you need If you always follow the best path? And the answer is, in the worst case, 11 So we’re interested in the worst case of the best algorithm For two by two by two, the answer is 11 For three by three by three, the answer is 20 That was just proved last summer with a couple years of computer time For four by four by four– I don’t have one here– I think we’ll never know the answer For five by five by five, we’ll never know the answer For six, for seven, same deal But for two by two by two, you can compute it You will compute it on your problem set And it’s kind of nice to know because it says whatever configuration I’m in, I can solve it in 11 moves But the best known way to compute it, is basically to construct this graph one layer at a time until you’re done And then you know what the diameter is The trouble is, in between here this grows exponentially At some point, it decreases a little bit But getting over that exponential hump is really hard And for three by three by three, they used a lot of tricks to speed up the algorithm, but in the end it’s essentially a breadth-first search What’s a breadth-first search? This going layer by layer So we’re going to formalize that in a moment But that is the problem So just for fun, any guesses what the right answer is for an n by n by n Rubik’s cube? What’s the diameter? Not an exact answer, because I think we’ll never know the exact answer But if I want theta something, what do you think the something is? How many people here have solved the Rubik’s Cube? Ever? So you know what we’re talking about here Most people have worked on it To think about an n by n by n Rubik’s Cube, each side has area n squared So total surface area is 6 n squared So there’s, roughly, stata n squared little cubies here So what do you think the right [INAUDIBLE] is for n by n by n? No guesses? AUDIENCE: n cubed? PROFESSOR: n cubed? Reasonable guess But wrong It’s an upper bounds Why n cubed? AUDIENCE: [INAUDIBLE] PROFESSOR: Oh, you’re guessing based on the numbers Yeah The numbers are misleading, unfortunately It’s the law of small numbers I guess It doesn’t really look right I know the answer I know the answer because we just wrote a paper with the answer This is a new result From this summer But I’m curious To me the obvious answer is n squared because there’s about n squared cubies And it’s not so hard to show in a constant number moves you can solve a constant number of cubies If you think about the general algorithms, like if you’ve ever looked up professor’s cube and how to solve it, you’re doing like 10 moves, and then maybe you swap two cubies which you can use to solve a couple of cubies in a constant number of moves So n squared would be the standard answer if you’re following standard algorithms But it turns out, you can do a little bit better And the right answer is n squared divided by log n I think it’s cool Hopefully, you guys can appreciate that Not a lot of people can appreciate n squared divided by log n, but here in algorithms, we’re all about n squared over log n If you’re interested, the paper’s on my website

I think its called, Algorithms For Solving Rubik’s Cubes There’s a constant there Current constant is not so good Let’s say it’s in the millions [LAUGHTER] You’ve got to start somewhere The next open problem will be to improve that constant to something reasonable that maybe is close to 20 But we’re far from that Let’s talk about graph representation Before we can talk about exporting a graph, we need to know what we’re given as input And there’s basically one standard representation and a bunch of variations of it And they’re called adjacency lists So the idea with an adjacency list, is you have an array called Adj, for adjacency of size V. Each element in the array is a pointer to a linked list And the idea is that this array is indexed by a vertex So we’re imagining a world where we can index arrays by vertices So maybe, you just label your vertices zero through v minus 1 Then that’s a regular array Or, if you want to get fancy, you can think of a vertex as an arbitrary hashable thing, and Adj is actually a hash table And that’s how you probably do it in Python Maybe your vertices are objects, and this is just hashing based on the address of the object But we’re not going to worry about that We’re just going to write Adj of u Assume that somehow you can get to the linked list corresponding to that vertex And the idea is, for every vertex we just store its neighbors, namely the vertices you can reach by one step from u So I’m going to define that a little more formally Adj of u is going to be the set of all vertices, V, such that u, v is an edge So if I have a vertex like b, Adj of b is going to be both a and c because in one step there are outgoing edges from b to a and b to c So Adj of b is a, c In that graph I should have labeled the vertices something different Adj of a is going to be just c because you can’t get with one step from a to b The edge is in the wrong direction And Adj of c is b I think that definition’s pretty clear For undirected graphs, you just put braces here Which means you store– I mean, it’s the same thing Here Adj of c is going to be a, b, and d, as you can get in one step from c to a, from c to b, from c to d For pretty much every– At least for graph exploration problems, this is the representation you want Because you’re at some vertex, and you want to know, where can I go next And Adj of that vertex tells you exactly where you can go next So this is what you want There’s a lot of different ways to actually implement adjacency lists I’ve talked about two of them You could have the vertices labeled zero to v minus 1, and then this is, literally, an array And you have– I guess I should draw In this picture, Adj is an array So you’ve got a, b, and c Each one of them is a pointer to a linked list This one’s actually going to be a, c, and we’re done Sorry, that was b Who said it had to be alphabetical order? A is a pointer to c, c is a pointer to b That’s explicitly how you might represent it This might be a hash table instead of an array, if you have weirder vertices You can also do it in a more object-oriented fashion For every vertex, v, you can make the vertices objects, and v dot neighbors could store what

we’re defining over there to be Adj of v. This would be the more object-oriented way to do it I’ve thought a lot about this, and I like this, and usually when I implement graphs this is what I do But it is actually convenient to have this representation There’s a reason the textbook uses this representation Because, if you’ve already got some vertices lying around and you want to have multiple graphs on those vertices, this lets you do that You can define multiple Adj arrays, one for graph one, one for graph two, one for graph three but they can all talk about the same vertices Whereas here, vertex can only belong to one graph It can only have one neighbor structure that says what happens If you’re only dealing with one graph, this is probably cleaner But with multiple graphs, which will happen even in this class, adjacency lists are kind of the way to go You can also do implicitly-represented graphs Which would be to say, Adj of u is a function Or v dot neighbors is a method of the vertex class Meaning, it’s not just stored there explicitly Whenever you need it, you call this function and it computes what you want This is useful because it uses less space You could say this uses zero space or maybe v space One for each vertex It depends Maybe you don’t even need to explicitly represent all the vertices You start with some vertex, and given a vertex, somehow you know how to compute, let’s say in constant time or linear time or something, the neighbors of that vertex And then from there, you can keep searching, keep computing neighbors, until you find what you want Maybe you don’t have to build the whole graph, you just need to build enough of it until you find your answer Whatever answer you’re searching for Can you think of a situation where that might be the case? Where implicit representation would be a good idea? Yes Rubik’s Cubes They’re really good I never want to build this space It has a bajillion states A bajillion vertices It would take forever There’s more configurations of this cube than there are particles in the known universe I just computed that in my head [LAUGHTER] I have done this computation recently, and for five by five by five it’s like 10 to the 40 states Or 10 to the 40, 10 to the 60 There’s about 10 to the 80 particles in the known universe 10 to the 83 or something So this is probably 10 to the 200 or so It’s a lot You never want to build that But, it’s very easy to represent this state Just store where all the cubies are And it’s very easy to see what are all the configurations you can reach in one move Just try this move, try this move, try this move Put it back and try the next move And so on For an m by n by n cube in order n time, you can list all the order n next states You can list all the order n neighbors And so you can keep exploring, searching for your state Now you don’t want to explore too far for that cube, but at least you’re not hosed just from the problem of representing the graph So even for two by two by two, it’s useful to do this mostly to save space You’re not really saving time But you’d like to not have to store all 264 million states because it’s going to be several gigabytes and it’s annoying Speaking of space– ignoring the implicit representation– how much space does this representation require? V plus E. This Is going to be the bread and butter of our graph algorithms Most of the things we’re going to talk about achieve V plus E time This is essentially optimal It’s linear in the size of your graph You’ve got V vertices, E edges Technically, in case you’re curious, this is really the size of V plus the size of E But in the textbook, and I guess in the world, we just omit those sizes of whenever they’re in a theta notation or Big O notation So number vertices plus number of edges that sort of the bare minimum you need if you want an explicit representation of the graph And we achieve that because we’ve got we’ve got v space just to store the vertices in an array

And then if you add up– Each of these is an edge You have to be a little careful In undirected graphs, each of these is a half edge So there’s actually two times e nodes over here But it’s theta E So theta V plus E is the amount of space we need And ideally, all our algorithms will run in this much time Because that’s what you need just to look at the graph So let’s do an actual algorithm, which is breadth-first search So to the simplest algorithm you can think of in graphs I’ve already outlined it several times You start at some node You look at all the nodes you can get to from there You look at all the nodes you can get to from there Keep going until you’re done So this is going to explore all of the vertices that are reachable from a node The challenge– The one annoying thing about breadth-first search and why this is not trivial is that there can be some edges that go sort of backwards, like that, to some previous layer Actually, that’s not true, is it? This can’t happen You see why? Because if that edge existed, then from this node you’d be able to get here So in an undirected graph, that can’t happen In a directed graph, you could conceivably have a back edge like that You’d have to realize, oh, that’s a vertex I’ve already seen, I don’t want to put it here, even though it’s something I can reach from this node, because I’ve already been there We’ve got to worry about things like that That’s, I guess, the main thing to worry about So our goal is to visit all the nodes– the vertices– reachable from given node, s We want to achieve V plus E time And the idea is to look at the nodes that are reachable first in zero moves Zero moves That’s s Then in one move Well that’s everything you can reach from s in one step That’s adjacency of s And then two moves, and three moves, and so on until we run out of graph But we need to be careful to avoid duplicates We want to avoid revisiting vertices for a couple of reasons One is if we didn’t, we would spend infinite time Because we’d just go there and come back, and go there and come back As long as there’s at least one cycle, you’re going to keep going around the cycle forever and ever if you don’t try to avoid duplicates So let me write down some code for this algorithm It’s pretty straightforward So straightforward, we can be completely explicit and write pseudocode There’s a few different ways to implement this algorithm I’ll show you my favorite The textbook has a different favorite I’m going to write in pure Python, I believe Almost done

I think I got that right So this is at the end of the while-loop And at that point we should be done We can do an actual example, maybe I’m going to do it on an undirected graph, but this algorithm works just as well on directed and undirected graphs There’s an undirected graph We’re given some start vertex, s, and we’re given the graph by being given the adjacency lists So you could iterate over the vertices of that thing Given a vertex, you can list all the edges you can reach in one step And then the top of the algorithm’s just some initialization The basic structure– We have this thing called the frontier, which is what we just reached on the previous level I think that’s going to be level i minus one Just don’t want to make an index error These are going to be all the things you can reach using exactly i minus one moves And then next is going to be all the things you can reach in i moves So to get started, what we know is s s is what you can reach in zero moves So we set the level of s to be zero That’s the first line of the code There’s this other thing called the parent We’ll worry about that later It’s optional It gives us some other fun structure We set i to be one because we just finished level zero Frontier of what you can reach in level zero is just s itself So we’re going to put that on the list That is level zero. i equals one So one minus one is zero All good And then we’re going to iterate And this is going to be looking at– The end of the iteration is to increment i So you could also call this a for-loop except we don’t know when it’s going to end So it’s easier to think of i incrementing each step not knowing when we’re going to stop We’re going to stop whenever we run out of nodes So whenever frontier is a non-empty list the bulk of the work here is computing what the next level is That’s called next It’s going to be level i We do some computation Eventually we have what’s on the next level Then we set frontier next Because that’s our new level We increment i, and then invariant of frontier being level i minus 1 is preserved Right after here And then we just keep going till we run out of nodes How do we compute next? Well, we look at every node in the frontier, and we look at all the nodes you can reach from those nodes So every node, u, in the frontier and then we look at– So this means there is an edge from u to v through the picture We look at all the edges from all the frontier nodes where you can go And then the key thing is we check for duplicates

We see, have we seen this node before? If we have, we would have set it’s level to be something If we haven’t seen it, it will not be in the level hash table or the level dictionary And so if it’s not in there, we’ll put it in there and add it to the next layer So that’s how you avoid duplicates You set its level to make sure you will never visit it again, you add it to the next frontier, you iterate, you’re done This is one version of what you might call a breadth-first search And it achieves this goal, visiting all the nodes reachable from s, in linear time Let’s see how it works on a real example So first frontier is this thing Frontier just has the node s, so we just look at s, and we look at all the edges from s We get a and x So those get added to the next frontier Maybe before I go too far, let me switch colors Multimedia here So here’s level one All of these guys, we’re going to set their level to one They can be reached in one step That’s pretty clear So now frontier is a and x That’s what next becomes Then frontier becomes next And so we look at all the edges from a That’s going to be s and z s, we’ve already looked at, it already has a level set, so we ignore that So we look at z Z does not have a level indicated here, so we’re going to set it to i which happens to be two at this point And we look at x It has neighbors s, d, and c We look at s again We say, oh, we’ve already seen that yet again So we’re worried about this taking a lot of time because we look at s three times in total Then we look at d d hasn’t been set, so we set it to two. c hasn’t been set, so we set it to two So the frontier at level two is that Then we look at all the neighbors of z There’s a. a’s already been set Look at all the neighbors of d There’s x There’s c Those have been set There’s f This one gets added Then we look at c There’s x That’s been done. d’s been done f’s been done v has not been done So this becomes a frontier at level three Then we look at level three There’s f D’s been done, c’s been done, b’s been done We look at v. c’s been done. f’s been done Nothing to add to next Next becomes empty Frontier becomes empty The while-loop finishes TA DA! We’ve computed– we’ve visited all the vertices Question AUDIENCE: [INAUDIBLE] What notation? PROFESSOR: This is Python notation You may have heard of Python This is a dictionary which has one key value, s, and has one value, zero So you could– That’s shorthand in Python for– Usually you have a comma separated list The colon is specifying key value pairs I didn’t talk about parent We can do that for a little bit So parent we’re initializing to say, the parent of s is nobody, and then whenever we visit a new vertex, v, we set its parent to be the vertex that we came from So we had this vertex, v. We had an edge to v from some vertex, u We set the parent of v to be u So let me add in what that becomes I’ll change colors yet again Although it gets hard to see any color but red So we have s When we visited a, then the parent of a would become s When we visited z, the parent of z would be a Parent of x is going to be s Parent of d is going to be x The parent of c is going to be x The parent of f– it could have been either way, but the way I did it, d went first, and so that became its parent And I think for v, c was its parent So that’s what the parent pointers will look like They always follow edges They actually follow edges backwards If this was a directed graph, the graph might be directed that way but the parent pointers go back along the edges So it’s a way to return It’s a way to return to s If you follow these pointers, all roads lead to s Because we started at s, that’s the property we have In fact, these pointers always form a tree, and the root of the tree is s In fact, these pointers form what are called shortest paths Let me write down a little bit about this

Shortest path properties If you take a node, and you take its parent, and you take the parent of the parent, and so on, eventually you get to s And if you read it backwards, that will actually be a path in the graph And it will be a shortest path, in the graph, from s to v Meaning, if you look at all paths in the graph that go from s to v– So say we’re going from s to v, how about that, we compute this path out of BFS Which is, follow a parent of v is c, parent of c is x, parent of x is s Read it backwards That gives us a path from s to v The claim is, that is the shortest way to get from s to v. It might not be the only one Like if you’re going from s to f, there’s two short paths There’s this one of length three There’s this one of length three Uses three edges Same length And in the parent pointers, we can only afford to encode one of those paths because in general there might be exponentially many ways to get from one node to another We find a shortest path, not necessarily the only one And the length of that path– So shortest here means that you use the fewest edges And the length will be level of v. That’s what we’re keeping track of If the level’s zero, you can get there with zero steps If the level’s one, you get there with one steps Because we’re visiting everything you can possibly get in k steps, the level is telling you what that shortest path distance is And the parent pointers are actually giving you the shortest path That’s the cool thing about BFS Yeah, BFS explores the vertices Sometimes, that’s all you care about But in some sense, what really matters, is it finds the shortest way to get from anywhere to anywhere For a Rubik’s Cube, that’s nice because you run BFS from the start state of the Rubik’s Cube Then you say, oh, I’m in this state You look up this state You look at its level It says, oh, you can get there in nine steps That’s, I think, the average So I’m guessing I don’t know how to do this in nine steps Great, so now you know how to solve it You just look at the parent pointer The parent pointer gives you another configuration You say, oh, what move was that? And then you do that move I’m not going to solve it Then you look at the parent pointer of that You do that move You look at the parent pointer of that You do that move Eventually, you’ll get to the solved state, and you will do it using the fewest possible moves So if you can afford to put the whole graph in memory, which you can’t for a big Rubik’s Cube but you can for a small one, then this will give you a strategy, the optimal strategy, God’s algorithm if you will, for every configuration It solves all of them Which is great What is the running time of this algorithm? I claim it’s order V plus E But it looked a little wasteful because it was checking vertices over and over and over But if you think about it carefully, you’re only looking– what’s the right way to say this– you only check every edge once Or in undirected graphs, you check them twice, once from each side A vertex enters the frontier only once Because once it’s in the frontier, it gets a level set And once it has a level set, it’ll never go in again It’ll never get added to next So s gets added once then we check all the neighbors of s a gets added once, then we check all the neighbors of a Each of these guys gets added once We check all the neighbors So the total running time is going to be the sum over all vertices of the size of the adjacency list of v. So this is the number of neighbors that v has And this is going to be? Answer? AUDIENCE: Two times the number of edges PROFESSOR: Sorry AUDIENCE: Double the number of edges PROFESSOR: Twice the number of edges for undirected graphs It’s going to be the number of edges for directed graphs This is the Handshaking Lemma If you don’t remember the Handshaking Lemma, you should read the textbook Six o four two stuff Basically you visit every edge twice

For directed graphs, you visit every edge once But it’s order E. We also spend order V because we touch every vertex So the total running time is order V plus E In fact, the way this is going, you can be a little tighter and say it’s order E. I just want to mention in reality– Sometimes you don’t care about just what you can reach from s, you really want to visit every vertex Then you need another outer loop that’s iterating over all the vertices as potential choices for s And you then can visit all the vertices in the entire graph even if it’s disconnected We’ll talk more about that next class That’s it for BFS