Week 5: Friday – CS50 2007 – Harvard University

Just another WordPress site

Week 5: Friday – CS50 2007 – Harvard University

all right it’s good morning welcome to Friday of week 5 so yesterday with shuttle boy his ninth birthday he was born in 1998 so hence the cupcakes some of you may have realized so that we’d have enough quantity some Twinkies might have slipped in there too but otherwise help yourselves on your way out as well so how many of you have made me spent the past few moments staring at this thing may be crossing your eyes just a little bit hoping that some magic I kind of thing is going to pop out yeah okay it’s not one of those so but there is something hidden in there so problem set for which will be released tonight even though ps3s do tomorrow is quite fun I think among the challenges it poses to you is to give you this image which is a BMP file bitmap file and somewhere hidden in there is a a the answer to a murder mystery and only by writing a piece of software that’s going to decode this thing will you be able to unravel that mystery so it’s not one of those magic eye puzzles but if you ever had as a kid one of those little plastic pieces of like red red plastic that you could hold in front of a message like this it would then reveal to you the answer so that’s essentially the idea that we’ve implemented here in software so if you still have your little 3d glasses those two might work on a puzzle like this but take a look at the spec when it’s released for more on that all right any questions before we move ahead all right in an English sentence what’s a pointer and we seem to have a few parents in the audience today so you better impress what’s a pointer what’s a pointer this is actually pretty meaningless because what parents watch this what’s a cupcake say you get nothing with that either so all right so a pointer all gets you started so a pointer is an address in memory all right so why how about lets us get any more interesting question why are pointers a potentially dangerous feature of a language like C what can you do with these things that perhaps isn’t such a good thing yeah yeah so you can read from or even right to even worse memory that is not your so to speak memory that doesn’t belong to you and later in the semester we’re actually going to come back to this topic in our secure coding lecture will we’re look quite specifically at something called the buffer overflow exploit which I’ve mentioned a few times already but we’ll actually see mechanically how something like that is achieved and how you can in fact take over a system simply by exploiting some programmers mistake with regard to pointers so one other piece of functionality that’s useful to know when it comes to pointers this file here is pointers 1 dot C it’s among your handouts from this week among your printouts and this demonstrates something called pointer arithmetic so we know from Wednesday that they’re sort of this equivalence between pointers and arrays whereby you can access and arrays elements using that square bracket notation but we’ve also seen that you can actually get at some of the contents via pointer notation using the dereference operator the star operator well it turns out that we can take that one step further and you don’t need to be totally comfortable with this just yet because we’ll see it again but know that we can do the following so this program pointers one in this blinking line clearly gets a string from the user we’re then doing a sanity check saying if s equals equals null just return immediately and get string or call can return null if something goes wrong with jet stream what’s one thing that could go wrong in the process of getting a string from the user yeah sure so if that user hammers on the keyboard for quite some time and types in way too many characters while our get string function is we’ll see today isn’t going to be able to handle that and rather than turn just part return part of that string it’s going to return none of it at all and it’s instead going to return null but assuming everything goes okay clearly the comments suggests that we’re going to print this string one character per line but what’s interesting for now is how we’re doing this so typically if you wanted to print out the eighth character of a string called s what would you write printf of you know % c comma what s bracket i write pretty straightforward been doing that for a few weeks now well it turns out that you can also do this via the pointer that really is s so recall that string is just synonymous with char star so that’s why I’ve written today the string s is really typed to be a char star but now notice a few lines down from where my cursor is here notice that what I’m printing is indeed a character

but what’s going on with star parenthesis s plus 1 I mean that will too will work but take a guess as to why one yeah yeah exactly so remember that star s is whatever’s at the location that s is pointing at and so for the zeroth character in the array well star s is literally the first character in the string but if you want to then get at the next character and the next character you need to do an offset so plus one plus one plus one so what I’m simply doing here is simply saying star of the address S Plus this offset go get what’s at that memory location and so the compiler goes ahead and essentially treats that as you know as though you were accessing this thing via an array and in fact it actually works the other way around this bracket notation that you guys have been using this far I mean what the compiler really is doing is translating that shorthand notation into this which is a more accurate implementation of what’s going on underneath the hood let’s see it in one other context in the file called pointers to dot C we have the same idea using pointer arithmetic and line got a little long here given my font size but take a look at the first line of code here and this will perhaps make more clear discussion that was going on in the bulletin board recently this first line of code in English does what what does it do yep perfect it declares an array and it statically initializes it so to speak with those five numbers so the nice trick here if you haven’t seen it before is that if you’re declaring an array and simultaneously giving it some values you actually don’t have to specify its size within the brackets we could in fact if I really wanted to be anal and clear I could say that and that would be equivalent but some of you one of you noticed on the bulletin board if you pull up this thread if you haven’t seen it yet that this creates a problem if you say give me an array of size four but here are five numbers for it what you’re going to get some kind of warning most likely from the compiler and even if the thing does compile you’re probably going to what what’s going to happen yeah so you’re going to lose the last number that you’ve tried to provide all right so that’s a bit of syntax now let’s move on to the pointer stuff so down here notice that I’m printing out printing out first the size of the array is % d so this is kind of cool we’ve seen size of before but thus far we’ve used size up to print the size of data types all right well you can also use it to print out the size of a data structure like an array albeit with some exceptions if I simply put size of numbers at the end of that line what the compiler is going to do is figure out how many bytes does the whole array numbers take up and it’s going to return that value so that you’re going to tell us how many bytes constitutes that array the second line of code there meanwhile is printing out what instead well obviously the size of each element now that’s an easy one right just getting this eighth character or the zeroth character of numbers or the zeroth integer and numbers obviously it’s going to be of what data type based on what we’ve seen it’s an int so what should that return for so it returns bytes recall so that value should be four and then finally one line of craziness here in this loop I’m initializing some variable I 20 all right that’s old school now n equals size of numbers / size of numbers bracket 0 what’s that doing yeah you know so it’s dynamically figuring out how many different elements are in the array because if the first call size of numbers returns the size of the whole array in memory so how many bytes it takes up / the size of just one of those things well that tells you mathematically how many different things there are in the array but here’s the gotcha this works in the function in which you declared the array if you start writing more complicated code that passes arrays around as parameters that’s not going to work because unlike Java for those of you familiar C does not remember how big an array is outside of the scope in which it was initially declared okay so calling size of on an array that’s been passed as a parameter some function is not going to behave as such so realize you can use this here but not necessarily in all contexts but finally the last piece of magic this last line here is printing each element in the array will buy a percent D and notice what’s happening here star of numbers plus I well what’s numbers to be clear be as technical as possible and what is numbers perfect it’s the pointer to the first element in an array and that element happens to be an int because the pointer itself was declared as a pointer to an

int right now that’s not totally obvious but recall the equivalence of these things right that’s essentially what we’re seeing there as well alright so back to the original why there’s an interesting issue here though so in this example we’re doing numbers plus I and doing essentially plus one plus one plus one but numbers is in a dress and we have entz in this array so what should happen if you take the starting address of that array and then just add one so take the address plus one are you going to in fact get the second element in the array thinking about how big is an int though you miss a misaligned so that seems to be a little dangerous here right if these are in sand I know these things are of size four don’t I really want to be doing this so I is incrementing by one each time right because of the I plus plus that’s wrapping around on to the long line there so don’t I actually want to be doing plus for each time essentially so I times for not just plus I ok so it’s a perfect observation turns out the compiler fixes that for you so the nice thing about this so-called pointer arithmetic is that the compiler figures out the size of the data type to which you’re applying this plus operator and in this case it would actually add the numeric value for even though visually it looks like you’re just adding plus one plus one plus one so enough on that for now but just tuck that away as a feature that will likely revisit but are there any questions on what just happened if anything at this point it hopefully makes a little more clear exactly what is in fact going on underneath the hood alright so as promised we can finally now take a look inside of cs50 library so this is something you’ve been using for quite some time some of you have already sort of ditched these training wheels so to speak and begun implementing user input you’re on your own frankly there’s no reason not to continue using CS 50 s library for user input throughout the rest of the semester it makes much more much easier a process that C does not out of the box make very accessible getting input from the user but let’s actually use this as an interesting opportunity to exist see what’s been going on with some of these functions so this recall is cs50 dot H and recall we saw this weeks ago so we’re declaring a boolean data type using this syntax called enum which allows you to declare a bunch of variables in between those curly braces and the way enum works for future reference though you haven’t had to use this in problem sets yet is that enum of signs automatically the value is zero then one then two then three then four to any of the tokens that you put in between the curly braces so that is to say what does false actually equal in reality zero and true is one so it’s a nice piece of syntax to save me the trouble of actually signing that explicitly Mia here meanwhile is our synonym get char is implemented with this declaration defined with this declaration there’s get double there’s get float I mean there’s really not much going on inside this header file but to be clear why have you guys been including sharp including this file for all these weeks I mean what’s the point of doing that in the first place well think about those function prototypes that you guys have been writing yourselves for some of your functions what happens if you when writing a program like 15 that has multiple functions or any program you’ve written yourself thus far that has multiple functions if you don’t either put the function itself above main or equivalently you forget to declare where the prototype that function above main what happens what’s that good yeah exactly so GCC yells at you and says implicit declaration of function something or other right and you can fix that by adding the prototype or just moving your functions around well the way sharp include works is literally like a copy paste anytime you’ve had a top your file a line like sharp include in this case cs50 eh what this is literally doing is taking the contents of cs50.h and pasting them into this C file so that when GCC then compiles this file it has not only all the stuff you wrote but the contents from cs50.h as well thereby giving you sort of automatically all of those function prototypes at the top of your file so that’s what’s been going on there and even in problem set three or four you’ll see us increasingly using header files to sort of factor out function prototypes and maybe data structure declarations just so that we can start organizing what are increasingly large files more effectively so this is now the C file let’s scroll down to get int since that’s perhaps one of the most common ones you guys have used so here is get int implemented in cs50 s library and

most of you probably haven’t seen this before so get int returns an int that’s pretty easy takes no arguments we knew that all right what next happens well I’ve declared a few variables atop this file and then I appear to have this infinite loop so hopefully there’s actually a break statement somewhere in here that actually gets me out of this well what do we do well we actually within get int call our own get string just because we already spent all this time writing get string which gets a whole line of text from the user why not leverage the code we’ve already written well the next line is doing this sanity check if for some reason get string flakes out and doesn’t return to us a valid string will get int shouldn’t return a value as well and there’s a tricky thing here what typically has been the value we return when there’s some error in a function yeah like one maybe negative 12 you know anything other than 0 the problem that we’re doing that with implementing a function called get int as what I mean those are integers right and those are very reasonable numbers for the user to be able to type into your program so you need a sort of special value and arbitrarily we decided to return this constant called int max which is defined in one of those other header files which really is the biggest possible impossible so we are sacrificing essentially the value to to the 31-1 you physically can’t use that number in your program because we’re using as a special marker to indicate that a problem happens so maybe not ideal net we’re wasting a value but at least now we use ur by checking for that return value by checking if get int returns int max which is again just a constant you can at least check for yourself did something go wrong rather than just in confusing return one with a valid number potentially now here’s the magic of get int that’s as all there is to it we’ve seen this scanf function on wednesday scanf or rather we saw scanf this is sscanf but what did scanf do you recall yeah it gets a line of input from the user and you tell scanf what type of data to expect so for now just assume this function is the same what I’m doing is I’m saying to this function alright expect on the line you’re given an integer and then maybe some character now that’s a little strange but let’s see what’s going on this syntax here so n recall is an int to find up here and see is a char why am i passing to this function scanf or sscanf ampersand n an ampersand see again think back to wednesday yeah so it’s the address of operator and why is it crucial to pass the address of these variables to sscanf rather than just n and c so that you pass by reference but again who cares so it knows where to look and more importantly it knows where to put the values that the user typed in right it’s the same issue with those swap functions that we looked at a few weeks ago that was just broken right when you pass by value you don’t have the ability to change a variables value if you pass by address aka by reference you do have that ability and it really it boils down to that same idea so this bit of trickery here is just so that we can detect errors so ideally the string that the user has typed which is called line is going to contain ideally just an int and so we use percent D so that sscanf looks at that string says here’s the first int i find i’m going to put it into n for you but we also wanted to be able to do some error checking so if the user screws up and types in the number 5 space foo because we don’t want to just return five because maybe we want to be able to detect that the user was messing around it did not provide just an int so the trick that we’re using and this is a more say sophisticated use of this function is we’re declaring that scanf should also potentially expect % see why are we doing that and given the goal that I just had in mind what’s going to happen if the user does type in five space foo as opposed to just five so scanf will virtru okay true but let’s ignore return values for a moment focusing just on why i’m specifying both % d + % c or push so that you can tell but how what’s going to happen if the user also typed in foo after the number five exactly so the character f if the user has messed around like that in typed in not only an int but some character or characters the first of those characters is going to end up in % c or rather the variable that that’s meaning which is the variable C and it turns out that scanf and sscanf return to you the caller the total number of

variables that were filled with values from the user so the idea here is that if only one of those variables out of N and C was filled with the values sscanf is going to return one and that’s a good thing i’m going to go ahead and return the value n but if any other number was captured namely to something went wrong and I’m going to bail and I’m going to force the user with this last line of code to retry retry retry and now here’s one thing worth pointing out but will again come back more to memory management and future problem sets get string it turns out all this time has been leaking memory so to speak right get string if it’s going to return to you a strain that the users typed we’ve been allocating space for that in RAM but you’ve never been returning it to the operating system you just keep getting string after string from the user and you yourselves odds are have never freed that memory well we’ve been more careful underneath the hood with say get int if anything goes wrong well suppose user types in the number five thousand at the prompt and hits enter how many characters presumably did our get string implementation allocate for five thousand careful 5y five for the value five thousand so it’s a null-terminated right because when the user types it it’s still a string we’ve not condone any kind of conversion so 5000 backslash zero is five total character so get string presumably and if you look closely does in fact allocate dynamically using what function to tie everything together malik five bytes five charge for that string the problem is if something ever goes wrong we want our function to free up that memory so know for today that the means by which you do that is simply calling free on whatever pointer is holding the address that Malik returned okay so any questions on get sent you don’t have to totally understand everything that’s going on but at least now you should be able to you should be able to pick up on how this library has been functioning all along yeah so excellent question where is the conversion from string 500 backslash 0 happening to the integer we know as 5,000 well turns out that’s one of the features of scanf and sscanf so whereas for the Caesar cipher most of you guys probably used a tie and manually converted things yourself SKF does that for you and it does it in mass for a large potentially number so that’s happening there and now just to be clear the difference between scanf and sscanf is that scanf reads its input from the keyboard sscanf reads the input from a string which is why i’m able to pass sscanf line rather than expecting the user to type something else altogether any other questions ok so if you’re curious and you kind of like understanding how some of the stuffs been working underneath the hood by all means check out the rest of the library the path to it is on this particular slide all right so two final features both of which you’ll employ for problem set 4 we promised a while back that you have this ability with C to declare your own data types because the only data types you get out of the box are in char float double char star all these pointers as well but you don’t get anything particularly interesting anything particularly large so suppose that you actually wanted to implement a program that somehow manages a database worth of students well if you have this program is if it’s supposed to work for say twenty different students well how could you go about declaring enough space in memory so that for each student you can store their ID number their name and their house I mean to up until this point what might you have to do we know how to get a big chunk of memory so what could we do if you wanted to store all of the users ID numbers yeah so right so int let’s call it IDs and we’ll call it size 20 or we could figure it out dynamically but if we also want to keep around every student’s name we’re also going to need to do something like string names bracket 20 and even that is just really the pointers to those strings we haven’t even actually allocated space for the strings themselves but then finally for houses we could do this so the problem though with this approach is that there’s no inherent linking of each of these data structures together I mean you yourself can just infer or can just assume that I DS 0 is the first students ID and names 0 is the first students name and how is it 0 is the first students house and repeat for all of the other students but this very quickly becomes unwieldy particularly because I mean this does not scale very well for a database that has more fields than just ID name and house and you can imagine how much data

at FAS maintains on each of you in its database well structs are the answer we now have the ability to define your own data structure and you can plop in this structure any primitives or even any other data structures that you want the syntax is as simply is simply the following you specify typedef struct and then in between the curly braces you enumerate all the different fields that you want this new data structure to have in this case an ID a name and a house and then outside the curly brace you specify the name that you want to give this structure and henceforth what I’m now able to do is to declare 20 students in memory I can get away with declaring that and that’s just it in one fell swoop what I was enumerated line by line by line separately moreover inside of each of this arrays elements is an entire structure for a student but we need a beat a piece of syntax if I wanted to assign the first student an ID number well odds are we do that to access the zeroth student and can’t probably guess this but for those of you who already know what’s the syntax for ask accessing the zeroth students ID number yeah it’s dot right and we’ll see this in print in a moment that piece of code access is the 0 0 students ID number and then we can put it as say one two three four and that tucks away in the variable called ID which is part of this structure that particular number we can go one step further and say students 0 dot name gets get string right and I’ll just let without doing any error checking i’ll just let the user type in the name even though that might not necessarily be a safe thing if something goes wrong all right but let’s take a look at this in action so instructs one we have the following code so notice atop this file I’m including struck stage so already I’m sort of adhering to this principle factoring out some code like data structures or other constants except for this one students which is defined here is 3 right I just wanted a quick and dirty example so I hard-coded the value 3 as the value of students all right so this is declaring a class of students so to speak right the goal of this program is to model all of the students in a class that’s a small class only three students but that is equivalent to what I just did here for of course 20 students instead now what am I doing here well this thing’s an array the end of the day this whole class so I can just iterate over it and for each of these students IDs and names and houses I can just ask the user to provide them and then by the end of this loop I have an array of three students each of whose three members is populated with the everything the user typed in right so the code is very simple and concept and the only piece of new syntax we’re seeing is this so called dot notation and the ability to declare the struct in the first place in fact notice that instructs dot H is that data structure so it’s common practice they’re not required to put data structures in a header file especially if it makes your code more readable because it makes more obvious where some of your data structures are defined and now just for kicks I need an arbitrary example to illustrate the syntax I wanted to print out any student who happens to live in mather house ok so I’m iterating again over the array of students and then what’s this line of code presumably doing most of you probably haven’t used this function before but it’s just it’s documented in that website we keep referring you to and it’s pretty straightforward yeah exactly yeah so if the value of the ice students house which is a string is the same as quote unquote Mather capital m 80 h er turns out that this function stir comp if two strings are equal return 0 because that means they are truly equal if instead one is greater than or less than the other in terms of lexicographic order that is dictionary order it will return instead negative one if one belongs before the other or positive one if the other belongs after the other so you can look at the documentation for specifics but that’s the idea it does a comparison of the strings why couldn’t I just do class of i dot house equals equals Mather yeah exactly exactly we’d be comparing the pointer values which is not the goal we want to compare them things character for character and then finally notice what I’m doing this last line is an evidence of good practice once you’re done using memory that you have yourself

allocated which in this case you have because you call get string which you now know does call malloc we have to free every one of the strings we don’t need to free an int because that’s just a primitive it wasn’t a allocated with Malik but anything that’s a string or dynamically allocated an array you need to free up ultimately so that your system doesn’t eventually run out of memory now the bit of a white lie there is that when your program terminates it frees up all of its memory anyway so it’s sort of a moot point for these simple examples but i think i mentioned weeks ago have any of you ever had the experience where you’re running your computer without rebooting four maybe five days a week two weeks and just gradually the damn things getting slower and slower and slower even if you’ve quit all of your applications and nothing appears to be running one of the explanations for that is that whoever wrote one or more programs that at one point you were running during the week had memory leaks were they called malloc or the equivalents again and again and again and they never got around to calling free and either because of that or because the operating system itself didn’t properly terminate the process it didn’t free up that memory your computer thinks it’s out of ram it thinks Ram is filled with programs and data even though none of it’s in use anymore so why does a reboot fix that we’ll just restores the computer to its default state which is where all of them at most of the memory is free for you alright so let’s run this just to make clear so I’m going to run struck one ok compiled successfully I’m going to run struck one all right students I do give it a 10 call them David Mather all right to Joe Quincy 3 Jill photo ok so David isn’t math or so the code does appear to work but let’s see if we can break it so students ID will be one you know I’m not even going to bother typing a name I don’t have time for this instead I’m going to pretend like i typed nothing and as you saw i think in problem set 3 if you hit ctrl d that’s like saying into file to a program so those of you who worked on problem set 315 with our test inputs 3×3 dot whatever and 4×4 whatever and you were piping those files into your input well the only way the program knows that those files are ended is because the very last thing that happens when you send one file as input to another program is the last line essentially triggers this special character eof that here’s the end of the file you as a user can mimic that same idea and pretend like the file that’s feeding this program is done by hitting control D and notice what happens if I hit ctrl D it’s just preceding but then breaking right because what did I never have throughout my program I never mentioned null and in particular I never checked for null I got lazy with this example and didn’t check the return values of getstring right instructs one recall that what I was doing constantly was making this leap of faith and just assuming whatever the user typed got put into the array and got returned by bias address but again that might not be the case yeah hmm good question so let’s run it sews trucks 11 David Mather ok to Joe Quincy and doesn’t all right doesn’t it doesn’t really matter so apparently no one’s in mather this time because it’s not case insensitive so if you wanted to do that then you need to either implement yourself the idea of lower casing the string or upper casing the string to fix things or calling a library function that might do that for you stir to upper stir to lower I think you’ve seen some of those in the string library even if you haven’t used them all right so now a cool feature and this one’s going to be very useful for problem set 4 which obviously is going to have you decoding what’s in that sort of non-magic I puzzle and there’s also a second piece recall to problem set 4 in fact I went around campus earlier this week with an expert photography friend we had a gigabyte compact flash card in his nice digital camera and we took photographs of identifiable but non locations on campus they’re all in here unfortunately i’m an idiot and i formatted the thing so your task for problem set 4 besides that non-magic i puzzle is to recover as many of the photos on this compact flash card as you can I haven’t retaken photos using the same card so hopefully the bits that constitute all of my photos are in fact still on this thing since obviously there’s one of these and 300 of you you’re not physically going to get this but what we did was make a forensic copy of this compact flash card so that what you will get is a big file on nice fast that’s identical to the contents of this compact flash card as they are now in that way can you then poke around the insides of this file and recover as many JPEGs as possible and the icing on the cake there so to speak for that part of

the problem set is that you’ll have an extra week after submission time to with zero or more other students in your section locate as many of those locations that we shot on campus take a photo of yourself or someone in your section with your camera phone or your digital camera email it to a certain address and the section that identifies most of the photographs on the compact flash card will have a nice night out in the town with your Teaching Fellow some evening soon so there there’s the fun challenge so how do we get there when we need to be able to manipulate files so file i/o is actually not all that complicated thus far any time you’ve been getting input from the user essentially see treats the keyboard as a file called standard in or standard input you don’t often see this written because it’s just assumed but similarly when you print something to the screen see essentially behaves like you’re printing to a file that file just happens to be called standard out or STD out or standard output that is just in a sort of imaginary file whose destination isn’t the printer it isn’t the disc itself but it’s the screen so thus far you’ve essentially been familiar with these concepts but there are functions that specifically let you name files and open files from the local hard drive or from nice fast so let’s take a look now its trucks to dot C notice that this thing to also makes use of struck dot H so struck CH again defines a student structure the goal of this program now is to have whatever data I provide persist after the program ends thus far most if not all of you have written programs at the moment they quit any work that the program did any data that it created is lost because you didn’t save it anywhere and rams contents obviously are ephemeral they disappear effectively when your program quits not if you have access to file IO IO meaning input/output if you have the ability to write to disk obviously your data can persist so how do we do this well the top of this file is identical to before I’m just iterating over the class and I’m getting an ID name and a house for each student and again I’m being lazy and not checking for null for this purpose now I’m going to go ahead and print out anyone in mather so that code is identical but I’m going to go ahead now and save these students that have been provided to disk just so that later I can read them in or at least I can reference it as though this were a database that I’m accumulating on disk well how do we do this well the first line of code that’s relevant is this blinking one and notice that we couldn’t really express file i/o until we got two pointers for this reason when you call the function f open you first pass to it the name of the file you’d like to open and then you pass a string that represents the mode that you’d like to open it in and popular modes are going to be w for right or are for read so in this case I’m obviously writing a file called database for writing and because I didn’t specify any slashes any path it’s going to assume my current working directory wherever I am so maybe your ps4 directory okay the next line of code is more important these days than ever because I do need to check the return value of F open as to whether it’s null because if it is null that means something went wrong maybe the there’s no space left on the system maybe I can’t overwrite a file that’s still there or maybe I just don’t have permissions to write to the director I am in because I maybe CD to someone else’s directory and I don’t have write access there whatever the case you need to check for this but assuming it’s not null that is FP file pointer the convention here is a pointer to essentially that file not so much on disk but in memory currently because F open sort of gives you access to that file in RAM effectively but what I’m going to do now is iterate over my class and i’m just going to print every field from the student objects into that file not using printf but using f printf which is different only in that the first argument has to be a pointer to the file that you want to write to but notice if I instead change this to standard out that would be equivalent to writing that so that’s why this isn’t all that dissimilar to what you’ve been doing thus far finally I get down here I call F close on the pointer which closes the file which means I’m done it’s like quitting microsoft word when you have a file open now I go ahead and free memory as before so the only new code here are these lines here which I’d argue are pretty straightforward once you understand that they exist and what they’re supposed to do and ideally now in my current directory I should have a file called database that has all of those strings so let’s run it so makes trucks too will run strokes to all right students ID David Mather let’s do oh whatever get used to typing mailing a lot so that’s what happens Jill will be

in Quincy three will be Jack this time and photo okay so no one’s in matter nothing appears to have happened but if I do an LS there’s a lot of stuff in here but notice that there is now a new file called database so I’m going to go ahead and open that with nano and notice that inside of this file is line by line all of the data that the user provided now this isn’t this sort of doesn’t get the whole job done because I don’t seem to have any program in here and I don’t that actually reads this data in but it turns out you can do that as well in fact just like there’s sscanf and scanf get else get guess what also exists f scanf which allows you to read from a file line by line now here’s an interesting thing to note we’ve just written out a so-called ascii file a txt file your problems that three is going to involve binary files the difference being that ASCII files are characters alphabetical characters numeric characters punctuation binary files underneath it all are just zeros and ones so the syntax will be slightly different in terms of the mode that you need to open the file and you’ll open it in binary mode as opposed to the default ASCII mode but the spec will walk you through that and now the domain in which you’ll be playing for problem set 4 so perhaps the coolest job I’ve ever had before this one was working while I was a grad student for a few months as a forensic investigator for the Middlesex County DA’s office and my research in grad school was predominantly in security and so this job was essentially involved working with their full-time forensic investigator just down the road near lechmere is where the DA’s office is and the job for me and for my mentor was to receive on a daily or weekly basis hard drives and flash media and other hardware that the mass State Police or local sheriff’s had confiscated during executions of search warrants they would dump these things off in the office and typically say find evidence on this media what was kind of funny was that and they were wonderful people to work with but sometimes what would also be dropped off as things like mice and monitors and they’re certainly not all that much evidence inside of a mouse or a monitor but that is okay we just we had a big storage area but one of the but it was fascinating work because literally we would be writing tools or using tools to unearth on people’s hard drives files that they hoped were no longer their files that they had deleted because the funny thing is about computers and most of you have probably heard this already that when you delete a file by dragging it to your recycle bin or the so called trash can and then even empty recycle bin or trash can what happens yeah I mean still on the hard drive so in short not much it all happens when you delete a file in fact the fact that I formatted my compact flash card accidentally really means nothing because even formatting a flash drive or formatting a whole hard drive rarely means eliminating the data it typically means writing a few bites a few kilobytes worth of data structures to the very front of the hard drive the very beginning of the hard driver ash media just to sort of lay the foundation for what’s called a file system like NTFS or or fat32 or HFS if you’re familiar with those acronyms so the rest of the data is still kind of there effectively what happens when you start saving files to a disk and this is sort of file systems or CS 161 in a nutshell you have these things called file allocation tables fats and also directories in fact a directory as you all know in a folder that you double-click on your hard drive and such it looks like a folder it looks like a directory but what that technically means is that that directory is implemented in memory as a table so there is somewhere in RAM or somewhere on disk essentially like an Excel spreadsheet that has at least two columns the first of which specifies a files name the second of which specifies a files location on disk so if you have a file like your resume somewhere on your system aesthetically it looks like it’s in this nice pretty double clickable folder but what that folder really represents is something like this and underneath the hood it’s saying that resume doc if it’s in this directory is actually at location FF bb-8 something not in RAM but on disk so you have this directory that maps filenames to addresses file names to address it if we have another file like sa dot docx this might be at oxf FAA 32 somewhere completely different moreover it’s even possible for your files to be spread out among multiple locations right imagine a scenario where over the course of the week or a year you’re saving lots of files well then you delete some of the files in the middle so to speak of some of those other files and then you save some new files well your operating system Mac OS windows and whatnot are going to try to use that available space

so sometimes to fit your files into your hard drive or flash media it will fragment them putting parts in the free locations and then the rest wherever it can find space so that you’re not wasting space so again in a nutshell if any of you have ever been told to or have defragmented your hard drive all it means is to make sure that all files are completely contiguous you they’re parts around so that all of their pieces are back to back to back they’ll also say is today cs50 lesson it is of questionable performance value to bother defrag back fragmenting your hard drive so if any IT guy ever tells you go defragment your hard drive odds are that’s not going to fix much of anything there are other bottlenecks and computers today and hard drives are by God to get 200 gigabytes 400 gigabytes fragmentation of files is not as much of an issue these days but enough on that when you drag SI dot doc to the recycle bin and empty the recycle bin what happens the files no longer associated with the location which just means that this goes away but guess what on disk which will represent as one of those platters from say day 0 of week 0 this is still an address on disk so somewhere on disk is 0 X FFA a 32 and let’s say that it’s this part of the disk so that’s referring to this location on disk and you have in here obviously a whole bunch of zeros and ones it looks like it’s a slice doesn’t it that’s it’s kind of how it works so what what’s actually happening when you delete a file nothing in this side of the world the only thing that typically happens is that the directory entry gets deleted so the computer forgets where the file is but all those zeroes and ones are still there all those web pages you visited are still there all those URLs you visited are still there even when you empty your cache or empty your recycle bin so one of the challenges for problem set 4 is going to be X to exploit this reality in today’s file systems for compact flash cards and also for the real world hard drives and other such devices that all of my JPEGs hopefully are still on that compact flash card and ideally on your forensic image and problem set for one power you to go recover them for us so that said we’ll see you on Monday you