Ep. 11: Yandex Practicum Lesson 4.66: Group Analysis 3

Just another WordPress site

Ep. 11: Yandex Practicum Lesson 4.66: Group Analysis 3

what’s up guys and welcome back to the tutorials we’re going to continue with module four like we did last time just pick up where we left off looks like we’re showing on the in operator so here we go let’s start it okay the field genre in our table has the list type let’s try to figure out how to filter the table based upon that the problem is that fields with the type list can contain a multitude of values and therefore cannot be tested mathematically that is why Python has a special in operator you can see in this function area drama in this list you can see drama is one of the elements in the list so it’s returning true for that one and then it’s looking for comedy in this list and since you only have drama crime and history comedies not in that list it’s returning false okay read as the element drama is on the list sci-fi drama this statement can only have a value of either true or false this means the in operator and conditional constructions naturally complement each other so you knew if crime in Birdman genres print Birdman has the genre crime otherwise don’t okay so let’s fill this table by genre in order to do that right the function filter by genre it should include the two arguments the data table and the name of the genre it should return a completely new table that includes the films from that specific genre once you’ve written out the function logic called the new function in pinpoint every mellow drama film okay so we’re just going to repeat what they had before we’re going to check for every element in the list we’re going to see if Oh if genre is Oh Your Honor in element okay so it’s going to loop through every list in the table so it might grab this whole Rain Man column and it’s gonna say is a certain genre in this element and we don’t actually want to check this list we want to check this list which is the 0 1 2 3 4 index so we should do element 4 right so it’s gonna check every every element of this list here and check to see if the genre that we give it as a parameter is in that list okay so if the genre is in that element we’re going to add it to what we’re going to return so we’re gonna do to return equals an empty list if this is Python I should use the notation like that that’s the variable name and then we’re going to to return dot append the I think it what’s the whole row right let’s see you should Kim you should return a completely new table that includes films from that specific genre yeah I think it I think it went to the whole row so I’ll do return I append the whole element okay and then when we’ve loop through all of them and check to see if there’s onra if they’re in that genre we’re going to return the list that we’ve created which is to return and then they want us to print the table and right get the melodrama category so we’ll do filter data equals filter by genre and we send it the Oscar what is it Oscar nominees let’s see what the table is called Oscar data Oscar underscore data and the genre that we want is melodrama right and see yeah mellow drama okay so let’s play that and see if we get some results looks like it doesn’t say to sort it anyway I don’t think so we’ll just check that correct next some genres are more common than others let’s try to find the most popular ones create a table and saved in the variable genre account it should have two columns on her name and number of films we’ve collected all the genres in the variable all genres first fill in the table with the data then sort the number of column in descending order I’ve already written the code to process the data of the following format okay so they give us this the whole table they give us this function that’s gonna filter by genre and they give us this all genres list and we create a table list of lists and for each genre in all genres we’re going

to filter get the data for just that category which is filter by genre we send it oscar data and the genre we want which is genre case that’s how we get the filter data for just that genre category then we’re gonna calculate the length of the filter table so we just do length of filter data okay and then add the genre name end results of the calculation to the table to do this use the function append okay so what’s the table called that we’re adding to John R accounts okay so we do genre counts dot append the genre name is genre we actually want to append the whole list right so we need to do a square bracket and then we do the genre as the first column and then we do the count as the second column okay now sort the table in descending order so we do genre counts pilot it’s outside the loop so we’ve gone through all the genres already and so now we’re ready to sort it so we do John R accounts dot sort we’re reverse is equal to true okay and then I think you should print it so let’s run it – we get oh we didn’t give it what key to use right it’ll probably sort by the name and I also have a typo looks like genre Oh Jean res Cal okay and so the key we’re going to use is again a lambda Rho I got the B I guess lambda Rho : Rho 1 right cuz 0 would be the genre we want to sort by the count which is the first index or I mean you guess one index okay so we play key equals instead key : it’s my bad typos now looks right it looks like dramas a number one by a long shot correct next multiple functions within a program we spend a lot of time writing functions time for us to use them Python allows programmers to use several function in a single program in the last lesson we obtain the following results they give us our table again we’re only going to focus on analyzing the five core genres the rest are either a rare or two dominant now let’s figure out which characteristics correlate with each of the five genres we’ve already we already have functions that create supplementary columns of turn on investment cost per minute now let’s add them to the output table to organize the genres better okay so we’re gonna add the function call add ROI and then add cost per minute pass the oscar data output table to both functions as a parameter then calculate the rating running time and ROI and cost per minute mean values for every genre use the column mean function to do so okay so here they’ve done all the functions that we’ve already written in previous sections and the first thing we need to do is add the columns for ROI and cost per minute of the film to the table to do this use the functions at ROI and add cost per minute okay so we add our ally let’s see what ad roi takes it just takes the Oscar Aida it looks like then ad cost per minute of the Oscar data let’s make sure that it only has one yeah ad cost per minute is taste just the data okay then genre means filter the table by genre okay to filter by genre we’re going to call filter underscore filter underscore

by underscores onra with sending at the Oscar data and the genre that we want is the genre we need to store this in something as we can do filtered equals filter by genre calculate the filter tables means so should be a function for that right yep column mean there’s four mean and we send it the filtered and the column number is 2 they give that in the comment and we do the same thing for length here column mean filtered 3 column mean filtered seven column mean filtered eight okay then it should have pinned all of them and then print it so let’s see if that works seems about right or he supposed to sort it or anything doesn’t look like it I’ll just check it correct next we obtain the following results melodrama melodramas are the most expensive the most profitable films they feature the biggest stars making them expensive to film as well as ranking in enormous revenues at the box office which with the massive busts budget and box-office gross Titanic add a gigantic influence on statistics overall crime films and thrillers are the shortest and least profitable they just can’t handle the pressure for too long history films are the least expensive to reduce it’s off it’s also possible that historic films have a more reliable fanbase so we continue lesson 12 the not operator so I think I used this in a previous lesson that I was wrong in but I think is because of the word weirdly but we’ll see so our list only includes one film that isn’t a drama let’s try to dig it up using the not operator will help us accomplish this task it can reverse any given condition by changing true to false and vice versa so you can see not true as false and not false is true we’re example we can rewrite the function in order check the film as new released after 2007 so if it’s not so so first it’s gonna check this condition right is the year greater than 2007 so let’s say it’s 2008 okay so normally 2008 would return true right because 2008 is greater than 2007 but you have the not here so it actually return false and so you get old films there’s a lot of like like you know how people say you shouldn’t use two negatives like like let’s see like that that ain’t not true or something like that where they use not twice which would mean what it’s actually is true because it’s double negative it’s that sort of thing okay so finish the function filter without genre I should toss all the films of a specified genre from the table to the cutting room floor the arguments of this function of the table of films and the genre that needs to be excluded you don’t have to change the output data just let the function produce a new updated table okay so we scroll past the initialization filter with out John rabe okay let’s see if it’s onra in genres justine do it John or not I guess you put the knot before the end okay so maybe we do genre in genres that’s the first check

right here to do if it’s not in there then you append it okay let’s see if they that’s right they said there’s only one film right looks like it’s Chicago correct next the and and/or operators there are two additional operators used for building more complex conditions in Python and an or the in operator combines two conditions into one only returning a true result when both of the included conditions have been met so you can use they have true and true returns true true and false turns false false and true returns false false and false returns false okay let’s use this operate in order to find every historical film released after 2000 so you see they have to do a check to see is it history and they also to check is it released after 2000 so they have the and operator in there okay you could potentially just do two if statements like nested within each other and you wouldn’t need to do the ant statement the end statement just likes it a little bit faster so you don’t write two different if statements the or operator also combines two conditions into one however the structure will return true when at least one of the specified conditions has been met so you have true or true true or false returns true so you can a true or true returns true or false returns true false or true returns true and false or false returns false so only if both of them are false it will return false now let’s search for every fantasy and sci-fi film in our table okay so there at the code for that so now they want us to write the function filter suspense that will compile the most suspenseful films meaning those of the thriller or horror genres the function should accept the source as source table as an argument return it filtered but unchanged okay so we filter that will be filtered equals a new list and then for each elements in Oscar data we are going to so we have a whole list now we need to grab the genres which are equal to the elements let’s see which index character zero is the title one is the year two is the rating three as the length four is the genres and if let’s see thriller let’s see if thriller in genres or I’ll put parentheses around this just to make it clear you don’t have to that’s the first condition if thriller is in the genres or if horror is in genres then we want to append element to filter okay and then we want to return filtered okay let’s see if that works see check it make sure a filter spins function runs when the right condition is moved Oh thriller and horror genres okay so I’m using the wrong key we’re here we should have and not org it’s like only Silence of the Lambs huh interesting

the most suspenseful films beating those of the thriller and horror genres the function should accept the source table as an argument return it filtered but unchanged a second but I figured out the reason that I wasn’t working is for one I had the list up here as Oscar data when it should have just been data here so we shouldn’t be using the global variable we should be using the parameter that they send us so so originally I had Oscar date here now it’s just data and in addition it wasn’t it was supposed to be or instead of hands so I just switched the end and or ran it and it says I completely completed the task but again I think it’s just because their description is a little weak there that that’s that’s why we couldn’t get it right off the bat okay so now we go to the second question to analyze films by their years of release we must first learn to delineate ranges from the table for instance from 2000-2010 right the function filter first decade to filter the table by the Accra mentioned condition the function should accept the source table as input and return it filtered but unchanged okay so we go up here and we do the same thing we did before we do filtered equals an empty list and then for each element in the data we are going to grab the year right and so the year is element at so zero is the title one is the year okay and so we’re gonna check if the year is less than 2010 and year is greater than 1999 then we’re going to append the element filter dot append elements okay and then we’re going to return filtered okay so if we play that and check it okay so it’s probably like either supposed to be less than or equal to or something like that let’s see if they give us an example here 2000 to 2010 so maybe a greater than or equal to 2000 and it has to be less than 2010 right cuz that would be the second decade 2010 to be the second decade so let’s try that and see if that fixes what they’re not happy about nope I just played around with the greater than or equal to or less than or equal to conditions and this was the one thing it doesn’t make a lot of sense to me because 2010 seems like it’s in the second decade right so I don’t get that one but that’s what they wanted so next okay a brief eating conditions before moving on to analyzing the film’s by ear let’s take a look at some of the convenient ways to abbreviate our coding first let’s try to write a function that will place films into one of three categories short medium and long the this code has a nested condition we can simplify it using the keyboard else if so you can see the LF condition she lets you basically do a third condition on top of just if or else and the condition length is less than or equal to 130 and length is greater than 120 can also breathe previewed it as 120 is less than length which is less than or equal to 130 and the end our function is going to look like this okay so now they want us to filter by year two selected Corp films a quarter of the year should return a table of films released within a specified year range data about the film and limits of the set period service input arguments to the function films released in the latest year of the period should be excluded whereas those released in the first year should be included this is done in order to ensure that no periods will overlap use abbreviated coding for your conditions okay so we do gonna do filtered equals empty list and then we’re going to go for

elements and data so we’re going through each list of the table we’re going to get the year for it which is the nut like one index so it should be data or element at one okay and so if the year is so so I guess we do begin is less than let’s see if it’s less than or equal to film’s release in the first year should be included so less than or equal to begin with year which is then less than end so if that conditions met and we can put spaces around it because they like come out make it look so clunky then we can append it to filtered okay and then we need to return filtered okay so if we play and check that I’ll see typo somewhere should be an L here okay check that correct next now I want to pool everything together we obtained the file following outputs the film production is in a downward spiral however that I should be short side airs we haven’t properly considered the context of time period and market economy more accurate hypothesis would be that the Academy is moving away from preference for commercial successful films to what is commonly referred to as festival pieces okay so you see more box office grosses from 1988 to 1998 okay and then they have the conclusions for the data here ah screen films are becoming shorter cheaper to produce and their box-office profits tend to be dropping as well as ratings from viewers they’re shifting their preferences are shifting towards artsy films and away from money-making blockbusters milah dramas are the most expensive in profitable genre stemming from its innate popularity and outrageous cost of casting big-name actors Titanic has greatly impacted the stats crime films thrills our shortest film which are in the least amount of money what’s next we’re gonna talk our first business task will mirror the real-life analyzing data from Yandex Music and conducting experience / this service you’ll become acquainted with the pandas library and will allow you to advance from the training tables so with that I’ve seen of this less we can finish this module so thanks for watching and see you next time