How fast growing apps move quickly with Crashlytics (Firebase Summit 2019)
[UPBEAT MUSIC PLAYING] MICHAEL REED: Hi, folks Thanks for coming We’re here to talk about Crashlytics and how Crashlytics can help you move quickly, even as your app is scaling I’m Mike Reed I’m a developer on the Crashlytics product SHOBHIT CHUGH: And hi, I’m Shobhit Chugh I’m product manager for Crashlytics To compete, you must move fast You must release features as fast as your competitors The stakes are high Now, if you’re a fast growing app, you might remember the early days when you just started off And you started with an idea, you ran some experiments, you started to get some initial traction, and then the idea started to catch You got more users, more funding, more revenue You hired a bigger team But with this growth comes competition, and you’re no longer the only one in the market So to compete, you must keep moving fast But as you grow, you often become more risk averse And we’ve found four factors related to app stability that cause teams to slow down as they scale Let’s look into them First of all, even smaller issues can affect a lot of users Crashes have a real downside We have found that, when leaving one-star reviews, Google Play users often mention stability issues or bugs And so you might be tempted to just slow down as a result and try to fix every single issue How do you make sure you keep an eye on important issues and keep moving fast? Second, you face more edge cases Once you have more functionality, more complex code base, so many users, and so many variety of devices, a lot of edge cases start to emerge You might have never seen these edge cases when you were building the app, when you were testing it So how do you make sure you quickly triage, reproduce, and fix these edge cases so that you can keep moving fast? Third, your effort to just release and monitor goes up dramatically I mean, you might spend a lot of time just keeping an eye on your test, your alpha, beta, gamma, theta production releases And it could become very onerous So how do you make sure it does not become onerous and you keep moving fast? And last but not least, the number and variety of issues that you face that impact your app’s stability increases Your issue list could look like an endless to-do list, which is what my days look like most of the days But let’s talk about Crashlytics instead So how do you make sure you don’t react to crashes one by one, but instead systematically start addressing root causes so that you can keep moving fast? So here we are You’ve grown You’re a successful app, but have started to move slowly Now, you might think of Crashlytics as a place where you come to see how bad things really are But what if you used it as a tool to ship faster? Using examples from real customers, we’re going to tell you how to do just that Let’s begin the first topic For the first problem, your tolerance for shipping [INAUDIBLE] to customers for instability has decreased, because even small issues can affect a lot of users But sometimes, in spite of all your code reviews, testing, you might ship a version to production that might start crashing And you don’t want negative app reviews or customer support tickets to be your first sign that something’s really broken Instead, you want to know immediately if any issue has become a major concern so you can do something about it– fix the code, turn off an experiment, roll back the release– do something about it But also, since you have so many different issues to deal with– so many different crashes–
you want to be notified just at the right time So for example, for low and medium-priority issues, you might just want to monitor passively But for high-priority issues, you might want to get notified immediately– even get paged, if needed Crashlytics offers three kinds of alerts that maps to this priority order There’s new issue alerts, which occur when a new type of crash occurs for the first time This is low-priority at this time We’ve just seen the crash for the first time Then there’s regression issues You fixed a bug, marked it as closed And then you ship a new release– and again, the app starts crashing in the new version Low to medium-priority And then there is velocity alerts, which are a true signal that there’s something significant wrong in your app Let’s see how they work So what we’re doing is Crashlytics is monitoring every single issue in your dashboard to see what percentage of user sessions it is affecting When a crash starts to affect more than a certain percentage of users, we raise up a velocity alert And by default, this percentage is set to 1% That’s a significant issue Earlier this year, we also launched the ability to fine-tune this threshold so that you can change it based on your app size, your workflow, your particular needs Now, let’s see how a customer uses these alerts to keep moving fast Swiggy is one of India’s largest food delivery services, and if the app crashes, users cannot order food, and Swiggy loses money Speaking of food, I’m starting to get a little hungry But the Swiggy team wants to know about all issues, but they want to focus on the most impactful issues first So in order to do that, they connected velocity alerts to PagerDuty and JIRA, which are out-of-the-box integrations that Firebase offers This way, if something is truly wrong, the on-call person will be notified– will be woken up in the middle of the night, if necessary– and they will have a JIRA issue looking for them to investigate In addition, Swiggy also wanted to make sure that their engineers were kept in the loop for lower-priority issues So that is why they connected new issues, as well as regression alerts, to rooms in Slack And they created separate rooms for their iOS and Android teams so that they can passively monitor, triage, and pick up any crashes, especially if it’s related to code they own and they might have changed recently This allows Swiggy to keep shipping fast with the confidence that they’ll be notified in the right manner for high-priority crashes as well as for low-priority crashes Now, that was keeping an eye on issues even as you ship fast To tell you about edge cases, here’s my teammate, Mike Reed MICHAEL REED: Thanks, Shobhit So let me make sure I get this straight– the velocity alerts become pages, and the new issues go to the Slack room SHOBHIT CHUGH: You catch on fast, Mike MICHAEL REED: I think for my temperament, I should hang out in the Slack room I don’t like to be woken up at night SHOBHIT CHUGH: As long as you don’t send me any messages, you’re cool MICHAEL REED: Deal Handling edge cases So as your user base grows, they’re going to experience bugs that you did not discover during development We’ve all heard this joke, right? “Works fine on my machine.” But usually it’s not a joke, right? And I bet we’ve all spent countless hours trying to debug issues without enough information And I know I found myself wishing I could meet my end users I’d ask them questions– what were you doing? What steps had you taken prior to the crash? Were there items in your shopping cart? But I can’t meet every end user, and I can’t ask them all these questions So how am I going to debug the issue? Well, we want to know what caused the crash And we can see this through the stack trace, right? It shows us which line of code caused the crash But stack traces aren’t the whole story What’s also important is the state of the application and the steps taken leading up to the crash And Crashlytics can help gather this data So we have three tools that help developers capture state and sequence of application usage prior
to a crash So let’s see how we can use these tools to find and fix edge cases quickly So back to our questions for the end user– one of them was, “Did you have items in your shopping cart?” And wouldn’t it be nice if we could capture this information, and enhance our crash reports by adding it to our crash reports, and sending it to our server at crash time? Well, that’s a leading question– a trick question– because we can do that You can do that It’s a feature Crashlytics calls Custom Keys And here we see how easy it is to do It’s a function You call the function You name your key, and you programmatically set a value for that key And our SDK will capture these key value pairs and send them to our server with your next crash So, OK– cool We see it’s easy to instrument the code Where does the data go? Well, the key value pairs are visible right next to your stack trace So now, at crash time, no more regrets that you can’t meet all of your end users and ask them a hundred questions, because we’ve captured custom application state of your choosing, and it’s available at crash time But how about logs? Often, we do our best debugging with logs But they’re usually a luxury available at development time, right in your development machine Now, that’s not the case when you’re using Crashlytics, because since the beginning of the product, we have supported custom logging And here is how it’s done It’s a one-liner You call a function and you provide your log string And that log is uploaded, just like Custom Keys, with the next crash to our servers And where are these logs available? Well, like Custom Keys, they’re next to your stack trace So now you can have your application state You can have your Custom Logs And, just as always, you get your stack traces, too So debugging these edge cases, hard-to-reproduce bugs, is getting a little bit easier But suppose you’re experiencing crashes and you haven’t yet instrumented your code with Custom Keys or Custom Logs Well, if you integrate your Google Analytics SDK with Crashlytics, we can capture automatically predefined analytics events We call them Breadcrumbs, and with no code modification at all, they’re captured, and they show the actions taken prior to the crash Now, if you’re using Google Analytics, you’ve probably already defined custom events with finer details specific to your application And these are also captured And like Custom Keys and like Custom Logs, they’re shipped to our server with the next crash And the theme repeats Next to your stack trace, you’ll also find your Google Analytics events And you can use these Breadcrumbs to understand the events which led up to the crash For example, maybe you have an event which shows the provisioning of a shopping cart, and maybe you have a Custom Key which shows how many items were added to that shopping cart So now, rather than imagining and reasoning what might have occurred prior to a crash, you can do what some of our other users do They go to the device locker They grab the device And they use the Custom Keys, the Custom Logs, and the Google Analytics Breadcrumbs to repeat the steps that the user took prior to the crash It helps them reproduce these bugs So I hope you’ll give it a try Instrument your code– Custom Keys Add Custom Logs And integrate your Google Analytics SDK with Crashlytics And then we can help with these debugging efforts SHOBHIT CHUGH: So Breadcrumbs– I mentioned I was feeling a little hungry earlier, and those Breadcrumbs sound really yummy right now Can you get me some of those? MICHAEL REED: I’ll hook you up I will bake you a loaf of bread SHOBHIT CHUGH: Thank you Thank you MICHAEL REED: My pleasure SHOBHIT CHUGH: Now, even with all these tools, you may still have a fair amount of manual work to do, which you could automate And we could try to tell you how to automate it, but why? Why not bring onstage a customer who has automated their release processes? So please join me in welcoming Mattis, product manager for the release team at Spotify [UPBEAT MUSIC PLAYING] [APPLAUSE] MATTIS CASTEGREN: All right Hi My name is Mattis Castegren I am the product owner of the Spotify release team And we are the ones responsible for releasing new versions of the Spotify app for Android and iOS And I will share a little bit about how our journey has been and how we use Crashlytics But I wanted to start by just sharing some numbers, just to set the scale And I think the first one may be a bit surprising to some of you– and that is that there are more than 65 teams that contribute
to the main Spotify app Each platform has more than a million lines of code We release new mobile releases every week for both Android and iOS And of course, when we do it, that version is rolled out to more than 200 million monthly active users So I would say our biggest challenge in the release team is to do all this in a way that isn’t stressful, either for us as a release team or for our developers So the release team was started three years ago And one of the first things we did was we did an inventory of all the manual steps required to release our app And this is an actual picture from that meeting We basically filled an entire whiteboard with all the things you had to do, from signing and uploading builds to monitoring tests, pinging teams with open bugs, sending status emails– all those things And then, over time, we started automating these things one by one But one of the really big manual steps eluded us for a long time, and that was crash monitoring So we used Fabric for this And for a long time, whoever was the release manager had to log into Fabric pretty much every day to see if there were any new crashes on our alpha or beta builds And of course, this was tedious But it was also stressful, because this was one of the few times where you could mess up as the release manager Maybe you were busy You were fighting other fires You forgot to look at Fabric for one day, two days– well, maybe there was a big crash And maybe we lost two days of debugging, or had to cancel the release or something So that was a big issue And that is why we were really excited last year when we could go over to Firebase Crashlytics And there was exactly one feature that I was really, really eager to start using, and that was crash data in BigQuery That had been my dream feature ever since the release team was formed And a quick note here, when we talk about big data– in order to protect the privacy of our users, we always disable data sharing for all the crash data we collect But anyway, having direct access to the crash data has been a game changer for us in the release team We let Crashlytics handle the gathering, the grouping of crashes, the visualization of individual crashes But apart from that, our automatic tooling now does everything The release manager never has to log into Firebase Our tooling will monitor incoming crashes for alpha and beta builds We have exact rules for when to file tickets And we can even use the stack trace to assign those tickets to the right teams And I want to end with a quick screenshot of what we’re working on right now– and that is to take the crash data from Crashlytics and actually integrate it into our own tools So this is a screenshot of our own tooling, where you can list the crashes for one specific team, and it is shown in the same tool where that team would look at back-end requests or test results And to me, that is the power of Crashlytics and Firebase You have a UI with a lot of really powerful features that works for most apps and most use cases But as you grow, you can start writing your own tooling and take full control of the automation any way you like And that is what I have Thanks for having me I’ve included my ancient Twitter handle I haven’t used it in years But feel free to reach out if you want to discuss releasing a scale Thanks for having me Back to Mike MICHAEL REED: Thanks, Mattis [APPLAUSE] Love your app In fact, I use it all day long Identifying broader trends– what do we mean when we talk about broader trends in the context of app stability and app quality? Well, as we know, the landing page for Crashlytics shows crashes sorted by impact, highest to lowest But is fixing crashes one by one, top to bottom, necessarily the only way to pursue your bugs? Maybe not It’s a good start, but you may be missing an important quality signal if it’s the only way you visualize your crash data, and the only way you prioritize which bugs to fix So if not this ordered list, then what? You have all this data How can you best visualize it? You’ve got pie charts and line graphs Now, I’m no data scientist But I believe that’s a lampshade That’s a lampshade, right?
OK This is where BigQuery and Data Studio can come in Firebase products allow you to export your data to BigQuery, and BigQuery lets you run custom queries on that data And it also enables integrations into in-house custom workflows, as Mattis just showed us Now, Data Studio is a dashboard and reports builder, which is back by your BigQuery data So you get a data warehouse and a UI builder provision for you with just a few clicks And this is the BigQuery IDE And as developers, you’re going to feel quite at home there You write SQL queries, and they’re linted, and they’re validated real-time against the Crashlytics schema But most importantly, you’ll have the opportunity to run ad hoc queries on your data in a custom manner And so that’s great You can write SQL queries I can write SQL queries But how do we present our learnings to our teammates? Well, that’s where I would use Data Studio With Data Studio, you can build visualizations And you build them graphically, using an IDE And you’ll enjoy all the features you’ve come to appreciate from Google Docs, like real-time collaboration, and version history, and the like And Crashlytics has helped you get started So we’ve created this template, and you can clone it And then you connect it to your BigQuery data source And once you’ve cloned it, you’re dropped into this IDE And you start with a real dashboard backed by your real data– and, in fact, it looks a lot like the Crashlytics user interface But it’s easy to customize You can add new visualizations You can choose charts from the pull-down Once you’ve chosen your charts, you can populate them with data by choosing dimensions and metrics graphically from the user interface, shown on the right And there’s a style panel there, so you can round your corners and touch up your color schemes And there we have it So we’ve added a pie chart, and it shows that, in this case, 21% of our crashes are caused by the embarrassing NullPointerException So we have some friends who develop an app They have a nice, large user base And they use the default issue list, as we all do But they were also interested if there were broader app quality issues spanning across their issues And they found one by using Data Studio and applying the technique we just talked about They added a pie chart, and it showed their issues grouped by exception type And there was a surprise waiting for them– more than half of their issues were caused by out-of-memory exceptions Now, they knew their app was using memory, of course But they didn’t know how much They were allocating heaps, and maps, and lists, and queues– you name it And it all adds up And by using BigQuery and Data Studio, they could visualize the impact So what would you do if you learned that half your crashes were caused by memory usage? Well, our friends, rather than fixing issues one by one, top to bottom, they switched focus And they targeted reducing their memory usage They had a few targeted sprints focused solely on reducing their memory footprint And then they were back on task So with BigQuery and Data Studio, you can perform custom queries, and you can visualize your data, and perhaps uncover problems that wouldn’t be apparent under a traditional view of independent issues SHOBHIT CHUGH: Thank you, Mike MICHAEL REED: You’re welcome SHOBHIT CHUGH: I really like the lampshade And the one in purple– it matches my shirt perfectly So can you get me one of those? MICHAEL REED: I can hook you up Thank you SHOBHIT CHUGH: OK Thank you Cool So, to wrap up– we’re here to tell you that there’s no need to sacrifice your development philosophy How quickly you ship, even as your app grows– with Crashlytics, you can keep an eye on important issues via notifications so you can ship faster with confidence You can reproduce and fix edge cases faster You can start to automate release monitoring And you can identify and act on broader trends, so you can go and fix your crashes faster I’m Shobhit MICHAEL REED: And I’m Mike Reed SHOBHIT CHUGH: Thank you very much for joining us today If you have any questions, we will be in the AskFirebase area back over there somewhere I hope you enjoy the rest of the summit, and have a great day Thank you [APPLAUSE] MICHAEL REED: Thanks [UPBEAT MUSIC PLAYING]