Let’s build a JavaScript Engine in Rust by Jason Williams | JSConf EU 2019

Just another WordPress site

Let’s build a JavaScript Engine in Rust by Jason Williams | JSConf EU 2019

Wow I mean, thank you by the way to JSConf EU for having me here It’s an absolute privilege to be at the last JSConf EU of its time Can you guys give it up for JSConf EU? I can’t believe it [Applause] I cannot believe it So, you know, biggest JavaScript conference in Europe, I think biggest JavaScript conference in the world Going to be talking about building a JS engine No pressure at all! It will be completely fine! I should probably introduce myself I’m Jason Williams I’m a senior software engineer at Bloomberg in the UK in London, dare I say Europe? We’re still clinging on I used to work at the BBC as well Are there BBC people? Here? Yes, wow! I’ve also been involved in sort of TC39, so I’ve been working on the proposal over the past year which allows you to get on board all of your promises, regardless of the state whether fulfilled or rejected, and you have control then to do what you want with them On the Rust side, I’ve also been involved in Rust as well I used to be the maintainer of the Sublime Text Rust plugin I’ve since moved into the Rust DevTools team, and they focus on sort of ID plugins, tooling, and then the Rust language server as well which is pretty cool which is a single process that handles things like debugging, hinting, and all the editors can make use of that So I thought doing some specifications with JavaScript, really interesting, you know, working a little bit on Rust as well, my two favourite things, how can I marry these two things together? My two languages that I love so much? How can I bring these together? And by the way, the biggest JavaScript conference in the world, I know it’s a ballsy move talking about Rust, but I’m going to go with it Bear with me I thought why not build a JavaScript engine written in Rust? Take the fun and accessibility of JavaScript and then power that with Rust, and then see what we can get? It was a bit ambitious It was quite checking I actually started off by looking for a place where I could contribute to this Nowhere existed, like no-one had built one in Rust before, at least two years ago, anyway There was one single project called js.rs, and the maintainer abandoned it It was six years old I tried to build it, and there were 650 errors It was basically because he made it at a time when Rust wasn’t 1.0 and so the syntax had changed and moved on, and it was faster to build one from scratch than it was to try and get this working That’s exactly what I did I’m not an expert by the way in parsing, or writing compilers, or any of that I mean, I even struggle to read CSS in the morning, so I’m not an expert in parsing source code, so I thought let’s start from somewhere I thought it was a jump Doing small things in Rust, and going from like “hello, world” to full-on JavaScript interpreter was probably a bit of a jump I probably should have done something in between like a to-do MVC, or something like that I knew I was on to something here that I had a good language, something to work with There were a couple of reasons that I wanted to use Rust to build this engine One was control One of the things Rust says to you we’re going to give you the control you can have plus all the safety And so I can utilise a sort of close to the metalness of Rust and do the things I need to do, and then the compile time means I don’t fall over myself It was quite fast, so it is built on top of LOVM which means the years of the LOVM compiler they’ve done, Rust can tap into that and make use of it Sometimes, you can get faster CodeGen which produces C. There’s memory safety as well Rust has an ownership system which is quite unique Basically, if you were passing a value into a function, the function is now the owner of that value, so there can only be one owner to a value That means when Rust wants to get rid of data it can do so with 100 per cent confidence because nothing else is pointing to that thing You don’t have dangling pointers, or anything like that Currency, I might want to make use of this at some point Again, because of the ownership model, I won’t have to worry too much about multiple threads,

trying to access the same value, and then what I need to worry about data races All of that is taken care of at compile time which means at run time, I’m not penalised on performance So, yes I thought it’s a good start Let’s get going So I started About two years ago, I started, I made a Git Repo, an engine called Boa Someone asked me why I called that? An evil boa in a zoo in Australia I thought I’m naming my engine after that I’m going to start with a high-level view here So the architecture that I’m going to go with for now is going to be a few things missing, for those of you who are familiar with compilers, but the view that I’m going to start with is taking some source code, and that can be from, say, a network, or in this case reading from a file, basically Taking the source code, bringing it into as a string buffer, so just imagine a big string of JavaScript And I’m going to use the scanner that I’ve made to break that down into some various tokens And these tokens, they’re basically like groups of characters that are bunched together to have some sort of semantic meaning In this case, we’ve got function as a tone, or fu, or the opening parenthesis here We are going to send them on to the parser operation These expressions are different Expressions are hold other for example s If you’ve got the opening of a function, you’ve got function declaration The block inside that function, we’ve got some variable declaration, so the variable declarations would be children to that function That’s generally the idea Let’s just step into that a little bit closer I’m going to take that line of JS here I showed this to someone earlier, and they were like, “You should have used const!” Sorry, I forgot I’m going to start with Let Imagine you’re scanning this, so the thing that reads the file has done its job now and it’s going to pass this through The first thing we are sitting on is the Let We will produce a token from that The token in this case is our object We’ve got the value We want to keep hold of the value That’s useful We want to make a category for the tone In this case, it’s a key word The reason I know that it is a key word is because I took a look on the spec, grabbed key words and codified against that So, when I take a string like let that’s got no quotes at the beginning, I know it’s going to be either an identifier, or maybe even a token, so what I start off by doing is checking off against my list of key words, and I know that this one is a key word Conf does the same thing but it doesn’t exist as a key word so we assume it’s an identifier We are keeping track of the position The compiler doesn’t really need that per se, but it’s good if you’re debugging and you can output that there’s a problem on this line, or if there is a token we’re not expecting, we can say there was an issue here, and it’s on this column, this line number Punctuation gets its own category Again with, we want to keep the value and where we are and so on So here we’ve got Some punctuation, a string, and an identifier, and a key word We’re going to place this into an array, and then pass it on to our parser This makes it easier for us to deal with when we’re traversing through the source code If you want to look at Rust, don’t worry about understanding the syntax or anything like that, but I make use of Rust pattern-matching Think of Switch Case It is similar but more powerful The idea is that you can take a character, or anything, actually, and then the pattern-matching allows you to do some destructuring on that You can say is it this or this? You can do bindings which I don’t have an example about here, or we can say is it a digit, or is it alphabetic? This this case, we’re on let, so it is alphabetic, so we go into a state of trying to parse and an identifier And then we take this, put it in a loop, and of through each character in our source code and you do the same thing With each character you land on, that’s a state It’s a state machine going on If you’re in a quote, it’s a state of parsing a string If you’re in a character, it’s parsing an identifier and a key word, and so on I’ve got a demo This is something I did earlier I was going to do it live, but Paul Irish is around, so I just wanted to make sure that

he’s not going to get interference, or anything! This is a quick writing conf, equals JSConf We will output some tones Very similar to what I showed you earlier We have key words, an identifier, and our punctuation, et cetera This is now ready to go We’re Aldershot Town done in this area With parsing, we’re going to sort of take that array of tokens, and we are going to try to build a tree structure now So we are going to start building expressions from these tokens The way we do that is similar to how we were iterating for the tokens as well With the characters, we had some pattern-matching going on We’re going to do the same thing This time, though, we’re able to pattern-match on tokens Now we can say we are sitting on a token that is representing a key word, or representing an identifier If it is a key word, we can say this is the key word that we are currently sitting on A quick side note about pattern-matching It’s been so successful in Rust, it’s had a lot of success that there’s been, there’s been a proposal in stage one to bring pattern-matching across into JavaScript Keep an eye on that In this case here, we are sitting on top of an “if”, and we know our next token should be an open parenthesis I’ve made a utility function here called expect punc, expect to have a premises or error out at that point We can then parse our condition This is between the parenthesis, and we can make an expression there, and we can do the same thing with the “if” body, and that is the expression underneath “Next” there chemicals it if the token is “else”, which is optional We may or may not have angles If we do, we parse that as well After that, we’ve got three expressions: the condition, the body, and then potentially the else, and we add those expressions underneath the if “if” expression, so you can start to see the free structure we’ve got Don’t worry about the syntax of this, but this is implemented using an enum The expressions are what they call variants Each variant can hold another expression You get the recursive structure, and the if variant holds an expression which is a condition which is another expression that might hold other things The if body, and we wrap the “else” in an option which basically means we may or may not have a value there It’s basically a way of way of us checking to say is there angles? If not, we move on and we get a non-value The box you can see as well, so these are like the types and the box is basically saying that these are going to live on the heap And it will manage that for us So, take a look at this I’ve added a bit more code now We had a let const I’ve added more around that If we started with our function, we get a function declaration expression And as we work our way down, we get a let const, and that generates is a “let” generation expression underneath the function, and the const expression is just a little, so that is basically saying it’s a value that we don’t need to evaluate, it’s already there That is also a child of the let declaration The reason for that is that you can declare variables without defining what they are In this case, we are defining it as well So we go down to our “if”, and we have our if expression which holds an operation, a condition, a then, and angles As I said earlier, the else was wrapped in an option, and we didn’t use else in this case so we just return as non Then we go into our if body which is the block expression, and we have another let declaration, and that also has a concept underneath that as well The last thing is the return which gives us the return expression So the source code that you see on the left is what would generate the tree we have on the right Again, a demo, so, hope you can run this And then there we go Yes And so there is a lot more going on there, and that is because I can only cover one more on one slide, but there are more expressions In this case, there’s a block expression at the top, and that represents the global scope, and you’ve got the function declaration underneath But you kind of get the idea So, now we are going to try and evaluate the tree that we’ve generated

And there are a couple of things that we need to do and bear in mind The idea is that we want to walk through the tree and then make decisions based on which expression we land on So, in this example here, we’ve got a let declaration expression, and the JavaScript underneath is just an example of what that has generated, and so, on the right, we’ve got the Rust code What I decided to do was to represent JavaScript values in an email, and so we’ve got the sort of values that you can have on the right-hand side A symbol is missing PRs are welcome But we’ve got null undefined, boolean string number, integer, object, and function Some people wandering why we have integer if all numbers are 64, 64-bit floating points? If you’re doing stuff around bit-wise operations, we need to convert them to a number, do the operation on that, and then convert it back into a floating point So it’s kind of used internally Then we take a value and wrap it in a GC so there is a great GC library that somebody made and it was super useful, and the reason I needed that is because in JavaScript, you can reference one value for multiple functions, and so that value goes out of scope You need to make sure that it stays alive, and so the idea here is I can take a value that I’ve made, put it in the GC, and that GC will automatically handle not garbage-collecting that until nothing else needs it, or at least until it goes out of scope and nothing else can possibly point to it Manesh worked on that He did a great job on that I have my value data which is a string, and I pass my string in, and I now have my value, which is great We need to put the value somewhere So I started off by using hashmaps The idea was that I had a naïve approach which was whenever we come across a function, create a hashmap Anything inside that function, I will throw in there This is to get going It got me surprisingly far As you saw the tree earlier, every time I stepped on to a function declaration, I would step on to a hashmap Every time I came across a let declaration, I would add the value straight in Real engines do more than this, but if you want to get going, it was pretty effective You can see here, though, that quickly it start to fail, because we have our B here that is in same environment as the conf and the B should probably be removed once that block expression finishes So I realised I do need to have more environments than just the one for function Took a look at the spec The specification actually points out five of them, and it calls them records I’m not going to read them out, but you kind of get the idea And is to I then took the environment section of the spec and basically implemented that It was not easy It was not easy at all But basically, what I do now is I created function records, I created block scope records, global scope records, and even modular records, and then the idea is every time I made a new record, I would add metadata to add value, and I would give it a parent property as well, so that I could point to the record I made of previously and then you start to have a sort of chain, the scope chain, where we can do our look-ups If you go back to our tree, the idea is that I make a function declaration, and I make a function environment I have a declaration, I add my value in there Have a block expression, I make a declarative environment, add my value in there So far, so good Return is interesting, because the Conf is not in the turn the current environment I’ve just made, so I have to make use of the patient property to do a lockup In this case, I’m trying to find where const is I’m saying can you give me the value for const? It doesn’t find it It gets it from the function declaration scope and gets it from there This is basically the scope chain that you probably heard obviously people talk about You can actually put your code into various tools that show you the tree, so there’s AST Explorer I think it’s astexplorer.com or dot net, and you can put your view in there and see how it would be expressed as an abstract stream

With all of this Let’s give this a try I’m going to console.log it out It implements the console.log which prints the value given There we go And so we have JSConf EU Thank you! [Applause] It’s pretty fast as well Yes, there we go Got it working To begin with, I got small things working, then I slowly started adding some other stuff, so I think this week, you know, it can do let and comps, a bit better, arrow functions were easy to add Once you get the building blocks there, it becomes easier and easier to add more stuff A couple of performance opportunities: so one is maybe don’t parse everything Most functions are declared Don’t end up evening being … so it’s not worth parsing every single thing you come across because that can slow things down Spider Monkey and V8 do lazy parsing and may not add it to the syntax free straightaway and make a note of it and parse it will have before it is ran and parse lazily It will check through functions but not actually add them to the AST That’s one tip Don’t parse functions that aren’t going to be called All the JS engines in use today already do this Another opportunity is to make use of the concurrency in Rust, so I can actually break these tasks apart, have them run on separate threads and pass data between each etch other Got a scanner, pass the tokens across into the parser, because a scanner doesn’t need those values anymore, and it doesn’t need to hold on to them I can go further than this I can go further and have a work er reading the files into the parser This is something that script engines do today It’s called script-screaming The new WebAssembly cider monkey does this As the script is downloading, it can start running straightaway V8 does this already as well A couple of good things – implementing JavaScript, I need to do that JIT compilation, hopeful working for us with a good package called Holy JIT – I like that I will do more stuff on the event loop, and more tests around that, and United States advertising tests 262 That’s it I’ve brought JavaScript to the Rust ecosystem People in the Rust world can make use of it if they want to tokenise It’s quite modular, if they want to have a parser in the Rust world, they can do it They can use it They have some JavaScript Then I thought have I married my love of these two things together? I brought JS into the world of Rust Can I bring this piece of Rust into the world of JavaScript? Yes, I can! There’s WebAssembly I can bring this whole thing into the browser I know! I know! The Rust team and the WebAssembly working group have been working together for quite a while now to make Wasm really awesome especially via Rust, working on great tooling over the past few years Why not take of those One of those tools is Wasm bind be and let’s take that and use it with Boa, bring it into the browser, running JS The way that works is pretty peaceful You take WebAssembly, bring it on to a place where you want expose Something like this, exposed as a ES6 module I thought let’s add Wasm into that, build my package It will generate WebAssembly, and I can use that There are some advantages to this It is quite fast When the browser pulls it in, it can run it straightaway But it’s compiled, it’s more compact, so you’ve got a faster download because you’re downloading less code There’s still the safety aspect there as well because it follows the same rules, like same origin and browser limitations, so you don’t get the ballot box making requests where it shouldn’t be I can expose it to JavaScript I can use it as though it’s a JS module

It’s actually Rust I’m importing there It’s not JS I can pass it a string And I can run it That’s the Wasm attribute You put that on to be of the function of any function you want to expose in the JS world and it will build your package into a Wasm file and then export it Like this I’m going to build it I’m doing Yarn serve because there’s a WebAssembly plugin That’s going to build that out Take a look on the left It’s going to generate a package folder It’s pretty quick I can make changes to Rust code and that are long-run, and the browser will reload It’s like I’m writing JavaScript, basically Now we have a Boa bind gen Wasm file, and you can import that into – we’re going to write it, scan it, parse it would be evaluate it And then we’re going to bring that back into JavaScript, and then basically print the value out I’m going to try to do something challenging Over time, it will get more challenging, but this is the most I can do right now Hopefully, we get – we can return “hello”, we can do plus JSConf, second function Are you ready? Oh! Come on! [Applause] That’s what keeps me up This is why I’m in the industry This is that’s Rust running there There’s no network request That’s Rust running in the browser executing that JS, scanning it would be parsing it, evaluating, and the JavaScript is there as well They work hand in hand The takeaway is you can take the … performance-sensitive parts of your application, build them in Rust, as and you only need to change those bits, not the whole application The next time you’re thinking of building a JS parse er like this one, or a game engine, it was super easy and worth doing It’s definitely worth thinking about You can actually play with the demo as well It’s on that GitHub repo You can go with there and play it live Also, there’s the Wasm book, and people here will be talking about WebAssembly as well