CppCon 2018: Gabor Horvath “Dealing with aliasing using contracts”

Just another WordPress site

CppCon 2018: Gabor Horvath “Dealing with aliasing using contracts”

– Welcome Does this work? Everybody’s hearing me okay? Great So let’s get started This talk is about aliasing but the title might not be the most appropriate because actually it is about the lack of aliasing So this talk is mainly for those who are interested in performance and to look at the assembly outlook from the compiler and also for implementers because I’m looking for feedback So since this is a half an hour talk, a lot of things I will say is actually a simplification If you wonder what else is there, find me offline And if you have any question, feel free to ask but use the microphone in front in the middle Okay, so let’s talk about aliasing Let’s check out this function that we have two arguments by reference, A and B We write to both of these arguments and return A And if we are looking at the LLVM IR which is produced by the compiler, we will see that if you return the constant to So actually, we write twice into the memory and then the return value, we will not do any memory operation But if we look at the slightly different example, there we will take two reference to integers The compiler will generate slightly different code where we will write into both of these arguments and then read back the value of A from memory and return the memory, return the value that we just read So there is one more memory operation here and the reason is that because the code that the compiler generates have to work in those circumstances than they pass the same argument twice to this function So basically in that case, the second assignment will overwrite the memory And in this case, we will still want to return the right value So since the parameters might alias, the compiler needs to be conservative and needs to generate code that works in this case But if the developer already knows that aliasing will never happen, and if the developer had a way to tell the compiler the absence of aliasing, then the compiler could generate more efficient code And one way to do that is to use types So the reason why we didn’t have this load in the first function because their arguments had a different type The arguments had different types and we cannot access the same memory as integer and float at the same time in the standard C++ So basically this gave the compiler a hint about the absence of aliasing So why do we care? One reason is that memory operations are usually slow, much slower than other operations Even if the value that we are going to read is in a cache Sometimes it’s slow but not just, the reason is not just the memory operation but the more important reason is that what the compiler can do So when the compiler applies some optimizations to our code, the compiler have to prove that this optimization will not break the semantics of the code And a lot of the optimizations are only safe if the compiler can prove that two pointers may not alias One of the optimizations, for example, that requires alias analysis is vectorization Another is invariant code motion So these optimizations can do a lot Sometimes, you can get a very huge performance increase from these transformations So it can be very substantial in performance critical code to utilize this If we look at other languages, for example, Fortran Fortran will have a different assumption So Fortran will assume that no aliasing is going on

with the arguments And using this fact, it can generate faster code than C++ But of course, that code is not correct if the user is passing something twice So one of the reason Fortran does that because when Fortran was born, the CPU time was very expensive So in order to convince people to write code in Fortran instead of Assembly, the compiler needed to generate very fast code And of course, C has this keyword called restrict or qualifier So C also has a way to handle this but unfortunately, in standard C++, we don’t have such facilities So let’s look at an example that is written in C So here, we have a matrix and we will summarize the rows into a vector And this loop will not get vectorized because the compiler cannot prove that this pointers may not alias If we decorate this function with the restrict qualifiers, then this code will be vectorized and it will be much faster So this is how it works in C and the question is can’t we just use the restrict qualifier for C++? Isn’t it good enough? So one of the problem is that the function we had earlier is very C-like And if we were to rewrite this in C++, we would end up having something different For example, something like this And how can we annotate this function? So what should we do? If we just decorate vector with restrict, what should be the semantics of that? Should we apply restrict to all of the pointers we get back from vector or only a subset of them? So the problem is that we are no longer using rogue pointers but we have user defined types and we don’t have the granularity to tell which part of the user defined type is affected by this restrict qualifier and in what way So in order to support this, we need something different So let’s look at this code example to get some intuition what we could do instead So for example, here we pass the same object twice through a function that accepts restrict pointers So this is clearly an error and will cause some problems So the problem is that when the compiler generates code for the function F, it has some assumptions and those assumptions are not met by the caller So basically whenever we write restrict qualifier for functions, we make that function harder to use We impose some burden on the caller because the caller has the responsibility to pass the right thing to this function And this is basically a precondition Only if we had a way to write preconditions in C++ Well, soon we will have a way to do this So let’s explore what can we do with the current proposal and what extensions we might need if we want to generalize this to get similar facilities to the C restrict qualifier So if we look at the same example again, then we can add precondition pretty easily to the function that is requiring the caller to pass through separate objects to this function And this way, it is clear from the function definition that the function called f(x, x) is undefined This precondition is well documented And if the caller has code like this, we have two mitigations We could either have runtime checks if we use contracts in a way

that we ask compiler to insert runtime checks Or we can use this (mumbling) static analysis tools that can help us catch this kind of failure So while we have the new way to introduce and define behavior, at the same time, we have a way to mitigate the problems this might cause Okay So this was easy, but what if we have a slightly more complicated function that is, for example, merging to erase with the same size Some of the constraints I have on these slides might seem silly but the reason why the (mumbling) have the same slides to fit all the concept that’s on the slides So what can we write in the expects clause there? If anyone has any idea, let me know, but as far as I know, there is no standard way in C++ to express that these two arrays must not overlap Because we could write such code, but that would be user undefined behavior (faint speaking) So the comment was that using std::array can help us– (faint speaking) Ah, I see, I see So the comment was that we could pass the whole array rather than passing a point or two on array And in that case, we know that– (faint speaking) Mm-hmm (faint speaking) Okay, thank you for the comment Okay, so my approach was slightly different I was looking into how can we extend the language to support this use case as well And I came up with a kind of a non-notation function that has no meaning in terms of program execution but it has some semantic meaning that it conveys to the compiler So here, disjoint (a, b, num) means that That these two arrays that are consisting num elements are two non-overlapping regions of the memory We might have some combinatorial problems if we use this interface exactly as is because if we have more arguments like A, B, C, then we would need to declare that A is disjoint from B, B is disjoint from C, A is disjoint from C So if we go through this direction, we might want to have a (mumbling) version of this notation And a slightly different option is to say that A and points into unique memory region and these two preconditions are not (mumbling) because, for example, if we have one big array and we would like to pass two sub-arrays, then the first one is a suitable precondition for it but the second is not So these are two different kinds of assumptions that a compiler can make Okay, so what about user defined types? So if we would like to express that some numbers are disjoint, we are free to do that That works pretty well If you would like to do that for the return values of some methods, that’s also easy But the problem is what if we need to pass arguments through these methods So should we use some dummy symbols? These dummy symbols should be existentially or universally quantified? So these are not easy questions So is the first thing this direction, I would recommend something different So whenever we have something like a vector, we sometimes have a helper or a view into that owner like a span So what about having a special version of span with a different set of guarantees?

So if we had a span that would give the user the guarantee that all the memory accesses are, all the memory that’s been referring is unique for the caller, then the compiler could optimize upon this assumption And here is an example So here, we have a square bracket operator of a unique span and we have a notation regarding the return value And this notation might be surprising at first So basically, we did an ensures close We have an identifier That identifier is the identifier for the return value And then, what this post condition says basically, whatever we return is unique for each unique span object and index So the reason why we need all this in this notation is because if we pass the same index through the same unique span, of course, we will get something So we will get aliasing memory location So in order to get something that is not aliased, we need either the indices to be separate or the unique span objects to be separate So this way, the compiler can analyze whether we are using this square bracket operator on the same objects with different indices or on different objects with the same indices And this can have the compiler to be derive facts about aliasing So if we had a unique span like this, we could rewrite the previous code example in a way that the compiler could reason about the aliasing and it will still look like C++ So some of you might wonder isn’t that a lot of work to have a completely new type just for having a non-aliasing view into an object And for that, I have some questions So who thinks that these three functions should be considered the same? Who thinks that these three functions are fundamentally different? Okay, I think this is the majority If you don’t look at the function but we only look at the signature of the function, (mumbling) contracts, who thinks these function signatures are or should be considered equivalent if they are in three different code bases? Who thinks we should consider them to be separate? So I think this is something similar So the reason why we should consider the unique span to be a separate type than span because we have different set of guarantees and the compiler has a different way of reasoning about them So this is the reason that I think justifies why we should have a different type But having a different type also has some advantages that I will come back to later But why can’t we just do some annotations in line rather than defining a new type? And my argument for that is that we would need to annotate a lot of methods meaning span, for example, and not just span itself, but also some other types that are related for span For example, if we have iterators So I think it is not feasible and clean at all to do all this annotation at the function signature We don’t want to have even bigger function signatures than we have today with all the (mumbling) trickery So I think having them as the separate types is much cleaner Also, it has some other advantages like we can have vocabulary types that not aliasing and this is some kind of encouragement to use them sparingly And each time we see a function is using such a type as argument, we can see that this function is special in some ways And also, it having separate function types, it can do overloads and overload resolution

So we can have an algorithm that is doing the more efficient thing with a non-aliasing view and we can have a general one that works with the aliasing cases So in this case, the user can decide what guarantees can she give And this way, it is not just a burden on the caller but an option to either choose the more performant better with the more narrow precondition or choose one with the wider precondition that is likely to be less performant And the other question might be C++ is already hard enough Why to reason about aliasing as well? And my argument is that you already have to reason about aliasing all the time because we have functions in the standard library that have aliasing related preconditions For example, we cannot use std::copy with developing ranges And if you are using C, we need to decide whether we could use memcpy or memmove to copy (mumbling) region depending on what the aliasing guarantees we have So we already have aliasing created and defined behavior in the language but we do not have the mitigation for that Supporting these kind of annotations, of course, would not give us the mitigations immediately because we would need kind of sanitizer like an aliasing sanitizer which is a hard problem to solve But having these annotations could help us develop such a tool Okay, this looks like I’m doing well with time so I have some bonus slides as well and I will come back to the related work So let’s look at this code example This is a loop that will not get vectorized by any of the major compilers at this point And the reason is that the compiler cannot be sure that the num will not change during (mumbling) iterating So if num is aliased with A, then whenever we assign to A, we might update the bound of the loop And this way, the behavior of the loop might change But if we pass the bound by value, then this will be vectorized So you might wonder who writes code like the first one? We always pass values like integers by value rather than by reference But if you ever wrote templated code, I think this rings some bells So basically, as a result of template associations, we might end up having code like this which can hurt performance So even though we are advertising C++ as a quiet performant language where we have the full control, we also advocating patterns that are sometimes not efficient enough Of course, if someone wants to have full control, could use templates (mumbling) and this way, can get rid of this problem, but it is a lot of work So we could, of course, have a template, not a problem, that decides for each argument whether it should be passed by value or by reference That could help a bit but that is only local optimum To determine which arguments are best passed by value and best passed by reference, we have a lot more things to consider like the number of parameters, the types of these parameters, how many parameters are there Actually, if we could give the compiler more freedom to figure out what’s the best way to pass arguments, it might help us with the performance of argument passing But of course, if you also need to have some kind of common ABI to make actually linking things together work Okay, so Basically, that’s it and these are some related works if you are interested And if you have some questions, feel free to ask them (audience applauding)

– [Male Speaker] These are difficult problems you’re looking at I know that in the linear algebra community and that, what a lot of the approaches have been is that they build up tokens, proxy objects, whatever you want to call it, and the idea is that they load the data into storage and they execute functions Basically, everything is hand optimized and, you know, might not even be C++ at all The computation might just be C with restrict and stuff like that or they might be Fortran actually But the real point, one of the companies who was here yesterday, is that commercial libraries provide almost all of the way to do these problems in a lot of engineering applications They’re so hard to try to deal with in a compiler that most companies would just soon bite the bullet and pay for a commercial library software to rent it, or use it, or whatever I’m not sure how feasible it’s going to be to actually try to deal with these problems in a compiler compared to those other techniques That may not be what we want to hear but it might be better to try to focus on things that we can actually do or the aspects of what we can actually do and let the other approach win even if it’s inelegant and, you know, not free That’s the other thing is the free sourced software movement is really big, but in engineering commercial software, it’s exactly the opposite – Mm-hmm. Thank you – [Male Speaker] Hey In slide 11, do you mind looking at slide 11? Maybe I don’t understand the syntax of how the contracts work but can’t you just check that the A plus num is smaller or bigger than B Just check that inside the expect – Okay, so the question was that there are in the first line, you mean? – [Male Speaker] Yeah – So could I write something like A plus num is smaller than B, or? – [Male Speaker] Yeah, just check the ranges and stuff like that So what’s wrong with that? – So– – [Male Speaker] And check the null pointer obviously – So doing that is currently undefined behavior as far as I understand And the interesting thing about this is that checks like that is being (mumbling) by the compiler a lot But the user is not allowed to write something like that – [Male Speaker] Okay – Maybe this is something that could be fixed in the standard, but my current understanding is that writing that down in your code is undefined – [Male Speaker] What exactly is undefined behavior, if you don’t mind? – So the standard says that if you have two pointers that are pointing to different objects, comparing these two pointers is undefined – [Male Speaker] Okay Okay, thank you – [Male Speaker] Hello First of all, pretty cool that you are looking into this problem The restrict discussion kind of reminds me about like volatile, and specifically, my background in counter programming is that we, in the counter, volatile isn’t allowed to be used anywhere in the counter And whatever like needs to do a volatile read where the compiler needs to guarantee that you actually read or write to memory, that particular instance of excess has to be cast to volatile and then has to be dereferenced And so I wonder like, I think what is important here is that they tried to encapsulate the temporal aspect of like we are just like not, we are just volatile at this moment And I think this also kind of applies to the restrict discussion because like if you embed something in a unique span, there’s no notion of like temporal You can’t express it So basically, you make pretty clear to a programmer, okay, if I pass it in, it must be unique, but my fear is that basically, the return they use Will they correctly be decapsulated or will the programmers just like you, some afterwards as unique spans And maybe later on, they can very much actually be aliased And so I wonder if like you actually would generate too much of like non-reloading operations And my question is have you looked into like if you could make this

kind of a property of a block in some way? So you basically enforce it The uniqueness is ended at a particular point in time in the code – Mm-hmm I didn’t look into that yet but thank you for the suggestion – [Male Speaker] Hi Earlier you had asked whether some of us felt that the contract should be part of the function signature or not part of the function signature and I didn’t raise my hand for either of the answers as I didn’t come up with a reasonable thought in my head at the time I like the fact that the function accomplishes what it says it accomplishes And the contract itself may not be part of it, but I can see why virtualism and other things might want you to direct that to be part of a function signature I didn’t arrive on how I felt about it My question is with the way that contracts are going to be coming out, will they be considered part of the function signature? I’m unaware of what the answer was – So there are two different things So I think you do not need to repeat all of the contracts and all of your declarations So in this sense, it is not part of the declaration What I was trying to get at, if you see these functions in three separate projects, what’s your assumption about these functions? So basically, in the first case, you know nothing about the function In the second case, you know something about the precondition And in the third case, you something about the post condition As a developer, you think fundamentally different ways about these three signatures – [Male Speaker] Sure My curiosity would have extended to overwrites at that particular point and resolution when it comes to templates Okay, but thank you – Okay – [Male Speaker] Can I try to use something like (mumbling) run application with profile guided optimization and collect information about how each pointers can be aliased and then compiled Again, our program with this information and some compilers which like F unsafe alias optimization Yes, I understand that profile guided information is not, cannot be right in 100% But with Swish, we can improve (mumbling) alias mode – Yes, so actually, compilers already doing something called look versioning which means that the compiler might have the assumption that most of the time, the arguments will not alias So the compiler ends up generating both versions of the loop, the vectorized and the unvectorized And then starts a runtime check to decide which loop to execute So I think something like this could be augmented by looking at some kind of statistics from the profile, but I’m not aware of any project doing that at this point – [Male Speaker] Is there a way to ask the compiler to tell us if it’s unsure about aliasing in a specific function and just tell us hey, if you can restrict one of them, it will be great – So LLVM has something called optimization remarks and you can use it to get certain information about some transformations that have failed due to similar reasons This is, I’m not sure what the current state is I think they are constantly improving what the passes they have to report more and more optimization failures So you could definitely look at those and check out why the compiler couldn’t do a certain transformation But in some cases, you will not have the language feature to actually tell the compiler that it is safe to do this kind of transformation here – [Male Speaker] Thank you – [Male Speaker] Great talk Sort of similar to the question about profile guided optimization,

have you, I guess, compared aliasing results using link time optimization and seeing if certain functions have managed to avoid kind of obvious failures? – Mm-hmm Sorry, can you repeat that question? – [Male Speaker] So if I was confronted with one of these functions and it was doing something stupid like reading memory that I didn’t want it to read, I’d see if it was still reading that memory after I enabled link time optimization because at link time, kind of, some of these ambiguities should be gone Have you tried comparing results? No? – No, not yet – [Male Speaker] Okay, thanks – [Male Speaker] I just had another idea how to maybe express it and some other programming languages mostly from the dependent type area have a concept that you can basically pass an approve as an argument or as its own standing object My idea was basically to use like RAEE to basically go ahead and construct a witness that witnesses like okay, I have like two ranges that are basically unique And you can pass this proof as an additional argument down to the function and can use like a destructor to basically tell the compiler at which point the aliasing would stop – I do like dependent typing but I do not see that standardized any time soon I think most of the users of C++ are not used to this kind of work flow and also compiling C++ takes a long time currently and usually, dependent type systems, with dependent type systems, we have some very expensive algorithms in the type system So I could imagine that adding such a feature would make compiling large projects taking forever Okay Well, thank you for your attention (audience applauding)