Functional Programming And Formal Software Verification For Non Industry Applications
Introduction
One and a half years ago, I was reading an article on Medium titled “Why Object Oriented is a dead walking thing?”[1]. While I don’t fully agree with the article (OOP has a couple of flaws, but I still think that it is just misused often) it kept stuck in my head for a long time.
Shortly afterwards, I started a fascinating discussion with one of my university professors, in which he told me that I should give Haskell a try; just for the fun of tackling problems in a different way.
At that time, I often got told, especially by OO programmers, that “functional programming wouldn’t add any value to their work”. (I altered the original, often annoyed or passive aggressive statements.) As I don’t like being biased, I just gave the language a try; thereby falling in love with the functional paradigm.
After exploring this new world for a bit, I came across something called formal verification that was apparently used in the hardware industry for developing chips. I knew that chips were being developed using languages like VHDL or Verilog, which are essentially just programming (or description) languages. As chips always perform some specific function that essentially represents an algorithm, why not use this cool sounding technique for verifying that a chip works as expected for ‘normal’ software as well?
I found out that the field of formal software verification along with the functional paradigm was quite old and had had a couple of breakthroughs that actually made it usable. It made me wonder why these powerful things were predominantly used in academia.[2]
Therefore, I went out and tried to learn all these cool techniques I found with the notion of making them more popular in the world of software development. The rest of the article is for doing exactly that:
(Note that I still have a lot to learn, but at this stage, I can write about it in an understandable, not too abstract way.)
Application in the legal tech sector
For most people it is fairly easy to understand why for example hardware (which is designed by code in VHDL, Verilog and Co.) might be something that benefits from formal verification:
The designers have to be 100% sure that the chip they designed performs the function it is designed for in a 100% of the cases. The chip will probably be incorporated into a larger architecture and therefore needs to work reliable. A simple chip with very few inputs and outputs could of course just be tested for validity, but as the chip grows in complexity, we may find that we simply cannot test an infinite amount of input-output pairs, as we don’t have the time and resources to do that.
However, the same also holds true for most software that is being written.
Nowadays we perform a lot of code review and end-to-end tests (which is not inherently a bad system), but like with hardware, we cannot test software for an infinite amount of time.
Working a lot with people who aren’t programmers, I often get asked how I can be sure that the stuff I wrote really does the things it should do under all circumstances.
Using formal verification and functional programming when writing a library of just even a short but complex script, in addition to testing and code review, can be of enormous value when one has to explain why the thing is working or how one could proof that some error was caused due to wrong usage on the client side rather than by faulty code. Thanks to functional programming we can state that the pure functions we wrote will always produce the results intended, because the global state of the computing system cannot alter the calculations. (It also holds some psychological value when we can state, that we have a formal/mathematical proof for the correctness of the function or program.)
Especially in the legal tech sector, the developers as well as their clients (be it the company they are working for or an actual client using the software) have to be very sure, that the things they do are correct, as the fees for misconduct or failing to do something are pretty high and can cause an enormous amount of not only monetary loss but also loss of trust by clients.
So let’s look into
Functional Programming (FP)
What it is:
FP is all about not writing object oriented, imperative code. That means that your code won’t be executed in the way most people are used to (normally from top to bottom, following the path they are coding). I will explain the concept by using Haskell and pseudocode for the examples. (The Headings with — … — describe some of the differences between the functional and the imperative paradigm.)
In Haskell (like in other imperative languages) you have got a main function which is the entry point of your program. Programs in imp. languages will enter at the main function and then follow the instructions given to them. E.g.:
function main():print assemble(3)
function assemble(x): var y = 0 var s = “” do until (y > x) s += y++ loop return s
// output : 0123
However, in FP languages we don’t have loops. One of the most important concepts is something nearly every imperative programmer hates[3]: recursion.
In Haskell, the example above could look like this:
Note that in Haskell a comment begins with —
.
main = putStrLn $ assemble 3-- this is simply to print the result of assemble(3)
assemble 0 = "0" -- the base case of the recursionassemble y = assemble (y-1) ++ show y-- the recursion itself and the assembling of the string-- “show y” is to convert from Int to String
-- output : 0123
Wait; why are there multiple assemble functions in the Haskell version? Is the function overloaded?
— Pattern Matching —
The short answer is: No. What you see here, is called pattern matching: When we call the function (e.g. in main), the parameter is matched with every “version” (every row) of the function.
So in this example 3 is first matched with 0 [from assemble 0 = ...
], which is of course not applicable as 3 is not 0. Then 3 is matched with assemble y
. As y is generic (it is matched with any value), the statement behind assemble y =
is being executed. [Note that we could as easily write “assemble val”, instead of assemble y
; it is just the name we use to refer to the value later on (e.g. in show y
).
Now what is going on here? First the main function is evaluated, resulting in the evaluation of the function assemble
with the parameter 3, so far so clear. We can see, that we have “2” assemble functions: one with 0 and one with a variable y. The 0 and the y are the parameters of the function: So there is really only one function assemble, that just decides (by its’ parameter) if it should use the result “0” [if the parameter given is 0] or the recursive call in the second line [if the parameter given is something else than 0].
The recursive call itself should be pretty obvious: it counts down the value of y until it reaches 0 and then starts to append the values 1,2 and 3 to the respective results using [result of call] ++ y
.
— Where we want to get, instead of how we want to get there —
One other way to explain the difference between the functional and the imperative approach is:
In the imperative paradigm, we specify exactly how an algorithm should run; in the functional paradigm, we specify how the result of the algorithm looks like.
Take a look at the implementation and description of QuickSort in Haskell on this website: http://learnyouahaskell.com/recursion. (Just search for quicksort once on the site… ;) Here we see, that instead of saying ‘okay let’s do this and then that and then depending on the result do that’, we say ‘what is a sorted list? Well it is the stuff bigger than the first element in the list, plus this element, plus the stuff smaller than that element. The smaller lists are again the stuff bigger than the first el…’ I think you see where this is going ;)
— No Variables —
The second thing you may have noted is that there are no variables in the FP code[4]. Therefore, when we code in a functional way, we cannot modify the global state (aka. we cannot create global variables that influence how our functions work)[5]. Most of the functions in a functional program are pure, which means that they don’t have any side effects. (No user input, no screen output, no global state changes, etc.)
Having no variables leads to the consequence that (1) whenever we call a function (that just increments every number in that list and returns it) with some list as a parameter, the list we passed stays unchanged and instead a new one is constructed that is then returned and (2) we use a lot of anonymous functions (lambdas), which you may know from languages like Java, JavaScript, C++, etc.
It is often argued that the first consequence is very bad, as we often want to have good performance. However, the compiler is good enough to keep that in mind and optimizes the code internally (we just don’t see that)[6]. Don’t take my word for it; take a look at some Haskell benchmarks ;)
— Strong Type System —
However, one of the most important features of FP languages is, what we call, their strong type system. If you have ever programmed, you are undoubtedly familiar with types (int, float, double, string, etc.).
If a language has a strong type system, types may not be implicitly converted into one another (float to int for example), meaning that if you call a function with an object as a parameter, the objects’ type has to be identical to the type of the parameter the functions wants to have. This feature can be used/exploited in a way that when we encode our logic, we can “use types as values” and therefore let the compiler catch logical errors in the program. Let me explain that again:
Instead of using values that have some type, we could just interpret a specific type as a specific value and work with the types instead of the values. This way, the compiler, which checks if all the functions get the correct types as parameters, checks if the “values” [the types we regard as values] are correct; leading to an implicit logical check by the compiler itself. An example:
If we want to write a function that subtracts 2 from some value and rely on the function to always return some value greater than 0 (maybe for using it as a divisor), in imperative style we would have to write it like this:
function calculateDiv(x): if(x<=2) return 1 else return x-2
However, when not being careful, we could eventually make the mistake of using <=2 or even>2. Depending on where the function is used, this might only pop up in very special cases of the program, that we must not miss testing. Writing that function in a strong type system could result in this:
Note that this is just pseudocode to prove a point!
calculateDiv :: (Num n) => Successor (Successor (Successor n)) → ncalculateDiv n = n-2
What happens here?
First of all: calculateDiv
is the name of the function. So why is it there multiple times? As explained before, functions in Haskell work primarily with pattern matching. Here, however, there is just one line that actually does something. The first line is just the type of the function. This can also be omitted, but is often written, because we can deduct a lot about a function when we see its’ type; especially in more complex cases. (In this case it actually couldn’t be omitted, because it is the integral part of how this works. It can be omitted in normal Haskell code.)
In the first line, we define the type of the function: It should be a function, which takes a something (in front of the last arrow →) and returns a something (at the end of the →).
The thing it takes (the parameter) should have the type Successor (Successor (Successor n))
where n is a Num. In this case, Num would be a positive Number (so from 0 to +infinity). However, Num is not a value, it is a type. A value of that type can either be Z
(for zero) or Successor n
, where n is either Z
or Successor n
again.
To clarify this: We can now interpret Successor (Successor (Successor Z))
as the number 3.
So now we have a function that takes a value of the type Successor (Successor (Successor n))
which we can now identify as a number that is at least Successor (Successor (Successor Z)) [3]
or greater if the n in there is not Z
but again Successor n [succ. of some other n]
.
Now, at compiletime, the compiler would check if every call of calculateDiv
is essentially done with some parameter of the type of Successor (Successor (Successor n)) [>3]
.
(Of course we would need to use this scheme of writing numbers all over our program and would therefore probably introduce some syntactic sugar, etc.)
This way of writing the function saves us the hassle of testing if for any value we give to the program, we always pass a number greater than 3 into that function.
If you didn’t get it right away, please read the section again carefully. After it, you should be able to see why we are able to know a lot about a function that is written in the functional paradigm when we just see the type of that function.
This is just a simple example, but when done correctly, this can also be applied for complex types in complex functions, leading to much shorter[7] testing time (which is of course far more interesting).
Here are some of the differences of the functional and the imperative paradigm:
Why use it?
As mentioned above, functional programming languages prevent (or at least aide) you writing pure code that will always produce the same results upon the same inputs. This makes testing such programs much faster, as we can argue a lot about how the code works and therefore prevent bugs before we run it and more importantly: let the compiler catch logical errors. Additionally, thanks to the syntax of e.g. Haskell we make complicated algorithms very easy and concise to read and therefore also to reason about and debug them.
The second argument why we should use it is it’s good compability with formal software verification, which we will hear about in the next section.
Why Haskell?
The short answer for this question is: it is the first functional programming language that I came across. Additionally, there is a big community and lots of libraries and code examples out there that range from simple mathematical algorithms to libraries for building UIs and even web frameworks[8] in Haskell.
Mainstream Applications
If you go to this website: https://wiki.haskell.org/Haskell_in_practice you can find some major use cases for Haskell. You will probably find that most of the cases are either very abstract and applicable in academia and research or that the examples are not really used by end users, but by other software developers. However, the more one gets to know Haskell, the more one sees that it is actually applicable in a lot of cases outside these areas. If explained to an end user in an easy way, using functional programming languages can benefit either party.