First Example: Counting Set Bits

Most developers are used to techniques like testing, continuous integration, and thoughtful documentation that can help prevent mistakes from being introduced into a system during its development and evolution. These techniques can be relatively effective, but they risk missing certain classes of bugs. For the most important systems, like those that protect human life or information security, it can make sense to also use formal verification, in which a program is mathematically proved to be correct for all inputs.

Testing takes the actual binary executable and runs it on a subset of the possible inputs to check for expected outputs. The downside of this approach is that tests may miss some critical case. Compared to testing, verification is the process of building a mathematical model of the software and proving properties of that model for all possible inputs.

In this lesson you’ll learn how to use a system called SAW, the Software Analysis Workbench, to build models of functions written in C. You’ll learn how to specify what those functions are supposed to do, and how to write a program in SAWScript that orchestrates the proof that the functions meet their specifications.

The Code

The first program to be verified is pop_count. This function takes a 32-bit integer and returns the number of bits that are set (“populated”). For example pop_count(0) is 0, pop_count(3) is 2, and pop_count(8) is 1. This description is an English language specification of the pop_count function. A specification can be written in a number of formats, including English sentences, but also in machine-readable forms. The advantage of machine-readable specifications is that they can be used as part of an automated workflow.

Note

The pop_count function has uses in many kinds of algorithms and has an interesting folklore.

Here is a sophisticated implementation of pop_count from the book Hacker’s Delight by Henry S. Warren Jr.:

int pop_count(uint32_t x) {
    x = x - ((x >> 1) & 0x55555555);
    x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
    x = (x + (x >> 4)) & 0x0F0F0F0F;
    x = x + (x >> 8);
    x = x + (x >> 16);
    return x & 0x0000003F;
}

Exercise: A Safe and a Broken pop_count

Write a version of pop_count that you believe to be correct, and also a version that includes the kind of error that could be made by accident (perhaps by adding a typo to the optimized version). Add them as pop_count_ok and pop_count_broken1 to popcount.c in the examples/intro directory.

Testing Programs

You’re not likely to be able to convince yourself that the optimized pop_count function is correct just by inspection. A unit test, like the following pop_check can help:

/* Test pop_count on a few values to make sure it's at least sometimes correct */
bool pop_check() {
    return (pop_count(0x0) == 0) &&
           (pop_count(0x3) == 2) &&
           (pop_count(0xFFFFFFFF) == 32) &&
           (pop_count(0xAAAAAAAA) == 16) &&
           (pop_count(0x55555555) == 16);
}

There are some downsides to testing only with chosen values, however. First off, these tests are usually selected by the author of the code, and there is a risk that important values are not tested. This can be ameliorated by a systematic, disciplined approach to choosing test values, but it can never be completely eliminated. This particular unit test is likely to catch egregiously buggy versions of popcount, but not subtle or tricky bugs.

A second approach to testing is to choose many random values at each execution. This approach may eventually find subtle or tricky mistakes, but not reliably or in a predictable amount of time.

Testing with random values requires an executable specification. This specification may just describe some properties of the output (e.g. that the length of two appended lists is the sum of the lengths of the input lists, or that the output of a sorting function is sorted), or it may be a simpler, more straightforward version of the code that uses an easier algorithm. An executable specification for popcount can loop over the bits in the word, masking them off one at a time. While this implementation is straightforward, it is also slow.

int pop_spec(uint32_t x) {
    uint32_t pop = 0;
    uint32_t mask = 1;
    for (int i = 0; i < 32; i++) {
        if (x & mask) { pop++; }
        mask = mask << 1;
    }
    return pop;
}

The function random_value_test performs randomized testing of a provided population count function, comparing its output to that of pop_spec. When they are not identical, it prints the offending input, which can aid in debugging.

void random_value_test(int (*fun)(uint32_t), char *name) {
    srand(time(NULL));

    int failures = 0;
    for (int i = 0; i < 100000; i ++) {
        uint32_t x = rand();
        int test = (*fun)(x);
        int check = pop_spec(x);
        if (test != check) {
            printf("Test failure: %s(%u) was %u, != %u\n",
                    name, x, test, check);
            failures++;
        }
    }
    if (failures == 0) {
        printf("Testing %s succeeded!\n", name);
    }
}

Finally, one could attempt to exhaustively check the values by enumerating and testing all possible combinations. In the simple case of pop_count, which only takes one 32-bit integer, this will take about 20 seconds. With a 64-bit version of the program, however, the test would take longer than a normal human lifetime, so this technique is not practical for ongoing software development.

The way formal verification addresses this is by reasoning about mathematical models of a program, which allows it to eliminate huge regions of the state space with single steps. There are many tools and techniques for performing full formal verification, each suitable to different classes of problem. SAW is particularly suited to imperative programs that don’t contain potentially-unbounded loops. In general, the cost of verification is that it requires specialized knowledge and developing mathematical proofs can take much longer than writing test cases. However, for many programs, automated tools like SAW can be used with similar levels of effort to testing, but resulting in much stronger guarantees. At the same time, re-checking a proof can sometimes be much faster than testing large parts of the input space, leading to quicker feedback during development.

Exercise: Testing popcount

Write a test that detects the defects in your pop_count_broken1 function, and also check that your pop_count_ok and the optimized pop_count function have no defects by using manual and random testing. How much confidence do those techniques provide?

Finally, consider pop_count_broken2, which is only incorrect for exactly one input value. Check how often the randomized test detects the one buggy input.

int pop_count_broken2(uint32_t x) {
    if (x == 0xDEADBEEF) return 22;
    return pop_count(x);
}

Symbolic Execution

The way SAW can prove properties about programs is by converting them into an internal representation that is much closer to a pure mathematical function. For instance, pop_count might be converted to a function like:

\[pop\_count(bit\_string) = \sum_{i=0}^{32}{bit\_string_i}\]

In this version, the details of the call stack, registers vs. memory and the specific execution model of the CPU have been removed. The technique for doing this conversion is called symbolic execution or symbolic simulation. It works by first replacing some of the inputs to a program with symbolic values, which are akin to mathematical variables. The term concrete values is used to describe honest-to-goodness bits and bytes. As the program runs, operations on symbolic values result in descriptions of operations rather than actual values. Just as adding 1 to the concrete value 5 yields the concrete value 6, adding 1 to the symbolic value \(x\) yields the symbolic value \(x+1\). Incrementing the values again yields 7 and \(x+2\), respectively. By simulating the entire function this way, SAW creates a mathematical function out of the C function you provide.

There are complications, of course, such as what to do with conditional branches, but as a user of the tool you won’t have to worry about them except when they introduce limitations to what you can reason about. The main such limitation is that symbolic simulation can’t effectively deal with loops whose termination depends on a symbolic value. For example, this simple implementation of add would not be easily analyzed:

unsigned int add(unsigned int x, unsigned int y) {
    for (unsigned int i = 0; i < y; i ++) {
        x++;
    }
    return x;
}

The problem is that the loop termination depends on the symbolic value \(y\), rather than on some pre-determined concrete number. This means that each time through the for loop two new branches must be explored: one in which the present concrete value of i is less than the symbolic value of \(y\), and one in which it is not. The key thing to remember is that symbolic execution is most applicable to programs that “obviously” terminate, or programs in which the number of loop iterations do not depend on which specific input is provided.

Running SAW

Note

This section uses a library of SAW helpers, in the file helpers.saw. If you’re comparing this text to the SAW manual, you may notice that a few operations have been abbreviated.

SAW is a tool for extracting models from compiled programs and then applying both automatic and manual reasoning to compare them against a specification of some kind. SAW builds models of programs by symbolically executing them, and is capable of building models from LLVM bitcode, JVM bytecode, x86 machine code, Rust’s MIR internal representation, and a number of other formats.

The first step to verifying pop_count with SAW is to use clang to construct its representation in LLVM bitcode. It is important to pass clang the -O1 flag, because important symbols are stripped at higher optimization levels, while lower optimization levels yield code that is less amenable to symbolic execution. The -g flag leaves symbols in the output which helps SAW produce helpful messages when verification fails. It can be convenient to include this rule in a Makefile:

.SUFFIXES: .c .bc

%.bc : %.c
	clang -g -O1 -c -emit-llvm $< -o $@

After building the LLVM bitcode file (by typing make popcount.bc), the next step is to use SAW to verify that the program meets its specification. SAW is controlled using a language called SAWScript. SAWScript contains commands for loading code artifacts, for describing program specifications, for comparing code artifacts to specifications, and for helping SAW in situations when fully automatic proofs are impossible.

The specific fact to be verified using SAW is that pop_count and pop_spec always return the same answer, no matter their input. For any particular input, this can be checked using pop_spec_check:

bool pop_spec_check(uint32_t x) {
    return (pop_spec(x) == pop_count(x));
}

The SAWScript to verify pop_count is really checking that pop_spec_check always returns true.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
include "helpers.saw";
popmod <- llvm_load_module "popcount.bc";

let pop_is_ok = do {
     x <- symbolic_variable uint32_t "x";
     execute [x];
     returns TRUE;
};

llvm_verify popmod "pop_spec_check" [] pop_is_ok;

To execute the verification, we invoke saw on the SAWScript file:

$ saw pop.saw
[20:24:45.159] Loading file "/.../pop.saw"
[20:24:45.160] Loading file "/.../helpers.saw"
[20:24:45.282] Verifying pop_spec ...
[20:24:45.282] Simulating pop_spec ...
[20:24:45.291] Checking proof obligations pop_spec ...
[20:24:46.212] Proof succeeded! pop_spec

The Proof succeeded! message indicates to us that our pop_spec_check function returns True for all possible inputs. Hooray!

Returning to the SAWScript we used, it has three parts:

  1. Lines 1–2 load helper functions and the LLVM module to be verified. This step builds the model from your code.

  2. Lines 4–8 defines the pop_is_ok SAWScript specification, which sets up the symbolic inputs to the pop_spec function, calls the function on those symbolic inputs, and asserts that the return value is True.

  3. Line 10 instructs SAW to verify that pop_is_ok is true for all possible input values.

The LLVM module is loaded using the llvm_load_module command. This command takes a string that contains the filename as an argument, and returns the module itself. In SAWScript, the results of a command are saved using the <- operator; here, the name popmod is made to refer to the module.

SAW specifications have three main parts:

  1. Preconditions which state what the code being verified may assume to be true when it is called,

  2. Instructions for executing the code.

  3. Postconditions which state what the code must ensure to be true after it is called.

Here, the precondition consists of creating one symbolic variable. Internally, symbolic variables are represented in the internal language SAWCore. symbolic_variable takes two arguments: the new variable’s type and a string that names the symbolic variable (which may show up in error messages). After the precondition, the SAWScript variable x is bound to the respective symbolic value \(x\). In more complicated verifications the preconditions are more interesting, as we’ll see soon.

The function is invoked using the execute command, which takes an array of SAWScript variables that correspond to the function’s arguments. The function being executed is the one named by the string argument in the call to llvm_verify.

In the postcondition, the expected return value of the function is specified using returns. In this example, the function is expected to return TRUE.

Translated to English, pop_is_ok says:

Let \(x\) be a 32-bit integer. The result of calling pop_spec_check on \(x\) is TRUE.

If verification reports success, we know that this is the case for all possible values of \(x\) and \(y\).

In other words, pop_is_ok wraps the C function pop_spec_check. This C function computes the believed-correct result (by calling pop_spec), calls the pop_count function we are analyzing and returns TRUE if the results agree. The SAW wrapper creates the symbolic input variable, executes the function on its input, and ensures that the return value is TRUE.

Note

SAWScript distinguishes between defining a name and saving the result of a command. Use let to define a name, which may refer to a command or a value, and <- to run a command and save the result under the given name. Defining a command with let is analogous to defining a C function, and invoking commands with <- is analogous to calling it.

The arguments to llvm_verify (on line 10 above) are popmod, which specifies the LLVM module that contains the code to be verified; "pop_spec_check", the C function to be symbolically executed; and pop_is_ok, the SAW specification to check "pop_spec_check" against. The empty list ([]) is an optional list of previously proven statements, which is used in larger verification projects as described later in this tutorial. This verification script provides the same level of assurance that exhaustive testing would provide, but it completes in a tiny fraction of the time, fast enough to be part of a standard CI (continuous integration) workflow.

Exercise: Verifying Clever Versions of popcount

The following versions of popcount are quite different from the preceding implementations, but they should always return the same value. For both pop_count_mul and pop_count_sparse, do the following:

  1. Write a C function, analogous to pop_spec_check, that relates pop_spec to the new implementation.

  2. Use pop_is_ok in pop.saw together with additional calls to llvm_verify to asserts that the modified versions pop_spec_check also always return true. The string argument to llvm_verify states the name of the C function being verified - modify it to point to your new specification.

  3. Use SAW to verify the implementation. Remember to rebuild the bitcode file after modifying the C sources.

/* A version of popcount that uses multiplication */
int pop_count_mul(uint32_t x) {
    x = x - ((x >> 1) & 0x55555555);
    x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
    x = ((x + (x >> 4)) & 0x0F0F0F0F);
    return (x * 0x01010101) >> 24;
}
/* A version of popcount that uses an indefinite while loop(!) */
int pop_count_sparse(uint32_t x) {
    int n;
    n = 0;
    while (x != 0) {
        n = n + 1;
        x = x & (x - 1);
    }
    return n;
}

Exercise: Verifying Your pop_count Implementations

Verification is useful for more than just carefully-chosen examples. This exercise is about your programs.

Start with your solutions pop_count_ok and pop_count_broken1 from the first exercise. Repeat the tasks from the previous exercise, creating specifications and extending pop.saw to attempt to verify the functions.

$ make popcount.bc
$ saw pop.saw
...
[19:27:38.518] Proof succeeded! pop_ok_check
[19:27:38.520] Verifying pop_broken1_check ...
... many lines deleted
[19:27:38.856] ----------Counterexample----------
[19:27:38.856]   x: 3735928559
[19:27:38.856] ----------------------------------

As in the output above, you should see one successful verification (for the wrapper corresponding to pop_count_ok) and one failed one (for pop_count_broken1). SAW’s messages for failed verifications are quite verbose, but the most important part is the counterexample, which is a concrete input value for which the program fails to return TRUE. Next apply verification popcount_broken2 from the exercise above, which is only incorrect for exactly one input value, you will see SAW comes up with exactly that counterexample without any guidance from you.