Compositional Verification and Salsa20

First Example: Counting Set Bits demonstrates verification and maintenance for a small standalone function. Most interesting programs are not just single functions, however. Good software engineering practice entails splitting programs into smaller functions, each of which can be understood and tested independently. Compositional verification in SAW allows this structure to be reflected in proofs as well, so that each function can be verified independently. In addition to being more maintainable, this can greatly increase the performance of a verification script.

This section describes the verification of an implementation of the Salsa20 encryption algorithm. Complete example code can be found in the examples/salsa20 directory of the example code.

Salsa20 Verification Overview

Salsa20 is a stream cipher developed in 2005 by Daniel J. Bernstein, built on a pseudorandom function utilizing add-rotate-XOR (ARX) operations on 32-bit words. The original specification can be found here.

The specification for this task is a trusted implementation written in Cryptol. This is analogous to what is covered in Cryptol in the minmax example, but for a larger system. Some examples from this specification are explored below for the sake of showing what larger Cryptol programs look like.

The implementation to be verified is written in C. This implementation is shown in part alongside the specification for comparison purposes.

A SAWScript containing the specifications of memory layouts and orchestration of the verification itself ties everything together. This will be covered last, including some performance comparisons between compositional and non-compositional verification.

A Cryptol Specification

The Cryptol specification in examples/salsa20/salsa20.cry directly implements the functions defined in Bernstein’s specification. Because there is so much code, this section will only go through some of the functions in detail, in order to highlight some features of Cryptol.

The first example function is quarterround. Its type is [4][32] -> [4][32], which means that it is a function that maps sequences of four 32-bit words into sequences of four 32-bit words. The [y0, y1, y2, y3] notation is pattern matching that pulls apart the four elements of the input sequence, naming each 32-bit word. The Cryptol operator <<< performs a left rotation on a sequence.

quarterround : [4][32] -> [4][32]
quarterround [y0, y1, y2, y3] = [z0, z1, z2, z3]
  where
    z1 = y1 ^ ((y0 + y3) <<< 0x7)
    z2 = y2 ^ ((z1 + y0) <<< 0x9)
    z3 = y3 ^ ((z2 + z1) <<< 0xd)
    z0 = y0 ^ ((z3 + z2) <<< 0x12)

This Cryptol code closely resembles the definition in Section 3 of the specification. The definition reads:

If \(y = (y_0, y_1, y_2, y_3)\) then \(\mathrm{quarterround(y) = (z_0, z_1, z_2, z_3)}\) where

\[\begin{split}\begin{array}{rcl} z_1 & = & y_1 \oplus{} ((y_0 + y_3) <\!\!<\!\!< 7)\\ z_2 & = & y_2 \oplus{} ((z_1 + y_0) <\!\!<\!\!< 9)\\ z_3 & = & y_3 \oplus{} ((z_2 + z_1) <\!\!<\!\!< 13)\\ z_0 & = & y_0 \oplus{} ((z_3 + z_2) <\!\!<\!\!< 18)\\ \end{array}\end{split}\]

Contrast this with the C implementation of s20_quarterround, which makes heavy use of in-place mutation rather than the functional paradigm of building and returning a new sequence:

static void s20_quarterround(uint32_t *y0, uint32_t *y1, uint32_t *y2, uint32_t *y3)
{
  *y1 = *y1 ^ rotl(*y0 + *y3, 7);
  *y2 = *y2 ^ rotl(*y1 + *y0, 9);
  *y3 = *y3 ^ rotl(*y2 + *y1, 13);
  *y0 = *y0 ^ rotl(*y3 + *y2, 18);
}

This function directly modifies the targets of its argument pointers, a shift in paradigm that will be highlighted by the SAW specification since that is where the memory management of the C is connected to the pure computation of the Cryptol.

quarterround is used in the definition of two other functions, rowround and columnround, which perform the operation on the rows and columns of a particular matrix, represented as a flat sequence of 16 32-bit words.

These two operations are composed (rowround after columnround) to form the doubleround operation. The Cryptol code for this composition closely resembles the definition in the specification:

doubleround : [16][32] -> [16][32]
doubleround(xs) = rowround(columnround(xs))

Combined with some utility functions for mapping sequences of four bytes to and from little-endian 32-bit words, doubleround gives us the Salsa20 hash function:

Salsa20 : [64][8] -> [64][8]
Salsa20 xs = join ar
  where
    ar = [ littleendian_inverse words | words <- xw + zs@10 ]
    xw = [ littleendian xi | xi <- split xs ]
    zs = [xw] # [ doubleround zi | zi <- zs ]

All three definitions in the where clause are sequence comprehensions, which are similar to Python’s generator expressions or C#’s LINQ. A sequence comprehension consists of square brackets that contain an expression, and then one or more branches. Branches begin with a vertical bar, and they contain one or more comma-separated bindings. Each binding is a name, an arrow, and another sequence.

The value of a comprehension with one branch is found by evaluating the expression for each element of the sequence in the branch, with the name to the left of the arrow set to the current element. The value of [x + 1 | x <- [1,2,3]] is [2, 3, 4]. If there are multiple bindings in the branch, later bindings are repeated for each earlier value. So the value of [(x + 1, y - 1) | x <- [1,2], y <- [11, 12]] is [(2, 10), (2, 11), (3, 10), (3, 11)]. The value of a comprehension with multiple branches is found by evaluating each branch in parallel; thus, the value of [(x + 1, y - 1) | x <- [1,2] | y <- [11,12]] is [(2, 10), (3, 11)].

In the where clause, the definition of xw can be read as “First split xs into 4-byte words, then combine them in a little-endian manner to obtain 32-bit words.” The specific sizes are automatically found by Cryptol’s type checker.

The definition of zs is an infinite sequence. It begins with xw, the little-endian reorganization of xs from the previous paragraph. The # operator appends sequences. The rest of the sequence consists of doubleround applied to each element of zs itself. In other words, the second element is found by applying doubleround to xw, the third by applying doubleround to the second, and so forth. Stepping through the evaluation yields this sequence:

[xw] # [ doubleround zi | zi <- zs ]

[xw] # [ doubleround zi | zi <- [xw] # [doubleround zi | zi <- zs] ]

[xw] # [doubleround xw] # [ doubleround zi | zi <- [doubleround zi | zi <- zs] ]

[xw] # [doubleround xw] # [ doubleround zi | zi <- [doubleround zi | zi <- [xw] # [doubleround zi | zi <- zs]] ]

[xw] # [doubleround xw] # [ doubleround zi | zi <- [doubleround xw] # [doubleround zi | zi <- [doubleround zi | zi <- zs]] ]

[xw] # [doubleround xw] # [doubleround (doubleround xw)] # [ doubleround zi | zi <- [doubleround zi | zi <- [doubleround zi | zi <- zs]] ]

The resulting sequence consists of doubleround applied \(n\) times to xw at position \(n\). This process could, in principle, continue forever. In Cryptol, however, sequences are computed lazily, so as long as nothing ever asks for the last element, the program will still terminate.

The final definition is ar, which adds xw to the tenth element of zs, which is the result of applying doubleround ten times to xw. In Cryptol, + is extended over sequences so that adding two sequences adds their elements. The final result of Salsa20 is computed by re-joining the split words into the appropriate-sized sequence.

The C implementation uses in-place mutation and an explicit loop. Due to the use of mutation, it must be careful to copy data that will be used again later.

// The core function of Salsa20
static void s20_hash(uint8_t seq[static 64])
{
  int i;
  uint32_t x[16];
  uint32_t z[16];

  // Create two copies of the state in little-endian format
  // First copy is hashed together
  // Second copy is added to first, word-by-word
  for (i = 0; i < 16; ++i)
    x[i] = z[i] = s20_littleendian(seq + (4 * i));

  for (i = 0; i < 10; ++i)
    s20_doubleround(z);

  for (i = 0; i < 16; ++i) {
    z[i] += x[i];
    s20_rev_littleendian(seq + (4 * i), z[i]);
  }
}

Note again the pervasive use of in-place mutation - as with s20_quarterround, the connection between this and the functionally pure Cryptol specification will be made clear through the SAW specification.

Salsa20 supports two key sizes: 16 and 32 bytes. Rather than writing two separate implementations, Salsa20_expansion uses two unique feature of Cryptol’s type system to implement both at once. These features are numbers in types and arithmetic predicates. Numbers in types, seen earlier, are used for the lengths of sequences, and it is possible to write functions that work on any length.

In Cryptol, some types accept arguments, which are written at the beginning of the type in curly braces. For instance, the most general type signature for a swap function on pairs is swap : {a, b} (a, b) -> (b, a). This is equivalent to the Java signature Pair<B, A> swap<A, B> (Pair<A, B> x). The {a, b} corresponds to the <A,B> immediately after swap. Arguments to types can be both ordinary types, like [8] or ([16][8], [8]), or numbers.

Type arguments can additionally be constrained. This means that a type or number argument must satisfy certain properties in order to be used. These constraints are written in parentheses and followed by a double arrow. For instance, the type of a function that takes the first element of a sequence is {n, a} (n > 0) => [n]a -> a, where n must be greater than zero (because empty sequences have no first element).

The beginning of the type signature for Salsa20_expansion reads {a} (a >= 1, 2 >= a) => ..., which says that a can only be 1 or 2. Later on in the type, [16*a][8] is used for the key length, resulting in a length of either 16 or 32 8-bit bytes. The back-tick operator allows a program to inspect the value of a length from a type, which is used in the if expression to select the appropriate input to Salsa20. Cryptol strings, like C string literals, represent sequences of ASCII byte values. The specific strings used here come from the Salsa20 specification.

// Salsa 20 supports two key sizes, [16][8] and [32][8]
Salsa20_expansion : {a} (a >= 1, 2 >= a) => ([16*a][8], [16][8]) -> [64][8]
Salsa20_expansion(k, n) = z
  where
    [s0, s1, s2, s3] = split "expand 32-byte k" : [4][4][8]
    [t0, t1, t2, t3] = split "expand 16-byte k" : [4][4][8]
    x = if(`a == 2) then s0 # k0 # s1 # n # s2 # k1 # s3
                    else t0 # k0 # t1 # n # t2 # k0 # t3
    z = Salsa20(x)
    [k0, k1] = (split(k#zero)):[2][16][8]

The encryption function takes a tuple of three parameters: a key k, an eight-byte nonce v, and a message m of at most \(2^{70}\) bytes. In accordance with Section 10 of the specification, it computes the Salsa20_expansion of the nonce and sufficient subsequent numbers, and take truncates it to the desired length. The message is combined with this sequence, yielding the result.

Salsa20_encrypt : {a, l} (a >= 1, 2 >= a, l <= 2^^70) => ([16*a][8], [8][8], [l][8]) -> [l][8]
Salsa20_encrypt(k, v, m) = c
  where
    salsa = take (join [ Salsa20_expansion(k, v#(reverse (split i))) | i <- [0, 1 ... ] ])
    c = m ^ salsa

SAW Specification and Verification

The SAW specification for this Salsa20 implementation is comprised of a couple of convenient helper functions, a specification for each of the interesting functions in the Salsa20 specification (i.e. the functions detailed in Bernstein’s specification document), and a defined command main that performs the actual verification.

One big difference between the Cryptol specification and the C implementation is that Cryptol, a functional language, returns new values, while programs in C, an imperative language, tend to write new values to a pointer’s target. In this case, the C version of the program overwrites an argument with the value that the Cryptol version returns. This pattern is abstracted over in oneptr_update_func, a SAWScript command that describes this relationship between the C and Cryptol versions of a function. The arguments are type : LLVMType that describes the parameter type, name : String that names the parameter for pretty-printing, and the function f : Term to apply to the parameter.

let oneptr_update_func (type : LLVMType) (name : String) (f : Term) = do {
    (x, p) <- pointer_to_fresh type name;
    crucible_execute_func [p];
    crucible_points_to p (crucible_term {{ f x }});
};

Note

If you haven’t already, look at the file helpers.saw - it defines a number of SAW functions that factor out common patterns as in oneptr_update_func, but also give more user-friendly names to various functions. Feel free to use, modify or ignore helpers.saw in SAW programs you write, and be on the lookout for new helpful functions when you work with SAW programs written by others. Good choice of names can make SAW programs much more readable.

All of Salsa20 depends on s20_quarterround. Here is its specification:

let quarterround_setup : CrucibleSetup () = do {
    (y0, p0) <- pointer_to_fresh (llvm_int 32) "y0";
    (y1, p1) <- pointer_to_fresh (llvm_int 32) "y1";
    (y2, p2) <- pointer_to_fresh (llvm_int 32) "y2";
    (y3, p3) <- pointer_to_fresh (llvm_int 32) "y3";

    crucible_execute_func [p0, p1, p2, p3];

    let zs = {{ quarterround [y0,y1,y2,y3] }};
    crucible_points_to p0 (crucible_term {{ zs@0 }});
    crucible_points_to p1 (crucible_term {{ zs@1 }});
    crucible_points_to p2 (crucible_term {{ zs@2 }});
    crucible_points_to p3 (crucible_term {{ zs@3 }});
};

The helper pointer_to_fresh is the same as the one in Specifying Memory Layout. It allocates space for a new symbolic variable of the given type, returning both the symbolic value and the pointer to it. The symbolic values are passed to the Cryptol function quarterround to compute the expected result values. Because the function’s inputs are symbolic, the outputs are also mathematical expressions that reflect the function’s behavior. These expected result values are then used as the expected targets of the pointers in the post-condition of the SAW specification.

The specification for s20_hash is an example of one for which oneptr_update_func is useful.

let salsa20_setup =
  oneptr_update_func (llvm_array 64 (llvm_int 8)) "seq" {{ Salsa20 }};

Putting everything together, main verifies the implementation functions according to these specifications. main has the type TopLevel () — this is the type of commands that can be run at the top level of a SAWScript program. In First Example: Counting Set Bits, crucible_llvm_verify was used on its own, and its return value was discarded. However, verification actually returns a useful result: it returns an association between a specification and the fact that the given function has been verified to fulfill it. In SAWScript, this association has the type CrucibleMethodSpec. Because crucible_llvm_verify is a command, the returned value is saved using the <- operator.

The third argument to crucible_llvm_verify is a list of CrucibleMethodSpec objects. While performing verification, the work that was done to construct a CrucibleMethodSpec is re-used. Specifically, instead of recursively symbolically executing a verified function, the prior specification is used as an axiomatization of its behavior. In the definition of main, the results of earlier verifications are passed along:

let main : TopLevel () = do {
    m      <- llvm_load_module "salsa20.bc";
    qr     <- crucible_llvm_verify m "s20_quarterround" []      false quarterround_setup   z3;
    rr     <- crucible_llvm_verify m "s20_rowround"     [qr]    false rowround_setup       z3;
    cr     <- crucible_llvm_verify m "s20_columnround"  [qr]    false columnround_setup    z3;
    dr     <- crucible_llvm_verify m "s20_doubleround"  [cr,rr] false doubleround_setup    z3;
    s20    <- crucible_llvm_verify m "s20_hash"         [dr]    false salsa20_setup        z3;
    s20e32 <- crucible_llvm_verify m "s20_expand32"     [s20]   true  salsa20_expansion_32 z3;
    s20encrypt_63 <- crucible_llvm_verify m "s20_crypt32" [s20e32] true (s20_encrypt32 63) z3;
    s20encrypt_64 <- crucible_llvm_verify m "s20_crypt32" [s20e32] true (s20_encrypt32 64) z3;
    s20encrypt_65 <- crucible_llvm_verify m "s20_crypt32" [s20e32] true (s20_encrypt32 65) z3;

    print "Done!";
};

This example also uses the fourth argument to crucible_llvm_verify. During symbolic execution, conditionals require that both branches be explored. If the fourth argument is true, then an SMT solver is used to rule out impossible branches. For some problems, the overhead of the solver exceeds the time saved on exploring branches; for others, a short time spent in the solver saves a long time spent in the symbolic execution engine. Ruling out impossible branches can also allow termination of programs in which the number of iterations can depend on a symbolic value. This is called path satisfiability checking.

The 16-byte version of Salsa20 is not verified, because the C program does not implement it. Also, Salsa20 is verified only with respect to some particular message lengths, because SAW is not yet capable of verifying infinite programs. This is why main verifies multiple lengths, in the hope that this is sufficient to increase our confidence.

Comparing Compositional and Non-compositional Verification

In examples/salsa20, there are two SAW specifications: salsa20_compositional.saw, which contains main as presented above, and salsa20_noncompositional.saw, which replaces the CrucibleMethodSpec list parameter in each call to crucible_llvm_verify with the empty list, effectively disabling compositional verification. The one exception to this is in the verification of s20_hash; not using compositional verification for this function did not terminate in a reasonable amount of time.

These two verification tasks were run on a 2019 15-inch MacBook Pro, 2.4 GHz 8-Core Intel i9 processor, 32 GB DDR4 RAM. The values shown are the average over five runs:

Compositional

Non-Compositional

Average Time (s)

2.64

5.39

Even with this limited data set, the benefits of using compositional verification are clear: There’s effectively a 2x increase in speed in this example, even accounting for the fact that the verification of s20_hash is still treated compositionally.

Exercise: Rot13

Rot13 is a Caesar cipher that is its own inverse. In it, each letter is mapped to the letter that is 13 places greater than it in the alphabet, modulo 26. Non-letters are untouched, and case is preserved. For instance, “abc” becomes “nop”, and “SAW is fun!” becomes “FNJ vf sha!”.

Your task is to implement rot13 in C, and verify it using SAW.

Start by writing a function that performs a single character of rot13, assuming 7-bit ASCII encoding. Verify it using SAW and Cryptol.

Then, write a function that uses your single-character rot13 to perform rot13 on a string with precisely 20 characters in it. Verify this using SAW and Cryptol with compositional verification.