Paul Kocher: Spectre Mitigations in Microsoft's C/C++ Compiler

Spectre Mitigations in Microsoft's C/C++ Compiler

Paul Kocher
February 13, 2018

Introduction

Microsoft announced support in the Visual C/C++ compiler for mitigating the conditional branch variant of the Spectre attack (aka "variant 1" in Jann Horn's post).

This form of Spectre can be used to attack a broad range of software -- operating systems, device drivers, web APIs, database systems, and almost anything else that receives untrusted input and may run on the same computer as untrusted code. Because large programs can contain literally millions of conditional branches and a lot of legacy software needs to be updated, automated tools to add protections are essential.

The countermeasure approach being used is conceptually straightforward but challenging in practice. Intel has redefined the LFENCE instruction as stopping speculative execution. (Although this post focuses on x86, ARM has defined an instruction named CSDB that works similarly.) If an LFENCE instruction is placed before every vulnerable conditional branch destination, this variant of Spectre is fully addressed.

It's important that no exploitable paths get missed. An attacker needs only a single vulnerable code pattern, which can be anywhere in a process's address space (including libraries and other code that have nothing to do with security). It doesn't help much to lock some doors while leaving others wide open. Likewise, speculation barriers only work if they are inserted in all necessary locations.

Inserting an LFENCE on every path leaving a conditional jump would be effective and conceptually simple, but unfortunately the performance cost would be substantial. For example, I ran an experiment where I took a simple SHA-256 implementation and manually added LFENCEs around the conditional jumps in the main loop. Performance on my Haswell-based laptop fell from 94857 iterations/sec to 38476 iterations/sec., a decrease of 59.4 percent. Some other operations would likely have an even greater performance impact, since in my test the entire compression function was LFENCE-free.

Microsoft's compiler attempts to reduce the performance impact by using the compiler's static analyzer to select where to insert LFENCE instructions. Although Microsoft's post lacks any quantified performance data, I was surprised by the statement that Microsoft has "built all of Windows with /Qspectre enabled and did not notice any performance regressions of concern".

Results

To see how well Microsoft's compiler implementation works, I wrote several Spectre-vulnerable source code examples and compiled them using Microsoft's 64-bit C/C++ compiler version 19.13.26029 with the Spectre mitigation enabled. I then looked at the resulting assembly language listings.

At first, I thought I had the wrong version of the compiler, since I wasn't seeing any LFENCEs. Finally, I tried compiling my example code from the Appendix of the Spectre paper, and saw LFENCEs appearing. Still, even small variations in the source code resulted in unprotected code being emitted, with no warning to developers.

The code examples below include 15 vulnerable functions. The compiler adds LFENCEs to the first two, which closely resemble the example code in the Spectre paper. The remaining 13 examples compile to unsafe output code which, if included in an application where adversaries control the input parameter x, would potentially compromise the entire application. Furthermore, my examples are far from comprehensive -- for example, they all rely on cache modification as a covert channel and they all reside in simple functions that more amenable to static analysis.

Discussion

The strictest security requirement for speculation barrier countermeasures would be to ensure that no unauthorized (e.g. out-of-bounds) memory reads occur during speculative execution. A weaker requirement would be to allow unsafe reads to occur provided that the results are only used in 'safe' operations that are 'guaranteed' to not leak information. Because future processor implementations may add new optimizations, these guarantees should ideally be architecturally defined. Unfortunately, the Microsoft compiler does not do either of these, and simply produces unsafe code when the static analyzer is unable to determine whether a code pattern will be exploitable.

While there is room for improvement, the real issue is one of approach rather than implementation. A compiler cannot reliably determine whether arbitrary sequences of instructions will be exploitable. For example, when compiling a function, the compiler often has no way to infer the properties of the parameters that will be passed when the function is called. A post-compilation analysis tool (e.g. using symbolic execution) could do somewhat a better job, but will still be imperfect since code analysis is an inherently undecidable problem.

The underlying issue is one of security versus performance. Automated tools will inevitably encounter many locations where they are uncertain whether a speculation barrier is required. Inserting LFENCEs in all these places will hurt performance, omitting them is a security risk, and alerts will drown the programmer in a sea of confusing warning messages. Microsoft's current compiler mitigation is designed to minimize the performance overhead.

I've been in touch with the Microsoft compiler team and have had an excellent conversation with them. They understand the trade-offs involved. Given the limitations of static analysis and messiness of the available Spectre mitigations, they are struggling to do what they can without significantly impacting performance. They would welcome feedback – should /Qspectre (or a different option) automatically inserts LFENCEs in all potentially non-safe code patterns, rather than the current approach of protecting known-vulnerable patterns? Would you make use of a /Qspectre variant that provides the most secure mitigation assistance -- at the cost of a significant performance loss? (The Visual Studio team can be reached online or by email.)

In my opinion, the best approach is to address the security issue fully when a developer explicitly passes a compilation flag (e.g. /Qspectre) requesting protection.

Although a there would be a performance impact at first, developers can (relatively) easily rework performance-critical routines as needed. In contrast, manually wading through the compiled code to find missing LFENCEs is entirely impractical. Speculative execution commonly 180+ instructions past a cache miss, and vulnerabilities can involve multiple functions, macros, "?:" operators, etc. It would also be enormously helpful to have a flag in output files (object files, DLLs, executables, etc.) indicating whether comprehensive LFENCE insertion was performed on the components. Static analysis still has an important role to play; instead of identifying known-bad code patterns for LFENCE insertion, its job is to identify safe code patterns where LFENCEs can be omitted. Unreliable defenses should be avoided, since even a single exploitable code pattern in an application or its libraries can leak the entire memory contents of the process to an attacker.

Conclusions

Developers and users cannot rely on Microsoft's current compiler mitigation to protect against the conditional branch variant of Spectre. Speculation barriers are only an effective defense if they are applied to all vulnerable code patterns in a process, so compiler-level mitigations need to instrument all potentially-vulnerable code patterns. Microsoft's blog post states that "there is no guarantee that all possible instances of variant 1 will be instrumented under /Qspectre". In practice, the current implementation misses many (and probably most) vulnerable code patterns, leading to unrealistically optimistic performance results as compared to robust countermeasures while creating a potentially false sense of security.

I gave Microsoft an opportunity to review this post. In addition to edits to how they can receive feedback, they asked me to highlight that the compiler cannot instrument all possible instances of variant 1 without over-inserting barriers, incurring a significant performance cost. I completely agree with this comment and made some edits to reflect this. Still, the opinions expressed here (as well as any errors) are mine.

Code Examples


// ----------------------------------------------------------------------------------------
// Define the types used, and specify as extern's the arrays, etc. we will access.
// Note that temp is used so that operations aren't optimized away.
//
// Compilation flags:  cl /c /d2guardspecload /O2 /Faout.asm
// Note: Per Microsoft's blog post, /d2guardspecload flag will be renamed /Qspectre
//
// This code is free under the MIT license (https://opensource.org/licenses/MIT), but
// is intentionally insecure so is only intended for testing purposes.

#include <stdlib.h>
#include <stdint.h>
extern size_t array1_size, array2_size, array_size_mask;
extern uint8_t array1[], array2[], temp;


// ----------------------------------------------------------------------------------------
// EXAMPLE 1:  This is the sample function from the Spectre paper.
//
// Comments:  The generated assembly (below) includes an LFENCE on the vulnerable code 
// path, as expected

void victim_function_v01(size_t x) {
     if (x < array1_size) {
          temp &= array2[array1[x] * 512];
     }
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN2@victim_fun
//    lfence
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rdx+rcx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 2:  Moving the leak to a local function that can be inlined.
// 
// Comments:  Produces identical assembly to the example above (i.e. LFENCE is included)
// ----------------------------------------------------------------------------------------

void leakByteLocalFunction_v02(uint8_t k) { temp &= array2[(k)* 512]; }
void victim_function_v02(size_t x) {
     if (x < array1_size) {
          leakByteLocalFunction(array1[x]);
     }
}


// ----------------------------------------------------------------------------------------
// EXAMPLE 3:  Moving the leak to a function that cannot be inlined.
//
// Comments: Output is unsafe.  The same results occur if leakByteNoinlineFunction() 
// is in another source module.

__declspec(noinline) void leakByteNoinlineFunction(uint8_t k) { temp &= array2[(k)* 512]; }
void victim_function_v03(size_t x) {
     if (x < array1_size)
          leakByteNoinlineFunction(array1[x]);
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN2@victim_fun
//    lea     rax, OFFSET FLAT:array1
//    movzx   ecx, BYTE PTR [rax+rcx]
//    jmp     leakByteNoinlineFunction
//  $LN2@victim_fun:
//    ret     0
//
//  leakByteNoinlineFunction PROC
//    movzx   ecx, cl
//    lea     rax, OFFSET FLAT:array2
//    shl     ecx, 9
//    movzx   eax, BYTE PTR [rcx+rax]
//    and     BYTE PTR temp, al
//    ret     0
//  leakByteNoinlineFunction ENDP


// ----------------------------------------------------------------------------------------
// EXAMPLE 4:  Add a left shift by one on the index.
// 
// Comments: Output is unsafe.

void victim_function_v04(size_t x) {
     if (x < array1_size)
          temp &= array2[array1[x << 1] * 512];
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN2@victim_fun
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rdx+rcx*2]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 5:  Use x as the initial value in a for() loop.
//
// Comments: Output is unsafe.

void victim_function_v05(size_t x) {
     size_t i;
     if (x < array1_size) {
          for (i = x - 1; i >= 0; i--)
               temp &= array2[array1[i] * 512];
     }
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN3@victim_fun
//    movzx   edx, BYTE PTR temp
//    lea     r8, OFFSET FLAT:__ImageBase
//    lea     rax, QWORD PTR array1[r8-1]
//    add     rax, rcx
//  $LL4@victim_fun:
//    movzx   ecx, BYTE PTR [rax]
//    lea     rax, QWORD PTR [rax-1]
//    shl     rcx, 9
//    and     dl, BYTE PTR array2[rcx+r8]
//    jmp     SHORT $LL4@victim_fun
//  $LN3@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 6:  Check the bounds with an AND mask, rather than "<".
//
// Comments: Output is unsafe.

void victim_function_v06(size_t x) {
     if ((x & array_size_mask) == x)
          temp &= array2[array1[x] * 512];
}

//    mov     eax, DWORD PTR array_size_mask
//    and     rax, rcx
//    cmp     rax, rcx
//    jne     SHORT $LN2@victim_fun
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rdx+rcx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 7:  Compare against the last known-good value.
//
// Comments: Output is unsafe.

void victim_function_v07(size_t x) {
     static size_t last_x = 0;
     if (x == last_x)
          temp &= array2[array1[x] * 512];
     if (x < array1_size)
          last_x = x;
}

//    mov     rdx, QWORD PTR ?last_x@?1??victim_function_v07@@9@9
//    cmp     rcx, rdx
//    jne     SHORT $LN2@victim_fun
//    lea     r8, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[r8+rcx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+r8]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    cmovb   rdx, rcx
//    mov     QWORD PTR ?last_x@?1??victim_function_v07@@9@9, rdx
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 8:  Use a ?: operator to check bounds.

void victim_function_v08(size_t x) {
     temp &= array2[array1[x < array1_size ? (x + 1) : 0] * 512];
}

//    cmp     rcx, QWORD PTR array1_size
//    jae     SHORT $LN3@victim_fun
//    inc     rcx
//    jmp     SHORT $LN4@victim_fun
//  $LN3@victim_fun:
//    xor     ecx, ecx
//  $LN4@victim_fun:
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rcx+rdx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//    ret     0



// ----------------------------------------------------------------------------------------
// EXAMPLE 9:  Use a separate value to communicate the safety check status.
//
// Comments: Output is unsafe.

void victim_function_v09(size_t x, int *x_is_safe) {
     if (*x_is_safe)
          temp &= array2[array1[x] * 512];
}

//    cmp     DWORD PTR [rdx], 0
//    je      SHORT $LN2@victim_fun
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rcx+rdx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 10:  Leak a comparison result.
//
// Comments: Output is unsafe.  Note that this vulnerability is a little different, namely
// the attacker is assumed to provide both x and k.  The victim code checks whether 
// array1[x] == k.  If so, the victim reads from array2[0].  The attacker can try
// values for k until finding the one that causes array2[0] to get brought into the cache.

void victim_function_v10(size_t x, uint8_t k) {
     if (x < array1_size) {
          if (array1[x] == k)
               temp &= array2[0];
     }
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN3@victim_fun
//    lea     rax, OFFSET FLAT:array1
//    cmp     BYTE PTR [rcx+rax], dl
//    jne     SHORT $LN3@victim_fun
//    movzx   eax, BYTE PTR array2
//    and     BYTE PTR temp, al
//  $LN3@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 11:  Use memcmp() to read the memory for the leak.
//
// Comments: Output is unsafe.

void victim_function_v11(size_t x) {
     if (x < array1_size)
          temp = memcmp(&temp, array2 + (array1[x] * 512), 1);
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN2@victim_fun
//    lea     rax, OFFSET FLAT:array1
//    movzx   ecx, BYTE PTR [rax+rcx]
//    lea     rax, OFFSET FLAT:array2
//    shl     rcx, 9
//    add     rcx, rax
//    movzx   eax, BYTE PTR temp
//    cmp     al, BYTE PTR [rcx]
//    jne     SHORT $LN4@victim_fun
//    xor     eax, eax
//    mov     BYTE PTR temp, al
//    ret     0
//  $LN4@victim_fun:
//    sbb     eax, eax
//    or      eax, 1
//    mov     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0

// ----------------------------------------------------------------------------------------
// EXAMPLE 12:  Make the index be the sum of two input parameters.
//
// Comments: Output is unsafe.

void victim_function_v12(size_t x, size_t y) {
     if ((x + y) < array1_size)
          temp &= array2[array1[x + y] * 512];
}

//    mov     eax, DWORD PTR array1_size
//    lea     r8, QWORD PTR [rcx+rdx]
//    cmp     r8, rax
//    jae     SHORT $LN2@victim_fun
//    lea     rax, QWORD PTR array1[rcx]
//    lea     r8, OFFSET FLAT:__ImageBase
//    add     rax, r8
//    movzx   ecx, BYTE PTR [rax+rdx]
//    shl     rcx, 9
//    movzx   eax, BYTE PTR array2[rcx+r8]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0

// ----------------------------------------------------------------------------------------
// EXAMPLE 13:  Do the safety check into an inline function
//
// Comments: Output is unsafe.

__inline int is_x_safe(size_t x) { if (x < array1_size) return 1; return 0; }
void victim_function_v13(size_t x) {
     if (is_x_safe(x))
          temp &= array2[array1[x] * 512];
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN2@victim_fun
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rdx+rcx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 14:  Invert the low bits of x
//
// Comments: Output is unsafe.

void victim_function_v14(size_t x) {
     if (x < array1_size)
          temp &= array2[array1[x ^ 255] * 512];
}

//    mov     eax, DWORD PTR array1_size
//    cmp     rcx, rax
//    jae     SHORT $LN2@victim_fun
//    xor     rcx, 255                    ; 000000ffH
//    lea     rdx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rcx+rdx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rdx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0


// ----------------------------------------------------------------------------------------
// EXAMPLE 15:  Pass a pointer to the length
//
// Comments: Output is unsafe.

void victim_function_v15(size_t *x) {
     if (*x < array1_size)
          temp &= array2[array1[*x] * 512];
}

//    mov     rax, QWORD PTR [rcx]
//    cmp     rax, QWORD PTR array1_size
//    jae     SHORT $LN2@victim_fun
//    lea     rcx, OFFSET FLAT:__ImageBase
//    movzx   eax, BYTE PTR array1[rax+rcx]
//    shl     rax, 9
//    movzx   eax, BYTE PTR array2[rax+rcx]
//    and     BYTE PTR temp, al
//  $LN2@victim_fun:
//    ret     0

Home page