VigNAT: A Formally Verified Performant NAT

Overview

We present a NAT written in C and proven to be semantically correct according to RFC 3022, as well as crash-free and memory-safe. There exists a lot of recent work on network verification, but it mostly assumes models of network functions and proves properties specific to network configuration, such as reachability and absence of loops. Our proof applies directly to the C code of a network function, and it demonstrates the absence of implementation bugs.

We tackle the scalability challenge with a new combination of symbolic execution and theorem proving using separation logic; this combination matches well the typical structure of a network function. We then demonstrate that formally proven correctness in this case does not come at the cost of performance.

Software Network Functions

Software NFs have always been popular in low-rate environments, such as home gateways or wireless access points. More recently, they have also appeared in experimental IP routers and industrial middleboxes that support multi-Gbps line rates. Moreover, we are witnessing a push for virtual network functions that can be deployed on general-purpose platforms on demand, much like virtual machines are being deployed in clouds.

Hardware NF

Software NF

  • Higher unit cost
  • Bulky
  • Needs care

Steel Cup

VS

Paper Cup

  • Low unit cost
  • Compact
  • Recyclable
  • Needs no care

Inspired by this post.

Reliability Perspective

Such popular NF as NAT however has proven hard to get right over time: the NAT on various Cisco devices can be crashed or hung using carefully crafted inputs; similar problems exist in Juniper’s NAT, the NAT in Windows Server, and NATs based on NetFilter.

The bugs periodically reach consumers affecting hundreds of thousands of households and often remain unknown for the users. A recent example is this firewall implementation that undermined the isolation of many private networks. Vigor certification would have flagged the missing physical interface check that lead to the possibility of the partial MAC brute-force attack.

On the other side, formal verification techniques have grown strong enough to certify complete and practical systems. Such systems as a compiler, an operating system kernel, a distributed key value store or a file system.

Method

The rationale behind our approach is that different verification techniques are best suited for different types of code. The beauty of symbolic execution lies in its ease of use: it enables automatic code analysis, hence can be used by developers without verification expertise. The challenge with symbolic execution is its notorious lack of scalability: applying it to real C code typically leads to path explosion [1,2]. The part of real NF code that typically leads to unmanageable path explosion is the one that manipulates state. Hence, we split NF code into two parts:

The challenge lies in combining the results of these two verification techniques, and for that we developed a technique we call “lazy proofs”. A lazy proof consists of sub-proofs structured in a way that top-level proofs proceed assuming lower level properties, and the latter are proven lazily a posteriori. For example, symbolic execution requires the use of models that must be correct; we first do the symbolic execution and only afterward validate automatically the correctness of the models. This approach enables us to avoid having to prove that our models are universally valid—which is hard—but instead only prove that they are valid for the specific NF and the specific properties we verified earlier with symbolic execution. This is much easier.

Vigor Method

Results

We compared key performance characteristics of four NFs (see our paper for details):

Latency

The Verified NAT (5.13μsec) has 2% higher latency than the Unverified NAT (5.03μsec), and 8% higher than No-op forwarding (4.75μsec). So, on top of the latency due to packet reception and transmission, the Unverified and Verified NAT add, respectively, 0.28μsec and 0.38μsec of NAT-specific packet processing. For all three NFs, latency remains stable as flow-table occupancy grows, which shows that the two NATs use good hash functions to spread the load uniformly across their tables. The only case where latency increases (to 5.3μsec) is for the Verified NAT, when the flow table becomes almost completely full (the green line curves upward at the last data point). The Linux NAT has significantly higher latency (20μsec).

Latency Plot

Throughput

Throughput Plot

This plot shows the maximum throughput achieved by each NF with less than 0.1% packet loss, as a function of the number of generated flows. The Verified NAT (1.8 Mpps) has 10% lower throughput than the Unverified NAT (2 Mpps). This difference in throughput comes from the difference in NAT-specific processing latency (0.38μsec vs. 0.28μsec) imposed by the two NATs: in our experimental setup, this latency difference cannot be masked, as each NF runs on a single core and processes one packet at a time. The Linux NAT achieves significantly lower throughput (0.6 Mpps).

Contact

References