Bullet Trains, Express Lanes' and Aspen Trees
San Diego, Calif., Dec. 11, 2013 -- Computer scientists from the University of California, San Diego’s Jacobs School of Engineering didn’t have far to travel to attend the 9th ACM International Conference on Emerging Networking Experiments and Technologies (CoNEXT) in Santa Barbara, Calif. So why the big focus on “bullet trains” and “express lanes”?
For researchers and students in UC San Diego’s Center for Networked Systems (CNS) and Computer Science and Engineering (CSE) department, those transportation terms do not mean quite the same thing in computer networking as they do for consumers eager to get home by train, bus or car for the holidays.
CNS research scientist George Porter co-authored two papers with CSE colleagues, and they were presented on Dec. 11 during a CoNEXT session called, “Trains, Lanes and Autobalancing.”
According to the first presented paper, “Bullet Trains: A Study of NIC Burst Behavior at Microsecond Timescales,” a lot is known at a macro level about the behavior of traffic in data center networks. This includes the ‘burstiness’ – the relative tendency of data traffic to transmit in short, uneven spurts – of TCP (a protocol on which transmission through the Internet is controlled), variability based on destination, and overall size of network flows. However, according to Porter and his co-authors, CSE grad student Rishi Kapoor and CSE professors Geoff Voelker and Alex Snoeren, “little information is available on the behavior of data center traffic at packet-level timescales,” i.e., at timescales below 100 microseconds.
Some 30 years ago, an MIT study compared packets of data with train cars – sent from a source to the same destination back-to-back like train cars pulled by a locomotive. In the context of data centers, however, the UC San Diego researchers came to the conclusion that those trains are more aptly termed “bullet trains” when viewed at microsecond timescales. Porter and his colleagues examined the various sources of traffic bursts and measured the traffic from different sources along the network stack, as well as the burstiness of different data-center workloads, and the burst behavior of bandwidth-intensive applications such as data sorting (MapReduce) and distributed file systems (NFS and Hadoop).
“Our analysis showed that network traffic exhibits large bursts at sub-100 microsecond timescales,” said Porter. “Regardless of application behavior at the higher layer, packets come out of a 10 Gigabit-per-second server in bursts due to batching.” The larger the burst, he added, the greater the likelihood of packets being dropped.
The researchers focused primarily on the network interface controller (NIC) layer, because the controller is directly implicated in the burst behavior that most affects the speed of computer networking. The assumption has been that packets transmitted within a single flow would be uniformly paced, but real life turns out to be more complex. This is primarily because packets are batched differently across the network stack in order to achieve link rates of 10Gbps or higher.
For their paper, Porter and his co-authors studied the burst behavior of traffic emanating from a 10Gbps end-host across a variety of data center applications. “We found that at 10- to 100-microsecond timescales, the traffic exhibits large bursts, tens of packets in length,” said Porter. “We also found that this level of burstiness was largely outside of application control, and independent of the high-level behavior of applications.”
The second Dec. 11 presentation at CoNEXT, FasTrak: Enabling Express Lanes in Multi-Tenant Data Centers, was co-authored by Porter, computer science Ph.D. student Radhika Niranjan Mysore, and CSE Prof. Amin Vahdat (currently on leave at Google). They explored an issue specifically facing operators of cloud services, such as Amazon EC2, Microsoft Azure and Google Compute Engine. These so-called multi-tenant data centers may host tens of thousands of customers.
No customer wants their data or service to leak into those of other customers in the cloud, and typically, cloud operators rely on virtual machines (VMs) as well as network-level rules and policies that are enforced on every packet going in and out of the host in order to ensure network isolation. As a result, however, VMs carry innate costs in the form of latency (delays) and the increased cost of processing packets in what’s called the hypervisor, which affect both the provider and the tenant.
The researchers came up with a solution called FasTrak, which keeps the functionality but curbs the cost of processing rules by offloading some of the virtualization functionality from the hypervisor software to the network switch hardware through so-called “express lanes.” There is limited space on a switch – not enough to take care of all the rules required by a server – so for FasTrak, the researchers determined the subset of data flows that could benefit most from offloading via express lanes to hardware. The result: an approximate doubling in latency improvement (i.e., 50 percent shorter delays), combined with a 21 percent drop in the server load (volume of traffic). According to the study’s conclusion, FasTrak’s actual benefits are workload dependent, but “services that should benefit the most are those with substantial communication requirements and some communication locality.”
Fat Trees vs. Aspen Trees
|Aspen Trees Roots|
CNS research scientist Porter also moderated a CoNEXT session on Dec. 10 about…. During the session, recent CSE alumna Meg Walraed-Sullivan (Ph.D. ’12), who is now at Microsoft Research, presented a paper on “Aspen Trees: Balancing Data Center Fault Tolerance, Scalability and Cost.” The paper was co-authored by her Ph.D. advisors, CSE professors Amin Vahdat (on leave at Google) and Keith Marzullo (recently on leave at NSF). The paper flows from Walraed-Sullivan’s dissertation at UCSD, which introduced a new class of network topologies called ‘Aspen trees,’ named after Aspen trees in nature, which share a common root system.
Large-scale data center infrastructures typically use a multi-rooted, 'fat tree' topology, which provides diverse yet short paths between end hosts. A drawback of this type of topology is that a single link failure can disconnect a portion of the network’s hosts for a substantial period of time (while updated routing information propagates to every switch in the tree). According to the CoNEXT paper, this shortcoming makes the fat tree less suited for use in data centers that require the highest levels of availability. Alternatively, Aspen tree topologies can provide the high throughput and path multiplicity of current data center network topologies, while also allowing a network operator to select a particular point on the spectrum of scalability, network size, and fault tolerance – affording data center operators the ability to react to failures locally.
Walraed-Sullivan outlined a corresponding failure-notification protocol, ANP, whose "notifications require less processing time, travel shorter distances, and are sent to fewer switches, significantly reducing re-convergence time and control overhead in the wake of a link failure or recovery.“ The paper spelled out a the tradeoffs among fault tolerance, scalability and network costs for data centers using an Aspent tree topology, concluding that “Aspen trees provide decreased convergence times to improve a data center’s availability, at the expense of scalability (e.g., reduced host count) or financial cost (e.g., increased network size).”
Jacobs School of Engineering