Overcoming Tail Latency with Netronome SmartNICs

By Netronome | Jul 11, 2019

Netronome delivers smart network solutions for today’s most demanding problems. One problem that continues to plague hyperscalers is tail latency or slow data delivery, which is critical for Web 2.0 applications. This blog and video explain why tail latency is happening, why it is important for hyperscalers to overcome its challenges, and of course, how Netronome can solve the problem of high tail latency.

Web 2.0 apps, like Facebook, Twitter or LinkedIn are not just a single application. They rely on a multitude of sub-applications or services that run in parallel in order to provide a complete web experience. These services can include location services, image serving, video streaming, trending stories and ad serving, each with their own application stack and data located on disaggregated storage. These services must complete their operation before the final product (the Web 2.0 app) is delivered to the customer, which has to happen within milliseconds. The slowest service to get all of the data slows down the entire user experience.

The OCP Yosemite v2 server platform is highly optimized to satisfy stringent power, performance and cost requirements that are critical for Web 2.0 applications. Yosemite’s form factor, hosting four 12-core Intel Xeon-D CPUs in a single sled, and 4 sleds per 4U chassis, provides an ultra-low power server with excellent density. Sixteen CPUs or servers are compacted into a 4U form factor. As with all designs, there are tradeoffs. An inherent architecture issue with the Yosemite v2 server platform is on the network side. Yosemite servers have only one network port, which is shared between the four CPUs on a single Yosemite sled, which creates a congestion point.


Yosemite Blog


At peak times, the CPUs can require more data than the single network port, running at 50Gb/s, can provide. This congestion causes significant packet drops, which require re-transmission of data and lead to untenable tail latency that stalls the application. Ultimately, the end result is a less than desired user experience.

Reducing Tail Latency with SmartNICs and NFP
At the core of the Netronome Agilio SmartNIC is our Network Flow Processor (NFP). Offering up to 17MB of on-chip memory, the NFP can be utilized to enable network buffering, and because Netronome SmartNICs are programmable, our on-chip buffering is used to predictively set the Explicit Congestion Notification (ECN) flag, telling the network to proactively reduce packet flow, thereby eliminating packet drops and the resulting re-transmission of data. Because each packet gets to its intended host without a drop or re-transmit, tail latency is significantly reduced and Web 2.0 application performance is increased.


Yosemite Blog 2


Through a series of experiments using a number of Facebook benchmarks, Netronome has shown that using Adaptive Buffer Management on the Netronome Agilio SmartNIC effectively eliminates packet drops. The SmartNIC also reduces tail latency by 30X when measuring 99.9% of packets transmitted, enabling the Yosemite platform to handle more applications and transactions. More applications on a single server means more efficient data center operation and maximized performance per each Intel Xeon-D CPU.

Yosemite 3

Below is a short video based on this demonstration that was showcased at the Netronome booth during the OCP Global Summit.



During the OCP Global Summit 2019, Netronome and Facebook presented, “DCTCP in the OCP Data Center,” which highlighted the eliminated packet drops using adaptive buffer management, which is available only on the OCP-inspired Netronome Agilio 50GbE SmartNIC. The slides can be downloaded here and the session can be viewed below.