Traverse Ten64

by Traverse Technologies

An eight-core ARM64 networking platform with mainline Linux support

View all updates Nov 16, 2020

Crypto & AI Acceleration

by Mathew M

One key differentiator of Ten64 from general-purpose and media-oriented appliances is the networking-oriented acceleration capabilities of Ten64’s LS1088 System-on-Chip.

The previous 10G Options & Performance post described some of the options available to improve packet routing performance - all the way up to the programmable offload engine (AIOP).

There are two other workloads you can accelerate on Ten64. In this post, we will describe how Ten64 can accelerate cryptography (important for VPNs) and AI workloads using an AI acceleration card.

Cryptographic & VPN Acceleration

The LS1088 SoC provides two separate methods of cryptography acceleration:

Method 1: Acceleration via the the ARMv8 cryptography extension

This provides acceleration for AES, and SHA-1,-224 and SHA-256. It is analogous to the AES-NI in most modern x86 processors. This is an optional extension which is not present on all ARM-powered processors, but is present on the LS1088. You can check if it is available on your ARM machine by looking at the flags in cpuinfo:

Ten64 supports AES, SHA1, SHA2, and PMULL (polynomial long multiply)
$ cat /proc/cpuinfo  | grep Features | head -n 1
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
Raspberry Pi 3 and 4 do not implement the crypto extension
$ cat /proc/cpuinfo  | grep Features | head -n 1
Features        : fp asimd evtstrm crc32 cpuid

To illustrate the difference, we ran OpenSSL’s speed benchmark on the Ten64, and Raspberry Pi 3 and 4. The Raspberry Pi 3 also uses the Cortex-A53 core (like the LS1088), but does not have the ARMv8 crypto extension.

The newer Raspberry Pi 4 uses the Cortex-A72 - a faster, out-of-order core, but also lacks the cryptography extension.

As we can see, the lack of AES acceleration is a major handicap — the LS1088 is 18-22x faster in this particular use case.

The ARMv8 Cryptography extension is used by OpenSSL, wolfSSL, and through the arm/sha*-ce kernel modules in the Linux kernel, so most applications using these libraries should be able to take advantage of them.

Method 2: Acceleration via the NXP SEC engine

The NXP SEC engine (also known as CAAM) is NXP’s encryption acceleration block. It is designed to accelerate communications workloads like IPSec, as well as some earlier versions of TLS and ciphers used in standards such as 3G/UMTS (Kasumi, Snow) and Wi-Fi. It also implements some older, but still relevant standards such as RSA and 3DES.

SEC engine is best at accelerating packets to/from the network stack in the kernel (or similar environments such as DPDK). There are higher latencies as data packets need to be transferred in and out of it via DMA, rather than the ARMv8 crypto extensions, which are part of the CPU instruction set.

It is possible to use SEC from userspace, using mechanisms such as cryptodev, but you might end up with better performance using the CPU instructions.

IPSec throughput comparison between ARMv8 crypto and SEC engine. We anticipate the SEC engine throughput can be improved even further in the future.

Nonetheless, you can get some impressive performance from the SEC engine for IPSec workloads, because it can accelerate not only the encryption cipher but also a chain of related operations such as AEAD and HMAC, as can be seen in /proc/crypto when the SEC drivers are compiled into the kernel:

cat /proc/crypto | grep aes | grep caam
driver       : cmac-aes-caam
driver       : xcbc-aes-caam
driver       : seqiv-authenc-hmac-sha512-rfc3686-ctr-aes-caam
driver       : authenc-hmac-sha512-rfc3686-ctr-aes-caam
driver       : seqiv-authenc-hmac-sha384-rfc3686-ctr-aes-caam
driver       : authenc-hmac-sha384-rfc3686-ctr-aes-caam
driver       : seqiv-authenc-hmac-sha256-rfc3686-ctr-aes-caam
driver       : authenc-hmac-sha256-rfc3686-ctr-aes-caam
driver       : seqiv-authenc-hmac-sha224-rfc3686-ctr-aes-caam
driver       : authenc-hmac-sha224-rfc3686-ctr-aes-caam
driver       : seqiv-authenc-hmac-sha1-rfc3686-ctr-aes-caam
driver       : authenc-hmac-sha1-rfc3686-ctr-aes-caam
driver       : seqiv-authenc-hmac-md5-rfc3686-ctr-aes-caam
driver       : authenc-hmac-md5-rfc3686-ctr-aes-caam
driver       : echainiv-authenc-hmac-sha512-cbc-aes-caam
driver       : authenc-hmac-sha512-cbc-aes-caam
driver       : echainiv-authenc-hmac-sha384-cbc-aes-caam
driver       : authenc-hmac-sha384-cbc-aes-caam
driver       : echainiv-authenc-hmac-sha256-cbc-aes-caam
driver       : authenc-hmac-sha256-cbc-aes-caam
driver       : echainiv-authenc-hmac-sha224-cbc-aes-caam
driver       : authenc-hmac-sha224-cbc-aes-caam
driver       : echainiv-authenc-hmac-sha1-cbc-aes-caam
driver       : authenc-hmac-sha1-cbc-aes-caam
driver       : echainiv-authenc-hmac-md5-cbc-aes-caam
driver       : authenc-hmac-md5-cbc-aes-caam
driver       : gcm-aes-caam
driver       : rfc4543-gcm-aes-caam
driver       : rfc4106-gcm-aes-caam
driver       : ecb-aes-caam
driver       : xts-aes-caam
driver       : rfc3686-ctr-aes-caam
driver       : ctr-aes-caam
driver       : cbc-aes-caam

(For a full output from /proc/crypto, see the cryptographic acceleration page in the Ten64 manual.)

IPSec may not be the easiest VPN solution to use (especially in the face of alternatives like OpenVPN and Wireguard) but this is balanced by its ubiquitous nature (as many operating systems and network appliances implement it) and ability to leverage hardware offloads such as the SEC engine.

AI acceleration

Those of you interested in machine learning and AI may be interested to know that the Coral AI EdgeTPU cards work in the Ten64. The Coral PCIe cards are available in both Mini PCIe and M.2.

The Coral Mini PCIe card installed on a Ten64 board

While we haven’t had an opportunity to piece together an AI/ML demo of our own, the TensorFlow Lite image classification example shows an impressive speedup:

Unaccelerated (CPU only)

----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
140.4ms
138.9ms
139.1ms
139.3ms
139.3ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.77734

EdgeTPU accelerated

----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
12.6ms
2.5ms
2.4ms
2.4ms
2.4ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.77734

That is an over 50x speedup - which opens up possibilities involving real-time processing, such as classifying objects from a video feed.

For information on how to setup a development environment for the Coral EdgeTPU, see our application note.

About the Author

Mathew M

mcbridematt  ·   Melbourne, Australia


$118,491 raised

of $60,000 goal

197% Funded! Order Below

Product Choices

$650

Ten64 Complete Kit

You get a a fully assembled and tested Ten64 mainboard installed in a custom metal enclosure with a fan, 60 W power supply with regional power cord, a USB-C console cable, a recovery microSD card, a SIM eject tool, and a hex key, as you'd expect with any good piece of hardware. RAM with ECC not included.


$56

NVMe SSD

This SanDisk solid-state drive (SSD) fits inside the standard Ten64 enclosure and interfaces to the mainboard via NVMe. The 128 GB drive (P/N SDAPMUW-128G-1022) is compatible with both the M.2 Key M and M.2 Key B slots on Ten64's mainboard, whereas the 256 GB drive (P/N SDBPNPZ-256G) and 512 GB drive (P/N SDBPNPZ-512G) are only compatible with the M.2 Key M slot. These drives are only available when purchased with a Ten64. User installation required.


$70

NAS-grade SATA 2.5" SSD

These NAS-grade solid state drives (SSDs) are rated to last much longer than consumer models, so are perfect for NAS bulk storage. Choose from 256 GB (AP256GPPSS25-R), 512 GB (AP512GPPSS25-R), and 1 TB (AP1TPPSS25-R) capacities. These drives are only available when purchased with a Ten64. User installation required.


$4

Flexible SATA Cable

One flexible cable (3M part number 5602-44-0142A-300) for connecting a SATA drive to a SATA controller board. You will need one cable per drive. Free shipping only available when shipped with another Ten64 product.

Credits

Traverse Technologies

Traverse is a design house focusing on broadband and machine-to-machine applications. Our key areas of expertise are in wireline (xDSL), wireless (LTE), and embedded Linux with an aim to leverage open source technologies such as Linux and OpenWrt as much as possible.


Guy Ellis

SI and DFM Engineer

Mathew McBride

Product Architect

Brett Hahnel

PCB Layout and CAD

Sean Yang

SW Developer

Dennis Monks

SW Dev Leader

Vaughn Coetzee

Firmware Developer


SRXGlobal

Recommended

Contract Manufacturer

See Also

Subscribe to the Crowd Supply newsletter, highlighting the latest creators and projects: