Subject: sntrup761 browsing demo: 166000 cycles keygen From: "D. J. Bernstein" Date: Thu, 16 Apr 2020 08:48:35 +0200 To: pqc-forum@list.nist.gov Message-ID: <20200416064835.26539.qmail@cr.yp.to> I'm pleased to announce software online for a demo of web browsing taking just 166000 Haswell cycles to generate a new one-time sntrup761 public key for each TLS 1.3 session. This demo uses (1) the Gnome web browser (client) and stunnel (server) using (2) a patched version of OpenSSL 1.1.1f using (3) a new OpenSSL ENGINE using (4) a new sntrup761 library. This is joint work. Authors in alphabetical order: Daniel J. Bernstein, Billy Bob Brumley, Ming-Shing Chen (leader for #4), and Nicola Tuveri (leader for #3). Email address: authorcontact-opensslntru@box.cr.yp.to. The new speed is much faster than previously announced speeds for sntrup761 keygen. In combination with the (recently announced) 48780 Haswell cycles for enc and 59120 Haswell cycles for dec, this new keygen speed means a total of just 273900 cycles for sntrup761 keygen+enc+dec. The TLS 1.3 integration here uses the same basic data flow as the CECPQ2 experiment carried out by Google and Cloudflare: the client generates a one-time public key, the server encapsulates to that one-time key, and the client decapsulates, obtaining a one-time session key. Beware that this data flow is designed only to protect against attacks by future quantum computers ("transitional" security); stopping active attacks will also require long-term post-quantum identity keys. CECPQ2 used (a minor variant of) ntruhrss701 for this data flow. The state-of-the-art (March 2020) software for ntruhrss701 takes 272028 cycles for keygen, 26116 cycles for enc, and 63632 cycles for dec, for a total of 361776 cycles. The CECPQ2 experiments showed that ntruhrss701's CPU time consumes very little of the overall TLS time. The new sntrup761 software here consumes even less time. The CECPQ2 experiments showed a somewhat more noticeable impact of network traffic on the slowest connections; sntrup761 sends 2197 bytes (one-time key+ciphertext) where ntruhrss701 sends 2276. Here's the comparison table (all numbers are from SUPERCOP except for the new 166000 for sntrup761 keygen): sntrup761 ntruhrss701 public-key bytes 1158 1138 ciphertext bytes 1039 1138 pk+ciphertext bytes 2197 2276 keygen cycles 166000 272028 enc cycles 48780 26116 dec cycles 59120 63632 keygen+enc+dec cycles 273900 361776 1000*bytes+cycles 2470900 2637776 This should put an end to the idea that sntrup761 keygen is too slow for TLS. Both sntrup761 and ntruhrss701 are designed for IND-CCA2 security, as recommended in most of the NISTPQC lattice submissions and in Google's CECPQ2 announcement: "CCA2-security is worthwhile, even though TLS can do without. ... CPA vs CCA security is a subtle and dangerous distinction, and if we're going to invest in a post-quantum primitive, better it not be fragile." Taking away IND-CCA2 security would speed up both ntruhrss701 and sntrup761 by removing some hashing and removing (basically) a copy of enc from dec. For comparison, Google's earlier CECPQ1 experiment used an early non-IND-CCA-secure version of newhope1024, with approximately 200000 cycles of total computation and more than 4000 bytes of network traffic (more than 4 million in the 1000*bytes+cycles metric), and concluded that this "would be practical to quickly deploy". Algorithmically, the new sntrup761 keygen speed comes from generating 32 independent keys at once, using Montgomery's trick for batch inversion. This option has been pointed out before. The new demo shows that this option fits into a CECPQ2-type data flow in TLS 1.3. The total latency of generating 32 keys is around two milliseconds; even better, keys can be generated in advance of being used, reducing the impact on TLS latency to zero (with or without Montgomery's trick). Of course, one still has to generate each new key at some point, but the new sntrup761 software shows that Montgomery's trick provides excellent throughput. Montgomery's trick replaces each batch of inversions with one shared inversion and a batch of multiplications. In the context here, there is a batch of 32 inversions mod q and a batch of 32 inversions mod 3, using 1 shared inversion mod q and 1 shared inversion mod 3. Out of the 166000 cycles per key for a batch of 32 keys, about 30000 cycles per key are spent on the shared inversions, and simply increasing the batch size further reduces this cost. With slightly more work it is possible to share transforms across various multiplications. Consequently, the current software speed is not the limit of what can be achieved. One can also use Montgomery's trick for some other NISTPQC submissions that rely on inversion as part of keygen, but the dramatic speedup for sntrup761 doesn't imply a similarly dramatic (or even nonzero) speedup for those other submissions. In particular, the current ntruhrss701 keygen already exploits the power-of-2 structure of its q for a Hensel lift. In the Montgomery context, the Hensel speedup rapidly vanishes, while multiplication speeds and other overheads become more important. There's _some_ gap between ntruhrss701 and sntrup761 in multiplication speed (sntrup761 aims for a higher security level, uses larger polynomials, and requires a field) but this is only about 8000 cycles per multiplication with the current software. Demo instructions appear below. ---Dan ### Demo overview Warning: This demo comes with no cryptographic warranties and no other security warranties. The software here is experimental, and is built upon other software with a long history of security problems, such as OpenSSL. The purpose of this demo is purely to show the sntrup761 performance achievable with a CECPQ2-type data flow for TLS 1.3. The demo has two parts: a server side and a client side. We recommend running each side in its own VM. The server side uses stunnel for SSL termination. It receives TLS connections, including sntrup761 connections, and passes along the answers provided by a preexisting back-end web server, which does not need to support sntrup761 connections. For example, the demo site https://test761.cr.yp.to looks just like the preexisting site https://ntruprime.cr.yp.to, but with the extra feature of supporting sntrup761 connections. Internally, https://test761.cr.yp.to passes requests along through a local connection to the preexisting back-end web server for ntruprime.cr.yp.to. You can use https://test761.cr.yp.to as the server side of this demo, or you can set up the server side for a web server of your choice. The client side uses Epiphany, the Gnome web browser, with no modifications to the Epiphany source code. The glib-networking library used inside Epiphany already supports OpenSSL as an option for outgoing connections, and is configured below to use this option. Both sides use a version of OpenSSL 1.1.1f patched inside libssl to support sntrup761 as experimental group 0xfe00 for TLS 1.3, and patched inside libcrypto to include a reference implementation of sntrup761. Our new engntru library then overrides this reference implementation with a fast implementation, which in turn is built on top of our new libsntrup761. This way of using the OpenSSL ENGINE feature allows OpenSSL to take advantage of fast software implementations while allowing those implementations to be developed in separate libraries; see https://eprint.iacr.org/2018/354. Various other applications that use OpenSSL have been verified to work with libsntrup761 via engntru. This demo focuses on stunnel on the server side and Epiphany on the client side. ### Server side The following instructions for setting up the server side have been tested in a VM running Debian 11 (Bullseye) on a CPU supporting AVX2. You can skip down to the client side if you simply want to try https://test761.cr.yp.to as the server. As root: apt install wget python3 build-essential clang cmake ruby pkg-config -y adduser --disabled-password --gecos opensslntru opensslntru As the new opensslntru user (change the first three lines for your own demo server name, demo server address, and preexisting back-end server address---of course, you should use your favorite VPN to protect the connection from this SSL terminator to the back-end server): EXTERNALNAME=test761.cr.yp.to EXTERNALADDRESS=1.2.3.4:65024 # provide TLS service on this address INTERNALADDRESS=5.6.7.8:80 # use existing server on this address export PATH=$HOME/bin:$PATH cd wget https://www.openssl.org/source/openssl-1.1.1f.tar.gz wget https://ntruprime.cr.yp.to/opensslntru/openssl-1.1.1f-ntru.patch tar -xf openssl-1.1.1f.tar.gz mv openssl-1.1.1f openssl-1.1.1f-ntru cd openssl-1.1.1f-ntru patch -p1 < ../openssl-1.1.1f-ntru.patch ./config shared --prefix=$HOME --openssldir=$HOME -Wl,-rpath=$HOME/lib make -j8 # a few minutes make test # more minutes make install_sw cd wget https://ntruprime.cr.yp.to/opensslntru/libsntrup761-20200415.tar.gz tar -xf libsntrup761-20200415.tar.gz cd libsntrup761-20200415 env USE_RPATH=RUNPATH DESTDIR=$HOME CPATH=$HOME/include LIBRARY_PATH=$HOME/lib make all install test cd wget https://ntruprime.cr.yp.to/opensslntru/engntru-20200415.tar.gz tar -xf engntru-20200415.tar.gz cd engntru-20200415 mkdir build cd build cmake -DCMAKE_PREFIX_PATH="$HOME;$HOME/usr/local" .. make make test make install cd wget https://www.stunnel.org/downloads/stunnel-5.56.tar.gz tar -xf stunnel-5.56.tar.gz cd stunnel-5.56 ./configure --prefix=$HOME --with-ssl=$HOME LDFLAGS=-Wl,-rpath=$HOME/lib make make install cd mkdir service cd service openssl req -x509 -sha256 -nodes -newkey rsa:2048 -keyout "$EXTERNALNAME.key" -days 730 -out "$EXTERNALNAME.crt" -subj "/CN=$EXTERNALNAME" -config /etc/ssl/openssl.cnf ( echo "key = $EXTERNALNAME.key" echo "cert = $EXTERNALNAME.crt" echo 'foreground = yes' echo 'engine = engntru' echo 'engineDefault = ALL' echo '[forward]' echo "accept = $EXTERNALADDRESS" echo "connect = $INTERNALADDRESS" echo 'curves = SNTRUP761:X25519:P-256' echo 'config = MinProtocol:TLSv1.2' echo 'ciphers = ECDHE+CHACHA20:ECDHE+AES256:ECDHE+AES128:!aNULL:!eNULL:!LOW:!EXPORT:!DES:!3DES:!RC4:!MD5:!PSK:!SRP:!DSS:!aECDSA' ) > stunnel.conf As root: ( echo '[Unit]' echo 'Description=opensslntru forwarding' echo 'DefaultDependencies=no' echo 'After=network.target' echo '' echo '[Service]' echo 'Type=simple' echo 'User=opensslntru' echo 'Group=opensslntru' echo 'WorkingDirectory=/home/opensslntru/service' echo 'ExecStart=/home/opensslntru/bin/stunnel stunnel.conf' echo '' echo '[Install]' echo 'WantedBy=default.target' ) > /etc/systemd/system/opensslntru.service systemctl restart opensslntru At this point the server should be working. Try any browser to connect to the server's external address. The certificate is self-signed; signing it with Let's Encrypt is recommended but is outside the scope of these instructions. This stunnel configuration passes SNI along from the client to the server, so the client is free to access any server name provided by the server. For example, almost all *.cr.yp.to are hosted on the same back-end server and can now be retrieved through sntrup761, although for the moment this is announced to the client (and signed) only for test761.cr.yp.to. You can advertise multiple names on the same server through the same stunnel configuration by adding those names to DNS and creating an appropriate certificate. You can instead configure stunnel to forward different SNI choices to different servers with different certificates. ### Client side The following instructions for setting up the client side have been tested in a VM running Debian 10 (Buster) on a CPU supporting AVX2. As root: apt install wget python3 build-essential clang cmake \ ruby pkg-config epiphany-browser meson gnome-pkg-tools \ libglib2.0-dev libproxy-dev \ gsettings-desktop-schemas-dev ca-certificates -y adduser --disabled-password --gecos opensslntru opensslntru As the new opensslntru user: export PATH=$HOME/bin:$PATH cd wget https://www.openssl.org/source/openssl-1.1.1f.tar.gz wget https://ntruprime.cr.yp.to/opensslntru/openssl-1.1.1f-ntru.patch tar -xf openssl-1.1.1f.tar.gz mv openssl-1.1.1f openssl-1.1.1f-ntru cd openssl-1.1.1f-ntru patch -p1 < ../openssl-1.1.1f-ntru.patch ./config shared --prefix=$HOME --openssldir=$HOME -Wl,-rpath=$HOME/lib make -j8 # a few minutes make test # more minutes make install_sw cd wget https://ntruprime.cr.yp.to/opensslntru/libsntrup761-20200415.tar.gz tar -xf libsntrup761-20200415.tar.gz cd libsntrup761-20200415 env USE_RPATH=RUNPATH DESTDIR=$HOME CPATH=$HOME/include LIBRARY_PATH=$HOME/lib make all install test cd wget https://ntruprime.cr.yp.to/opensslntru/engntru-20200415.tar.gz tar -xf engntru-20200415.tar.gz cd engntru-20200415 mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_PREFIX_PATH="$HOME;$HOME/usr/local" .. make make test make install cd git clone --branch 2.60.2 https://gitlab.gnome.org/GNOME/glib-networking.git cd glib-networking mkdir build cd build env PKG_CONFIG_PATH=$HOME/lib/pkgconfig CPATH=$HOME/include LIBRARY_PATH=$HOME/lib meson --prefix=$HOME -Dopenssl=enabled -Dgnutls=disabled .. ninja ninja install cd wget https://ntruprime.cr.yp.to/opensslntru/openssl-engntru.cnf export OPENSSL_CONF=$HOME/openssl-engntru.cnf export LD_LIBRARY_PATH=$HOME/lib export GIO_MODULE_DIR=$HOME/lib/x86_64-linux-gnu/gio/modules export ENGNTRU_DEBUG=4 # to watch engntru activating ln -s /etc/ssl/certs $HOME/certs epiphany https://test761.cr.yp.to You should be able to browse to this demo server (using sntrup761), whichever other demo servers you set up above (using sntrup761), and other sites (typically not using sntrup761 yet). The ENGNTRU_DEBUG=4 log information in the terminal includes a note for each sntrup761 keygen, a note for each sntrup761 dec, and a note for each computation of a batch of 32 keys.