OpenSSH Key Shielding 18 December 2019 1. How key shielding works On June 21, 2019, support for SSH key shielding was intro‐ duced into the OpenBSD tree, from which the OpenSSH releases are derived. SSH key shielding is a measure intended to protect private keys in RAM against attacks that abuse bugs in speculative execution that current CPUs exhibit.[0] This functionality has been part of OpenSSH since the 8.1 re‐ lease. SSH private keys are now being held in memory in a shielded form; keys are only unshielded when they are used and re‐shielded as soon as they are no longer in active use. When a key is shielded, it is encrypted in memory with AES‐256‐CTR; this is how it works: 1. A prekey is generated, which is 16 KiB of random bytes obtained through arc4random_buf(3). 2. The prekey is then hashed using SHA‐512, of which the first 32 bytes form the encryption key and the next 16 bytes form the IV (CTR). 3. The private key is serialized. 4. The serialized private key is padded to the cipher block size (16 bytes). 5. The serialized private key is then encrypted using AES‐256‐CTR with the parameters determined in steps 1 and 2. 6. The SSH key struct is replaced with one that only con‐ tains the public key, the encrypted private key and the prekey. 7. All secrets that were handled are zeroed: the cipher context, the derived key, the derived IV, the old SSH key structs and the serialized private key. In short, 16 KiB of random data are hashed to derive an en‐ cryption key and IV which are then used to encrypt the key in memory. ‐2‐ 2. Thoughts on the prekey Because cryptographic hash functions exhibit the avalanche effect,[1] getting one bit wrong will result in a completely different hash. Every time the key is used, a new prekey is generated, so any kind of progress on exfiltrating the prekey is lost every time the key is actually used. However, there is an attractive goal with significantly less state than 16 KiB: the random number generator. The arc4random_buf(3) random number generator operates largely in userspace. It gets entropy either from OpenSSL (if linked with OpenSSL) or from the operating system (the lat‐ ter is always true on OpenBSD); external entropy to seed it‐ self is obtained on initialization and thereafter only every 1600000 bytes (1.6 MB). Its state consists of only 64 bytes (namely, it consists of a ChaCha20 context, see openbsd‐com‐ pat/arc4random.c and openbsd‐compat/chacha_private.h in OpenSSH‐portable). Once recovered, it becomes fairly triv‐ ial to anticipate the prekey by generating all possible start/end patterns of the generated random bytes until de‐ cryption with the generated key and CTR succeeds. I’m not sure if this is practical, however. While 64 bytes is significantly less data than 16 KiB, it’s still a decent amount of data to be extracted with limited verifi‐ ability: It is hard to locate in memory as it is pseudo‐ran‐ dom, and checking the actual output of the random state is likely to be difficult. Chances may be that the ChaCha20 state has already changed by the time all the required bits to reconstruct it have been obtained. And all of that as‐ sumes the side channel attacks do not require execution to actually execute the code paths interacting with the state more than once: All code paths that lead up to accessing the ChaCha20 state are also destructive, so all data must be ex‐ filtrated in one go to get all of the new state before it is lost on the next invocation. Furthermore, a busy server will likely have torn through the 1.6 MB of random data and caused fresh data from the operating system to be retrieved as well. There is also an in‐memory buffer of random bytes, which consists of 1024 bytes. This is (several times) less than the size of the prekey. Extracting it is useless un‐ less all of it can be extracted several times in succession while the prekey generation is taking place, which strikes me as difficult. The random bytes in the buffer are also replaced with zeroes after they are consumed. (Disclaimer: I am not very well‐versed in the intrica‐ cies and practicability of exploitation of speculative exe‐ cution vulnerabilities. Corrections would be greatly appre‐ ciated!) ‐3‐ 3. Cryptographic notes Notably, there is no authentication of the encrypted key; I’d imagine that authentication is not necessary because modification of memory is not part of the threat model (key shielding tries to guard against key exfiltration through limited side channels). They do, however, check the success of the deserialization and for some reason the validity of the padding as well. Padding the serialized key is not necessary for AES‐256‐CTR, as CTR mode effectively turns AES into a stream cipher. That the serialized key is padded is likely because the OpenSSH project may be planning to swap out the cipher algorithm later down the road; this is suggested by a com‐ ment in the code: #define SSHKEY_SHIELD_CIPHER "aes256‐ctr" /* XXX want AES‐EME */ I can only speculate why AES‐EME is not actually used. Perhaps it proved to be too computationally expensive as it requires two invocations of AES per block; perhaps the au‐ thors were simply unaware that the EME patent application had been abandoned.[2] 4. Leftover data and blind spots While keys are mostly stored in encrypted memory, there is still a brief moment left during which attacks using specu‐ lative execution could take place, namely in the brief peri‐ ods of time when the keys are unshielded to be actually used. I assume that these will be significantly harder to mount, however. There may also be other leftovers of the key data in other places, such as the CPU cache. While ex‐ plicit_bzero(3) guarantees to clear the given block of mem‐ ory by overwriting it with zeroes, compilers make no guaran‐ tee that there are no extraneous copies of data. Stronger guarantees regarding clearing important data would be help‐ ful in this area, both on a language standard level for C and C++ as well as on a compiler level (e.g. in LLVM, for other languages like Rust). 5. Calls to action I would personally suggest that all applications handling important or critical key data shield their keys in a simi‐ lar manner wherever feasible, despite possible shortcomings of the method. This may be inhibited by performance (I would imagine that a web server would be able to serve a considerably smaller amount of requests if it had to shield and unshield the certificate private keys for every request) ‐4‐ or other resource constraints. Finally, I strongly urge to consider hardware tokens and hardware security modules for all non‐trivial key data wherever possible. OpenSSH has been making steps in the di‐ rection of allowing host keys and client keys to be backed by security keys.[3] To the greatest extent possible under applicable law, I have waived all copyright and related or neighboring rights to this blog post under the CC0 1.0 Universal Public Domain Dedication; see for details: https://creativecom‐ mons.org/publicdomain/zero/1.0/legalcode