OpenSSH Key Shielding










                   OpenSSH Key Shielding


                      18 December 2019




1.  How key shielding works

On  June  21, 2019, support for SSH key shielding was intro‐
duced into the OpenBSD tree, from which the OpenSSH releases
are  derived.   SSH  key  shielding is a measure intended to
protect private keys in RAM against attacks that abuse  bugs
in  speculative execution that current CPUs exhibit.[0] This
functionality has been part of OpenSSH  since  the  8.1  re‐
lease.   SSH  private keys are now being held in memory in a
shielded form; keys are only unshielded when they  are  used
and re‐shielded as soon as they are no longer in active use.
When a key is shielded,  it  is  encrypted  in  memory  with
AES‐256‐CTR; this is how it works:

1.   A  prekey is generated, which is 16 KiB of random bytes
     obtained through arc4random_buf(3).

2.   The prekey is then hashed using SHA‐512, of  which  the
     first  32  bytes  form  the encryption key and the next
     16 bytes form the IV (CTR).

3.   The private key is serialized.

4.   The serialized private key  is  padded  to  the  cipher
     block size (16 bytes).

5.   The  serialized  private  key  is  then encrypted using
     AES‐256‐CTR with the parameters determined in  steps  1
     and 2.

6.   The  SSH key struct is replaced with one that only con‐
     tains the public key, the encrypted private key and the
     prekey.

7.   All  secrets  that  were handled are zeroed: the cipher
     context, the derived key, the derived IV, the  old  SSH
     key structs and the serialized private key.

In  short, 16 KiB of random data are hashed to derive an en‐
cryption key and IV which are then used to encrypt  the  key
in memory.











                             ‐2‐


2.  Thoughts on the prekey

Because  cryptographic  hash functions exhibit the avalanche
effect,[1] getting one bit wrong will result in a completely
different hash.  Every time the key is used, a new prekey is
generated, so any  kind  of  progress  on  exfiltrating  the
prekey is lost every time the key is actually used.

     However, there is an attractive goal with significantly
less state than 16 KiB: the random  number  generator.   The
arc4random_buf(3)  random  number generator operates largely
in userspace.  It  gets  entropy  either  from  OpenSSL  (if
linked  with OpenSSL) or from the operating system (the lat‐
ter is always true on OpenBSD); external entropy to seed it‐
self is obtained on initialization and thereafter only every
1600000 bytes (1.6 MB).  Its state consists of only 64 bytes
(namely, it consists of a ChaCha20 context, see openbsd‐com‐
pat/arc4random.c  and   openbsd‐compat/chacha_private.h   in
OpenSSH‐portable).   Once recovered, it becomes fairly triv‐
ial to anticipate the  prekey  by  generating  all  possible
start/end  patterns  of the generated random bytes until de‐
cryption with the generated key and CTR succeeds.

     I’m not sure if  this  is  practical,  however.   While
64  bytes is significantly less data than 16 KiB, it’s still
a decent amount of data to be extracted with limited verifi‐
ability: It is hard to locate in memory as it is pseudo‐ran‐
dom, and checking the actual output of the random  state  is
likely  to  be  difficult.  Chances may be that the ChaCha20
state has already changed by the time all the required  bits
to  reconstruct  it have been obtained.  And all of that as‐
sumes the side channel attacks do not require  execution  to
actually  execute  the code paths interacting with the state
more than once: All code paths that lead up to accessing the
ChaCha20 state are also destructive, so all data must be ex‐
filtrated in one go to get all of the new state before it is
lost  on  the  next  invocation.  Furthermore, a busy server
will likely have torn through the 1.6 MB of random data  and
caused  fresh data from the operating system to be retrieved
as well.

     There is also an  in‐memory  buffer  of  random  bytes,
which  consists of 1024 bytes.  This is (several times) less
than the size of the prekey.  Extracting it is  useless  un‐
less  all of it can be extracted several times in succession
while the prekey generation is taking place,  which  strikes
me  as  difficult.   The random bytes in the buffer are also
replaced with zeroes after they are consumed.

     (Disclaimer: I am not very well‐versed in the  intrica‐
cies  and practicability of exploitation of speculative exe‐
cution vulnerabilities.  Corrections would be greatly appre‐
ciated!)










                             ‐3‐


3.  Cryptographic notes

Notably,  there  is  no authentication of the encrypted key;
I’d imagine that authentication  is  not  necessary  because
modification  of memory is not part of the threat model (key
shielding tries to guard against  key  exfiltration  through
limited side channels).  They do, however, check the success
of the deserialization and for some reason the  validity  of
the padding as well.

     Padding   the  serialized  key  is  not  necessary  for
AES‐256‐CTR, as CTR mode effectively turns AES into a stream
cipher.  That the serialized key is padded is likely because
the OpenSSH project may be planning to swap out  the  cipher
algorithm  later  down the road; this is suggested by a com‐
ment in the code:

#define SSHKEY_SHIELD_CIPHER  "aes256‐ctr" /* XXX want AES‐EME */


     I can only speculate why AES‐EME is not actually  used.
Perhaps  it proved to be too computationally expensive as it
requires two invocations of AES per block; perhaps  the  au‐
thors  were  simply  unaware that the EME patent application
had been abandoned.[2]

4.  Leftover data and blind spots

While keys are mostly stored in encrypted memory,  there  is
still  a brief moment left during which attacks using specu‐
lative execution could take place, namely in the brief peri‐
ods  of  time  when  the  keys are unshielded to be actually
used.  I assume that these will be significantly  harder  to
mount, however.

     There  may  also  be other leftovers of the key data in
other  places,  such  as   the   CPU   cache.    While   ex‐
plicit_bzero(3)  guarantees to clear the given block of mem‐
ory by overwriting it with zeroes, compilers make no guaran‐
tee  that  there are no extraneous copies of data.  Stronger
guarantees regarding clearing important data would be  help‐
ful  in  this  area, both on a language standard level for C
and C++ as well as on a compiler level (e.g.  in  LLVM,  for
other languages like Rust).

5.  Calls to action

I  would  personally  suggest that all applications handling
important or critical key data shield their keys in a  simi‐
lar  manner wherever feasible, despite possible shortcomings
of the method.  This may  be  inhibited  by  performance  (I
would  imagine  that  a  web server would be able to serve a
considerably smaller amount of requests if it had to  shield
and unshield the certificate private keys for every request)









                             ‐4‐


or other resource constraints.

     Finally, I strongly urge to  consider  hardware  tokens
and  hardware  security modules for all non‐trivial key data
wherever possible.  OpenSSH has been making steps in the di‐
rection  of  allowing host keys and client keys to be backed
by security keys.[3]

     To the greatest extent possible under applicable law, I
have  waived all copyright and related or neighboring rights
to this blog post under the CC0 1.0 Universal Public  Domain
Dedication;    see    for    details:   https://creativecom‐
mons.org/publicdomain/zero/1.0/legalcode