The aarch64-dit crate has two notable flaws:
- It is possible for the compiler to reorder operations outside the critical section where DIT is enabled.
- It does not perform speculation fencing, which is documented as necessary at least on apple platforms in this document: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Enable-DIT-for-constant-time-cryptographic-operations.
Flaw 1: Compiler Reordering
Here's an example of the first flaw: https://godbolt.org/z/8fdsnxrc5. Essentially the compiler is free to move instructions outside of the asm!("msr DIT #1") and asm!("msr DIT #0") guarded region, and you can see that it essentially compiles:
if have_dit {
dit_enable();
}
let res = sensitive(a, b);
if have_dit {
dit_disable();
}
return res;
into
if have_dit {
dit_enable();
dit_disable();
}
return sensitive(a, b);
which defeats the point, since the instructions which are data-dependent are no longer in the block where DIT is enabled.
In an ideal world, inserting compiler fences (or atomic fences, or adjusting the asm! options...) would fix this, however they only apply to memory accesses, so that does not work. The only approach I can think of that does work is to use an indirect jump whose target we obscure from the vision of the compiler. Something like:
impl Dit {
// ...snip ...
/// Call `f` with DIT enabled, returning the result. Takes care to ensure
/// that that any computations which occur during `f` do not leak outside
/// the critical section.
pub fn with<R, F: FnOnce() -> R>(s: &self, f: F) -> R {
let guard = self.enable();
let mut o: Option<F> = Some(f);
// Note: Could be unwrap_unchecked().
let f: &mut dyn FnMut() -> R = &mut || o.take().unwrap()();
core::hint::black_box(f)()
}
// ...snip...
}
This works, as demonstrated here: https://godbolt.org/z/fv1Gofjxc (note: the add instruction was changed to floating point addition just so that it's clearer in the assembly listing what is happening—you can see that the fadd takes place in the indirectly called function, which is called (the blr x8) between the msr DIT #1; and msr DIT #0 instructions.
Flaw 2: Speculation
I'm unsure if this is needed on targets besides apple (my suspicion is yes, or at least that there's no guarantee for it to be a no), but apple documents it as necessary in the linked documentation.
Essentially, you need something like this after you enable DIT, and before you disable it (actually, I think you might not need it before disabling it, since it's fine if a DIT-mode processor speculates operations outside of DIT mode).
if is_aarch64_feature_detected!("sb") {
// note: in the real impl this must happen
// in a `#[target_feature(enable = "sb")]` function)
asm!("sb");
} else {
asm!("dsb nsh", "isb sy");
}
Of course, a real impl needs to:
- Put
asm!("sb") inside a #[target_feature(enable = "sb")] function.
- Use
cpufeatures rather than is_aarch64_feature_detected (which requires adding sb support to cpufeatures).
- And so on.
A patch fixing these issues is here: master...thomcc:rust-crypto-utils:aarch64-dit-sb-fix. I don't have time to do a PR and respond to review feedback, feel free to mess with it however you please, or to take a different approach that solves the same problem.
One issue with my patch is that it leaves the Dit::enable() function present and non-deprecated, even though it's a footgun. I think probably the right call would be to remove it. But that begs the question of whether or not the API should have a different structure in other ways too.
The
aarch64-ditcrate has two notable flaws:Flaw 1: Compiler Reordering
Here's an example of the first flaw: https://godbolt.org/z/8fdsnxrc5. Essentially the compiler is free to move instructions outside of the
asm!("msr DIT #1")andasm!("msr DIT #0")guarded region, and you can see that it essentially compiles:into
which defeats the point, since the instructions which are data-dependent are no longer in the block where DIT is enabled.
In an ideal world, inserting compiler fences (or atomic fences, or adjusting the
asm!options...) would fix this, however they only apply to memory accesses, so that does not work. The only approach I can think of that does work is to use an indirect jump whose target we obscure from the vision of the compiler. Something like:This works, as demonstrated here: https://godbolt.org/z/fv1Gofjxc (note: the
addinstruction was changed to floating point addition just so that it's clearer in the assembly listing what is happening—you can see that thefaddtakes place in the indirectly called function, which is called (theblr x8) between themsr DIT #1;andmsr DIT #0instructions.Flaw 2: Speculation
I'm unsure if this is needed on targets besides apple (my suspicion is yes, or at least that there's no guarantee for it to be a no), but apple documents it as necessary in the linked documentation.
Essentially, you need something like this after you enable DIT, and before you disable it (actually, I think you might not need it before disabling it, since it's fine if a DIT-mode processor speculates operations outside of DIT mode).
Of course, a real impl needs to:
asm!("sb")inside a#[target_feature(enable = "sb")]function.cpufeaturesrather thanis_aarch64_feature_detected(which requires addingsbsupport tocpufeatures).A patch fixing these issues is here: master...thomcc:rust-crypto-utils:aarch64-dit-sb-fix. I don't have time to do a PR and respond to review feedback, feel free to mess with it however you please, or to take a different approach that solves the same problem.
One issue with my patch is that it leaves the
Dit::enable()function present and non-deprecated, even though it's a footgun. I think probably the right call would be to remove it. But that begs the question of whether or not the API should have a different structure in other ways too.