Securing against timing attacks with Twisted
What are timing attacks?
Timing attacks are side-channel attacks that rely on inferring secret information from operations by measuring how long they take to execute.
A complete explanation is outside of the scope of this article, but the Wikipedia article might be a good starting point for the interested reader.
Why should I care about them?
Because they can creep in before you know it, and break your otherwise fine system.
A common way they're introduced are string comparisons. String comparisons in many langauge implementations, including all implementations of Python I know of, short-circuit. They work roughly like this:
def strcmp(s1, s2): if len(s1) != len(s2): return False for c1, c2 in zip(s1, s2): if c1 != c2: return False return True
This means that as soon as they can prove the two strings are not
equal, they return False
and ignore the rest of the string. This is
a very simple and effective performance optimization. And that's
precisely why, from a security point of view, it's a liability.
Since it takes longer to compare "The quick brown fox jumps over the lazy dog" to "The quick brown fox jumps over the lazy god" than it does to compare it to "Lorem ipsum", or even a Lipsum of the same length, an attacker can use that timing information to figure out what the original string is that is being compared to, even when he shouldn't.
Why did you care?
Password resets.
The typically recommended way of doing them is to generate a random number, one that's sufficiently long that an attacker can't guess it. Then, you relay that number to the user using some alternative channel (usually a URL in an e-mail).
Bar the random number, those URLs are predictable. That means an attacker can trivially try any number he wants. If an attacker is allowed to do that enough, he could measure timing differences between different numbers (introduced by string comparison functions that short-circuit or even by the database's index) to efficiently deduce the value of a "good" number.
Also, if membership is private, an attacker may exploit timing differences in the password reset request function to farm e-mail addresses.
How do I protect against timing attacks?
Just to be clear: timing attacks are a complicated problem, and this article describes just one strategy I've applied to help secure against them.
The easiest way to prevent a timing attack is to make sure that the timing your hypothetical attacker can measure is unrelated to the work you have to do.
Since I was using Twisted and AMP, this was actually quite easy. I wrote a decorator for AMP responder functions that does exactly that:
def _immediateResponder(f): """ A decorator for responder functions that should return immediately and execute asynchronously, as a defense against timing attacks. The responder decorator should be applied after (above) this decorator:: @SomeCommand.responder @_immediateResponder def responder(...): .... This should be timing attack resistant since it is unconditional: the the AMP response is returned immediately, and the real responder is scheduled to run at the next chance the reactor has to do so. This only works with AMP commands with empty responses. That's probably a good idea anyway: almost all information you could add to the response is liable to introduce a timing attack vulnerability. Since this precludes your ability to communicate success or failure to the caller, the decorated function should return quite quickly (or, if it can't, that should be clearly documented). Otherwise, you may end up in a a race condition, where the caller assumes the operation has completed, but it is in progress or hasn't started yet. The original responder function is available on the decorated function as the ``responderFunction`` attribute. """ @functools.wraps(f) def wrapped(self, *args, **kwargs): reactor.callLater(0, f, self, *args, **kwargs) return {} wrapped.responderFunction = f return wrapped
When I get an incoming RPC call, the function doing the actual work is scheduled to run at the next reactor iteration. Then, an empty response is returned. All of this happens in amortized constant time, and independent of any secrets. As a result, it can't really leak much about them.
An abstraction too high
The above was an effective response to a proof-of-concept timing attack exploit. Unfortunately, that doesn't mean you've fixed every timing attack.
In particular, this example is a few layers of abstraction removed
from the grit of real-world I/O. Just because I returned {}
(an
empty response) immediately, doesn't mean the underlying IO happens
immediately.
In particular, the write output latency could be coerced to depend on
the computation time, because the function passed to it could be
executed before the write
. If that happens, and the time it takes is
dependant on some secret, delaying the write, the latency on the
attacker's side could be used to measure the work done.
I have not yet been able to turn the above into a working exploit.
There are a number of ways this could be mitigated. Since there's no working exploit, it's unclear if this mitigation would render a timing attack infeasible.
One way I've considered to mitigate this is to limit the time that the
reactor is blocked. In my concrete example, this was fortunately
already the case, since one of the first things it did was defer to a
thread that released the GIL (to compute the key from the password
using scrypt
). Alternatively, if you're doing this for Python
code, you could write your function cooperatively.