Deep Learning and Security

Attacking the security of Deep Learning systems is quite the “in” thing these days. After all, the headlines are so catchy, y’know? 
• “Hacking Stop Signs Will Cause CarCrashApocalypse!”, 
• “This One Sticker Will Change A Banana Into A Toaster!”, 
A common strain across most (almost all!) of these papers and articles is that they focus on indistinguishable changes — flipping these wee tiny pixels will cause the panda to become a vulture! — and whatnot. The weird thing in Deep Learning is, outside of movie-plots, there are virtually no real-world examples of security threats that are predicated solely on indistinguishability!
Before you get too hot-and-heavy about this though, let’s unpack Deep Learning based security threats a bit more. When we are talking about attacks based on content, we are actually talking about a continuum between Content Preservation and Distinguishability.
By definition, attacks that are indistinguishable (“We only flip one pixel, it’s invisible to the human eye!”) are completely content preserving.
Y’see, the point behind the attack is to make the classifier screw up, so that the system thinks that the panda is a vulture. As far as us humans are concerned, it’s still a panda. And that’s what makes this an attack — humans don’t know anything has changed, but the Deep Learning system thinks it is a vulture, not a pand (and then…what? But hold that thought, we’ll get to it).
OK, so, to recap the image above, one end of the spectrum consists of attacks that seemingly don’t change anything, and the other end being attacks that can do any damn thing they want.
A new paper by Gilmer et al. (•) breaks this down rather neatly in to the following five categories.
  1. 1. Content Preserving Attacks: You want the content to get across, even if the specific format of the content changes. Think of all the pirate streaming of the World Cup recently — where the content was either letter-boxed, or a TV screen was re-filmed — which evaded the content-filtering mechanisms of most of the world’s ISPs.
  2. 2. Non Suspicious Attack: Here, you don’t know that something bad happened, but behind the scenes there was an attack, oh yes indeed. Imagine, for example, you get your phone to ultrasonically tell somebody else’s Alexa to buy 1337 copies of Ivanhoe (easily the most boring book ever). It’s ultrasonic, so the first inkling Alice has is when the shipments start showing up.
  3. 3. Content Constrained Attacks: Getting malware onto somebody’s computer, getting email past a spam filter, or banned content onto Twitter/Facebook/whatever, these are the attacks that we are all so familiar with, and have been dealing with since the dawn of the InterTubes.
  4. 4. Unconstrained Input Attacks: Here, you get to do whatever you want. And yes, it’s a bit of a catch-all, but the point remains that you have minimal constraints on your attack. For example, you have somebody’s iPhone, you want to unlock it, and you have all the time in the world…
  5. 5. Indistinguishable Attacks: Here, indistinguishability is required(emphasis necessary). And, as mentioned earlier, outside of movie plots, this category is pretty much the empty set.
So, we’re talking about indistinguishable attacks — i.e., -5- above — at which point you need to ask yourself “How likely is it that the attacker would prefer to use an indistinguishable attack, instead of something in -1- through -4-. Or, heck, how likely is it that the attack that has no machine learning component whatsoever?
/via https://mcurphey.wordpress.com/2010/04/17/security-bullshit-23-pci-application-security/
After all, attackers have their own constraints, and capabilities. If they can waltz in the front door, they are probably not going to Tom Cruise their way in using harnesses, y’know? 
The point here being that if your system is susceptible to really stupid attacks (“the front door”), then you have a ways to go before you need to start worrying about indistinguishable attacks!
Exhibit A here is the Stop Sign Attack, where a Bad Guy hacks the sign to cause car crashes. Everybody talks about making little tiny scratches on the stop-sign to fool detectors, but the reality is that it is much, much easier attack is to just knock over the stop sign and be done with it.
The point here, of course, being that you want to design systems that correct for missing signs well before you get to robustness against tiny tiny scratches. Heck, that’s pretty much the way humans work — our vision is pretty seriously fallible, but we correlate a lot of other information to make sure that we don’t f**k up!
What’s more, we should keep in mind that there are defenders in the loop too! So, for example, to defend against suddenly getting 1337 copies of Ivanhoe in the mail, Alexa gets your permission before placing orders. Or extend this to having devices log and/or notify every time a voice-command is received. Which, in the Stop Sign example above, translates to devices tracking their inputs, and reporting unexpected stuff (“Huh, everybody is reporting that there is a Toaster at E. 76th where there should be a Stop Sign”).
Or, to put it differently, instead of freaking out with “OMG STOP-SIGN FIAL APOCALYPSE NOW WE ARE DOOMED!”, it might be better to look at “indistinguishable perturbation issues” as machine learning related, and not security related. As Gilmer et al. put it, “At the end of the day, errors found by adversaries solving an optimization problem are still just errors and are worth removing if possible”. Focusing on these as Security Issues does both Security and Machine Learning a disservice. After all, if they are Security-related, then it’s worth having at least some association with real-world scenarios where this could matter (and not just Movie-plot ones). After all, security doesn’t exist in a vacuum, it must be within the context of an attackers constraints and capabilities!

Comments

Popular posts from this blog

Cannonball Tree!

Erlang, Binaries, and Garbage Collection (Sigh)

Visualizing Prime Numbers