Data Augmentation? What's That?

“Dog on the left, cat on the right”.
It’s pretty obvious, right? I mean, you just looked at the picture, and you just knew.
Why?
Well, what with all the dogs, and cats and memes that you’ve seen over the years, in movies, TV, IRL, etc — the very concept of “dog”-ness and “cat”-ness has embedded itself in your brain to the point where you barely even think about it. Maybe, once upon a time, you heard “dog”, and you thought “Rover, my dog, with the defiant left ear”. But nowadays you hear “dog”, and there is just a vague concept in your head that is the embodiment of “dog”-ness.
Regardless, you see a dog, and your brain goes — “that’s a dog”.
Classification
This process is called classification, and is pretty much what neural networks also do when they look at a picture and say “dog”. To be a wee bit more precise, when you train your deep learning model, what you actually end up doing is get it to the point where given an image as an input, it outputs a label (“dog!”).
So, how d’you train it? Well, basically, do what you did to learn “dog”-ness, by giving it a whole bunch of images of dogs, so that it learns to recognize them.
OK, when you get down to it, dogs and cats kinda look the same right? Ears, nose, four legs, furry, it can get tricky differentiating between the two (imagine you’re a kid, and are seeing a cat for the first time — “look at the tiny dog!”).
So what you also do is train the model on not-dogs (cats, rhinos, etc.), and hey, now your neural network knows that this is a dog, and that isn’t. Cool, right?
Augmentation
Mind you, it’s easy to identify things when everything is Just Right — bright day, full frontal, up close, “yup, that’s not a dog, that’s a sloth”.
But what about when you only get a fleeting glimpse? Or it’s dark, and you’re looking through a blurry window?
What you need to do is train the neural network on images taken under all sorts of conditions. Think partialssizescaleorientationbrightnesslightingpostureslocations, and a whole bunch more.
https://rock-it.pl/images-augmentation-for-deep-learning-with-keras/
So, where d’you get images under all these conditions? Well, you basically fake it, by taking a regular image, and using data augmentation. Basically, you take the original, and then start photoshopping the heck out of it — flip, rotate, crop, etc. In fact, you can even translate the image(change photos into art, and whatnot).
The thing is, as far as the neural network is concerned, each of these images is a totally different thing! (remember, neural networks start out dumb). What you’re trying to do is teach your neural network about something called invariance, which is basically the ability to recognize (“classify”) the object regardless of the conditions that it is presented in —conditions like sizeviewpointillumination, and translation.
Relevance
The tricky part here, however, is that you do need to be careful about the type of data you are using — you need to make sure that the data is relevant to the problem at hand.
For example, if you’re building an app to see if your dog is sleeping on the bed when you’re gone, then you want to train it on pictures of dogs on beds, and empty beds. Dogs frolicking on the lawn are cool, but not exactly relevant to the problem at hand!
The same applies to all the data augmentation techniques that you use on the data. If you’re building a license-plate reader (for BigBrother?), then you need to assume that a lot of the pictures you take will be blurry, out of focus, at odd angles, and cropped — your data augmentation should apply these transformations to your sample images.
 OTOH, if your app is supposed to recognize the architectural style of houses, then you probably want pictures of houses in different seasons, right? As it turns out, you can actually do that too (Deep Learning is amazing).
Choices Choices
So, how do you make sure that your data augmentation is actually relevant? Well, you typically do that manually — you figure out what the problem space is (dogs on beds), the possible scenarios (daytime only, camera on the dresser, …) and go for it.
The problem with this approach is that we all have biases, conscious or otherwise. These don’t have to be evil biases, they could be something as simple as incorrect assumptions (“I thought daytime-only, but it gets dark at 4pm in Helsinki!”). These biases, along with the fact that this is still an evolving field, and we just don’t know what and how much augmentation to optimally do, result in a lot of guesswork.
This, mind you, isn’t the end of the world. Guesswork, is a good start, and as you find out your limitations (“4pm in Helsinki”), you can adjust things. It’s manual, and it can get a lot better, but it certainly is a good start.
On top of this, there is some excellent work being done out there in figuring out the optimal data augmentation strategy for a given problem. Google, for example, just released AutoAugment (•) where they use a Machine Learning technique called Reinforcement Learning to figure out which combination of data augmentation techniques works best for your specific problem space and dataset.
The above should serve as a primer. If you need to get more into the weeds, take a look at this excellent article by Bharat Raj.

Comments

Popular posts from this blog

Erlang, Binaries, and Garbage Collection (Sigh)

Visualizing Prime Numbers

Its time to call Bullshit on "Technical Debt"