Facial Recognition, and Bias

As you’ve probably heard, IBM is about to release a pretty monster dataset — over a million images — along with tool, all with the aim of helping get rid of bias in facial analysis. The cool part for me is actually the announcement of a seconddataset — around 36,000 images — that are “equally distributed across skin tones, genders, and ages”.
So, why does this matter? Before answering this, let’s first take a brief diversion.
Let’s say you are doing something involving Machine Learning and facial recognition. You’d need a dataset to train your models against — think about how you would select your dataset. You’d probably take into consideration the specifics of the task (“I need to know if the face is smiling or not”), the details of the algorithm that you’re working on (“Can I still tell it’s a smile if the background changes?”) and such-like. You’d then go to one of the handy-dandy collection of facial-recognition databases, and pick the most appropriate one. e.g.
• For straightforward, unconstrained facial recognition, you might pick LFW-a, or
• For cartoon versions of celebrities, you’d pick IIT-CFW
etc.
This works.
Or, I should say, it “works” (scare quotes intentional!), as long as you don’t care about bias in the results.
Which means that it actually fails pretty horribly when you start looking at things like how the results work with women, or people with dark skin, and so forth. Because, it turns out that the underlying data used in these databases are not well distributed across genders, skin-tones, and the like! And that means that your algorithms are being trained against biased data, which means that they themselves are biased! 
I’m not just making this up — in a recent paper, (•) Buolamwini and Gebru studied two common facial analysis benchmarks (IJB-A and Adience), and found that the data was overwhelmingly that of light-skinned subjects (79.6% for IJB-A and 86.2% for Adience). Even worse, they tested commercial products from IBM, Microsoft, and Face++, and found huge disparities when analyzing darker-skinned people.
In fact, dark-skinned women faring the worst, with error rates of almost 35%, as compared to < 1% for light-skinned men!!! (••)
Check out the following video about this. And yeah, if you don’t want to sit through it, skim the results at GenderShades.org — it’ll take you less than a minute, and should leave you pretty horrified. Then watch the video.

Which takes us to the recent test by the ACLU of Amazon’s facial recognition system, which happily identified a whole bunch-a members of congress as people who have been arrested for a crime, of which a disproportionate number were people of color!
And yeah, if you talk to Amazon (or, frankly, any of the vendors), you’ll probably get back responses like “You didn’t calibrate it correctly”, “We just supply the algorithms, it’s up to you to implement it correctly”, and “Caveat emptor”. Which doesn’t really get to the underlying issue, which is that you shouldn’t have to bend over backwards to do the right thing!
And that brings us back to the IBM dataset, the 36,000 images “equally distributed across skin tones, genders, and ages”. This, at the very least, will allow people to test their algorithms for bias against a really diverse dataset, and see how it fares. And if they don’t, we can see that they haven’t, and hold their feet to the fire appropriately.
There is more, much more about Algorithmic Bias at The Algorithmic Justice League. I strongly urge you to go check it out…
(••) Most everyone involved has taken these results quite seriously. You can see IBM and Microsoft’s responses at this FAQ. That said this is not a “quick fix” kinda thing. As I said earlier, go check out The Algorithmic Justice League.

Comments

Popular posts from this blog

Cannonball Tree!

Erlang, Binaries, and Garbage Collection (Sigh)

Visualizing Prime Numbers