Posts

Showing posts from August, 2018

Abstractions, Performance, and Documentation

Image
Abstractions are good, when it comes to software development . There are any number of reasons for this, from  Loose Coupling , to  SOLID , to, well, just google it if it isn’t already obvious to you. Let’s add a touch of nuance though, because, after all, Nuance  is  what we do in Software Development   . Abstractions are  mostly  good when it comes to software development . The key here is the ever-present tradeoff between Flexibility and Performance. Or, to put it differently, if you’re going through the “ Make it Work → Make it Right → Make it Fast ” sequence, and have somehow (hallelujah!) managed to get to the  Make it Fast  stage, well, this is almost exactly where the abstractions can start biting you in the butt. Sometimes that one optimization you need to make is hidden behind an interface, and you really don’t want to compromise the interface (because the abstraction  it  provides is ridiculously valuable everyw...

Test Coverage Applies To *All* Your Code!

Image
Way back in 2014 Yuan et al conducted  an analysis of production failures that brought down distributed systems  (•). They went through 198 random failures in well known distributed systems like Cassandra, Hadoop, and whatnot, and, well, the results were remarkably depressing. Basically, they found that for most of the  catastrophic  failures — where the entire system went down — the root cause was ridiculously simple. How simple you ask? Well, it turns out that around 90% of them boiled down to non-fatal errors that weren’t handled correctly. Yeah, I’ll let that sink in for a bit. Bad error handling. That’s it. Ok, now that it’s sunk in, let’s make it worse. 35% of the catastrophic failures boiled down to one of the following three scenarios 1.  The error-handler catches an overly general exception (e.g. an  Exception or  Throwable  in Java), and then shuts the whole damn thing down. BOOM. 2.  Or, better yet, the error-handler a...

“The Requirements Weren’t Clear”

Image
Timeframes are rarely absolutes. Oh yes, there absolutely might be  real deadlines , the kind that are actually important (“ We’re releasing in time for the Christmas season, the company depends on it… ”). That said, assuming that the deadline is actually achievable — and not something made up to get people to “work harder” — there is usually some level of uncertainty involved in the timeframes. Take the chart below for example People generally have a very good idea of how long it’ll take them to get  Short Term  stuff done. And  Long Term  stuff is usually beholden to the aforementioned deadlines, so there isn’t much uncertainty there (remember, if it’s a  real  deadline, the uncertainty is around  what  gets delivered, not  when !). The  Stuff in Between , however, can be all over the place. It’s the domain of shuffled priorities, unexpected kinks, flu season, market-induced curve balls, and  ambiguous requirements ...

Agile - I do not think that word means what you think it means

Image
/via http://www.commitstrip.com/en/2017/01/09/that-little-problem-with-agile/ The thing is, the higher up you go in the organizational food-chain, the more lip-service you get towards agility/flexibility. Actually, that’s not quite true — you’ll get the commitment,  as long as everything goes just right . The easy part to get buy-in on, the thing that they  totally  get, is the bit about  user feedback ,  short sprints ,  incremental releases , and so on. The part that you have to be  very  careful about though, is that in their minds,  this,  literally , translates to a linear progression , where, for example, over 10 sprints, you do 10% of the work each sprint.  User Feedback, in their minds, is User Validation! To belabor the obvious, their intuitive understanding is that there are no “ wrong roads taken ” and no “ failed hypothesis ”. Be very,  very , careful about this, and do your damndest to overcome this. I...

Deep Learning and Workflows — It *Can* Get Easier!

Image
/via https://xkcd-excuse.com/ If you’ve built out anything with Deep Learning, you pretty much know how the drill goes right? You 1.  Start off with a basic workflow. Or, alternatively, copy over a workflow from something you did previously, and mess with it for a while before you realize that it doesn’t quite match what you’re doing, throw it away, and then, well,  Start off with a basic workflow  . 2.  Start tweaking the workflow . There’s a step over  here  that takes way too long and needs to be optimized, there’s a step over  there  where you need to change a parameter, and so on. 3.  Wait a  lot . Seriously, that’s pretty much 99% of what you’re doing. Waiting. - 3-  is really quite the killer. Every time you tweak the workflow,  every time , you run the whole damn thing again. Which is really quite ridiculous, because, really, the entire pipeline can take quite a while to run. It’s actually quite a bit wor...

Nuance, and The Necessity For Trust

Image
“ When you live on the shoreline, you can forget that there are people out there who don’t even know the ocean exists ”  —  Me Remember when you started in the world of Software Development (or, frankly, any field whatsoever)? How wide open the vistas were, how challenging the problems were, and  how much you didn’t know? And how, years later, the vistas are  still  just as wide open, the problems are just as challenging as they used to be, and you’ve realized that, if anything,  there is even more  that you don’t know? (•) The thing is, it’s not that you haven’t learned anything. The more in depth you get into a field, the more nuance you discover, the more weird/fascinating edge cases you find, and the more complexity you discover when you pull back the curtains.  Edge Cases  and  Nuance  essentially become the cornerstone of your existence, to the point where most of your discussion with your peers revolve around them. And the...

Life on the Edge

Image
So yeah, this was years ago, before AWS was a thing. We provided business phone services (•), and had all our servers at a hosting facility (With cages. Where we had our own servers. Remember, this was before AWS!). This was a  real  hosting facility, with batteries, and a generator to back the whole thing up. And we were a  real  phone company, with tens of thousands of customers. All good fun — for a given definition of fun, mind you. Until the one day when there was a massive snow-storm, and a huge percent of D.C. lost power. Including our hosting facility. Which was fine, because the batteries took over immediately. And the generator kicked in, because the batteries were only good for, like, a minute or two of power. And after two minutes, all our servers crashed. Because, as it turned out, during routine maintenance, some dude had disconnected the generator from the circuits,  and never reconnected them . The good news was that a decent chunk ...

Pragmatism — It’s A Survival Trait

Image
You’ve probably seen some variant of this diagram, right? It’s really quite an interesting blend of cynicism and pragmatism. What you have is 1.  Where Marketing Meets Engineering : The nice folks in Marketing have looked at everything that the Market asks for — or they have seeded the Market with . They then match that up against the universe of everything that Engineering could ever deliver (“ Rollerblades. With energy harvesting brakes that charge your phone ”) 2.  Where Revenue Requirements Meet Marketing : So yeah, the company actually has to make money, right? Which means it actually needs to sell stuff. So the happy folks in sales look at their sales targets, and match that up against what the Market is asking for (“ Rollerblades with propellers! That can fly! People would actually pay $100K for them! And if I sell 10 of them a year I make Quota Club! ”) 3.  Where Engineering Meets Revenue Requirements : And finally, the engineering team should really ...

Tests, and Bug Fixes

Image
“Bug fixes must include a test that exercises the bug, and the fix ” This really shouldn’t be controversial, y’know? I mean, after all 1. There is a bug. We all know there is a bug. There is clearly something bad happening (“ Why did the service restart?  I  didn’t ask it to do so! ”), and bad is not good. 2. If we’re lucky, the bug even comes with a test case that exercises the bug (“ Send in an  int  instead of a  string , and watch the fun! ”) 3. If we’re  very  lucky, the bug report includes code (“ To dream the impossible dream… ”) Regardless  of where one is in the spectrum above, once you admit to yourself that there  is  a bug — and this can be an awfully hard admission to make sometimes — then you’re going to have to fix the damn thing. And that is going to involve some level of process where you’ll be doing  something  to make sure that there  is  a bug, right? After all, taking the above in...