Posts

Showing posts from January, 2018

Sum Types! Get your Sum Types here!

Sum Types — or, more to the point, Algebraic Data Types (•) — are one of the cooler things in Rust (••). Long familiar to folks in the Haskell / OCaml (and other ML-derived languages), they are, at heart, fairly simple things. In short, a “sum type” is any type that has many possible representations. e.g. in Haskell, if you wrote data Bool = False | True you’d basically be saying that Bool could take the values “False”   or   “True”. Extending this, if you wrote (in Haskell again!) data Event = ClickEvent Int Int | PaintEvent Color you’d be saying   there is a data type Event that contains two cases: it is either a ClickEvent containing two Ints or a PaintEvent containing a Color. It’s the kind of thing that is ridiculously useful, and impossible to live without once you’ve had it. By the way, these tend to go by a bunch of different names —  tagged union ,   variant record ,   disjoint union , and a whole host more. Chad Austin has an excellent writeup on Sum

Drones in the Streets

Image
So you want drones in your cities, doing, well, whatever the heck you want them to do (take-out, packages, snooping …). The trick, of course, is to have the drones do their thing, without running into lamp-posts, cranes, people, wires, and so forth. The even   trickier   bit is that a lot of these — yes, even lamp-posts — have the habit of suddenly showing up one day, making for a constantly changing topography. The   trickierest   part is that, well, how do you get all this 3-D info? I mean, you can’t just have a plane flying around doing the “Google StreetView” thing, or even a drone being flown around. Loquercio et al. have come up with a remarkably nifty solution — which they call DroNet (•). It’s a fast (and tiny!) 8-layers residual network, which outputs a steering angle and a collision probability. (Think   “if you fly straight ahead, you will crash” . ) They’ve also figured out how to collect the relevant data from cars and bikes, and it turns out that the data is good en

Deep Learning, in the end, still requires Software Development

Image
So yes, the field is new, the tooling is *not* mature, and the skillset is still being figured out. That, however does not excuse us from having to follow good software practices. I mean, we’ve been through this before with Javascript — in the early days, the sheer horror that was *any* JS code-base would blow your mind (•). So yeah, when building out whatever you’re working on, don’t for get the basics, like Unit Testing, Evolutionary Architecture, Clarifying Requirements, Validating Results, and so forth. I could go on, but luckily, Radek has already done a lot of the heavy lifting here —  https://hackernoon.com/doing-machine-learning-efficiently-8ba9d9bc679d  — go read it. (•) Or maybe not. I mean, it’s better now, but still pretty horrible (••) (••) Note that I’m differentiating between the *average* badness of development, and the *uniquely* bad thing that was JS back in the day, when anybody who could insert a <script> tag was not a “JS Developer”

Debugging Distributed Services with Squash

Image
Debugging distributed services is Teh Sux0r”  — this is a   very   common complaint from people new to the world of distributed computing. Mind you, it’s true, but it also kind of elides the point. “Old School” monolithic single-node apps may support things like setting breakpoints, stepping through code, following (and changing) variables on the fly, and so forth, but once you start thinking in distributed terms (Consistency, Latency, Resilience, etc.), the very act of “debugging” can interfere with the thing being debugged. This, however, does not mean that there is   no   room for debugging. The more you focus on debugging the components — instead of the interactions between the components — the more value there is in “old school” debugging, and thats where   Squash   comes in ( https://github.com/solo-io/squash ) From the docs “Squash brings the power of modern popular debuggers to developers of micro- services apps that run on container orchestration platforms. Squash bridge

Fuzzing, and … Deep Learning?

Image
Fuzzing, as you know, is a remarkably useful tool from the world of Security Testing. As wikipedia (•) puts it, you provide unexpected or invalid, data as inputs, and then look for exceptions, crashes, memory leaks, etc. The trick here is to generate inputs that are   “valid enough” in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are “invalid enough” to expose corner cases that have not been properly dealt with. The difficulty with fuzzing is that, with limited exceptions, it is hard/impossible to exhaustively explore all possible inputs. As a result, you tend to use heuristics to figure out what to fuzz, when to fuzz, the order to fuzz in, etc., all of which makes it a little bit of a black art. Herewith a paper by Böttinger et. al. (••) where they use   #DeepLearning  — specifically reinforcement learning to narrow down the “fuzz space” … Basically, fuzzing is modeled as a feedback driven learning process wher

Hiring in an Up economy (v2)

Image
  /via CommitStrip

AlphaGoZero and Multi-Task Learning

Image
As you probably know — unless you’ve been hiding under a rock —  AlphaGoZero   beat AlphaGo 100–0,   with no human training . When you look under the hood, the fascinating thing is that almost 50% of the gain was accomplished through simply updating the architecture from a “convolutional” one to a “residual” one. (°) The other 50%, interestingly, came from moving to Multi-Task Learning (MTL) — where you train your model across two related tasks. In human terms, think of this as the Karate Kid approach, where “wax on wax off” served to also teach karate (and, don’t forget, wax the car!). In particular, MTL is useful when you want to • Focus Attention : It provides additional evidence of whether data is relevant or not • Eavesdrop : Sometimes it is easier to learn something via unrelated tasks, a-la the Karate Kid • Prevent Overfitting : It keeps you honest • Avoid Representation Bias : It keeps you generalized, so that you can apply your model to other things too It’s a fa

Calling Bullshit on the “AI Gaydar” (aka: “Science! It Works!”)

Image
Herewith an excellent takedown of Wang & Kosinski’s “AI-based sexual orientation detector” (•), which comes to its conclusions based on a whole bunch of selfies (••)  It is worth reading in it’s entirety, both for the takedown, and the methodology involved — science-based, and showing alternative/simpler hypothesis that show the same results, if not better. It turns out that a handful of Yes/No questions about Makeup, Eyeshadow, Facial hair, Glasses, Selfie angle, and Sun Exposure are pretty much as good as Wang & Kosinski’s AI at guessing sexual orientation. As Blaise puts it   “ it’s hard to credit the notion that this AI is in some way superhuman at “outing” us based on subtle but unalterable details of our facial structure.” #DeepLearning   has a tremendous future — heck, it has a tremendous present. But bad science is really not a good stepping stone on the way to the future… (•) “Deep neural networks are more accurate than humans at detecting sexual orientation

OpenCensus — towards harmonizing your Instrumentation

Image
You’ve really gotten into this whole Observability thing, and have started plugging Prometheus into, well, everything that doesn’t already have it.  And you’ve   also   started implementing OpenTracing because, well, you’ve got a distributed something, and not doing it, or something like it, would be just plain dumb. And you realize that you’ve now spent the last week pulling together some kind of consistent wire protocol that works seamlessly across the metrics and the traces,   and you haven’t even started on the APIs , and surely,   surely somebody has already done this?   Enter OpenCensus (•), the open-sourced version of Google’s Census library, and which gives you  • Standard wire protocols and consistent APIs for handling trace and metric data.  • A single set of libraries for many languages, including Erlang, Java, C++, Go, Python, Ruby, and more  • Integrations with your favorite web and RPC frameworks  • Exporters for storage and analysis tools like Prometheus, Zipkin, Data

Stateless Services — and Performance

Image
So, you’ve drunk the ServerLess Kool-Aid — Lambda / Azure Functions / GCF / whatever — and are frenetically converting all your micro-services to StateLessMicroServices™, right? And everything is just going to be peachy-keen, right? Well, maybe. As you go down the path, one of the things you might find is that you now have   different   performance issues that you need to pay attention to. (•) In particular • Resources  : Yeah, you’re going to have to open/clean-up those db connections, file handles and network connections each time. • State  : is now going to be in your data-store. And oh, you   do   have a data-store right? Which you are dealing with per   -1-   above, right? Which means you’re going to have to retrieve/store it each time you run your function… • Concurrency  : Yup, welcome to the world of concurrency. You’re going to have to deal with your serializability, linearizability, consistency, and that entire ball-of-wax, each of which comes with its own performan

At scale, all IFs become WHENs (and rapidly at that)

How do YOU validate the structure of your containers?

Image
Speaking for myself, the phases I went through looked something like 1)   “Wut?” 2)   “Notify me if/when the build fails” 3)   “Notify me if/when it fails during use” 4)   “I’ve done all that I can do, now leave me the f**k alone.” The thing is, going from   -1-   to   -4-   is really pretty damn rapid, since,   at scale all ”IF”s become “WHEN”s (and rapidly at that) And now, finally, a solution (or a very promising start to one) —  Google’s Container Structure Test  —  https://goo.gl/spNXEq  — which promises to verify and validate the contents and structure of containers. It’s declarative (YAML!), and is really quite clever covering the following aspects  •   File Existence and Contents  : Checks that a file exists (or doesn’t!) in the image, and has the correct contents/metadata  •   Commands  : It’ll run a command inside the image (with setup/teardown!) and validate the output  •   Metadata  : Validate the images’s metadata (basically, “does the image match the dockerfile?” (•