Posts
Showing posts from December, 2017
Distributed Systems, and Coding for Failure
- Get link
- X
- Other Apps
The world of Distributed Systems is as much about philosophy as it is about implementation — the specific context, architecture, tooling, and developers all come together to make your way different from everybody else’s. And yet yet , there are overarching themes and patterns that hold true, that you ignore at your own peril. The most important of these is the necessity to architect for failure — everything springs from this. (°) Complexity grows geometrically with the number of components, and no amount of fakes, strict contracts, mocks, and the like can reduce this. If anything, poor practices will only increase this complexity The key is to accept this, and align the risk associated with a given service/component with the risk the business is willing to take. In short, make the component reliable, but no more reliable than necessary” As a developer, what this means is that you must • Understand the operational semantics of the system as a whole • Internalize t
Lossless Data Compression, and… #DeepLearning?
- Get link
- X
- Other Apps
Way back, at the dawn of information theory, Claude Shannon showed that there was a lower bound to the amount you could — losslessly — compress data. Since then, there have been any number of “encoding” systems dreamt up to hit this bound. These systems all (roughly!!) boil down to ° Find patterns in the data ° Associate a symbol with these patterns ° Transmit the symbol instead of the pattern The trick is in identifying patterns. For example, if you and I have the same book, I can just say “page 37, line 6” to specify any line, but, if I don’t know which book you have… Enter #MachineLearning — Recurrent Neural Networks specifically — which are particularly well suited to identify long term dependencies in the data. They are also capable of dealing with the “complexity explosion” as the number of symbols increase. In this paper — https://goo.gl/5SPwJB — Kedar Tatwawadi works through this approach, to excellent results, usually beating the pants off of “old school” arit
Everything you wanted to know about Load-Balancing, but were afraid to ask (°)
- Get link
- X
- Other Apps
Herewith an excellent overview of the field , as well as a bit of a deep dive to get you acquainted with what’s going on these days. The future — and present! — is getting way fascinating, with L4 load-balancers moving towards distributed consistent hashing solutions, and L7 ones becoming increasingly popular thanks to the growth of microservices, sidecars, and service-meshes. “Global load balancing and a split between the control plane and the data plane is the future of load balancing and where the majority of future innovation and commercial opportunities will be found.” — https://goo.gl/7m9sAu The above is particularly relevant given that the industry seems to have — finally! — embraced Observability, and in particular, the robust and protocol/system specific Observability needed by distributed systems (°) Or, frankly, didn’t know it even existed in the first place. Especially if you’ve been not paying attention for the last year or two.
Mortgage Risk, and … #DeepLearning?
- Get link
- X
- Other Apps
If you have been in a poker game for a while, and you still don’t know who the patsy is, you’re the patsy. Now, for poker, substitute Finance, and in this case, Mortgage risk. If you’re still doing it by hand, or using complex curve-fitting, etc., well, you’re the patsy. Kay Giesecke from Stanford has a fascinating paper where he collected 294 parameters per loan across 120 million loans (from 1995 to 2014), and used this to predict the loan performance via #MachineLearning. The entirely unsurprising result — it beat the pants off of the “traditional methods” (logistic regression, curve fitting, etc. And note, this isn’t simple stuff, it is basically what institutions have been throwing Quants at for years now). The thing is, this is exactly the kind of problem that is really hard to model (hence the quants) — there are too many parameters, internal and external, and the differences in performance tends to be more about making the correct guesses about samples when
The Importance of Soft Skills
- Get link
- X
- Other Apps
The thing is, these days, any non-trivial system is but a cog in a much larger system which includes humans as elements, and the complexities around communication and collaboration are the dominant factor in success or failure. #TechBros and # CowboyDevelopers , however, are concerned about the increasing importance of these “soft skills (•), usually because • Imposter Syndrome : “I’m not good at that stuff” • Diminishment : “What does that mean wrt my mad golang skillz?” • Kubler-Ross : “I am the smartest architect, therefore I am also the best manager. Also, everybody loves me” Does your manager get this? Worse, does your manager respect this? Remember, the only thing worse than a “ Professional manager ” is the “ #TechBro turned manager who hates management ". You know the kind — they suffered through the former, and when given a chance at control, either turn into dictators (they know best!) or devolve into chaos (let a thousand flowers bloom!) Soft skillz a
Chaos Monkey meets Kubernetes
- Get link
- X
- Other Apps
For when you *really*need to be sure, herewith #chaoskube to periodically kills random pods in your #Kubernetes cluster. (°) — https://goo.gl/SgdD4u (°) Note, you probably want to mess around with the label selector, to make sure you don’t nuke any of the “critical” or “kube-system” pods. Unless you actually want to do that, of course
In Praise of Boring Tech (and Erlang)
- Get link
- X
- Other Apps
Let’s get something out of the way first Boring =/= Bad Of course there is any amount of Boring and Bad (°) out there, but here the reference is to stuff that #JustWorks, whose implementation surface is well defined, whose failure modes are well understood, and whose edge cases tend to qualify as “Good Problems To Have”. #Postgres is Boring. #Varnish is Boring. #AWS is Boring. #Erlang is Boring. Wait, what ? Erlang ?Yes indeed, it is, in many ways, the quintessential Boring technology. Remember, your job, as a technologist, is to keep the company in business. Everything else — reducing costs, increasing revenue, using the best tool, “having fun”, etc. — are subordinate to keeping the company in business (°°). And, keeping a system working reliably is waaay more expensive that building the damn thing in the first place. And, to date, there are few, if any, environments other than Erlang to build reliable systems. Oh yes, you can do
One of these IS like the other
- Get link
- X
- Other Apps
So, you’re using #MachineLearning to do image classification (of course you are ). The issue at hand is, How do you differentiate the important stuff from the background? . It’s a bit of a trick question, because, well • What is “important”? That it’s a Cat (next to piglets)? • What is “background”? (You’re looking for piglets!) And so on. In #DeepLearning , we call these “core” features vs “orthogonal” (or “style”) features, and differentiating between the two can be, well, hard . In a paper by Heinz-Demel and Menshausen (°), the authors come up with a neat trick to deal with this, one that requires a lot less effort than good old #SupervisedLearning — they assume that somewhere in the image is the uniquely identifiable thing you are looking for. For example These images have my piglets in them . That’s pretty much it. Now that the model “knows” that my piglets are in there somewhere, it’ll happily ignore all the extraneous stuff including image qua
Differential Privacy — Under the Hood
- Get link
- X
- Other Apps
Apple released a little more information about #DifferentialPrivacy with a paper and an excellent blog post (°). A quick refresher - Under differential privacy, you • Collect user’s data (locally, or on the server) • Add noise to mask the data, but in such a way that when you • Average out the data, the noise averages out, and meaningful information emerges The trick, of course, is in the details, at the very least, you need to be careful in • calibrating the noise so that it averages out (algorithms matter!) • collecting and transmitting information in a manner that protects privacy (secure storage! encryption!) • keeping the information under privacy exposure thresholds (don’t combine data across different use cases! restrict the number of use cases!) etc. It's a fascinating read, and, I suspect, the very beginning of the field. I expect signal processing (Huffman coding, etc) to come into play any day now (°) Learning with Privacy at Scale - the paper - h
DB/2 has HOW MANY code bases?
- Get link
- X
- Other Apps
(aka: “Mechanical Sympathy is, very much, a thing”) Turns out that DB/2 has FOUR separate code bases for • AS/400 • MVS • VM • R/6000 (originally OS/2) They’re all, basically distinct lines (°), with Wildly different hardware/OS models. For example, the AS/400 line has a single level store where memory and disk addresses are indistinguishable and objects can transparently move between disk and memory. It’s a capability-based system where pointers, whether to disk or memory, include the security permissions needed to access the object referenced. Why would you not take advantage of this for your features — let alone, for optimization purposes? So yeah, in this light, it does make sense, especially when you add in the dictum market share is even more important than engineering efficiency via — http://perspectives.mvdirona.com/2017/12/1187/ (°) Yeah, VM/CMS is a but of a special case, it’s the productization of the System R research system, and is pretty wil
Clustered vs Non-Clustered Indexes
- Get link
- X
- Other Apps
(aka: “ Phonebooks vs Book Indexes ”) A clustered index is best thought of as a phonebook (°) — you index by a column, and all the data for that row is right there, next to that key. A non-clustered index, OTOH, is like the index in the back of a book — topics listed by alphabetical order, and pointers to every page where that topic is covered. The key difference being that in non-clustered indexes, getting at the data is a two-step process (find the topic, then find the page) And yeah, it’s not quite this straightforward, but it should get you started… (°) “ Note to future SQL developers in 2030 who have no idea what a phonebook is: it’s something that stores the names, addresses, and landline phone numbers in your area. What’s a landline? Oh boy…” — https://goo.gl/VXCPG3
User Interfaces, Design, and ... #DeepLearning?
- Get link
- X
- Other Apps
Good #UI design isn’t just about being “user-friendly”, it’s about reifying deep principles about the world, making them the working conditions that users live and create , making explicit, yet comfortable and familiar, becoming part of the user’s patterns of thought. #MachineLearning can be valuable here — removing vast swathes of cognitive burden to expose these deeper truths. Consider Font Design for example — bolding a font isn’t just about “making it thicker” () — there are a bajillion heuristics that type designers have figured out over the ages, things like “preserve the enclosed negative space”, “move bars lower”, “lag interior strokes to the exterior”, and hundreds more (Thousands? Even more? Quite likely…) The benefits of using #AI to help is that you can train it on these heuristics, using available fonts, and it will automatically generate the bolded versions for you. Oh, it won’t be perfect the first time around, but use your corrections as inputs, and,