Lossless Data Compression, and… #DeepLearning?

- December 30, 2017

Way back, at the dawn of information theory, Claude Shannon showed that there was a lower bound to the amount you could — losslessly — compress data. Since then, there have been any number of “encoding” systems dreamt up to hit this bound. These systems all (roughly!!) boil down to
° Find patterns in the data
° Associate a symbol with these patterns
° Transmit the symbol instead of the pattern

The trick is in identifying patterns. For example, if you and I have the same book, I can just say “page 37, line 6” to specify any line, but, if I don’t know which book you have…

Enter #MachineLearning — Recurrent Neural Networks specifically — which are particularly well suited to identify long term dependencies in the data. They are also capable of dealing with the “complexity explosion” as the number of symbols increase.
In this paper — https://goo.gl/5SPwJB — Kedar Tatwawadi works through this approach, to excellent results, usually beating the pants off of “old school” arithmetic encoding.

I suspect this will be moving very rapidly into the field!!

Search This Blog

And you are here why?

Lossless Data Compression, and… #DeepLearning?

Comments

Popular posts from this blog

Cannonball Tree!

Sysadmin Day - July 27th

Erlang, Binaries, and Garbage Collection (Sigh)