A while ago, I recommended multi-backend Keras to someone asking which neural net framework to use.
I want to rescind that – my attitude at the time was “Keras kind of sucks but it’s not the worst and I have the most experience with it,” and now my attitude has moved to “Keras really sucks, Keras is BAD, use pytorch or if you have to use tensorflow just use raw ops”
I may elaborate later… this is just to “clear my conscience” :P
I reblogged this earlier with a bunch of words elaborating the claim, but then I removed it after a few hours … I guess I’m just feeling weird about becoming this guy who has a blog where he does ~Epic Software Rants~, and even as those go it was kind of unfocused and weird.
The short version:
Keras objects usually do pretty trivial things, like simple for-loops around tensorflow code. Often, even this is buggy or feels incomplete, and it becomes obvious that writing your own version will be easier than trying to work around theirs.
The objects are hard to serialize, or have been historically anyway. Compare the vast and complex Keras serialization doc to the tiny pytorch one. The python parts of Keras don’t like to be pickled, and define their own serialization protocol with worse UX (I never want to see the phrase “custom objects” again).
The Keras project was originally trying to define an abstraction layer not tied to tensorflow, and now it’s tied to tensorflow but wants to be independent of python (because tensorflow aspires to be). You lose the clarity and language-independence of tensorflow graphs, and no longer gain the portability across ML backends that Keras used to offer.
A tensorflow graph is a clearly scoped and defined concept, so if you know something is a tensorflow graph, that gives you various assurances. Keras objects are usually glorified tensorflow (sub)graphs, yet they have arbitrarily shaped python utilities attached to them like malware, making it hard to reason about their exact behavior and contents.
Ultimately, writing down a neural net is just not that hard. GPT-2′s architecture was specified as raw tensorflow ops and it is wonderfully straightforward, crisp, and readable. Neural net code presents other challenges, mostly related to compute graphs, and Keras makes this worse by trying to hide what the graph is and how it got made.


