This article is very, very good and I’m linking it in lieu of some longer post I was planning to write about “explainability” vs. intrinsic interpretability, since it would have mostly covered the same ground
(To say something briefly, though: we really need a distinction between machine perception i.e. automatic feature extraction, which can and should be a complicated and difficult to compress function of raw low-level inputs, and machine judgment i.e. making a classification or decision on the basis of high-level extracted features, which had damn well better be locally expressable as a pretty shallow decision tree since that’s what human explanations of our own behavior amount to, and those are both invaluable for working together and apparently good enough for that purpose.
Relatedly, there’s a lot of current research in NLP — cf the GLUE benchmark paper and those citing it — on the dumb heuristics that modern NLP models learn, not because they can’t express complex features but because all the standard datasets are easy to game. They have excellent feature-extractors plugged into really stupid judgement-makers which probably perform worse than a few handcrafted rules on top of the same features; the tendency to view both as one magical, ineffable black box hampers progress, as does the closely related assumption that an high-performing decision process must be ineffable in itself. Meanwhile, much of the explainablility literature is off in a weird scholastic rabbit hole trying to decide how much of max(5, 4, 3)=5 is “caused” by each of the inputs [I wish I were kidding but see Figure 4 here]).




