stumpyjoepete asked:
Thoughts on this? https://twitter.com/SamuelAinsworth/status/1569719494645526529
This is a very, very cool paper!!
The underlying idea feels so retrospectively obvious. Like yeah of course different training runs will have “shuffled” channels relative to one another in the general case, and yeah of course this is going to make mode connectivity fail in the worst case. So unless you try to “unshuffle” things, mode connectivity is always a gamble.
Apparently this point was raised in the Entezari et al 2021 paper that they cite. In any case I’d never thought about it.
And then this paper goes ahead and … tries to compute the optimal unshuffling, in a straightforward way, and it just works (on MNIST/CIFAR models at least)? Amazing!
I’m not sure what the bigger picture implications are, but it was very intellectually satisfying to read.
EDIT: see also this thread and replies, this one too

![[Description] A picture of a McDonald's hamburger. On the left, a hand holding a burger. On the right, a hand making the ` [Text]](https://64.media.tumblr.com/d8319d714b15a751165896ad99014344/c8ba1e85c573b172-b7/s500x750/01ef0083fa5ee6f87d14f8f7ced7f67920b0f311.png)
![[Description] A photo of a carved egg on display at a Japanese restaurant. The egg is painted with Chinese calligraphy and has a gold leaf roof. [Text]](https://64.media.tumblr.com/6ba9078ad5320a7f9a2bfff6f4a5b5f6/bb29225a4f65c587-1b/s500x750/c1572aa1aed56aa7908705dbb2c5eacffe1a3198.png)
![[Description] A spherical pillow with a gold embroidered elephant head on it. The pillow is sitting on a wooden table.] [Text]](https://64.media.tumblr.com/b54759b131f9fd043a5f844c7be8a92b/2434cbae5059f790-a6/s500x750/3d74d63e2d0e9d362dc8070268ba51eebf4dbdb2.png)




![[Description] A blue line drawing of a robot with an angry expression and text that says ` [Text]$30.00 USD](https://64.media.tumblr.com/281ed2a642d14b42a5c802187059361c/f2b48645b2939726-62/s500x750/2cea3306386b58664c59621ff662fe3d06c297ce.png)

