(Attention conservation notice: about technicalities in deep reinforcement learning)
I was like “holy shit, these ‘A3C’ methods somehow achieve non-terrible performance immediately, their learning curves shoot up so fast they barely look like learning curves, what is this magic”
and then I looked up what “A3C” is, and apparently it’s an “asynchronous” method which just runs many copies of the environment at once, so when it says it’s experienced “1 episode” it really means “my N parallel workers have experienced N totally distinct episodes”
Is this correct??? If so it seems like it’s ruining a perfectly good metric – you used to be able to plot a learning curve and treat it as a measure of data efficiency, i.e. “this algorithm could do this well in this environment after only T timesteps of experience with it.” But now people are reporting N*T in place of T, for arbitrary N, so learning curves (in themselves) no longer say anything about data efficiency.
And like, data efficiency is kind of important? Perhaps the most important thing? The asynchronous methods are computationally fast, which is important in the real world, yes. But another fact about the real world is, you don’t get to simultaneously interact with N copies of it and pool all that knowledge together. If your robot is based on an algorithm that requires it to have 200 separate timelines in which to try out different actions in parallel, your robot does not work, IRL.
@you-have-just-experienced-things (because you seem knowledgeable about this stuff)
type12error liked this
nostalgebraist reblogged this from type12error and added:
Thanks, that’s a good point. I was getting the same impression from learning curves on the OpenAI gym, which have...
type12error reblogged this from nostalgebraist and added:
Not a expert, but. I don’t think “iteration” and “episode” are synonyms here. One iteration is one weight update /...
automatic-ally liked this
kitswulf liked this
regexkind said: Your robot will need to infer but won’t necessarily be the one to train the neural net
disconcision liked this
eka-mark liked this
typicalacademic liked this
namelessdistribution liked this phenoct liked this
lambdaphagy liked this
thefutureoneandall liked this
