Braham Snyder
(rhymes with "Graham" and "Sam")
I create more efficient algorithms for sequential decision-making. I
focus on reinforcement learning because: (i) it is likely important
for outperforming the best decisions in prior data, and yet (ii)
existing algorithms are often unstable or inefficient.
Three aspects for data- and compute-efficiency are so unstable if
combined that they are called the deadly
triad. One of my goals is to fix this instability. I think
moving closer to residual minimization might be one part of the
simplest solution.
I'm a student researcher at UT Austin. I'm fortunate to be advised
by Yuke Zhu, and to
be collaborating with Amy Zhang.
Email
|
Google
Scholar
|
Twitter
|
|
Target Rate Optimization: Avoiding Iterative Error Exploitation
Braham Snyder,
Amy Zhang,
Yuke Zhu
NeurIPS Foundation Models for Decision Making Workshop, 2023
paper
|
(code forthcoming)
To improve the deadly triad instability of two standard algorithms,
we optimize the rate at which their bootstrapped targets are
updated. Our best approach to this, TROT, uses residual
minimization. TROT increases return on almost half of the domains we
test, by up to ~3x.
|
Towards Convergent Offline Reinforcement Learning
Braham Snyder
MS thesis, UT Austin, 2023
paper
Raisin with a higher-level abstract and introduction, and an updated
conclusion. Includes more of my ideas for fixing residual
minimization, and discusses preliminary experiments in those
directions.
|
Raisin: Residual Algorithms for Versatile Offline Reinforcement
Learning
Braham Snyder,
Yuke Zhu
NeurIPS Offline Reinforcement Learning Workshop, 2022
paper
|
ICLR reviews (rejected, top ~30%)
We revisit residual algorithms, averages of residual minimization
and temporal difference learning. We add residual algorithms to a
modern and high-scoring but inefficient offline algorithm. Changing
nothing else, the residual weight hyperparameter reduces the
number of neural networks required by 50x on a standard
benchmark domain.
|
|