Nan Jiang Profile Banner
Nan Jiang Profile
Nan Jiang

@nanjiang_cs

6,926
Followers
73
Following
110
Media
1,888
Statuses

machine learning researcher, with focus on reinforcement learning. asst prof @ uiuc cs. Course on RL theory (w/ videos):

Joined November 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@nanjiang_cs
Nan Jiang
4 years
Learning Q* with + poly-sized exploratory data + an arbitrary Q-class that contains Q* ...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago. And what's the saying? Impossible is NOTHING Exciting new work w/ @tengyangx ! 1/
Tweet media one
Tweet media two
5
11
92
@nanjiang_cs
Nan Jiang
4 years
after consulting my colleagues, I decided to make my 598 lectures publicly available. The video links can be found on the course website, or from this list (). just started proofs of VI and PI, and check out if you are interested in a stat theory of RL!
@nanjiang_cs
Nan Jiang
4 years
Alekh, @ShamKakade6 and I have a (quite drafty) monograph on rl theory . I am also teaching a phd seminar course on this topic (w/ recordings): ; just did 1st lec 2h ago! still figuring out if I can share the videos publicly...
7
25
147
5
154
779
@nanjiang_cs
Nan Jiang
5 years
The entire RL theory is built on objects like V^π, Q*, π*, T (Bellman up. op.), etc... until you realize that this foundation is quite shaky. Spoiler: no big deal (yet) but thinking thru this is super useful for resolving some confusions. (1/x)
4
77
388
@nanjiang_cs
Nan Jiang
2 years
I received the NSF CAREER award. Each submission was month+ effort and I'm glad I get it the 2nd time. Also the detailed reviews & the process were not as delighting as the decision. Some experience & thoughts below: 1/
Tweet media one
33
10
390
@nanjiang_cs
Nan Jiang
22 days
friends must have been bored of me saying this, but clearly not nearly enough ppl know this not all equations can be turned into an optimization loss
Tweet media one
@kchonyc
Kyunghyun Cho
22 days
once @ylecun told me (heavily paraphrased), it's not F=ma but \min (F-ma)^2. i didn't realize its importance, but it is perhaps the most enlightning perspective i've ever heard.
44
41
608
6
10
160
@nanjiang_cs
Nan Jiang
2 years
this paper got Outstanding Paper Award! Congrats to my coauthors (esp. Ching-An and Tengyang). More reasons to check out the details! List of all paper awards:
@nanjiang_cs
Nan Jiang
2 years
Tmr @icmlconf 2:15pm R301, Ching-An will present our ATAC alg: w/ a clever transformation by PD lemma, we turn initial-state pessimistic term from our prior work into *relative* pess and smoothly bridge IL & offline RL, with robust improvement guarantees.
Tweet media one
1
3
40
10
11
149
@nanjiang_cs
Nan Jiang
4 years
Alekh, @ShamKakade6 and I have a (quite drafty) monograph on rl theory . I am also teaching a phd seminar course on this topic (w/ recordings): ; just did 1st lec 2h ago! still figuring out if I can share the videos publicly...
@marcgbellemare
Marc G. Bellemare
4 years
@thienan496 @quocleix @Miles_Brundage @mpd37 We have a monograph on deep reinforcement learning () which covers some of the recent work. Otherwise, much of the non-deep RL work is theory, in which case I am not the expert but perhaps @nanjiang_cs has suggestions.
0
1
22
7
25
147
@nanjiang_cs
Nan Jiang
1 year
Paper I've wanted to share for a while: model-free RL w/o value fns, but w/ *density estimators*! Featuring very unique *double-chain* error induction to overcome seemingly inevitable error exponentiation. Jt w/ students Audrey Huang and Jinglin Chen 1/
Tweet media one
Tweet media two
4
29
137
@nanjiang_cs
Nan Jiang
6 months
As semester draws to end, I want to share this *identity* (h/t @tengyangx ) that connects so many fundamental pieces of the RL theory together: optimism, pessimism, policy opt, proved by PD lemma + Bellman-error telescoping, all in one equation! 1/3
Tweet media one
2
16
125
@nanjiang_cs
Nan Jiang
3 months
🙏
@SloanFoundation
Sloan Foundation
3 months
We have today announced the names of the 2024 Sloan Research Fellows! Congratulations to these 126 outstanding early-career researchers:
Tweet media one
6
40
247
24
0
118
@nanjiang_cs
Nan Jiang
11 months
the first person I ever see outside the (core?) RL community to spontaneously realize this (ridiculous) problem...
@kchonyc
Kyunghyun Cho
11 months
how do people tune hyperparameters in offline reinforcement learning???
24
14
122
2
17
115
@nanjiang_cs
Nan Jiang
1 year
ICML results out! 3/4 acc (congrats to students; thread later). And @tengyangx eventually got a rejection after all. I was worried if I should graduate him, like c'mon, how can a PhD be complete w/o rejections 😜. Now such a relief 😆
0
0
83
@nanjiang_cs
Nan Jiang
12 days
In a few years the next gen of young researchers will find you all weird using the word "agent" in RL as it is supposed to be a dedicated terminology for LLM agents 🫠
9
4
82
@nanjiang_cs
Nan Jiang
6 years
had a very intriguing conversation w/ Alyosha Efros who visited us. agreed on many issues but also debated quite a bit on "RL tests on training data hence overfits". thought it's good time to organize my thoughts on this... bottomline: the statement is wrong if taken literally.
2
17
81
@nanjiang_cs
Nan Jiang
4 years
2 papers accepted to #icml2020 ! the MWQL one I have tweeted about quite a bit b4. the other one is an interesting connection between variance reduction in IS for OPE and that in PG—guess what, they are the same thing! w/ my student Jiawei Huang. congrats Jiawei!
@nanjiang_cs
Nan Jiang
5 years
@neu_rips the 1001st way to derive PG (originally by Jie Tang & @pabbeel here ). turns out you can also derive its entire var reduction family this way... and a new estimator that subsumes most previous ones pops up in this process!
Tweet media one
1
6
20
1
3
78
@nanjiang_cs
Nan Jiang
5 years
Our ICML paper () is online! We revisit core assumptions in the analysis of batch RL (ADP) algorithms and asks whether they are inevitable & hold in interesting settings. 1/x
Tweet media one
1
14
72
@nanjiang_cs
Nan Jiang
5 years
The densest paper I've ever written for a while: . My fav part is how a new world pops up when you swap the roles of importance weights & value functions in the "breaking the curse of horizon" method (Liu, @LihongLi20 et al) 😲 (1/x)
3
16
71
@nanjiang_cs
Nan Jiang
10 months
super lucky to get this tea with a mere 30min wait (peak wait time can be ~4h). pretty sure any other milk tea will feel basically tasteless for a while…
Tweet media one
5
0
68
@nanjiang_cs
Nan Jiang
2 years
An often confused point: Worst-case regret minimization & return maximization are 𝐧𝐨𝐭 the same in offline RL! This is perhaps retrospectively obvious (see🧵below), but do you know there are 𝐢𝐧𝐟𝐢𝐧𝐢𝐭𝐞𝐥𝐲 𝐦𝐚𝐧𝐲 alternatives to regret min and return max? 1/x
Tweet media one
3
11
65
@nanjiang_cs
Nan Jiang
3 months
that feeling of "ok I am now considered to give ok talks" when your advisor, who used to stop students' practice talks (mine no exception) within first 3 slides, praise your presentation 😅 can't thank Satinder enough tho for the communication skills I learned from him
Tweet media one
1
1
63
@nanjiang_cs
Nan Jiang
4 years
#icml2020 causal RL tutorial is interesting! quick notes: (1) combine confounded offline data + online exploration: identify the lower/upper bound of treatment effect from offline data and use it to refine model space (keep those whose predicted effect is in range).
1
7
62
@nanjiang_cs
Nan Jiang
15 days
Coverage is the core concept in offline RL, and in MDPs we use state density ratios… but what is the right concept for POMDPs? Extremely proud of this ICML *rejection* where we discover the right coverage condition for model-free OPE in POMDPs! 1/
Tweet media one
@nanjiang_cs
Nan Jiang
15 days
Causal inf community: am I missing something super basic? Claim: if behavior/logging policy only depends on observables, then there is no confounding whatsoever, no??? rev claims that reward depending latent state creates confounding. AC doubles down and further claims following
Tweet media one
6
2
8
1
10
60
@nanjiang_cs
Nan Jiang
2 years
Will #neurips provide free reg & hotel for top reviewers? @kchonyc My student Jinglin Chen is a top reviewer (his *3rd* (!) reviewer award at neurips), has a 1st-author paper at main conf, and is not given travel award 🙃
1
0
55
@nanjiang_cs
Nan Jiang
2 years
appreciate the recognition and grateful for working with a group of great students :)
Tweet media one
0
0
53
@nanjiang_cs
Nan Jiang
2 years
very first time to get 5/5 acc SHOULD'VE BOUGHT LOTTERY WITH THIS LUCK
0
0
50
@nanjiang_cs
Nan Jiang
4 years
moved to new house on thursday and had no internet. At some point I was prepared to give the @RLtheory talk with my neighbor’s wifi in the yard, holding an umbrella as sun shade... now that the internet is fixed, I’m sry u guys will miss that fun part :P
1
0
50
@nanjiang_cs
Nan Jiang
2 years
writing teaching statement for 3rd yr rev. thought it'd be painful and useless. turns out it brought up nice memory I'd like to share! in fa18 I taught RL thry 1st time, a student frequently challenged me like: "practice doesn't work acc to ur thry. is this really relevant?" 1/
1
2
50
@nanjiang_cs
Nan Jiang
2 years
After yrs I am eventually gonna teach regret min in linear MDPs properly...! A long note but most is "tech prep" on topics of relevance outside RL (eg elliptical potential). Core analysis is surprisingly short: merely *2 pgs* (excl standard covering arg)!
2
7
48
@nanjiang_cs
Nan Jiang
10 months
En route to #ICML2023 . My first in-person big conf ever since pandemic. Last when I did it I just started my faculty job and felt still like a PhD student :) Looking fwd to meeting old & new friends in RL & its theory. Happy to chat and you can also find me at the posters.
Tweet media one
0
1
48
@nanjiang_cs
Nan Jiang
2 years
so long since I last had small hot pot…!
Tweet media one
2
0
46
@nanjiang_cs
Nan Jiang
3 years
First dose!
Tweet media one
0
1
47
@nanjiang_cs
Nan Jiang
4 years
paper accepted to neurips! we are also changing terminology ("confidence interval" to "value interval") to avoid possible confusions pointed out by the reviewers.
@nanjiang_cs
Nan Jiang
4 years
Previously we split minimax OPE into 2 styles (value-learning, in addition to existing weight/ratio-learning), and now it's time to merge them back---a surprising byproduct when we try to quantify bias and relax realizability of these methods: (1/3)
Tweet media one
Tweet media two
2
6
25
2
4
45
@nanjiang_cs
Nan Jiang
1 year
Yes this is a slide I always include. To add another classical one (from Peter Abbeel I believe):
Tweet media one
@AnnaLeptikon
Anna Riedl
1 year
Still, the most insightful slide in all artificial intelligence introductions, if you ask me (From David Silver's 2015 Introduction to Reinforcement Learning)
Tweet media one
24
78
614
1
6
45
@nanjiang_cs
Nan Jiang
3 years
prep for lecture on LP for mdp and shocked: it is said the dual constraint characterizes occupancy of all stat policies, and I was always under the impression that non-stat/history-dep policies might induce occupancies outside the space. turns out… no?? (1/x)
5
0
45
@nanjiang_cs
Nan Jiang
4 years
recent paper accepted to #UAI2020 w/ my student @tengyangx , on how Bellman error minimization style algorithms for learning Q* save you a factor of horizon in error prop and give you straightly defined concentrability coeff compared to AVI.
1
6
44
@nanjiang_cs
Nan Jiang
3 months
Re planning w/ a representation learned w/o reconstruction loss: The discussion (not specifically here, but more general in the community) will be so much more informed if everyone knows what a bisimulation is.
@sirbayes
Kevin Patrick Murphy
3 months
Yann is advocating Model predictive control in a latent space , which is learned without a reconstruction loss, as a way to solve planning, and get truly controllable behavior. I agree.
9
41
392
2
2
44
@nanjiang_cs
Nan Jiang
6 months
En route to Ann Arbor (homecoming!) to give a talk at the CSP seminar at EECS Umich tomorrow!
0
1
44
@nanjiang_cs
Nan Jiang
3 years
A harsh advice I got during PhD: "No one is obliged to read a poorly written doc unless you proved P!=NP." Write the draft, let it sit for a while, read and edit parts nonsense to ppl other than authors, and iter a couple times before submission. ...seems a luxury these days?
@david_picard
David Picard
3 years
@neu_rips On the other hand, if reviewers did not understand what you wrote, I tend to think it's because you didn't explain it well enough.
1
0
13
1
3
43
@nanjiang_cs
Nan Jiang
4 months
after getting stuck on q1 for ~2 weeks, found a surprisingly simple & elegant proof: see bottom of All other ans (incl. in a diff thread) are complicated with unknown dim-dependent const, while this is a few lines & elementary. yet almost no upvotes???
@nanjiang_cs
Nan Jiang
5 months
Concentration ineq twitter(?): in the setting of linear reg (X in R^d, Y in R, Σ=E[XX^T], ||X|| and |Y| bounded), I want to bound the estimation errors of the plug-in estimators for 1. Σ^{½} 2. Σ^{-½} E[XY] w/o paying σ_min(Σ) or alike. Pointers plz (ideally ready to use...)!
2
2
31
1
8
42
@nanjiang_cs
Nan Jiang
2 years
General learnability conditions of Offline & Online RL are being better understood in recent years, tho mostly in parallel. In , we show an interesting connection that the good-old “concentrability” in offline RL implies online learnability!
1
5
43
@nanjiang_cs
Nan Jiang
5 years
My talk at MSR is online now! on our findings and open problems in figuring out minimal assumptions that enable theoretical guarantees for RL. talk was meant to offer a minimalist view of RL accessible to learning theoreticians or even TCS audience. (1/2)
2
8
42
@nanjiang_cs
Nan Jiang
11 months
I am telling this to many ppl recently, that I can't believe I missed this technical point for so long... What's the right notion of coverage in linear MDP? Poll below! A thread that discusses the nuances, connections to OOD/mean matching, and subtle (open?) questions... 1/
Tweet media one
3
5
40
@nanjiang_cs
Nan Jiang
3 years
let me follow-up with one on offline RL...
Tweet media one
@neu_rips
Gergely Neu
3 years
let's see if twitter is ready for RL theory memes
Tweet media one
4
2
123
4
0
40
@nanjiang_cs
Nan Jiang
1 month
me: I _really_ need to start writing this offline RL theory survey I agreed to. Also me: * get into rabbit hole with authors in ICML AC batch * play with OPE code * tweak visualization until satisfaction 🫠
1
0
40
@nanjiang_cs
Nan Jiang
3 months
what's 1x1 convolutions???
11
0
40
@nanjiang_cs
Nan Jiang
2 years
Tmr @icmlconf 2:15pm R301, Ching-An will present our ATAC alg: w/ a clever transformation by PD lemma, we turn initial-state pessimistic term from our prior work into *relative* pess and smoothly bridge IL & offline RL, with robust improvement guarantees.
Tweet media one
1
3
40
@nanjiang_cs
Nan Jiang
3 years
I am co-organizing an ICERM virtual workshop on theory and algos for Deep RL on Aug 2-4, with Sanjay Shakkottai, R Srikant, and Mengdi Wang. You can check out the line-up of speakers & the tentative schedule and register for the event at:
2
6
39
@nanjiang_cs
Nan Jiang
4 years
I am surprised by how many people showed up in the poster session for the DR-PG paper () and that we had an hour long in-depth discussion! (esp. given that I forgot to tweet about it...😂) thanks everyone, and this is an amazing night!
0
1
35
@nanjiang_cs
Nan Jiang
1 year
Prospective student interested in RL4edu said he’s scared meeting w me in 2 ways: b4 he thought I’d kick him out when he mentions the word “applied”, & after he’s scared of my enthusiam. Oh am I SUPER HYPED when it’s RL for *real* x instead of RL for simulator of x. (1/x)
3
0
37
@nanjiang_cs
Nan Jiang
4 years
wonderful talks in the morning! what I *particularly* liked is that these talks not only tell you how well their methods worked, but also *when they will fail*, both by theoretical reasoning and simple and intuitive examples, which I feel is missing in many deep RL papers
@marcgbellemare
Marc G. Bellemare
4 years
The deep RL workshop continues today on the topic of Exploration. Fantastic line up of speakers, @IanOsband @chelseabfinn @wwdabney Alekh Agarwal, discussion chair @joelbot3000 See you there! Schedule: @LihongLi20
0
5
52
0
5
37
@nanjiang_cs
Nan Jiang
23 days
@deliprao you sure there is long-term reputation? I thought internet (and research community) has no memory of subpar things (famous) people did
2
0
35
@nanjiang_cs
Nan Jiang
5 months
Boarding flight to free company t-shirts… I mean NeurIPS. Happy to chat! I mean seriously, don’t take all the t-shirts and leave some to me 🫠 @jasondeanlee
2
0
34
@nanjiang_cs
Nan Jiang
1 month
wrote this down more formally so that I can get it off my mind... If you find the original tweets lack context/background but find the topic interesting, the note might be helpful
@nanjiang_cs
Nan Jiang
2 months
At CISS hearing nice talks on model-based RL. MBRL has the reputation of bad "error compounding", but I realize recently that its theoretical root may be different from what ppl think... The problem may not be error accumulation over *time*, but the one-step error itself! 1/
1
3
28
1
2
34
@nanjiang_cs
Nan Jiang
3 years
I will talk at the offline RL workshop in 5min on off-policy cross-validation and evaluation. Please come and ask questions if you are interested.
1
2
34
@nanjiang_cs
Nan Jiang
1 year
frost on door handle. -20C outside
Tweet media one
1
0
33
@nanjiang_cs
Nan Jiang
5 years
went to grab a lunch box at visit day, and the volunteer looked at me and was like "hey grad student that hasn't signed up shouldn't steal the food here". glad that a staff beside her recognized me as a faculty... this happened to me a couple of times already 😜🤣
2
0
31
@nanjiang_cs
Nan Jiang
6 months
@tengyangx and I already have a paper w Q* in the title, so no hurry writing one and uploading to arxiv 😂
Tweet media one
0
2
30
@nanjiang_cs
Nan Jiang
3 months
Bonus (& shameless plug): want to know how to get the most variance reduction out of the baseline-type control variate? see here:
Tweet media one
@QuanquanGu
Quanquan Gu
3 months
It's intriguing to observe the use of REINFORCE in RLHF. REINFORCE is a classical algorithm utilized to estimate policy gradients for episodic Markov Decision Processes (MDPs). Another notable method is GPOMDP. While both are effective estimators, it's worth noting that neither
Tweet media one
3
20
92
0
2
31
@nanjiang_cs
Nan Jiang
5 months
Concentration ineq twitter(?): in the setting of linear reg (X in R^d, Y in R, Σ=E[XX^T], ||X|| and |Y| bounded), I want to bound the estimation errors of the plug-in estimators for 1. Σ^{½} 2. Σ^{-½} E[XY] w/o paying σ_min(Σ) or alike. Pointers plz (ideally ready to use...)!
2
2
31
@nanjiang_cs
Nan Jiang
4 years
as #icml2020 starts, I eventually got time to... catch up on the real-life RL conf I missed! among the amazing talks, I highly recommend by @prasadNiranjani . check out how various principled RL methods are adapted and integrated in a medical scenario!
0
1
30
@nanjiang_cs
Nan Jiang
1 year
@ylecun By RL do you mean 1. Current algorithms in RL 2. Current problem paradigms in RL research 3. RL as a problem formulation? I’d think world models that you advocate for are captured in 3
1
0
30
@nanjiang_cs
Nan Jiang
5 months
seeing other tweets about Lean recently on how difficult it is to formalize proof in Lean, and was thinking if LLMs can help… and see this..! How long will we have software that could just take one of my papers and turn its proof into Lean? 🤔
@AnimaAnandkumar
Prof. Anima Anandkumar
5 months
Launching Lean Co-pilot for LLM-human collaboration to write formal mathematical proofs that are 100% accurate. We use LLMs to suggest proof tactics in Lean and also allow humans to intervene and modify in a seamless manner. Automating theorem proving
21
327
2K
2
0
29
@nanjiang_cs
Nan Jiang
2 years
every semester I teach the RL thry course, crazy restructuring ideas always come to mind. like getting rid of tabular learning section (that’s just a special case of Tf in F for all f… right??) or neutral DP algs (don’t optimistic/pess algs basically cover all use cases…?)
3
2
29
@nanjiang_cs
Nan Jiang
2 months
when can I post my #icml rebuttal??? it’s ready 🙂
1
0
29
@nanjiang_cs
Nan Jiang
4 years
just received this today! T-shirt for 50th anni of dept of automation in Tsinghua. The only thing is that the design of the T-shirt is quite... simplistic...
Tweet media one
1
0
27
@nanjiang_cs
Nan Jiang
2 months
At CISS hearing nice talks on model-based RL. MBRL has the reputation of bad "error compounding", but I realize recently that its theoretical root may be different from what ppl think... The problem may not be error accumulation over *time*, but the one-step error itself! 1/
1
3
28
@nanjiang_cs
Nan Jiang
2 years
yes yes dimensional analysis In RL, think every reward / value function has a $ sign. make sure to cancel them out cleanly in your sample complexity expressions etc
@bremen79
Francesco Orabona
2 years
You cannot take the logarithm of the Lipschitz constant of a function! A 🧵 about a super common mistake in ML papers 1/10
19
107
804
1
2
28
@nanjiang_cs
Nan Jiang
1 year
Envy all of you at NeurIPS! While I'm not there*, my students will present their 1st-author works Wed/Thu. Please stop by their posters if interested! I will tweet about each paper when it gets close to the session.
Tweet media one
1
0
27
@nanjiang_cs
Nan Jiang
6 months
The talk (which I put a lot of efforts into to prepare) is up
@nanjiang_cs
Nan Jiang
6 months
En route to Ann Arbor (homecoming!) to give a talk at the CSP seminar at EECS Umich tomorrow!
0
1
44
2
5
27
@nanjiang_cs
Nan Jiang
5 years
In some cases we probably need to ask whether individual states are physically meaningful at all. This totally shocked the basic understanding of RL since I was a grad student. If u had similar confusions, read the paper and let me know what u think (like, at ICML)! (6/end)
1
0
26
@nanjiang_cs
Nan Jiang
3 years
speaking of which, can schools make the “top X%” q’s optional? I never look at them at recruiting and it is such a pain when I have to submit a letter to ~20 places. I always tell students not to worry about # apps, but hey grad programs plz make it easier 4 all of us.
@thegautamkamath
Gautam Kamath
3 years
This is the time of year that I apologize to all fellow faculty about Waterloo's absolutely atrocious recommendation letter submission system. I hate it too.
3
0
24
0
1
27
@nanjiang_cs
Nan Jiang
2 years
Life story. If anyone is curious why I had to read 30%-50% papers in my batch as AC, here is the thread for you
@shortstein
Thomas Steinke
2 years
Perspective: You are the AC and need to write a meta review. You don’t have time to read the paper, so you rely on the reviews. But all of the reviews are this template. What do you do?
7
9
67
1
1
26
@nanjiang_cs
Nan Jiang
5 months
I find myself often referring to tweets like this, and it can be hard (even for me) to find them, so I decide to set up a page: Blogposts are probably much better for dissemination (some of my tweets are not very readable...), but I'm too lazy 🫠
@nanjiang_cs
Nan Jiang
5 months
Robust MDP folks: (1) How common is the computational step of finding the worst-case transition against a given policy? (2) Have you seen algs that run natural policy gradient (i.e., state-wise mirror descent) on the Q-fn from the worst-case transition? (1/3)
2
1
11
2
5
26
@nanjiang_cs
Nan Jiang
4 years
Does anyone realize that #neurips initial reviews are due next friday?
1
0
25
@nanjiang_cs
Nan Jiang
3 years
thanks for reminder. just found the rej email in my spam :)
@guyvdb
Guy Van den Broeck
3 years
One day I dream of having a Google research grant accepted. 5th rejection in a row.
10
3
240
1
0
25
@nanjiang_cs
Nan Jiang
3 years
I had the impression that we would be able to upload new figures during #NeurIPS rebuttal (probably b/c other confs using openreview allow pdf updates?). the char limit is gracious and there will be even rolling discussions, so why limit response format to text only?
4
1
26
@nanjiang_cs
Nan Jiang
4 years
yup, conf is weird when u attend for the 1st time, and becomes fun as u make friends. my own story: phil thomas and I met at icml-15 as student volunteers and we talked endlessly at the reg desk, as back then it was quite difficult to find an RL person to talk to :) (1/2)
@edchi
Ed H. Chi
4 years
Many years ago, my late PhD advisor John Riedl said: "Not to worry about not knowing anybody at conferences, because if you keep going, they will all become your friends." Great advice. Gratefully true, as some confs are now inviting me to give talks to my friends. #PhDChat
2
20
227
1
0
26
@nanjiang_cs
Nan Jiang
2 months
Still #MBRL : @KaiqingZhang 's #ciss talk touched on MuZero loss. I happened to have discussed its issues w/ someone else recently: 1. Wrong model can get lower loss than true model in stoch env. 2. Even in dtmn env, dist shift can be exponentially bad! Detail & proof in 🧵 1/
1
1
26
@nanjiang_cs
Nan Jiang
10 days
I'm not at @iclr_conf , but Phil will present our spotlight poster in a few hours. Come see online RL using density ratios---which are **not even well defined**😱 b4 exploratory data is collected, plus very cool **black-box** online-to-offline reduction!
Tweet media one
1
1
27
@nanjiang_cs
Nan Jiang
4 years
Previously we split minimax OPE into 2 styles (value-learning, in addition to existing weight/ratio-learning), and now it's time to merge them back---a surprising byproduct when we try to quantify bias and relax realizability of these methods: (1/3)
Tweet media one
Tweet media two
2
6
25
@nanjiang_cs
Nan Jiang
4 years
every time I type Geoff Gordon's averagers nowadays, I can't help but check multiple times that I am not typing avengers... am I the only one...?
1
1
24
@nanjiang_cs
Nan Jiang
5 years
most interesting paper to me today: seems a very nice middle-ground between importance sampling (exp variance) and model-based (bias amplified by horizon). In their method, you need func approx similar to model-based, but impact of bias is much milder
0
5
25
@nanjiang_cs
Nan Jiang
2 years
@swetaagrawal20 sometimes you find related contents in the appendix. Also, a legitimate reason may be that ppl typically spend less time on verifying rigorously that what failed *really* didn’t work, so these experience may not stand the same level of scrunity as the main (positive) results
2
1
24
@nanjiang_cs
Nan Jiang
4 years
caught the tail of @svlevine 's talk and caught up w/ @tengyuma 's and @EmmaBrunskill 's videos. great to see a focused discussions on pessimism for off-policy RL! looking forward to the afternoon sessions starting w/ @ofirnachum
@LihongLi20
Lihong Li
4 years
Next week, @marcgbellemare and I are organizing a Deep RL workshop as part of Simons Institute's Theoretical RL program, with a great lineup of speakers. All talks will be recorded, and can be viewed live on YouTube channel. See for more details!
2
33
188
1
1
23
@nanjiang_cs
Nan Jiang
4 years
the really neat part of this work is the "traj simulator" that shows that off-policy Monte-Carlo (which we usually associate w/, say, imp sampling) can be very sample-efficient, at least in the tabular setting. (1/3)
@ShamKakade6
Sham Kakade
4 years
2/ This was the COLT 2018 open problem from @nanjiang_cs and Alekh, who conjectured a poly(H) lower bound. New work refutes this, showing only logarithmic in H episodes are needed to learn. So, in a minimax sense, long horizons are not more difficult than short ones!
1
0
19
4
2
22
@nanjiang_cs
Nan Jiang
22 days
now try (Q*-TQ*)^2
@kchonyc
Kyunghyun Cho
22 days
once @ylecun told me (heavily paraphrased), it's not F=ma but \min (F-ma)^2. i didn't realize its importance, but it is perhaps the most enlightning perspective i've ever heard.
44
41
608
0
0
23
@nanjiang_cs
Nan Jiang
11 days
In case anyone who read it is still wondering: see the simple lemma below (h/t Akshay Krishnamurthy). z = c log p/p* for different c (+-1, +- 1/2 etc) gives different useful results.
Tweet media one
@nanjiang_cs
Nan Jiang
12 days
and apparently log(p/p*) can only be way better? Hence the question in the quoted thread. My attempt so far: p/p* is non-neg & E[p/p*] = 1, so by Markov ineq: Pr[p/p* >= t] <= 1/t So Pr[log(p/p*) >= t] <= exp(-t) ok sub-exponential instead of sub-gaussian 🫠 thoughts? 5/end
1
0
1
2
1
22
@nanjiang_cs
Nan Jiang
2 years
Just arrived! First in-person event since forever & likely my only travel in the summer. looking fwd to tmr!
@BeyondrlT
BeyondRL TTIC
2 years
** Workshop, TTIC, July 13-15th: Online decision-making and real-world applications ** -) Why is it challenging to deploy online decision-making alg. in real-world problems?🤨 -) Which models describe these challenges?🤔 -) What is the path towards making RL be practical?😲
3
4
26
1
0
22
@nanjiang_cs
Nan Jiang
1 year
It’s RL problem with a specific kind of structure: rand init state + deterministic transitions. You can also view it as SL with a huge label space (and thus reduction to RL makes sense). And guess what, this exact setting is called structured prediction
@bodonoghue85
Brendan O'Donoghue
1 year
If we view LLMs in an RL way, then outputs are just rollouts from the policy. The 'prompt' is just the initial state, it encodes the starting condition of the agent and, implicitly, a goal. Prompt engineering == finding a good initial state for the agent to achieve your goal.
8
19
203
0
0
22
@nanjiang_cs
Nan Jiang
4 years
The connection between symmetry and conservation laws discovered by Emmy Noether is so beautiful and profound, and was completely eye-opening to me when I thought I had enough exposure (as an outsider) to modern physics.
@Daily_Epsilon
Your Daily Epsilon of Math
4 years
HAPPY BIRTHDAY EMMY NOETHER! Perhaps the greatest woman of mathematics of all time, colleague of Einstein and profoundly influential scientist who basically invented abstract algebra, Noether's legacy is massive. Learn about her below!
1
91
178
2
3
22
@nanjiang_cs
Nan Jiang
2 years
just gave virtual talk in this (in-person) workshop at RLDM. pitty that I couldn't go and thx organizers for being accommodating! looking fwd to the panel shortly (the talk is on a 4-yr old paper w/ 3 conf rej & 1 journal desk rej😜 )
@dabelcs
David Abel
2 years
We have an *amazing* line up of speakers: Sara Aronowitz, Brian Christian ( @brianchristian ), Maria Eckstein ( @eckstein_maria ), and Nan Jiang ( @nanjiang_cs ) Co-organized with the dream team: @aharutyu & @Mark_Ho_ Hope to see you in Providence 😀
0
1
9
3
0
21
@nanjiang_cs
Nan Jiang
3 years
speaking of noisy TV I always have had a problem: isn’t *non-noisy* TV more problematic? after all, pure noise is unpreditable; in contrast, things like Tv shows are somewhat predictable and highly complex that will distract learning algs to spend resources predicting them
@nanjiang_cs
Nan Jiang
3 years
guess what I was teaching today...
Tweet media one
7
0
18
3
0
21
@nanjiang_cs
Nan Jiang
2 years
One big problem I still dunno how to address: I do thry. Panel wants a 5-yr research plan w/ ambitious goals, and also wants concrete & credible technical solutions. That seems a unresolvable conflict for (at least some types of) theory research. 2/
1
2
20
@nanjiang_cs
Nan Jiang
2 years
A blog post that explains our recent offline RL works (Bellman-consistent pessimism + ATAC) by @chinganc_rl and @tengyangx
@MSFTResearch
Microsoft Research
2 years
Online RL agents learn by trial-and-error—not an option for tasks like training self-driving cars. Learn how Microsoft researchers used game theory to design offline RL algorithms that can learn good policies with state-of-the-art empirical performance:
2
19
154
0
1
20
@nanjiang_cs
Nan Jiang
2 months
Out of the ctx of og post, this is more true for TD than for PG. Next time you see someone tries to simplify RL w/ supervised-learning intuition and says TD/Q-learning minimizes Bellman error, ask them how they obtain stop-grad on the next-state term from that reasoning 🫠
Tweet media one
@y0b1byte
yobibyte
2 months
Next time you ask a question why RL researchers are so cynical
Tweet media one
5
3
88
0
4
20
@nanjiang_cs
Nan Jiang
5 years
@neu_rips the 1001st way to derive PG (originally by Jie Tang & @pabbeel here ). turns out you can also derive its entire var reduction family this way... and a new estimator that subsumes most previous ones pops up in this process!
Tweet media one
1
6
20
@nanjiang_cs
Nan Jiang
2 years
gave an tutorial on rl thry virtually@ NUS yesterday & enjoyed interaction w audience. Also got my fav questions: once FQI/E are intro’d, ppl start asking about convergence for SGD under convexity etc TD: hold my *divergence* under inf data and 1-d realizable linear features 😎
1
0
20
@nanjiang_cs
Nan Jiang
6 months
I keep hearing eg “digital twins”, but maybe (a large frac of) real world is just not simulatable? if we want decisions to matter, perhaps just have to deal w the “messiness” of reality like any other “traditional principle”
@EugeneVinitsky
Eugene Vinitsky @ ICRA
6 months
RL struggles as a field because it does not yet have one common, simulated, easily reproduced benchmark where progress would immediately improve an exciting application
35
5
266
2
0
20
@nanjiang_cs
Nan Jiang
2 years
funny, I open up simons RL reunion page and see Scott Aaronson's face and was like "what happened...?"
Tweet media one
2
0
20
@nanjiang_cs
Nan Jiang
10 months
Two posters today at 11a: #119 : Audrey will present our work on learning occupancies for RL (thread below). #330 : Phil will present how we pin down (nearly) the exact form the blowup factors for approx errors in OPE. Stop by and say hi :)
Tweet media one
@nanjiang_cs
Nan Jiang
1 year
Paper I've wanted to share for a while: model-free RL w/o value fns, but w/ *density estimators*! Featuring very unique *double-chain* error induction to overcome seemingly inevitable error exponentiation. Jt w/ students Audrey Huang and Jinglin Chen 1/
Tweet media one
Tweet media two
4
29
137
0
4
19