Nan Jiang @nanjiang_cs Twitter profile | Pikagi

Pikagi

Nan Jiang

@nanjiang_cs

6,926

Followers

73

Following

110

Media

1,888

Statuses

machine learning researcher, with focus on reinforcement learning. asst prof @ uiuc cs. Course on RL theory (w/ videos):

https://t.co/2r6KkAJgFu

Joined November 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@nanjiang_cs

Nan Jiang

4 years

Learning Q* with + poly-sized exploratory data + an arbitrary Q-class that contains Q* ...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago. And what's the saying? Impossible is NOTHING Exciting new work w/ @tengyangx ! 1/

Tweet media one

Tweet media two

5

11

92

Last Seen Profiles

@DrSaraAlthari

@algerians_files

@kaysahiasahi

@Scalene

@FrYonni

@ByGrace26

@ExpiredBL

@stunnaadgaf

@JoyPostell

@Crowngardens

@pasutri_plbg69

@salvalozza40853

@kscerato

@y4sfps

@PhishyyPicks

@parnellyx

@LaxmanSing92019

@MehmetK06805309

@tia_tease

@RNASYeovilton

@SoutaHS

@imVicky____

@MrC_PrimaryEdu

@archieemwood

@SenDennisHisey

@Chainkiller_cr

@srfreish

@stwmaniax

@karinefavilla

@Maxon193119

@RamnLeonar22031

@phenixpotin

@VakisR

@Nick_Colletti

@jandakembangstw

@moguranosenshi

@nanjiang_cs

Nan Jiang

4 years

after consulting my colleagues, I decided to make my 598 lectures publicly available. The video links can be found on the course website, or from this list (). just started proofs of VI and PI, and check out if you are interested in a stat theory of RL!

@nanjiang_cs

Nan Jiang

4 years

Alekh, @ShamKakade6 and I have a (quite drafty) monograph on rl theory . I am also teaching a phd seminar course on this topic (w/ recordings): ; just did 1st lec 2h ago! still figuring out if I can share the videos publicly...

7

25

147

5

154

779

@nanjiang_cs

Nan Jiang

5 years

The entire RL theory is built on objects like V^π, Q*, π*, T (Bellman up. op.), etc... until you realize that this foundation is quite shaky. Spoiler: no big deal (yet) but thinking thru this is super useful for resolving some confusions. (1/x)

Tweet card media

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment...

4

77

388

@nanjiang_cs

Nan Jiang

2 years

I received the NSF CAREER award. Each submission was month+ effort and I'm glad I get it the 2nd time. Also the detailed reviews & the process were not as delighting as the decision. Some experience & thoughts below: 1/

Tweet media one

33

10

390

@nanjiang_cs

Nan Jiang

22 days

friends must have been bored of me saying this, but clearly not nearly enough ppl know this not all equations can be turned into an optimization loss

Tweet media one

@kchonyc

Kyunghyun Cho

22 days

once @ylecun told me (heavily paraphrased), it's not F=ma but \min (F-ma)^2. i didn't realize its importance, but it is perhaps the most enlightning perspective i've ever heard.

44

41

608

6

10

160

@nanjiang_cs

Nan Jiang

2 years

this paper got Outstanding Paper Award! Congrats to my coauthors (esp. Ching-An and Tengyang). More reasons to check out the details! List of all paper awards:

@nanjiang_cs

Nan Jiang

2 years

Tmr @icmlconf 2:15pm R301, Ching-An will present our ATAC alg: w/ a clever transformation by PD lemma, we turn initial-state pessimistic term from our prior work into *relative* pess and smoothly bridge IL & offline RL, with robust improvement guarantees.

Tweet media one

1

3

40

10

11

149

@nanjiang_cs

Nan Jiang

4 years

Alekh, @ShamKakade6 and I have a (quite drafty) monograph on rl theory . I am also teaching a phd seminar course on this topic (w/ recordings): ; just did 1st lec 2h ago! still figuring out if I can share the videos publicly...

@marcgbellemare

Marc G. Bellemare

@marcgbellemare

4 years

@thienan496 @quocleix @Miles_Brundage @mpd37 We have a monograph on deep reinforcement learning () which covers some of the recent work. Otherwise, much of the non-deep RL work is theory, in which case I am not the expert but perhaps @nanjiang_cs has suggestions.

0

1

22

7

25

147

@nanjiang_cs

Nan Jiang

1 year

Paper I've wanted to share for a while: model-free RL w/o value fns, but w/ *density estimators*! Featuring very unique *double-chain* error induction to overcome seemingly inevitable error exponentiation. Jt w/ students Audrey Huang and Jinglin Chen 1/

Tweet media one

Tweet media two

4

29

137

@nanjiang_cs

Nan Jiang

6 months

As semester draws to end, I want to share this *identity* (h/t @tengyangx ) that connects so many fundamental pieces of the RL theory together: optimism, pessimism, policy opt, proved by PD lemma + Bellman-error telescoping, all in one equation! 1/3

Tweet media one

2

16

125

@nanjiang_cs

Nan Jiang

3 months

🙏

@SloanFoundation

Sloan Foundation

@SloanFoundation

3 months

We have today announced the names of the 2024 Sloan Research Fellows! Congratulations to these 126 outstanding early-career researchers:

Tweet media one

6

40

247

24

0

118

@nanjiang_cs

Nan Jiang

11 months

the first person I ever see outside the (core?) RL community to spontaneously realize this (ridiculous) problem...

@kchonyc

Kyunghyun Cho

11 months

how do people tune hyperparameters in offline reinforcement learning???

24

14

122

2

17

115

@nanjiang_cs

Nan Jiang

1 year

ICML results out! 3/4 acc (congrats to students; thread later). And @tengyangx eventually got a rejection after all. I was worried if I should graduate him, like c'mon, how can a PhD be complete w/o rejections 😜. Now such a relief 😆

0

0

83

@nanjiang_cs

Nan Jiang

12 days

In a few years the next gen of young researchers will find you all weird using the word "agent" in RL as it is supposed to be a dedicated terminology for LLM agents 🫠

9

4

82

@nanjiang_cs

Nan Jiang

6 years

had a very intriguing conversation w/ Alyosha Efros who visited us. agreed on many issues but also debated quite a bit on "RL tests on training data hence overfits". thought it's good time to organize my thoughts on this... bottomline: the statement is wrong if taken literally.

2

17

81

@nanjiang_cs

Nan Jiang

4 years

2 papers accepted to #icml2020 ! the MWQL one I have tweeted about quite a bit b4. the other one is an interesting connection between variance reduction in IS for OPE and that in PG—guess what, they are the same thing! w/ my student Jiawei Huang. congrats Jiawei!

@nanjiang_cs

Nan Jiang

5 years

@neu_rips the 1001st way to derive PG (originally by Jie Tang & @pabbeel here ). turns out you can also derive its entire var reduction family this way... and a new estimator that subsumes most previous ones pops up in this process!

Tweet media one

1

6

20

1

3

78

@nanjiang_cs

Nan Jiang

5 years

Our ICML paper () is online! We revisit core assumptions in the analysis of batch RL (ADP) algorithms and asks whether they are inevitable & hold in interesting settings. 1/x

Tweet media one

1

14

72

@nanjiang_cs

Nan Jiang

5 years

The densest paper I've ever written for a while: . My fav part is how a new world pops up when you swap the roles of importance weights & value functions in the "breaking the curse of horizon" method (Liu, @LihongLi20 et al) 😲 (1/x)

3

16

71

@nanjiang_cs

Nan Jiang

10 months

super lucky to get this tea with a mere 30min wait (peak wait time can be ~4h). pretty sure any other milk tea will feel basically tasteless for a while…

Tweet media one

5

0

68

@nanjiang_cs

Nan Jiang

2 years

An often confused point: Worst-case regret minimization & return maximization are 𝐧𝐨𝐭 the same in offline RL! This is perhaps retrospectively obvious (see🧵below), but do you know there are 𝐢𝐧𝐟𝐢𝐧𝐢𝐭𝐞𝐥𝐲 𝐦𝐚𝐧𝐲 alternatives to regret min and return max? 1/x

Tweet media one

3

11

65

@nanjiang_cs

Nan Jiang

3 months

that feeling of "ok I am now considered to give ok talks" when your advisor, who used to stop students' practice talks (mine no exception) within first 3 slides, praise your presentation 😅 can't thank Satinder enough tho for the communication skills I learned from him

Tweet media one

1

1

63

@nanjiang_cs

Nan Jiang

4 years

#icml2020 causal RL tutorial is interesting! quick notes: (1) combine confounded offline data + online exploration: identify the lower/upper bound of treatment effect from offline data and use it to refine model space (keep those whose predicted effect is in range).

1

7

62

@nanjiang_cs

Nan Jiang

15 days

Coverage is the core concept in offline RL, and in MDPs we use state density ratios… but what is the right concept for POMDPs? Extremely proud of this ICML *rejection* where we discover the right coverage condition for model-free OPE in POMDPs! 1/

Tweet media one

@nanjiang_cs

Nan Jiang

15 days

Causal inf community: am I missing something super basic? Claim: if behavior/logging policy only depends on observables, then there is no confounding whatsoever, no??? rev claims that reward depending latent state creates confounding. AC doubles down and further claims following

Tweet media one

6

2

8

1

10

60

@nanjiang_cs

Nan Jiang

2 years

Will #neurips provide free reg & hotel for top reviewers? @kchonyc My student Jinglin Chen is a top reviewer (his *3rd* (!) reviewer award at neurips), has a 1st-author paper at main conf, and is not given travel award 🙃

1

0

55

@nanjiang_cs

Nan Jiang

2 years

appreciate the recognition and grateful for working with a group of great students :)

Tweet media one

0

0

53

@nanjiang_cs

Nan Jiang

2 years

very first time to get 5/5 acc SHOULD'VE BOUGHT LOTTERY WITH THIS LUCK

0

0

50

@nanjiang_cs

Nan Jiang

4 years

moved to new house on thursday and had no internet. At some point I was prepared to give the @RLtheory talk with my neighbor’s wifi in the yard, holding an umbrella as sun shade... now that the internet is fixed, I’m sry u guys will miss that fun part :P

1

0

50

@nanjiang_cs

Nan Jiang

2 years

writing teaching statement for 3rd yr rev. thought it'd be painful and useless. turns out it brought up nice memory I'd like to share! in fa18 I taught RL thry 1st time, a student frequently challenged me like: "practice doesn't work acc to ur thry. is this really relevant?" 1/

1

2

50

@nanjiang_cs

Nan Jiang

2 years

After yrs I am eventually gonna teach regret min in linear MDPs properly...! A long note but most is "tech prep" on topics of relevance outside RL (eg elliptical potential). Core analysis is surprisingly short: merely *2 pgs* (excl standard covering arg)!

2

7

48

@nanjiang_cs

Nan Jiang

10 months

En route to #ICML2023 . My first in-person big conf ever since pandemic. Last when I did it I just started my faculty job and felt still like a PhD student :) Looking fwd to meeting old & new friends in RL & its theory. Happy to chat and you can also find me at the posters.

Tweet media one

0

1

48

@nanjiang_cs

Nan Jiang

2 years

so long since I last had small hot pot…!

Tweet media one

2

0

46

@nanjiang_cs

Nan Jiang

3 years

First dose!

Tweet media one

0

1

47

@nanjiang_cs

Nan Jiang

4 years

paper accepted to neurips! we are also changing terminology ("confidence interval" to "value interval") to avoid possible confusions pointed out by the reviewers.

@nanjiang_cs

Nan Jiang

4 years

Previously we split minimax OPE into 2 styles (value-learning, in addition to existing weight/ratio-learning), and now it's time to merge them back---a surprising byproduct when we try to quantify bias and relax realizability of these methods: (1/3)

Tweet media one

Tweet media two

2

6

25

2

4

45

@nanjiang_cs

Nan Jiang

1 year

Yes this is a slide I always include. To add another classical one (from Peter Abbeel I believe):

Tweet media one

@AnnaLeptikon

Anna Riedl

1 year

Still, the most insightful slide in all artificial intelligence introductions, if you ask me (From David Silver's 2015 Introduction to Reinforcement Learning)

Tweet media one

24

78

614

1

6

45

@nanjiang_cs

Nan Jiang

3 years

prep for lecture on LP for mdp and shocked: it is said the dual constraint characterizes occupancy of all stat policies, and I was always under the impression that non-stat/history-dep policies might induce occupancies outside the space. turns out… no?? (1/x)

5

0

45

@nanjiang_cs

Nan Jiang

4 years

recent paper accepted to #UAI2020 w/ my student @tengyangx , on how Bellman error minimization style algorithms for learning Q* save you a factor of horizon in error prop and give you straightly defined concentrability coeff compared to AVI.

Tweet card media

Q* Approximation Schemes for Batch Reinforcement Learning: A...

We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning. Compared to classical iterative methods such as Fitted Q-Iteration---whose...

1

6

44

@nanjiang_cs

Nan Jiang

3 months

Re planning w/ a representation learned w/o reconstruction loss: The discussion (not specifically here, but more general in the community) will be so much more informed if everyone knows what a bisimulation is.

@sirbayes

Kevin Patrick Murphy

3 months

Yann is advocating Model predictive control in a latent space , which is learned without a reconstruction loss, as a way to solve planning, and get truly controllable behavior. I agree.

9

41

392

2

2

44

@nanjiang_cs

Nan Jiang

6 months

En route to Ann Arbor (homecoming!) to give a talk at the CSP seminar at EECS Umich tomorrow!

Rethinking the theoretical foundation of reinforcement learning

ece.engin.umich.edu

0

1

44

@nanjiang_cs

Nan Jiang

3 years

A harsh advice I got during PhD: "No one is obliged to read a poorly written doc unless you proved P!=NP." Write the draft, let it sit for a while, read and edit parts nonsense to ppl other than authors, and iter a couple times before submission. ...seems a luxury these days?

@david_picard

David Picard

3 years

@neu_rips On the other hand, if reviewers did not understand what you wrote, I tend to think it's because you didn't explain it well enough.

1

0

13

1

3

43

@nanjiang_cs

Nan Jiang

4 months

after getting stuck on q1 for ~2 weeks, found a surprisingly simple & elegant proof: see bottom of All other ans (incl. in a diff thread) are complicated with unknown dim-dependent const, while this is a few lines & elementary. yet almost no upvotes???

Tweet card media

Is the matrix square root uniformly continuous?

Let $\operatorname{Psym}_n$ be the cone of symmetric positive-definite matrices of size $n \times n$. How to prove the positive square root function $\sqrt{\cdot}:\operatorname{Psym}_n \to \operat...

math.stackexchange.com

@nanjiang_cs

Nan Jiang

5 months

Concentration ineq twitter(?): in the setting of linear reg (X in R^d, Y in R, Σ=E[XX^T], ||X|| and |Y| bounded), I want to bound the estimation errors of the plug-in estimators for 1. Σ^{½} 2. Σ^{-½} E[XY] w/o paying σ_min(Σ) or alike. Pointers plz (ideally ready to use...)!

2

2

31

1

8

42

@nanjiang_cs

Nan Jiang

2 years

General learnability conditions of Offline & Online RL are being better understood in recent years, tho mostly in parallel. In , we show an interesting connection that the good-old “concentrability” in offline RL implies online learnability!

Tweet card media

The Role of Coverage in Online Reinforcement Learning

Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement...

1

5

43

@nanjiang_cs

Nan Jiang

5 years

My talk at MSR is online now! on our findings and open problems in figuring out minimal assumptions that enable theoretical guarantees for RL. talk was meant to offer a minimalist view of RL accessible to learning theoreticians or even TCS audience. (1/2)

Tweet card media

On The Hardness of Reinforcement Learning With Value-Function...

Value-function approximation methods that operate in batch mode have foundational importance to reinforcement learning (RL). Finite sample guarantees for the...

www.youtube.com

2

8

42

@nanjiang_cs

Nan Jiang

11 months

I am telling this to many ppl recently, that I can't believe I missed this technical point for so long... What's the right notion of coverage in linear MDP? Poll below! A thread that discusses the nuances, connections to OOD/mean matching, and subtle (open?) questions... 1/

Tweet media one

3

5

40

@nanjiang_cs

Nan Jiang

3 years

let me follow-up with one on offline RL...

Tweet media one

@neu_rips

Gergely Neu

3 years

let's see if twitter is ready for RL theory memes

Tweet media one

4

2

123

4

0

40

@nanjiang_cs

Nan Jiang

1 month

me: I _really_ need to start writing this offline RL theory survey I agreed to. Also me: * get into rabbit hole with authors in ICML AC batch * play with OPE code * tweak visualization until satisfaction 🫠

1

0

40

@nanjiang_cs

Nan Jiang

3 months

what's 1x1 convolutions???

11

0

40

@nanjiang_cs

Nan Jiang

2 years

Tmr @icmlconf 2:15pm R301, Ching-An will present our ATAC alg: w/ a clever transformation by PD lemma, we turn initial-state pessimistic term from our prior work into *relative* pess and smoothly bridge IL & offline RL, with robust improvement guarantees.

Tweet media one

1

3

40

@nanjiang_cs

Nan Jiang

3 years

I am co-organizing an ICERM virtual workshop on theory and algos for Deep RL on Aug 2-4, with Sanjay Shakkottai, R Srikant, and Mengdi Wang. You can check out the line-up of speakers & the tentative schedule and register for the event at:

2

6

39

@nanjiang_cs

Nan Jiang

4 years

I am surprised by how many people showed up in the poster session for the DR-PG paper () and that we had an hour long in-depth discussion! (esp. given that I forgot to tweet about it...😂) thanks everyone, and this is an amazing night!

0

1

35

@nanjiang_cs

Nan Jiang

1 year

Prospective student interested in RL4edu said he’s scared meeting w me in 2 ways: b4 he thought I’d kick him out when he mentions the word “applied”, & after he’s scared of my enthusiam. Oh am I SUPER HYPED when it’s RL for *real* x instead of RL for simulator of x. (1/x)

3

0

37

@nanjiang_cs

Nan Jiang

4 years

wonderful talks in the morning! what I *particularly* liked is that these talks not only tell you how well their methods worked, but also *when they will fail*, both by theoretical reasoning and simple and intuitive examples, which I feel is missing in many deep RL papers

@marcgbellemare

Marc G. Bellemare

@marcgbellemare

4 years

The deep RL workshop continues today on the topic of Exploration. Fantastic line up of speakers, @IanOsband @chelseabfinn @wwdabney Alekh Agarwal, discussion chair @joelbot3000 See you there! Schedule: @LihongLi20

0

5

52

0

5

37

@nanjiang_cs

Nan Jiang

23 days

@deliprao you sure there is long-term reputation? I thought internet (and research community) has no memory of subpar things (famous) people did

2

0

35

@nanjiang_cs

Nan Jiang

5 months

Boarding flight to free company t-shirts… I mean NeurIPS. Happy to chat! I mean seriously, don’t take all the t-shirts and leave some to me 🫠 @jasondeanlee

2

0

34

@nanjiang_cs

Nan Jiang

1 month

wrote this down more formally so that I can get it off my mind... If you find the original tweets lack context/background but find the topic interesting, the note might be helpful

Tweet card media

A Note on Loss Functions and Error Compounding in Model-based...

This note clarifies some confusions (and perhaps throws out more) around model-based reinforcement learning and their theoretical understanding in the context of deep RL. Main topics of discussion...

@nanjiang_cs

Nan Jiang

2 months

At CISS hearing nice talks on model-based RL. MBRL has the reputation of bad "error compounding", but I realize recently that its theoretical root may be different from what ppl think... The problem may not be error accumulation over *time*, but the one-step error itself! 1/

1

3

28

1

2

34

@nanjiang_cs

Nan Jiang

3 years

I will talk at the offline RL workshop in 5min on off-policy cross-validation and evaluation. Please come and ask questions if you are interested.

1

2

34

@nanjiang_cs

Nan Jiang

1 year

frost on door handle. -20C outside

Tweet media one

1

0

33

@nanjiang_cs

Nan Jiang

5 years

went to grab a lunch box at visit day, and the volunteer looked at me and was like "hey grad student that hasn't signed up shouldn't steal the food here". glad that a staff beside her recognized me as a faculty... this happened to me a couple of times already 😜🤣

2

0

31

@nanjiang_cs

Nan Jiang

6 months

@tengyangx and I already have a paper w Q* in the title, so no hurry writing one and uploading to arxiv 😂

Tweet media one

0

2

30

@nanjiang_cs

Nan Jiang

3 months

Bonus (& shameless plug): want to know how to get the most variance reduction out of the baseline-type control variate? see here:

Tweet media one

@QuanquanGu

Quanquan Gu

3 months

It's intriguing to observe the use of REINFORCE in RLHF. REINFORCE is a classical algorithm utilized to estimate policy gradients for episodic Markov Decision Processes (MDPs). Another notable method is GPOMDP. While both are effective estimators, it's worth noting that neither

Tweet media one

3

20

92

0

2

31

@nanjiang_cs

Nan Jiang

5 months

Concentration ineq twitter(?): in the setting of linear reg (X in R^d, Y in R, Σ=E[XX^T], ||X|| and |Y| bounded), I want to bound the estimation errors of the plug-in estimators for 1. Σ^{½} 2. Σ^{-½} E[XY] w/o paying σ_min(Σ) or alike. Pointers plz (ideally ready to use...)!

2

2

31

@nanjiang_cs

Nan Jiang

4 years

as #icml2020 starts, I eventually got time to... catch up on the real-life RL conf I missed! among the amazing talks, I highly recommend by @prasadNiranjani . check out how various principled RL methods are adapted and integrated in a medical scenario!

Tweet card media

Methods for Reinforcement Learning in Clinical Decision Support by...

www.youtube.com

0

1

30

@nanjiang_cs

Nan Jiang

1 year

@ylecun By RL do you mean 1. Current algorithms in RL 2. Current problem paradigms in RL research 3. RL as a problem formulation? I’d think world models that you advocate for are captured in 3

1

0

30

@nanjiang_cs

Nan Jiang

5 months

seeing other tweets about Lean recently on how difficult it is to formalize proof in Lean, and was thinking if LLMs can help… and see this..! How long will we have software that could just take one of my papers and turn its proof into Lean? 🤔

@AnimaAnandkumar

Prof. Anima Anandkumar

@AnimaAnandkumar

5 months

Launching Lean Co-pilot for LLM-human collaboration to write formal mathematical proofs that are 100% accurate. We use LLMs to suggest proof tactics in Lean and also allow humans to intervene and modify in a seamless manner. Automating theorem proving

21

327

2K

2

0

29

@nanjiang_cs

Nan Jiang

2 years

every semester I teach the RL thry course, crazy restructuring ideas always come to mind. like getting rid of tabular learning section (that’s just a special case of Tf in F for all f… right??) or neutral DP algs (don’t optimistic/pess algs basically cover all use cases…?)

3

2

29

@nanjiang_cs

Nan Jiang

2 months

when can I post my #icml rebuttal??? it’s ready 🙂

1

0

29

@nanjiang_cs

Nan Jiang

4 years

just received this today! T-shirt for 50th anni of dept of automation in Tsinghua. The only thing is that the design of the T-shirt is quite... simplistic...

Tweet media one

1

0

27

@nanjiang_cs

Nan Jiang

2 months

At CISS hearing nice talks on model-based RL. MBRL has the reputation of bad "error compounding", but I realize recently that its theoretical root may be different from what ppl think... The problem may not be error accumulation over *time*, but the one-step error itself! 1/

1

3

28

@nanjiang_cs

Nan Jiang

2 years

yes yes dimensional analysis In RL, think every reward / value function has a $ sign. make sure to cancel them out cleanly in your sample complexity expressions etc

@bremen79

Francesco Orabona

2 years

You cannot take the logarithm of the Lipschitz constant of a function! A 🧵 about a super common mistake in ML papers 1/10

19

107

804

1

2

28

@nanjiang_cs

Nan Jiang

1 year

Envy all of you at NeurIPS! While I'm not there*, my students will present their 1st-author works Wed/Thu. Please stop by their posters if interested! I will tweet about each paper when it gets close to the session.

Tweet media one

1

0

27

@nanjiang_cs

Nan Jiang

6 months

The talk (which I put a lot of efforts into to prepare) is up

Tweet card media

Rethinking the Theoretical Foundation of Reinforcement Learning

Nan JiangAssistant Professor of Computer ScienceUniversity of Illinois at Urbana-ChampaignAbstract: Given two candidate functions, can we identify which one ...

www.youtube.com

@nanjiang_cs

Nan Jiang

6 months

En route to Ann Arbor (homecoming!) to give a talk at the CSP seminar at EECS Umich tomorrow!

0

1

44

2

5

27

@nanjiang_cs

Nan Jiang

5 years

In some cases we probably need to ask whether individual states are physically meaningful at all. This totally shocked the basic understanding of RL since I was a grad student. If u had similar confusions, read the paper and let me know what u think (like, at ICML)! (6/end)

1

0

26

@nanjiang_cs

Nan Jiang

3 years

speaking of which, can schools make the “top X%” q’s optional? I never look at them at recruiting and it is such a pain when I have to submit a letter to ~20 places. I always tell students not to worry about # apps, but hey grad programs plz make it easier 4 all of us.

@thegautamkamath

Gautam Kamath

@thegautamkamath

3 years

This is the time of year that I apologize to all fellow faculty about Waterloo's absolutely atrocious recommendation letter submission system. I hate it too.

3

0

24

0

1

27

@nanjiang_cs

Nan Jiang

2 years

Life story. If anyone is curious why I had to read 30%-50% papers in my batch as AC, here is the thread for you

@shortstein

Thomas Steinke

2 years

Perspective: You are the AC and need to write a meta review. You don’t have time to read the paper, so you rely on the reviews. But all of the reviews are this template. What do you do?

7

9

67

1

1

26

@nanjiang_cs

Nan Jiang

5 months

I find myself often referring to tweets like this, and it can be hard (even for me) to find them, so I decide to set up a page: Blogposts are probably much better for dissemination (some of my tweets are not very readable...), but I'm too lazy 🫠

@nanjiang_cs

Nan Jiang

5 months

Robust MDP folks: (1) How common is the computational step of finding the worst-case transition against a given policy? (2) Have you seen algs that run natural policy gradient (i.e., state-wise mirror descent) on the Q-fn from the worst-case transition? (1/3)

2

1

11

2

5

26

@nanjiang_cs

Nan Jiang

4 years

Does anyone realize that #neurips initial reviews are due next friday?

1

0

25

@nanjiang_cs

Nan Jiang

3 years

thanks for reminder. just found the rej email in my spam :)

@guyvdb

Guy Van den Broeck

3 years

One day I dream of having a Google research grant accepted. 5th rejection in a row.

10

3

240

1

0

25

@nanjiang_cs

Nan Jiang

3 years

I had the impression that we would be able to upload new figures during #NeurIPS rebuttal (probably b/c other confs using openreview allow pdf updates?). the char limit is gracious and there will be even rolling discussions, so why limit response format to text only?

4

1

26

@nanjiang_cs

Nan Jiang

4 years

yup, conf is weird when u attend for the 1st time, and becomes fun as u make friends. my own story: phil thomas and I met at icml-15 as student volunteers and we talked endlessly at the reg desk, as back then it was quite difficult to find an RL person to talk to :) (1/2)

@edchi

Ed H. Chi

4 years

Many years ago, my late PhD advisor John Riedl said: "Not to worry about not knowing anybody at conferences, because if you keep going, they will all become your friends." Great advice. Gratefully true, as some confs are now inviting me to give talks to my friends. #PhDChat

2

20

227

1

0

26

@nanjiang_cs

Nan Jiang

2 months

Still #MBRL : @KaiqingZhang 's #ciss talk touched on MuZero loss. I happened to have discussed its issues w/ someone else recently: 1. Wrong model can get lower loss than true model in stoch env. 2. Even in dtmn env, dist shift can be exponentially bad! Detail & proof in 🧵 1/

1

1

26

@nanjiang_cs

Nan Jiang

10 days

I'm not at @iclr_conf , but Phil will present our spotlight poster in a few hours. Come see online RL using density ratios---which are **not even well defined**😱 b4 exploratory data is collected, plus very cool **black-box** online-to-offline reduction!

Tweet media one

1

1

27

@nanjiang_cs

Nan Jiang

4 years

Previously we split minimax OPE into 2 styles (value-learning, in addition to existing weight/ratio-learning), and now it's time to merge them back---a surprising byproduct when we try to quantify bias and relax realizability of these methods: (1/3)

Tweet media one

Tweet media two

2

6

25

@nanjiang_cs

Nan Jiang

4 years

every time I type Geoff Gordon's averagers nowadays, I can't help but check multiple times that I am not typing avengers... am I the only one...?

1

1

24

@nanjiang_cs

Nan Jiang

5 years

most interesting paper to me today: seems a very nice middle-ground between importance sampling (exp variance) and model-based (bias amplified by horizon). In their method, you need func approx similar to model-based, but impact of bias is much milder

0

5

25

@nanjiang_cs

Nan Jiang

2 years

@swetaagrawal20 sometimes you find related contents in the appendix. Also, a legitimate reason may be that ppl typically spend less time on verifying rigorously that what failed *really* didn’t work, so these experience may not stand the same level of scrunity as the main (positive) results

2

1

24

@nanjiang_cs

Nan Jiang

4 years

caught the tail of @svlevine 's talk and caught up w/ @tengyuma 's and @EmmaBrunskill 's videos. great to see a focused discussions on pessimism for off-policy RL! looking forward to the afternoon sessions starting w/ @ofirnachum

@LihongLi20

Lihong Li

4 years

Next week, @marcgbellemare and I are organizing a Deep RL workshop as part of Simons Institute's Theoretical RL program, with a great lineup of speakers. All talks will be recorded, and can be viewed live on YouTube channel. See for more details!

2

33

188

1

1

23

@nanjiang_cs

Nan Jiang

4 years

the really neat part of this work is the "traj simulator" that shows that off-policy Monte-Carlo (which we usually associate w/, say, imp sampling) can be very sample-efficient, at least in the tabular setting. (1/3)

@ShamKakade6

Sham Kakade

4 years

2/ This was the COLT 2018 open problem from @nanjiang_cs and Alekh, who conjectured a poly(H) lower bound. New work refutes this, showing only logarithmic in H episodes are needed to learn. So, in a minimax sense, long horizons are not more difficult than short ones!

1

0

19

4

2

22

@nanjiang_cs

Nan Jiang

22 days

now try (Q*-TQ*)^2

@kchonyc

Kyunghyun Cho

22 days

once @ylecun told me (heavily paraphrased), it's not F=ma but \min (F-ma)^2. i didn't realize its importance, but it is perhaps the most enlightning perspective i've ever heard.

44

41

608

0

0

23

@nanjiang_cs

Nan Jiang

11 days

In case anyone who read it is still wondering: see the simple lemma below (h/t Akshay Krishnamurthy). z = c log p/p* for different c (+-1, +- 1/2 etc) gives different useful results.

Tweet media one

@nanjiang_cs

Nan Jiang

12 days

and apparently log(p/p*) can only be way better? Hence the question in the quoted thread. My attempt so far: p/p* is non-neg & E[p/p*] = 1, so by Markov ineq: Pr[p/p* >= t] <= 1/t So Pr[log(p/p*) >= t] <= exp(-t) ok sub-exponential instead of sub-gaussian 🫠 thoughts? 5/end

1

0

1

2

1

22

@nanjiang_cs

Nan Jiang

2 years

Just arrived! First in-person event since forever & likely my only travel in the summer. looking fwd to tmr!

@BeyondrlT

BeyondRL TTIC

2 years

** Workshop, TTIC, July 13-15th: Online decision-making and real-world applications ** -) Why is it challenging to deploy online decision-making alg. in real-world problems?🤨 -) Which models describe these challenges?🤔 -) What is the path towards making RL be practical?😲

3

4

26

1

0

22

@nanjiang_cs

Nan Jiang

1 year

It’s RL problem with a specific kind of structure: rand init state + deterministic transitions. You can also view it as SL with a huge label space (and thus reduction to RL makes sense). And guess what, this exact setting is called structured prediction

@bodonoghue85

Brendan O'Donoghue

1 year

If we view LLMs in an RL way, then outputs are just rollouts from the policy. The 'prompt' is just the initial state, it encodes the starting condition of the agent and, implicitly, a goal. Prompt engineering == finding a good initial state for the agent to achieve your goal.

8

19

203

0

0

22

@nanjiang_cs

Nan Jiang

4 years

The connection between symmetry and conservation laws discovered by Emmy Noether is so beautiful and profound, and was completely eye-opening to me when I thought I had enough exposure (as an outsider) to modern physics.

@Daily_Epsilon

Your Daily Epsilon of Math

4 years

HAPPY BIRTHDAY EMMY NOETHER! Perhaps the greatest woman of mathematics of all time, colleague of Einstein and profoundly influential scientist who basically invented abstract algebra, Noether's legacy is massive. Learn about her below!

1

91

178

2

3

22

@nanjiang_cs

Nan Jiang

2 years

just gave virtual talk in this (in-person) workshop at RLDM. pitty that I couldn't go and thx organizers for being accommodating! looking fwd to the panel shortly (the talk is on a 4-yr old paper w/ 3 conf rej & 1 journal desk rej😜 )

Tweet card media

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment...

@dabelcs

David Abel

2 years

We have an *amazing* line up of speakers: Sara Aronowitz, Brian Christian ( @brianchristian ), Maria Eckstein ( @eckstein_maria ), and Nan Jiang ( @nanjiang_cs ) Co-organized with the dream team: @aharutyu & @Mark_Ho_ Hope to see you in Providence 😀

0

1

9

3

0

21

@nanjiang_cs

Nan Jiang

3 years

speaking of noisy TV I always have had a problem: isn’t *non-noisy* TV more problematic? after all, pure noise is unpreditable; in contrast, things like Tv shows are somewhat predictable and highly complex that will distract learning algs to spend resources predicting them

@nanjiang_cs

Nan Jiang

3 years

guess what I was teaching today...

Tweet media one

7

0

18

3

0

21

@nanjiang_cs

Nan Jiang

2 years

One big problem I still dunno how to address: I do thry. Panel wants a 5-yr research plan w/ ambitious goals, and also wants concrete & credible technical solutions. That seems a unresolvable conflict for (at least some types of) theory research. 2/

1

2

20

@nanjiang_cs

Nan Jiang

2 years

A blog post that explains our recent offline RL works (Bellman-consistent pessimism + ATAC) by @chinganc_rl and @tengyangx

@MSFTResearch

Microsoft Research

2 years

Online RL agents learn by trial-and-error—not an option for tasks like training self-driving cars. Learn how Microsoft researchers used game theory to design offline RL algorithms that can learn good policies with state-of-the-art empirical performance:

2

19

154

0

1

20

@nanjiang_cs

Nan Jiang

2 months

Out of the ctx of og post, this is more true for TD than for PG. Next time you see someone tries to simplify RL w/ supervised-learning intuition and says TD/Q-learning minimizes Bellman error, ask them how they obtain stop-grad on the next-state term from that reasoning 🫠

Tweet media one

@y0b1byte

yobibyte

2 months

Next time you ask a question why RL researchers are so cynical

Tweet media one

5

3

88

0

4

20

@nanjiang_cs

Nan Jiang

5 years

@neu_rips the 1001st way to derive PG (originally by Jie Tang & @pabbeel here ). turns out you can also derive its entire var reduction family this way... and a new estimator that subsumes most previous ones pops up in this process!

Tweet media one

1

6

20

@nanjiang_cs

Nan Jiang

2 years

gave an tutorial on rl thry virtually@ NUS yesterday & enjoyed interaction w audience. Also got my fav questions: once FQI/E are intro’d, ppl start asking about convergence for SGD under convexity etc TD: hold my *divergence* under inf data and 1-d realizable linear features 😎

1

0

20

@nanjiang_cs

Nan Jiang

6 months

I keep hearing eg “digital twins”, but maybe (a large frac of) real world is just not simulatable? if we want decisions to matter, perhaps just have to deal w the “messiness” of reality like any other “traditional principle”

@EugeneVinitsky

Eugene Vinitsky @ ICRA

@EugeneVinitsky

6 months

RL struggles as a field because it does not yet have one common, simulated, easily reproduced benchmark where progress would immediately improve an exciting application

35

5

266

2

0

20

@nanjiang_cs

Nan Jiang

2 years

funny, I open up simons RL reunion page and see Scott Aaronson's face and was like "what happened...?"

Tweet media one

2

0

20

@nanjiang_cs

Nan Jiang

10 months

Two posters today at 11a: #119 : Audrey will present our work on learning occupancies for RL (thread below). #330 : Phil will present how we pin down (nearly) the exact form the blowup factors for approx errors in OPE. Stop by and say hi :)

Tweet media one

@nanjiang_cs

Nan Jiang

1 year

Paper I've wanted to share for a while: model-free RL w/o value fns, but w/ *density estimators*! Featuring very unique *double-chain* error induction to overcome seemingly inevitable error exponentiation. Jt w/ students Audrey Huang and Jinglin Chen 1/

Tweet media one

Tweet media two

4

29

137

0

4

19