@Allison_Dupont
Engineers will use the concepts tested here all the time. Question 2 tests a formula which almost everyone forgets after taking calculus, but you can't take most college engineering classes without having an understanding of the answers to 1,3,4,5.
@paulg
@ikirigin
That article contains a total of one specific example where the Hamas figures broadly matched reliable estimates by other sources.
The rest of it is just blandly supportive quotes from various NGOs. In the only other example cited (Al Ahli hospital bombing) we have the Hamas
Happy to share this paper, which was recently accepted to SIAM Journal on Control and Optimization.
Actor-critic methods are widely used in reinforcement learning but there is a significant gap between theory and practice...
1/3
You can play 20 questions with GPT-4 by asking it to give you the base64 encoding of the object you want at the beginning of the conversation: I've tried it and it works, see screenshot.
On the other-hand, ChatGPT fails at this.
The fact ChatGPT can’t play 20 Questions reveals an important limitation vs. a human: it can’t keep secrets. It has nowhere to put a memory of an unspoken decision.
In effect, it’s like each token is chosen by a new person, guessing from prior context.
I disagree with this and want to explain why.
In the thread below,
@aryehazan
clarifies that his opinion that current LLMs don't understand is based on interactions with them -- and that he is not philosophically opposed to claiming LLMs understand something.
1/
I'll repeat for the (n+1)th time: current LLMs cannot be said to "understand" any topic, for any reasonable notion of "understand". That said, it's incredibly surprising and impressive (and downright amazing) what the *can* do. Turns out lots of "intelligent" tasks don't require
A recent paper on the distributed subgradient method (with exact gradient evaluations): I show that under a wide range of step-sizes, the distributed version has the linear speedup property, i.e., a network of n nodes is n times faster than a single node.
@michael_nielsen
I remember really enjoying this one as an undergraduate. Instead of developing Abel's theorem as a sequence of theorems and lemmas, it gives you a sequence of not-too-difficult exercises along with hints (and solutions in the back). In the process of solving all the exercises,
@RichardHanania
Arguably you already see a version of this dynamic in Lebanon. Opinion polls show that (i) overwhelming majorities of Lebanese have contempt for Israel and support the Oct 7 massacres (ii) majority of Lebanese want to stay out of the current conflict between Israel and Hamas.
@DimitrisPapail
Along the same lines, a couple of years ago I found this document very helpful,
(as opposed to reading standard ML fare, which has the horrible tendency to describe things in words without just giving you all the equations).
My productivity hack for grinding out math: find a place to work with wifi that has bad wifi quality.
You need the wifi to look up results from the literature. But you also need bad wifi quality to prevent yourself from wasting time on the internet.
for a year when I was in school, I lived very close to a Somali coffee shop
Unlimited coffee for $1 and they were open till 2am
AND, the best part -- no wifi and poor cell reception
I studied more in that year than I have before or since
I wasn't fully satisfied with existing expositions of the policy gradient theorem -- I wanted a short proof I could present to undergrads, without mathematically dubious steps, and each step seeming well-motivated by what preceded it -- so I wrote this up:
I have noticed the same, and I think this is strong evidence against the analogy between LLM hallucinations and compression artifacts proposed recently by Ted Chiang ( see ).
Chiang's starting point was that LLMs are effectively compressing a vast amount
One odd thing about ChatGPT: I may be the one hallucinating, but simply telling it to please not make up paper references seems to substantially improve performance
@wfithian
There's a general pattern prevalent on Twitter and other social networks where people love a pile-on:
(1) Someone gives advice which works in our world but not in an ideal world
(2) Said advice actually involves tradeoffs that wouldn't exist in an ideal world
(3) The
Excited to be giving a tutorial at Allerton in a couple of weeks:
I'll talk about some recently elaborated connections between reinforcement learning and gradient descent. If you'll be at Allerton this year, I hope to see you there.
But if I beg it to tell the truth and say it that it would pain me to hear false information, the hallucinated information gets discarded and I get a 100% correct reply.
3/3
@lreyzin
If you had told me one month ago that soon there'd be something called a "B-word" that people are not spelling out in reviews, there's about a zero percent probability I would have guessed what it turned out to be.
Together with Julien Hendrickx, I'm teaching a week-long course on "Dynamics and Algorithms on Networks" at Université Paris-Saclay in June 2022. This is aimed at grad students who want an introduction to recent developments in the area. Registration is at
@darengb
@TheStalwart
I've tried that and it didn't work: it seems too hard for it to produce a valid hash on the examples I tried. On the other hand, asking it to write the number in base64 seems to work:
You can play 20 questions with GPT-4 by asking it to give you the base64 encoding of the object you want at the beginning of the conversation: I've tried it and it works, see screenshot.
On the other-hand, ChatGPT fails at this.
Several of us are organizing a special issue of TCNS on Social Networks. Please consider sending us your work: we welcome both papers with a methodological contribution as well as interdisciplinary papers containing experimental research.
@goodside
Doesn't it seem intuitive that the prompt could be improved by asking for the justification *first*, and only then the yes/no/unknown?
All my intuition from playing around with language models suggests that asking for answer first and only then reasoning will occasionally lead
I wrote a short post about using Metropolis weights in consensus -- a very simple trick for avoiding bad network scaling that seems to be underused in the multi-agent control and distributed optimization communities.
@kamilkazani
And yet at one time both Egypt and Jordan maintained that Israel has no right to exist and sought to end it through military force. A peaceful settlement only followed repeated military failures on the part of those nations, and others, to destroy Israel.
@y0b1byte
The definition in terms of inner products is already geometric:
Given a linear map A, one defines A* to be the unique map with the property that,
angle between Ax and y = angle between x and A*y
@thesasho
Makes sense in a way: if you spend the vast majority of your day writing code or reading/commenting on other people's code, then making sense of code will be second nature and feel instantaneous to you in a way that parsing equations is not.
@Ike_Saul
Simon posted a convincing rebuttal to just one of the assertion yous made ("Israel is unwilling..."). Israel is clearly willing, and has tried on several occasions to create a viable peace plan, only to be rejected. Your response here doesn't really defend what you originally
@RealDianeYap
As others in your replies have said, this story is likely false. There is now a video of the explosion in question and it does not appear to be consistent with an airstrike. Some stills are at
but you can also find lots of high-quality discussion on
I'm many months late to the party here, but image generation from text feels amazing. Here's the output after putting in "the ocean at dusk | unreal engine" into VQGAN + CLIP:
Agree with this.
When AlphaZero learns it needs to keeps its King safe before it launches an attack, I couldn't care less if it "really" understands chess or merely simulates such understanding. Either way it kicks my ass.
I cannot tell you on what date deeply superhuman AGI systems will appear, but when they do, I can guarantee that a considerable fraction of the chattering classes will dismiss them as trickery or say “we don’t even have a good definition of intelligence.”
While I'm not an expert in this area, I thought this post was interesting and hope it stimulates a discussion.
One of the strengths of ML as a scientific field is the willingness of people to offer criticisms in public. In other areas I work in, people tend to keep their
New blog post: Yet Another ICML Award Fiasco
The story of the
@icmlconf
2023 Outstanding Paper Award to the D-Adaptation paper with worse results that the ones from 9 years ago
Please share it to start a needed conversation on mistakenly granted awards
Apropos of nothing, here is a horror story that happened 4-5 years ago when I was a reviewer on a COLT paper. Fortunately for me, this story happened to someone else.
The paper that I was a reviewer for was reasonable but not amazing. I thought it was neither a clear accept nor
@Osinttechnical
@oryxspioenkop
There is probably strong sampling bias in these numbers because Ukrainians are more likely to put out the videos on which this data is based (both to drum up morale and because of the Ukrainian civilians who like to make videos of damaged/abandoned Russian equipment).
@shortstein
Having worked in both, I felt much more satisfied with work in journal-driven fields: because there is no deadline, there's not an incentive to send out the paper before it is fully finished, completely to your satisfaction, with every last bit polished and revised as needed.
A writeup of some recent research from my group on finding a lockdown that minimizes job losses while holding down the reproduction number of an epidemic. Results turned out to be really counter-intuitive: the best lockdown was sometimes harshest in places with few infections.
New research by CISE Affiliated Faculty
@alexolshevsky1
(ECE) attempts to minimize job losses due to COVID-19 lockdowns. Olshevsky hopes to influence policymakers in future pandemic lockdowns. Learn more about the research here:
#COVID19
#OptimalLockdown
Very excited to share this paper
with Haoxing Tiang and
@YPaschalidis
which is scheduled appear in ICLR 2023 in Kigali. Quick summary of the result below. (1/6)
Many professors are reporting GPT-4 is getting good grades on their exams.
Well, I tried giving it a midterm from my graduate RL class and it performed abominably.
Below, see two attempts by GPT-4 to argue that a symmetric, stochastic matrix is nonexpansive.
@natfriedman
100% accurate summarization: GPT-4 can one-shot book summaries with almost 100% accuracy while GPT-3.5 gets confused whenever it's not obvious what information needs to be kept out and what should be left in.
Where this comes up: I'm using GPT-4 to generate 1-3 sentence book
GPT-4 seems to have improved a lot in the last month, especially on math problems.
It used to be that asking it for a proof that gradient descent converges under some standard assumptions produced nonsense. Now you more or less get the correct standard analysis for non-convex
I respect institutions like Georgia Tech which have a policy against taking institutional positions on controversial issues. I believe all universities need to adopt this approach.
But...
This is a thousand percent correct. Personally:
-- The $20 a month I pay for access to GPT-4 (through ChatGPT+) just might be the best money I've ever spent.
-- There is a large gap between the abilities of ChatGPT and GPT-4. ChatGPT will occasionally act like a "stochastic
There's a weird divergence right now where people skeptical of LLMs of course don't pay $20 to access chatgpt4, think chatgpt3/bard are state of the art, feeding into their low opinion of LLMs
vs those who paid the $20 and hence have a completely different experience with LLMs
@EugeneVinitsky
One change I'd love to see: evaluate the meaningfulness of a citation using some NLP, with more meanginful citations counting more. For example,
"Related works include [1]-[23]..."
should count differently from
"Our main result is an extension of a theorem from [7]."
The University of Toronto
@UofT
math department
@UofTMath
held an "Equity Forum" on October 31. Faculty attendance was conditional on attendees signing a petition denouncing Israel...
Zhong had 3.97 unweighted & 4.42 weighted GPA, scored 1590 out of 1600 on SAT's, founded his own startup, but was rejected by 16 colleges. They include MIT, Carnegie Mellon, UC Berkeley, Cal Poly SLO. But Google called. Watch interview w/
@abc7kristensze
:
@florian_dorfler
@aanna_mit
I wish all the control journals would adopt the LCSS model: if you submit by a certain date and the first round of review comes back sufficiently positive, the reviews are then forwarded to a conference, and the conference will typically invite you to give a presentation.
I don't support cancel culture -- I don't support firing people for their controversial beliefs -- unless their "controversial beliefs" are that people of a certain ethnicity who may be living in a certain region need to die, in which case I absolutely support cancel culture.
What should we conclude from this? That the model doesn’t understand causality?
No -- the followup contradicts this – rather GPT-4 misreads
@yudapearl
question in exactly the same way a typical person on the internet would.
7/
@DegenRolf
Not quite in the same category, but
argues (as far as I've been able to make out) that governments are neglecting to investigate UFOs because aliens could undermine the notion of "anthropocentric sovereignty" on which modern state power is based.
Spent some more time this weekend playing around with one of the public VQGAN + CLIP notebooks. I had to use several prompts sequentially to generate this image, with the final one being "Storm on the Sea of Galilee by John Constable | Unreal Engine | Matte Painting"
@aryehazan
Sadly, the kind of papers that you are not crazy about will often sail through the review process: often reviewers will be intimidated by all the technicalities needed to make things work in the super-general setting.
@ESYudkowsky
@skdh
We really should be talking about "passing the (n, x)-Turing test" where:
-- x is a quantification of the expert level of the opponent (e.g, 0 for random person who doesn't know anything about AI, 0.85 for someone who does research on LLMs, 1.0 for the principal inventors of the
@srchvrs
I've noticed a similar phenomenon with the standard Lion/Goat/Grass puzzle where you have to ferry all three across the river.
GPT-4 solves the puzzle perfectly, but if you rename the requirements (e.g., "can't leave Lion and Grass alone"), it will produce nonsense even though
I have no respect for certain university presidents who have never had problems taking institutional stances but discovered the virtues of neutrality immediately after the largest single-day slaughter of Jews since the holocaust.
Likewise, there are two reasons to get a question wrong: lack of understanding and poor alignment to give correct answers.
If you get a bad answer, you never know which of these two properties you got. So you can't go from bad answers to lack of understanding
12/
For example, asking ChatGPT for control theorists at Boston University gives an answer that is 50% accurate.
Items 1 and 4 in the list below are not correct.
2/3
Wish this sort of thing was more common. The default norms in every scientific community I've been a part of favor puffing up papers to match page limits. Personally, I find "short and sweet" papers infinitely preferable.
All this runs counter to our intuition as professors: we examine students all the time and while students can emulate some of the things we do in class, the only way to get the right answers consistently is genuine understanding.
So we look for that consistency.
9/
We've mostly forgotten about this but 8 years ago every political webzine had their own scientists building election models, and most of these geniuses made predictions by assuming differences between polls and election results were independent across all 50 states.
I'm very frustrated reading a proof that says "the conclusion is immediate from equation (XX)", when in fact the conclusion is not obvious from the stated equation.
What's worse is the author of the paper is me 2 years ago
Very much hope this proof stands up to scrutiny: it would be the most spectacular refutation of the Hardy quote from A Mathematician's Apology:
"No mathematician should ever allow himself to forget that mathematics, more than any other art or science, is a young man's game. ...
100% this.
To those who didn't follow this controversy, it seems to have occurred because the editor in chief of a journal linked to a (satirical) Onion article entitled "Dying Gazans Criticized For Not Using Last Words To Condemn Hamas."
Completely ridiculous to suggest
I have seen the tweet and I don’t understand what the basis for this investigation is.
If the
@eLife
code of conduct can be construed as forbidding affiliated scientists from publicly and disagreeably expressing unpopular political opinions, then it should be revised.
This is true but there are notable methods which are fancy and work well. I was shocked by how many bells and whistles went into PPO (this ICLR blog post was traumatic reading: ) and yet PPO generalizes to new domains better than many competing methods.
To sum up: *if* you believe “understanding” is a term which can, in principle, be reasonably applied to an LLM, you should reason from its successes to to conclude that, at least in some cases, it is capable of genuine understanding.
18/18
@aminkarbasi
@thegautamkamath
Doron Zeilberger, a champion of computer-aided mathematics, has put his PC as a co-author on over 30 papers:
I can't decide if this is really, really stupid or A+ trolling (though I guess the two are not mutually exclusive).
Best paper awards make sense in "static" areas where what is considered important changes slowly. I imagine no one in math would object to an award to a paper that made important progress towards a resolution of the Riemann Hypothesis.
But in fields where what is considered
It’s obvious to every thinking person in the ML community that we should kill the “best paper” award as an institution. It’s an impossible task. Our highest aspiration for it is “don’t embarrass ourselves”. Kill it. I wouldn’t even put it to a vote.
I want to share this letter put together by some of my colleagues which I signed in the aftermath of the Hamas massacre.
I am a little late on this, but it is as relevant today as it was two weeks ago.
The letter is open to signatures from members of BU faculty.
An open letter to the
@BU_Tweets
community on the massacres in Israel.
TL;DR: Historical context and nuance are important, but *nothing* serves as a justification for Hamas's acts against humanity.
Link to sign is within for the interested BU faculty.
Intriguing research by
@CollinBurns
suggests that you can sometimes tell whether the model is telling the truth sometimes from looking at it’s activations directly. In other words, the model can “know” it’s making stuff up:
13/
How can we figure out if what a language model says is true, even when human evaluators can’t easily tell?
We show () that we can identify whether text is true or false directly from a model’s *unlabeled activations*. 🧵
@lreyzin
I enjoy watching human players more than AIs. Humans have understandable goals and plans and make occasional blunders. Watching Stockfish is not fun and even Alpha/Leela Zero, which have more exciting playing styles, have plans that are typically beyond human comprehension.
@florian_dorfler
As others have said, the videos by
@brianbdouglas
are great. There is also this set of lectures from 1980s,
which looks like a good resource. Finally, I've also recommended this set of lectures
to students in the past.
@SebastienBubeck
Every time I see people on Twitter talk about how Bard is much improved followed some update, I give it the same test and it fails every time (see screenshot).
Interesting case where a surprising amount of anger was directed at the creator of a website summarizing books with statistics on writing style (frequency of adjectives, proportion of passive voice, etc).
Not sure if this is an answer to
@thegautamkamath
's challenge, but
Interesting to see how polarizing this was, from Twitter reactions:
- authors were strongly against the site, insisting that it be shut down and the data deleted
- CS/ML folks were astounded by such a gross overreaction
Anyone want to argue contrary to their group?
Wow -- this obtains a 4/5 on the AP Calculus BC test (whereas GPT-3 scored close to the zeroth percentile).
How long until this thing is better than me at proving theorems?
We’re releasing GPT-4 — a large multimodal model (image & text in, text out) which is a significant advance in both capability and alignment.
Still limited in many ways, but passes many qualification benchmarks like the bar exam & AP Calculus:
@aryehazan
....but to get to that real solution, you have imaginary terms that cancel from the formula (e.g., something like 2+3i + (2-3i) = 4); and there was no known way to get to the real solution using steps that didn't have square roots of negative numbers.