Mystery AI Hype Theater 3000
Mystery AI Hype Theater 3000
Episode 5: Sam Bankman-Fried's Future Fund, November 9 2022
Emily and Alex discuss Sam Bankman-Fried's Future Fund, its essay contest, and the problems with using AI for prediction and resource allocation--mere days before the collapse of FTX. Also, we introduce our "What in the Fresh AI Hell?" segment!
This episode was recorded on November 9, 2022.
Watch the video of this episode on PeerTube.
References:
Bill Howe - Applied AI in High-Expertise Settings or Curation as Programming
Samir Passi and Solon Barocos - "Problem Formulation in Fairness"
Vinod Prabharakan, William Isaac, Donald Martin Jr - Participatory Problem Formation for Fairer Machine Learning
David Ribes, Andrew S Hoffman, Steven C Slota and Geoffrey C Bowker -
The Logic of Domains
Shoshana Zuboff - The Age of Surveillance Capitalism
Lee Vinsel on Criti-hype - Notes on Criticism and Technology Hype
You can check out future livestreams at https://twitch.tv/DAIR_Institute.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Twitter: https://twitter.com/EmilyMBender
- Mastodon: https://dair-community.social/@EmilyMBender
- Bluesky: https://bsky.app/profile/emilymbender.bsky.social
Alex
- Twitter: https://twitter.com/@alexhanna
- Mastodon: https://dair-community.social/@alex
- Bluesky: https://bsky.app/profile/alexhanna.bsky.social
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.
ALEX HANNA: Welcome everyone!...to Mystery AI Hype Theater 3000, where we seek catharsis in this age of AI hype! We find the worst of it and pop it with the sharpest needles we can find.
EMILY M. BENDER: Along the way, we learn to always read the footnotes. And each time we think we’ve reached peak AI hype -- the summit of bullshit mountain -- we discover there’s worse to come.
I’m Emily M. Bender, a professor of linguistics at the University of Washington.
ALEX HANNA: And I’m Alex Hanna, director of research for the Distributed AI Research Institute.
This is episode 5, which we first recorded on November 9th of 20-22. And we’re talking about the Future Fund, which was a grant-making arm of Sam Bankman-Fried’s now-notorious cryptocurrency exchange, FTX.
EMILY M. BENDER: And if you note the date, we recorded this conversation *before* FTX collapsed. Because there was plenty to criticize even before it all went to crap.
Plus, we kick off a new segment we like to call, “What in the Fresh AI Hell?!” with a lightning round of assorted piping hot hype.
EMILY BENDER: Good morning Alex! How are you?
ALEX HANNALEX HANNA: Good morning Emily. I'm doing well. How are you doing?
EMILY M. BENDER: I'm doing pretty good. I've been up for three and a half hours and I spent one of them running so–
ALEX HANNA: That's exciting.
EMILY M. BENDER: Life is good.
ALEX HANNA: I'm glad that you're getting that. I can't really work out in the morning because it makes me tired for the rest of the day.
EMILY M. BENDER: Everybody's different. I am so much better for the rest of the day if I work out in the morning right.
ALEX HANNA: That's awesome. Well welcome everyone to Mystery Science– Mystery AI Hype Theater 3000. And if you got last time we were talking with a great panel of experts um talking about AI art, but now we are going to really get into it. We're going to be dealing with some really bad AI hype this week.
EMILY M. BENDER: Right but but keep in mind the level of discourse is going to go sinking right back down to our usual because we don't have those experts with us.
ALEX HANNA: Exactly, right? We're just going to be doing this and you know I'm I'm I'm I'm here for it you know. This is great. All right, wonderful. And so cool. So let's let's get into it. So what's the first thing on tap today? So so these are these are all things this one you you sent to sent to me a few weeks ago about and you want to give a little background on it?
EMILY M. BENDER: uh uh sure. Should I share the screen and let people know what it is, let the people look at it while I'm talking? okay I can't quite talk and share at the same time, so.
ALEX HANNA: Let's do it.
EMILY M. BENDER: All right do you see what you want me to see. Okay yes um so this is uh something called Future Fund which I gather is associated with the longtermist effective altruist side of things and they put out this contest a few weeks ago. Um and it's basically a you know "change my mind thing."
ALEX HANNA: yeah.
EMILY M. BENDER: Right, um and I was just so riled up on reading it, I'm like Alex we should do this one.
ALEX HANNA: Right.
EMILY M. BENDER: For the next Mystery AI Hype Theater 3000.
ALEX HANNA: Totally.
EMILY M. BENDER: But then we had better things to do with um talking to experts about AI art, so it sort of sat for a bit.
ALEX HANNA: right
EMILY M. BENDER: um and then I haven't actually looked at it in the meantime so I'm coming into this pretty fresh and what I remember from looking at it was first of all it's just like so deep in its own assumptions about like AI and how to think about AI, so that'll be fun to take apart. But also this is Future Fund is money right? Money
ALEX HANNA: totally
EMILY M. BENDER: meant to be used for research and they're setting this thing up as a prize so basically asking a bunch of people to do some work and then maybe paying some of them for it after the fact, which just makes me mad.
ALEX HANNA: Right right please do a bunch of work and then yeah but
EMILY M. BENDER: yeah
ALEX HANNA: In any case let's let's get into like what this is because I was kind of a bit befuddled about what was happening in this actual prize until I started reading this. Because kind of the first you know kind of paragraphs on it was trying to describe you know like-- and what they're doing is is that they want to have people write work or essays on the future of AI and um they're going to give this kind of wild range of money for this um and so. I'm seeing our screen is kind of clipping this um I'm gonna make a no this is my fault on the-- I'm going to adjust and yeah my broadcaster on my OBS end, I have to um so it's going to cut off our faces a little bit but it's it's fine.
EMILY M. BENDER: I'll sit over here. Does this help?
ALEX HANNA: The other the other way the other way! oh no here I'm just going to center us actually. I'm gonna do this this works better. Squeezing our screen in. All right great okay now we're totally in in all the tech stuff this is it.
EMILY M. BENDER: Great
ALEX HANNA: So what it is is this is like um you know they're offering this prize for um writing on what is considered the future of AI and I mean it's already on the premises you kind of know what these politics are given that first we expose our assumptions: "We think that AI is a development most likely to dramatically alter the trajectory of humanity in this Century um and it is consequently one of our top funding priorities–"
um and then and then I think you know they start sort of you know they they have as one of the premises in which they say "We think that's really possible that one, all this AI stuff is a misguided sideshow two, we should be even more focused on AI or three, a bunch of this AI stuff is basically right but we should be focusing on entirely different aspects of the problem."
EMILY M. BENDER: All right all right should we put something up on PredictIt about which of these they're actually going to award money for because my money's on this one. They're gonna like some essay that says no no no you should be even more focused on AI.
ALEX HANNA: right right and it's I mean in the way that this thing is basically written out is that the focus is much more on you know this this second one and not really on this all this AI stuff is a misguided sideshow.
Um you know I think I think you're you're already sort of showing your cards pretty early on right? And so I mean it's I mean the way that the rest of this is and it's it's I found that a lot of this is written in a way even the format that the way that they're adjudicating this is by giving uh a lot of these to what are called superpredictors um and they don't really--or superforecasters--and it's and it's and these are sort of judges and my sense and they're they're building off the work of a Phil Tetlock here who's is a a social psychologist I believe, and in which you know these are people I think who are considered to be kind of right, they've had some kind of track record of being right a few of the time but they these are the sort of authority of of judges.
So to me the sort of this sort of seems to read like we're gonna give this to like a bunch of Nate Silvers, who are like may have been right you know a few times uh and then you know had a spectacular failure in 2016 in forecasting the US presidential election and has become an incredibly terrible um contrarian on Twitter since.
EMILY M. BENDER: Yeah yeah yeah. So the super forecaster thing it's like so that's this sort of put forward as a here's a property that people have, they're good at forecasting "the future" right and as if that is one specialty and not yeah there's some people who have expertise in XYZ um you know science or technology or social phenomenon, and so they know a little bit better how to look at what's happening right now and you know make some educated guesses about what comes next, in that domain that they have some expertise in, as opposed to yeah these people have been so good at it you know they're they're the Nate Silvers who know how to do and I'm sorry I'm getting distracted by Kate Silver who is a far better...
ALEX HANNA: I know Kate Silver is my roller derby name and unfortunately the name of the stream and it's uh it's been it's been it's been haunting me since, but I'm too I'm too bought into the to to the brand that I need to stick with it.
EMILY M. BENDER: Yeah no I get it we can we can we can reclaim the *ate Silver space with something better, right?
ALEX HANNA: Right
EMILY M. BENDER: um or you can and I will hopefully help you. But um so so there's this like Nate Silver version of it or there's the like we found some people who we think you know actually have crystal balls and know how to operate them, like you know it just ah um
ALEX HANNA: Yeah it's already infuriating in the sort of judgment and I love the there's we've we've and and Em is in the chat we we actually we're on Masto uh last night making jokes about um Nates and various metals and they're in the chat saying like like we were and I think someone yeah said Nate Cobalt 60 that anyways. There's a whole thread. It was it was it was fun. Anyways we digress from the rest of this.
So they wanted to go into this super forecaster thing and then I mean let's let's go down because I I it's really I mean they're also saying we're sort of testing this idea of philanthropy to be based on prizes um for funding which is problematic in its own right and talking about the kind of ways in which you need some kind of discrete or some kind of impact guided philanthropy.
That is... You know like philanthropy is a messy ass field and I'm working in non-profit space right now which is new for me. It is also the case that you know you're not going to have that turn around especially for people working on really important kinds of things. If you're working in the space of sort of digital rights, you know these are things that are going to these this the sort of change that you want to see is going to take three five ten years and and it's really bizarre as a motive of of fundraising. Can you go ahead.
EMILY M. BENDER: Right well and so the thing about prizes here is that you can only attract the work of people who can do the work without getting paid first.
ALEX HANNA: yeah
EMILY M. BENDER: um and so you know given the current world we live in they are just setting up so that they are not going to listen to the people who probably have the most information about what's going on about the actual things threatening humanity, right, because this is all about "what is the biggest threat to humanity" um
ALEX HANNA: right know the 40 years from now
EMILY M. BENDER: The people who like I was saying before are experiencing the things that would let them speak to that very well do not have the time to do unpaid work in the hope of maybe getting a prize of you know somewhere between 15 000 and 1.5 million dollars right?
ALEX HANNA: Yeah it's it's it's already gonna be incredibly biased. All right shall we get into the actual kinds of things because there's there's one part the actual premises okay let's yeah talk about this Emily because it's it's a doozy here.
EMILY M. BENDER: "We think AI is the development most likely to dramatically alter the trajectory of humanity in this century" all right climate change?
ALEX HANNA: Yeah right.
EMILY M. BENDER: Okay at the climate change on top and it's all connected but you know what about the failures of various democracies, what about increasing inequality, what about you know global pandemics, what about like I can probably do 10 things that I think off the top of my head, but okay.
ALEX HANNA: Yeah.
EMILY M. BENDER: Um "It's already posing serious challenges transparency interpretability algorithmic bias and robustness to name just a few" uh yeah there's really easy ways to address those challenges and basically not deploying it. Like it's not something that's just sort of out there happening on its own.
Um people are building this and people can decide how much we want to use it.
Um but "Before too long advanced AI could automate the process of scientific and technological discovery leading to economic growth rates well over 10 percent per year".
ALEX HANNA: yes yes there's a there's a there's there's a lot in that here and I mean I think the for the next paragraph is really I looked at quite askance. "With the help of advanced AI we can make enormous progress towards ending global poverty, animal suffering, early death and debilitating disease."
Um okay first off the sort of the premise of basically thinking about these-- "ending global poverty" which is okay I want to take that, "animal suffering" which is a particular-- I'm curious on what the premises of that are "early death and debilitating disease–"
Okay so now you have the sort of health space but I really want to focus on this "ending global poverty" thing because why in the world do we need an AI for that? Ending global poverty seems as a consequence of being a problem of will a political will as well as being exacerbated by massive shifts in climate um you know reduction of food supply and growth and I–
EMILY M. BENDER: Concentrations of funding and power that yeah it's called AI these days is definitely exacerbating.
ALEX HANNA: right and so I can't imagine any kind of world in which advanced AI is going to facilitate ending of global poverty. I'm curious what they mean by animal-- what happened?
EMILY M. BENDER: Did I move something in a bad way?
ALEX HANNA: Yeah oh yeah you move something and then everything went out of frame.
EMILY M. BENDER: Sorry that's weird I'm trying to set it up so that I can also see the chat so.
ALEX HANNA: Oh okay let me move just a little bit so that I can kind of see the chat and then if you well okay one more thing sorry gonna make that more narrow okay.
ALEX HANNA: Yeah I yeah that works for me and that and it's yeah it's I'm checking the the thing thing you know. If anybody wants to be a uh producer I would love that um um yeah and uh and in the chat uh Pete Forward says this is the wildest Mad Lib. Yeah they're just putting in words here.
EMILY M. BENDER: And I love the Star Trek Cameo too.
ALEX HANNA: Oh yeah.
EMILY M. BENDER: Save the whales so the point of Star Trek four doesn't happen.
ALEX HANNA: I had I had a student that came in with a a Star Trek 4 shirt once and I was like
where did you get that shirt, I need a whale shirt. So good.
EMILY M. BENDER: All right so so we don't see a plausible path towards ending global poverty, like even if we posit that AI is a thing, right, that that mathy math leads to what these guys are talking about as AI, how does that help us end global poverty? um okay early death and debilitating disease.
EMILY M. BENDER: Like yes there are applications of pattern matching at scale to facilitating you know going through large piles of biomedical scientific literature to find the next hypothesis to test. Okay. And the protein folding thing is a reasonable application of pattern matching over a very large data set that was carefully curated and constructed that is probably going to be helpful to the people doing science that requires that information right.
ALEX HANNA: And I think and and not to be too down and I don't know enough about protein folding and so this is something where I'm going to quickly get out of my depth but it's probably one of the cases in which this could be a case in which AI is probably nearly very very helpful or drug discovery, um so I will give them that but this is dramatically overstating the claims alongside these other problems.
EMILY M. BENDER: Right right um and so there was- I had an interesting discussion um with my colleague Bill Howe at UW where he was talking about um the protein folding is a really interesting case because the problem is well scoped um and–
ALEX HANNA: yeah
EMILY M. BENDER: –there's a large sort of carefully constructed data set yeah and all of the work that went into that data set is a really important part of how we got to where we are now.
ALEX HANNA: Yeah and I think that's and I think that's an important thing to highlight because and I was having this conversation with Raesetje Sefala who is one of our follows one of our fellows, but a lot of the problems that we come into with AI is that the task is pretty narrowly
defined and that the problem is well-defined and I think a lot of the mismatch that happens in these spaces is that people take something that they don't know a lot about, they put a ta- they kind of wrap a task around it and this conceptualizing of a task and that then becomes a stand-in for a much more complicated problem and so this is you know what Samir Passi and Steve Jackson and have called and Solon Barocas have called this problem of um problem formulation and the idea that problem formulation doesn't get kind of squared up right and the interesting work done by our former colleagues at Google including Donald Martin Jr and Vinod Prabhakaran and a few others and William Isaac have tried to step back and think about problem formulation and try to reframe problem formulation in different ways.
So they had a workshop at Data for Black Lives in-- a few years ago in which they had a community-based problem formulation workshop. So I think those are interesting alternative approaches, but so much work goes into even framing the problem into one that is mangles it in a certain way that tries to make it interpretable by AI that really misses so much and ignores so much prior research that's done in these areas.
EMILY M. BENDER: Yeah yeah and I get the feeling that that this like we are making general purpose-systems that can do things like solve early death and debilitating disease makes it harder for people who actually are well positioned to come up with the good problem formulation where pattern matching at scale is is applicable and helpful and they can construct a relevant data set-- it makes it harder for that work to get done because it looks less sexy when it's held up against these people making these wild claims and somehow the wild claims are being treated as reasonable parts of the discourse.
ALEX HANNA: Yeah yeah.
EMILY M. BENDER: Which is not okay with me. I'm also reminded of how and I was tweeting about this and uh tooting about it recently. I got invited to be on a panel about uh task design in NLP and one of the sample questions that was in the invitation was something along the lines of um is domain expertise like linguistics necessary for task formulation?
ALEX HANNA: Oh dear.
EMILY M. BENDER: And I was like uh no thank you I do not wish to be invited on a panel in order to debate whether or not I should be on the panel. Like you know. And there's lots of interesting things that you could talk about next to that question if you presuppose yes, yes this is necessary, then it could be things like, ‘Okay what are effective ways of
bringing in that domain expertise?’
ALEX HANNA: Yeah.
EMILY M. BENDER: Right, how do you structure the interaction? How do you um you
know and and with what you're talking about about community design problems right that expertise isn't necessarily like book learning expertise. It could also be you know boots on the ground this is my lived experience variants I see the problems before me and in my community expertise.
ALEX HANNA: Yeah and I mean I didn't even come across this idea of domain expertise until I started focusing on kind of computer science and AI because it's sort of people are like oh yeah we expect you to actually know what you're talking about in this field and it really is this kind of blunt kind of thing of AI engineering um to weigh in on this particular problem that many people have focused on for years and years and years.
EMILY M. BENDER: And as you say that it strikes me that that phrase domain expertise in contrast to whatever it is that computer scientists know kind of suggests that computer science
has the view from nowhere and everybody else is mired in their specific domain so.
ALEX HANNA: Exactly, which is actually well there's a great paper here by uh Jenna Burrell um and it's called the Logic of Domains um and it really gets on this um on this idea of the idea that um oh sorry it's not but not Jenna Burrell. David Ribes who's a science and technology studies scholar uh Andrew Hoffman and Geoffrey Bowker that talks about this idea of domains and how there is this idea that domains um you know that like computer science sort of is outside of domains themselves.
I'll drop that in the chat and we can put that in show notes on when it's on.
EMILY M. BENDER: yeah um excellent.
ALEX HANNA: yeah so it's a really great article and I think really touches on that. Let's get into like the meat of this I really I really like–
EMILY M. BENDER: we have some aversion to it.
ALEX HANNA: like yeah there's like a I know I we let's get into it because I want to get into it I want to get to that terrible table that they have.
EMILY M. BENDER: Okay so um let's see uh where are we? "Our world could soon look radically different different we were saying okay with the help of AI these things could happen but two formidable new problems for humanity could also arise." I can't read this without doing silly voices.
ALEX HANNA: Do it do it do a silly voice yeah.
EMILY M. BENDER: "Loss of control to AI systems" My silly voice is like you know trying to sound like I'm taking this seriously. "Advanced AI systems might acquire undesirable
objectives and pursue power in unintended ways causing humans to lose all or most of their influence over the future" This is the paperclip nonsense right and yeah like you don't have to cede control to AI systems. Like we don't have to build the thing!
ALEX HANNA: But they could take power but what if they do? What if it was intentioned? Okay.
EMILY M. BENDER: Okay so this next one: "Concentration of power. Actors with an edge in advanced AI technology could acquire massive power and influence. If they misuse this technology they could inflict lasting damage on humanity's long-term future." Um you know I object to calling it "advanced AI technology" but otherwise this sounds like it's happening.
ALEX HANNA: Yeah I was about to say that this is happening and it's it's not necessarily even about advanced AI technology it's about yeah and that's yeah P. for it also jinx! um yeah so this is actually already happening I mean and it's not necessarily about the AI technology that they're yielding it's the the data and compute that they're yielding the way that it's suffused in so many different existing technologies, and the fact that I think there's a way in which AI kind of rhetoric then gets leverage, I mean which is kind of the idea behind this stream series, is that pushing back against the kind of ways in which rhetoric already gets yielded uh or used and wielded in a way um to sort of express a sort of knowledge of technical acumen of We Know Better Than You, ctc et cetera.
EMILY M. BENDER: Yeah so for more on these problems um we decided that we don't have have to subject ourselves to Bostrom right?
ALEX HANNA: We are we we I don't no I don't really don't want to there's other things I want to read I really want to read you know like there's a reading list on my kitchen my kitchen table that's this high and I really don't have the the spoons to hate read Bostom yeah.
EMILY M. BENDER: No me either but just looking at this thinking about point two whatI would recommend reading is is um Shoshana Zuboff's The Age of Surveillance Capitalism um which uh oh I can't remember his name the person who who coined um a criti-hype uh Vinsel Lee Vinsel?
ALEX HANNA: Lee Vinsel the STS scholar?
EMILY M BENDER: Yeah he points out that she um like uh buys in a little bit too much into what the AI people say they're doing um, but she also documents what they're trying to do and that's damning enough um and I recommend-- people have complained about this book as being a little bit redundant but I read it as an audiobook while running and a little bit redundant is great because if you missed something it comes back again.
ALEX HANNA: Right.
EMILY M. BENDER: So if you if you want to hear about you know sort of how people building stuff they're calling AI are using that in this process of concentration of power that's a great book in my opinion.
Timnit is live tooting us in Mastodon.
ALEX HANNA: I know yes um and so then like the next part of this is where it really I mean it really gets you know pretty bad um I mean it's already pretty bad but then the the kind of idea here is now they put kind of um probabilities on each of these things happening.
So probability one and I mean it's worth going to the the footnotes on this is um probability one is probability of the misalignment misalignment of existential risk, conditional on the development of AGI being developed by 2070. "Humanity will go extinct or drastically curtail its future potential due to a loss of control of AGI." First off now that I read this again the conditional statement is um just from a statistical perspective is sort of conditional on AGI being developed and then there's an additional causal claim of humanity's extinction due to a loss of control of
AGI.
So I'm like okay because one that I one thing that I would think we should say is say
there's a probability of humanity's extinction conditional AGI and what is the delta between
the probability of human extinction without that condition, probability-wise, which I think
is still rather High given climate change right?
EMILY M. BENDER: Right.
ALEX HANNA: And but there's and so it's it's really kind of weird that they're introducing this probabilistic notation and then also inducing this other joint kind of joint probability of because it should technically sort of be misaligned existential risk conditional AGI comma and it's a joint probability of the loss of control of that AGI anyways.
EMILY M. BENDER: Yeah yeah yeah yeah so so I guess misalignment X risk is short for that like that stands for this whole last part here right yeah um and so they they somehow believe they can reason about the probability of these things.
ALEX HANNA: Yeah.
EMILY M. BENDER: Right so their current position there's this this thing the conditional probability of of existential- so x-risk isn't a- you can't have a probability of a risk. That doesn't make any sense. But the probability of this outcome that they label existential risk given AGI happening yeah they've set at 15% which is not um they don't mean 15% likelihood just from right now they mean assuming this other thing happens uh which by 2070 would have to be this one so I think there's a 20% chance by 2043 why that that there will be AGI and then in that 20% chance there's a 15% chance of court- Like these numbers where does it come from?
ALEX HANNA: In the footnotes I mean like in the footnotes they say. Okay footnote one says that um they they– "We pose many of these beliefs in terms of subjective probabilities which represent the betting odds that we consider fair in the sense that we'd be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them." Okay?
EMILY M. BENDER: So so it's their gut feeling is what.
ALEX HANNA: It's their gut feeling right and they're assigning you know so they they call these "subjective probabilities" and it's very and they have in the they have in the um Q&A a line that says something of um uh where is it it's something of the nature of why where are you pulling these numbers of? basically and yeah so they "Are these statistically significant probabilities grounded in detail published models that are confirmed by strong empirical regularities that you're really confident in?" which you know if you have numbers that probably would be good they actually have empiricism empirics behind it so they say no, these this is–are these statistically significant probabilities?
This is I think the last or the second or last Q&A um they say "No they are what we consider fair betting odds." So what you're saying is this really is your this really is your gut feeling that you're putting some numbers to and then you know basically trying to you know establish some kind of priors and then you're starting-- so it's incredibly shaky empirical.
EMILY M. BENDER: Okay Here's one question. I'm gonna be really crude here.
ALEX HANNA: Yeah, do it.
EMILY M. BENDER: How is it that people are pulling these numbers out of their asses while their heads are shoved up their asses at the same time?
ALEX HANNA: Hey I mean talking about elevating the discourse we really went from here just we're now up in the stratosphere and it's really I mean so it's really and it and what's infuriating about this is not you know-- you could play some numerical games if you wanted to with this but there's millions of dollars on the line here. Do you really need we have adherence for so many other things climate change uh you know racial disparities and incarceration and policing uh global hunger that we have so much and you are literally putting millions of dollars based on these gut feelings and it's just uh it's just it's it would be funny if it wasn't such a a a kind of uh unjust action to do such a thing right that in the sort of in the sort of disservice of of having more empirics on problems that we know that you're granting so much money on things that are so subjective and so pulled out of the ass.
EMILY M. BENDER: Yeah yeah and granting so much money that could have been used for better you know in better ways and at the same time trying to redirect further effort of people who could be spending their time doing more valuable things rather than trying to you know these these are the the change my mind bros and it's obnoxious.
ALEX HANNA: Yeah okay yeah it's pretty it's pretty frustrating.
EMILY M. BENDER: All right all right so we're we're at the table is there anything else we need to say about this table?
ALEX HANNA: I don't I don't think like it's just it's just it really is pulling out of the ass and it's
and it's and it's and it's moving and the kind of thing that is wild about this is that they are going to award things to money that publish analysis that moves these probabilities and I'm I'm just sort of thinking about it's just it's just kind of fascinating that like okay how are the probabilities going to be moved right um and so what they say is you can also participate in the contest basically by publishing these things to a blog or arXiv which is hilarious anyways.
EMILY M. BENDER: Yeah right so okay so
ALEX HANNA: All right so
EMILY M. BENDER: That's five million dollars if you and they're they're interested of course in either direction right we saw that at the top right yeah so if you move them below three percent or above 75% and this is the one that starts at 15. And then AGI will be developed to below three percent or above 75 percent um and that one starts at 20. With award prizes of intermediate size for intermediate updates at our discretion. Like the whole thing's at their discretion.
ALEX HANNA: Yeah this is effectively at the discretion of these these super yeah these super things the super predictors uh I I'm I'm I'm kind of floored by this uh I'm looking at the rest of this and I'm sort of saying this is all very frustrating I kind of want to I kind of want to move to the other things we're starting that we had we had three other things we want to talk about or two other things and I wanted to spend the last kind of like 20 minutes or 30 minutes talking about those because it is frustrating but I don't think we're gonna like get really move from yeah adding more but like it maybe a final word and then moving to the two other the new segment that we're gonna have.
EMILY M. BENDER: Yeah um right so yes I guess you're right we could we could just keep saying different iterations of the same thing over and over again but I just want to point out that um the superforcaster judges are not actually the only judges right? "As a check slash balance of our reasonableness as judges" then they're going to have this panel do something independent. So it really is, “Change my mind.”
ALEX HANNA: Yeah it is it is effectively for the people who are at this fund. So an experiment in prize making philanthropy uh but not very different from existing philanthropy as being a series of judges except for these superforecasters who are problematic in their own right.
EMILY M. BENDER: All right um so on to the new segment?
ALEX HANNA: On to the new segment.
EMILY M. BENDER: Do we have music?
ALEX HANNA: So we we have a new segment. um do we have music? I haven't set
up any kind of music um so um in in my stream or anything. It'd be cool if you could sort of connect uh something um but our new segment is called and maybe when we put this on on YouTube we'll have a banner or something: What in the Fresh AI Hell?!
So maybe we could come up with a theme song "What in the Fresh AI Hell doot-doot-doot-doo-doo." I don't know maybe we have some musicians among the uh yeah if you're a musician and you want to come up with the stream but yeah.
EMILY M. BENDER: Yeah all right so so I've got a couple. Do you have a couple too?
ALEX HANNA: I I don't I don't have a couple, I let's just go on your your things that you had
and you dropped in our group chat yeah.
EMILY M. BENDER: Yeah so I've got two that are cued up here the other ones I'd have to look a little bit harder for but I could. Um okay so the freshest one this is one from just yesterday.
ALEX HANNA: This this piping fresh hell.
EMILY M. BENDER: Yeah.
ALEX HANNA: Is this from yesterday? Because I saw the original post was posted and the paper was posted in September.
EMILY M. BENDER: Oh wow.
ALEX HANNA: Maybe there's there's an update yeah I don't know.
EMILY M. BENDER: I came across it yesterday, so it's fresh to me.
ALEX HANNA: It's fresh to you okay EMILY M. BENDER: yeah I don't know maybe maybe it was Abeba who who retweeted it but yeah yes okay September.
Okay. "In a new paper we ask whether you can use GPT-3 to survey humans by simulating those humans and asking them questions as opposed to interviewing actual humans?"
ALEX HANNA: oh my God and okay and this is really and this is this is as I as a social scientist I am infuriated at this. It is such a- woo dog like this is this is kind of amazing and I and so I'm reading the abstract now and then.
EMILY M. BENDER: wait wait hold on I got to pull up the abstract I got to get to the rest of this.
ALEX HANNA: and let's call and first off let's also called out that Drake is trash, thank you, all right.
EMILY M. BENDER: Yeah but so Drake is trash and the captions are backwards on this meme like.
ALEX HANNA: Wait no it's they're not backwards because they're saying they're they're not surveying humans.
EMILY M. BENDER: I know.
ALEX HANNA: They want to survey.
EMILY M. BENDER: Yeah but like but if but if if the meme were sensible it'd be the other way.
ALEX HANNA: Well then yeah if it was if it was yes yes yes yes yeah okay.
EMILY M. BENDER: So so November 8th someone says why? and the Tweeter replies "Good social science is extraordinarily expensive and if we could safely do it on simulated subjects we could increase our sample size, save money for other studies, iterate more quickly, survey unreachable populations etc. I would suggest reading section 8 of the paper" which maybe we should but like
ALEX HANNA: Maybe maybe we should because I I'm already sort of saying uh like who are your unreachable populations? why are they why are they unreachable? Have you oh–
EMILY M. BENDER: Why do you think that that gpt3 would--?
ALEX HANNA: Is actually going to model those unreachable samples? I I want to read I want to read this abstract real quick as part of it in which they say "We create quote silicon samples by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys connected to the United States."
We're already going to be in the United States, you're already talking about people that are hard to reach through sampling, and and then and then you're also talking about some kind of a knowledge space where you know everything about individuals not reflecting on things like survey bias or response bias or selection bias or the things that survey researchers have
been reaching doing research on for decades.
EMILY M. BENDER: So there's there's that layer there's the fact that the underlying data set for GPT3 is completely undocumented.
ALEX HANNA: Yes.
EMILY M. BENDER: Um so we don't know like what that represents. So they're proposing to use GPT3 as a model of something serious um maybe they've done a little bit of like poorly done control on the backstories that they're feeding in, but the underlying thing is still just you know general web garbage which as we document in Stochastic Parrots is in no way representative–
ALEX HANNA: Right.
EMILY M. BENDER: –um and then on top of that you could maybe talk about using a language model to uh gather-- or using statistics over language form to gather information about what people have said about something right because yes it is it is built up about what people have said. I wouldn't want to use GPT3 for it because I had no control over the sampling that was done there um but you could with a curated data set you know if you were interested in for example um what people using a certain hashtag on Twitter are saying about a certain topic, you can use a language model in that context.
But they're proposing using it to determine what people would say.
ALEX HANNA: Yeah.
EMILY M. BENDER: Which is like you may as well use a Magic 8 Ball. Like that there's no grounding in reality there.
ALEX HANNA: And I do want to note that the people that the lead authors on this which I'm sorry to say are political scientists um and I'm I'm very upset that they're a little scientists because I'm thinking and what kind of methodology class? You know who hurt you? To try to think about this.
Um so it's quite upsetting to sort of sort of thing and see in this and I mean I guess this sort of idea here is um I mean it's worth sort of going into this and sort of poke into this a bit um because the the sort of the sort of examples that they're that they're focusing on here is---and maybe if you're thinking about an incredibly limited sort of survey--is given that they're political scientists, the three questions that they're talking about is free-form partisan texts um and then um so basically what GPT-3 is talking about how you're describing Democrats and Republicans, which okay.
Um the second one is vote choice which of course is a perennial um problem of political scientists um and vote prediction in which they focus on ANES, the American National Election Study, which is kind of a go-to for political scientists in collecting panel data on vote choice. This is an incredibly narrow um kind of outcome though focusing on what that vote choice is. Um and so then and then the last one is is this closed-ended questions which I'm just looking at this now um and so I don't know what that necessarily entails.
Um and so it is pretty frustrating to see this. First off, one you're focused on a very narrow sort of set of human behavior which is kind of vote behavior in the U.S, um which is already talking about okay if you're doing a survey and you want to do vote behavior, we-- There's already some kind of way that political scientists do model some of these behaviors, basically based on existing demographic variables um and I'm kind of thinking about all right well if you're using GPT3 for the for kind of doing any kind of modeling on demographic variables and forecast you know forecasters such as Nate Cobalt 60 do this as well in their models um in which they're sort of doing some kind of forecasting based on demographics of particular individuals, you are building sort of things but then they're using GP3 GP-- you're effectively trying to go from demographics which already know that demographics are pretty reductive in terms of doing vote choice but they're incredibly more reductive when you talk about other types of surveys and then trying to generate sort of responses based on those demographics.
So it seems you know like this is already pretty bizarre, um but I would say that maybe from their own kind of intensely constrained sort of thing about vote choice um this may be interesting um but it is still incredibly bizarre yeah
EMILY M. BENDER: I mean it just-- when you are doing any kind of science and scholarship you build a model with the full understanding that the model is not you know an exact map of the world, right, that there are simplifying assumptions. But if you're doing it carefully you have some sense of what those simplifying assumptions are–
ALEX HANNA: yeah
EMILY M. BENDER: –and you um like you know you might you might in a survey ask people a bunch of questions and you realize that you are nowhere near the full range of thoughts that person had. You only have this small sample of the questions you ask them.
ALEX HANNA: yeah.
EMILY M. BENDER: Um but that feels so different to me than nevermind actual people we are going to come up with synthetic personas out of GPT3 and ask that instead. It's like like you said who hurt you? What made you think that that might actually stand in some interesting relationship to what you want to study?
ALEX HANNA: And I mean it's and if you are posing this as a replacement for something
like the ANES, which would be a dramatic category error in terms of methodology- Well not
category error I'm sorry I probably misusing that. It would it would be a dramatic departure
from that type of methodology especially from what we know of even doing trying to do
vote prediction from in in from 2016 to 2020.
Um and that seems like a huge gap, especially if you think that there's going to be major shifts in things like demographics or major shifts and sort of you know or major shifts in say the overtness of racial animus. You know so for you know I'd imagine the shift coming even post 2020 is that there's going to be a much larger rise in people being overtly white supremacist.
Is that going to be and are they are people going to reflect that in the ANES? Who knows, maybe. Um you know that's sort of the that was sort of the kind of thing coming out of the 2016 uh polling debacle that you know people weren't going to express their kind of their their preferences for Trump because they didn't want to be labeled racist or white supremacists. Um but people are more overtly doing so. Um so you know if this is posing to be a replacement for the ANES, I really don't think that's going to be the case. And they say here at this end of this you know the study costs $29 on GPT3. All right but you're not going to replace the ANES in 20 in 2024 with this by any means.
EMILY M. BENDER: Right, yeah. Is that enough for one thing in this the segment?
ALEX HANNA: That's enough and I and I yeah let's let's go to this other thing.
EMILY M. BENDER: Okay and I I'm going to apologize to Timnit um for this other thing because I know this is this is directly in the kinds of things that make her rightfully and righteously angry and I really appreciate what she has contributed.
Um where's the thing? Okay here. Arvind Narayanan um claims- So this is in response to a question about–
ALEX HANNA: Um up a bit more. Eva Wolfangle yeah oh–
EMILY M. BENDER: Yeah yeah um I need your support I'm looking for examples where hashtag artificial intelligence that sounds like a really fun hashtag to file follow um is used for moderation and how this can go wrong or if it really helps how it helps and um Arvind comes in with and Arvind I should say I appreciate the work that he and his co-author are doing in that AI Snake Oil sort of book taking shape on substack although uh and they're I guess they're being very transparent and like putting out drafts and taking feedback and they do seem to take the feedback so that's good um so I was surprised to see him saying um uh "AI for content moderation generally used to assist humans rather than autonomously which keeps
the failures low. Overall it works pretty well–"
And I was like that's not what I hear about this and he says "That's why we haven't heard more failure stories." I'm like have you been listening?
ALEX HANNA: Yeah.
EMILY M. BENDER: Um so so what do you think Alex?
ALEX HANNA: What do you think I think? I mean the given it seems so I'm I'm reading the rest of this so he's continuing talking about "a big area of failure is mistaken copyright strikes and account suspensions" um another example is he is CSAM um "there's a lot of research in NLP on the fact that toxicity detection and other tools are biased against minority identities um I haven't seen evidence how that manifests on real platforms."
Probably because they really cover you know covering their asses and any and then a few more he says, "The common reason you'll hear about why AI isn't and what would be good for moderation is they can't handle nuanced context and humor. This is definitely true today but I'm the minority in believing this is a mostly solvable technical problem and will get solved in say the next decade." Um okay.
EMILY M. BENDER: Wildly optimistic.
ALEX HANNA: Yeah very optimistic about it. "I think the end state of AI for content moderation is that it will be decent at implementing policy and handling the easy cases but the hard problem of making policy and tackling the hard cases will remain and AI has no role there and shouldn't.
Um this person and I um are writing about this in our upcoming book on AI Snake Oil and um Tarleton Gillespie" at Microsoft Research uh who has this other paper on content moderation at scale which is a good paper and is an expanded um and there's a short version of this that he published and Logic Magazine um on scale as well.
And so I mean there's a lot of assumptions here that is being made one of them is a sort of notion of what these sort of easy cases are um and and this is already a very Western-centric view uh of the kinds of things that get paid attention to um the idea that in the U.S this is probably going to get better and in Western Europe but that this is going to do horribly in um in other places, on the African continent.
In places in which there is sort of value alignment that's done in particular ways and he's in the minority and should stay the minority in this, insofar as these are things that are the assumptions that already go in the sort of the value propositions of policy um there's kind of a refusal of doing that and this is something that um people have been talking about on Twitter the idea that the kind of idea that you want to have no content moderation and that um he who should not be named or as I posted on Twitter the blood emerald man baby has been speed running into content moderation policies, is because you need to make some kind of a value judgment um to even have a profitable platform.
And so you're going to have to put your you're going to have to plant your flag in the grass someplace and you're going to have to make pretty um clear political distinctions about what you're actually doing and if you're trying to only understand a view of the world in which there's only conservatives and liberals, there's only Democrats and Republicans, that's going to look a bit easier than you know a place that is going to have much more complicated political um and and ideological entrenchments in it, right, and so you have to make value judgment someplace.
And I want to call out Sarah Roberts because um given that everybody's in a a a you know an expert on content moderation these days, uh Sarah Roberts was you know the first person to write pretty extensively extensively about content moderation on platforms and how it gets outsourced to people uh in in typically Global South countries.
Um and what she said in one of her talks on this is that these policies are not clear and they're not pretty explicit they're kind of in the minds of policy managers and she calls these 100,000 lines in the sand. These things kind of get drawn and redrawn, um really from kind of um you know these kinds of ideas that happen on the everyday at the quote unquote street level and they don't get externalized.
So yeah we actually don't have a lot of policy um you know outwardly, because you know there's reasons for that, because gaming, because once you put these and publish these they can be gamed but knowing that there are a hundred thousand different policies being made on every given day means that these are technically interactable problems, means that you actually need people to actually look at this and it's not going to be about quote unquote hard cases.
These are the c- these this is the bulk of cases right.
EMILY M. BENDER: And I think the the people need tools and you mentioned already this is very Western-centric, that that to the extent that is working at all it's working for you know English in a US and let's say Australian and European context, North American Australian New Zealand and Europe, um and you know you talk about the workers in the Global South who are doing this highly traumatizing work they are doing it I think largely for um these contexts where it's considered quote unquote solved, where meanwhile as Timnit points out in this thread, um there aren't even the basic tools the people doing the content moderation would need for other languages, where you have people posting flat out incitement to genocide and it's you know if you flag it there's there's not the expertise among the people who looked at the flag things to be able to say yeah that's it um but there's not even you know just part of speech taggers and basic keyword search and that kind of stuff, so there are ways in which uh natural language processing techniques can help build useful tools that that can like help if you if you've got something that's gotten too big and you've got to like sort of sift through the mess and find a bunch of likely things to get rid of or likely accounts to ban, um but for many many
languages which are associated with communities where terrible things are happening those tools don't exist, and so to say oh yeah this mostly works pretty well is is just to not listen to the
people who are saying how it's not working.
ALEX HANNA: That's right, that's right. Yeah it's just it's just not a yeah you you have a very Western bias on that and even viewing that there's not the technological tools.
EMILY M. BENDER: Yeah yeah and and I I noticed so so uh Leon who's in the chat and has done some research on um how if people have to make content specific and context specific judgments, but they don't share the context, they are not well set up to do it.
Um yeah so he points out that the moderators themselves are also diverse and policies are both open source and country specific \but then also aren't implemented evenly um and so they reflect the social norms of individual moderators. It's really interesting to be looking at this in this moment as we're trying out Mastodon.
ALEX HANNA: Yeah.
EMILY M. BENDER: Because the Mastodon notion of moderation is very different right instead of trying to scale it and do something where one big network has to evenly apply a policy to everywhere this notion of sort of smaller federated servers where people maintain their own communities and make their own decisions, feels much more human scale to me and I know that that is also a lot of work, but I also wonder if the work is maybe um more evenly distributable or if people are working to tend their own communities and then tend and I guess defend their own communities, so you know thinking about the the DAIR community server and um what you all might be thinking about who we want to um block right? Like which which other things we have to de-federate from, that feels um like both works so um Ali Alkhatib was talking about this on Mastodon today. It's like yeah you're basically asking the people who are suffering to do the work to clean it up, but it's also empowering maybe?
ALEX HANNA: um yeah so I mean it's it's this sort of idea where like if you can actually identify these problems and then have policies around that and start developing kind of a set of community rules, then that makes sense right as a way of doing it rather than sort of ways um you know that that these decisions fit kind of um you know you have to sort of be a supplicant to the to the platform and ask for this and I mean this is sort of the idea of of kind of the the scale of these things and the idea that these things are always going to be hard to scale I mean this is a point that that Tarleton makes and and his in his piece and in the Logic piece.
Um and some some great points in the chat that Timnit's making, the idea of like the kind of infiltration that happens at these at these corporations and there's a problem that we saw in um in in Twitter um where there had been um people that were um uh uh uh attentive to there were there were Hindu nationalists in the organization as well as people that were um sympathetic to the Saudi regime and then also knowing that the the kind of context of these things in terms of scales as well um and Frigg in the chat says it's even scarier the chief AI guy at the faceplates is oblivious to these biases and in which you know one of these things you know um you know that he said um you know was uh you know like you need a place that has uh you know unlimited space and is really good in content moderation come on over to Facebook and yeah so um yeah yeah and and Teens mentioned their Kirky note that we can link in the chat in which she's talking about these as well.
Um I wanna, I have a meeting right after this like I selfishly want to like end this a little early um
but this discussion in the in the chat is really is really great and we're gonna try to gather these up and put them in the show notes um and so yeah this is great and if y'all like this segment you know say so the Fresh AI Hell and we'll come up with a theme song and a nice like you know I'm probably gonna basically just take a picture of that um that like everything is fine dog and then–
EMILY M. BENDER: I was gonna say yeah that one–
ALEX HANNA: and put like you know just have him say what in the fresh you know what is the what in the fresh AI hell and then we'll you know like and then we'll have a we'll have a theme song so uh like always-- oh oh hold on to do this right let's unshare the screen because I want to do so if you if you like this please click like and subscribe.
If you want to catch more about uh let's say uh content moderation click here for Maliha Ahmed and AIES keynote. If you want to uh look at what YouTube thinks you should watch go ahead and click on Emily's face.
EMILY M. BENDER: That is definitely not my recommendation. Thank you so much Alex.
ALEX HANNA: All right thank you Emily, this is great all right see y'all next time hopefully in two weeks or so bye bye now.
ALEX HANNA: That’s it for this week!
Our theme song is by Toby Menon. Graphic design by Naomi Pleasure-Park. Production by Christie Taylor. And thanks, as always, to the Distributed AI Research Institute. If you like this show, you can support us by rating and reviewing us on Apple Podcasts and Spotify. And by donating to DAIR at dair-institute.org. That’s D-A-I-R, hyphen, institute dot org.
EMILY M. BENDER: Find us and all our past episodes on PeerTube, and wherever you get your podcasts! You can watch and comment on the show while it’s happening LIVE on our Twitch stream: that’s Twitch dot TV slash DAIR underscore Institute…again that’s D-A-I-R underscore Institute.
I’m Emily M. Bender.
ALEX: And I’m Alex Hanna. Stay out of AI hell, y’all.