Mystery AI Hype Theater 3000
Mystery AI Hype Theater 3000
Episode 21: The True Meaning of 'Open Source' (feat. Sarah West and Andreas Liesenfeld), November 20 2023
Researchers Sarah West and Andreas Liesenfeld join Alex and Emily to examine what software companies really mean when they say their work is 'open source,' and call for greater transparency.
This episode was recorded on November 20, 2023.
Dr. Sarah West is the managing director of the AI Now Institute. Her award-winning research and writing blends social science, policy, and historical methods to address the intersection of technology, labor, antitrust, and platform accountability. And she’s the author of the forthcoming book, "Tracing Code."
Dr. Andreas Liesenfeld is assistant professor in both the Centre for Language Studies and department of language and communication at Radboud University in the Netherlands. He’s a co-author on research from this summer critically examining the true “open source” nature of models like LLaMA and ChatGPT – concluding.
References:
Yann LeCun testifies on 'open source' work at Meta
Stanford Human-Centered AI's new transparency index
Opening up ChatGPT (Andreas Liesenfeld's work)
Fresh AI Hell:
The Verge: Meta disbands their Responsible AI team
Call-out of Stability and others' use of “fair use” in AI-generated art
A fawning profile of OpenAI's Ilya Sutskever
You can check out future livestreams at https://twitch.tv/DAIR_Institute.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Twitter: https://twitter.com/EmilyMBender
- Mastodon: https://dair-community.social/@EmilyMBender
- Bluesky: https://bsky.app/profile/emilymbender.bsky.social
Alex
- Twitter: https://twitter.com/@alexhanna
- Mastodon: https://dair-community.social/@alex
- Bluesky: https://bsky.app/profile/alexhanna.bsky.social
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Christie Taylor.
ALEX HANNA: Welcome everyone to Mystery AI Hype Theater 3000, where we seek catharsis in this age of AI hype. We find the worst of it and pop it with the sharpest needles we can find.
EMILY M. BENDER: Along the way, we learn to always read the footnotes and each time we think we've reached peak AI hype, the summit of Bullshit Mountain, we discover there's worse to come. I'm Emily M. Bender, professor of linguistics at the University of Washington.
ALEX HANNA: And I'm Alex Hanna, director of research for the Distributed AI Research Institute. This is episode 21, which we're recording on November 20th, 2023.
And we're taking a deep dive into the world of open source AI. We've seen a lot of hot air and confusing rhetoric around what it means for an AI model to be "open." Claims about transparency and openness don't hold up under scrutiny.
EMILY M. BENDER: With us to help us see clearly through the haze of hype are two fabulous guests. Dr. Sarah West is the managing director of the AI Now Institute. Her award-winning research and writing blends social science, policy and historical methods to address the intersection of technology, labor, antitrust and platform accountability. And she's the author of the forthcoming book, "Tracing Code."
Hi Sarah.
SARAH WEST: Hi, thanks for having me.
EMILY M. BENDER: Thanks for being here.
ALEX HANNA: And Dr. Andreas Liesenfeld is an assistant professor in both the Center for Language Studies and the department of language and communication at Radboud University in the Netherlands. He's a co-author on research from this summer critically examining the true quote "open source" nature of models like LLaMA and ChatGPT, concluding based on rigorous application of detailed criteria that neither lives up to its name--to its label. Welcome Andreas.
ANDREAS LIESENFELD: Hi, great to be here.
EMILY M. BENDER: All right so let's dive right into our first artifact. Um this is from this past summer uh well yes technically still summer, September 19th 2023. "Testimony of Yann LeCun, Chief AI Scientist, Meta," in a hearing before the US Senate Select Committee on Intelligence. Which I I think that actually means like national intelligence, not artificial intelligence but you know. So he starts with a a very self- congratulatory biography um my--"An overview of my involvement in AI," which I think we can maybe skip for brevity um and then has a section called the current state of AI. Um and I'm just going to read a little bit here and then we can get into some of these appalling um sort of like 'get those words out of your mouth' type statements. Um so, "AI has progressed leaps and bounds since I began my research career in the 1980s. We've seen firsthand how making AI models available to researchers can reap enormous benefits. For example, AI is being used to translate hundreds of languages, reduce traffic collisions, detect tumors in X-rays and MRIs, speed up MRI exams by a factor of four, discover new drugs, design new materials, predict weather conditions, and help the visually impaired."
All right I have to stop and editorialize for a moment. Um some of those are probably true applications, where machine learning is being used in appropriate ways. I think it doesn't help to say all of that is AI. I think it just muddies the waters. But to continue: "Society's ability to develop AI tools to defend against adversarial, nefarious, or other harmful content derives in large part from our social values. Meta, by way of example has organized its responsible AI efforts around five key pillars reflecting these values."
Before we dive into the values I just want echo Meg Mitchell, who says that when the the folks from the corporate world do use values language, like that's a win already. Um so there's that. Um but then: "First, we believe that protecting the privacy and security of individuals' data is the responsibility of everyone and there--we have therefore established a cross-product privacy review process to assess privacy risks."
This is Meta. This is Meta priv--bragging about privacy. Any thoughts there?
SARAH WEST: I mean--
ALEX HANNA: I mean Meta has to--sorry go ahead Sarah.
SARAH WEST: No I was just they're--they're still under a consent decree with the FTC and got 5-billion dollar fine for violating that consent decree. So there's there's that.
EMILY M. BENDER: Yeah. All right: "Second, we believe that our services should treat everyone fairly and have developed processes to detect and mitigate certain forms of statistical bias." I mean yes there's people at Meta working on that kind of stuff but that makes it sound like a very solved problem. Uh, "Third, we believe that AI systems should be robust and safe, which is why we have established an AI red team to test our systems against adversarial threats to ensure that they behave safely and as intended even when they are subjected to attack." Any thoughts on that one? [pause] Yeah? The red teaming discourse is interesting because um yes, like adversarial attacks should be studied and that makes sense but I think maybe outside of this context there seems to be a lot of um work being done by, 'oh we'll just have people come and red team it.' Like that's how we're going to test for safety. And I'm like you don't red team a bridge right? You make sure it's constructed safely before you drive over it. Um but this is maybe--
ANDREAS LIESENFELD: If you do red teaming it should probably be done independently, right? So it shouldn't be done in house so but there should be an an outside institution who does that.
EMILY M. BENDER: Mmm-hmm. And it should be specifically about these adversarial threats, not about like basic functionality. Okay here's where the rubber really hits the road for today: "Fourth, we are striving to be more transparent about when and how AI systems are making decisions that impact the people who use our products, to make those decisions more explainable and to form--to inform people about the controls they have over how those decisions are made." So it seems to me that this is that we are striving to be, they're making it sound like this is difficult, where it seems like you could just be really upfront. Any any thoughts about how transparent Meta is in general?
ALEX HANNA: I mean it's I mean this is part of right--I mean you're you're making the pitch to a US Senate committee right, and so Meta is--continually fails on this, on the consent decree that Sarah has talked about, on the kind of continual failure to mitigate bias in their uh advertising system, the kind of um um different things that have been surfaced by Color of change and National--the National Fair Housing Association. And so I mean they're they're saying, 'okay we're going to try to do this thing in the AI sphere, although we haven't been able to do it in our other spheres.' Uh so you know it it kind of also pairs well with um Yann LeCun's other statements including that their AI tools work very well for content moderation, um especially saying something that it takes care of 90 percent of content moderation or something of that--um which ignores how poorly it does.
And and I'm assuming he is only focusing on English language content moderation whereas it performs really poorly in other spheres. But um yeah this seems par for the course for uh kind of Meta double speak.
SARAH WEST: I mean it's also like to your point Alex, like to what end is transparency serving here? Like does that necessarily mean that people can do anything about how AI is being used to make decisions? Like are you able to then be able to you know avoid being targeted with hate speech on Meta's platforms or are you able to take any steps that make sure that you're you know like appropriately getting--well I mean let's let's not even get into the fact that you're kind of stuck in a system where that's premised on targeted advertising, but also um that you're not losing out on job ad--job opportunities because their ad targeting is discriminatory.
Like there's been many there's there's many many layers here of you know I think ways in which the underlying system isn't really serving the you know the the people that are using the system, even if they were transparent about them, which I think is itself questionable.
EMILY M. BENDER: Yeah.
ANDREAS LIESENFELD: And speaking of transparency I mean, when we did a survey of open source large language models, right about half a year ago, um so we found for instance that one of their um quote unquote "open source" models LLaMA 2 actually ranked in our survey of openness ranked very low right. So it was almost at the bottom and so um we couldn't we couldn't find that piece of technology to be very transparent.
EMILY M. BENDER: But here he is, saying we're striving to be more transparent. I mean I guess there's a lot of room for them to be more transparent.
ALEX HANNA: Yeah. And I I like to think about this this is a bit of a prelude right to us talking about LLaMA and LLaMA 2 and you know this is the sort of thing that that is a rhetoric that's coming. Um I'm wondering if we should uh kind of move through the rest of this document, and in the next section I want to focus on a few things that um that he's saying. So the next section is called, "The Future of AI & The Importance of Open Sourcing," and he starts off by saying um, "The current generation of AI tools is different from anything we've had before, and it's important not to undervalue the far-reaching potential opportunities they present. However, like any new disruptive technology, advancements in AI are bound to make people uneasy."
I can understand why." Uh and goes on to talk about the internet, the mobile device, the micro process processor um and bad actors who would be using this technology for their ends. Um he goes on to say, "As AI systems continue to develop, I'd like to highlight two defining issues. The first is safety. New technology brings new challenges and everyone has a part to play here. Companies should make sure tools are built and deployed responsibly." For sure. "The second is access. Having access to state--to state-of the art AI will be an increasing driver of opportunity in the future for individuals, for companies and for economies as a whole." And then we get into the open sourcing. "One way to start to start to address both of these issues is through the open sharing of current technologies. At Meta, we believe it is better if AI is developed openly rather than behind closed doors by a handful of companies." Very funny. And yet most of the AI is developed by a handful of companies. Um, "Generally speaking companies should collaborate across industry, academia, government and civil society to ensure that such technologies are developed responsibly, with openness to minimize the potential risk and maximize the potential benefits."
Um before getting into it I really want to read this next next um paragraph because it's it's it's quite hilarious. "The concept of free code-sharing in the technological ecosystem is not new. It started long ago." Okay true. "The 1950s and 1960s saw an--almost all software produced by academics and corporate research labs like AT&T's Bell Labs working in collaboration. Companies like ABC, DEC, and General Motors set up user groups that facilitate sharing code amongst users."
So this itself is--so Yan LeCunn, just an incredibly online hu--man that just posts through it. He uh you know is very famous I think having very revisionist histories of technology that if you read much of histories of of comp--computing or technologists, you would realize you know that's not actually the case.
Companies like Bell Labs and and AB and--I I almost said IBMT--Uh the IBM are famously litigious, famous for being um fighting for intellectual property rights um and absolutely going after anyone who does otherwise. So to compare and say that they're in the same breath as open source software like Linux, Apache, MySQL, JavaScript um is really incredible as a statement.
So I do want to say that much.
EMILY M. BENDER: Yeah.
ANDREAS LIESENFELD: Yeah so what I find interesting here is or even disturbing uh is that that basically what gets mentioned is on one side free code sharing, and then access right so that's it. And but we we we talking about open source technology right, so and I mean call me old-fashioned but for me uh real open source technology should be about more than just free of charge, right, it should be uh more than just a free piece of technology um yeah, so. But uh I I don't see this happening in the actual products that I've seen from from Meta recently or at all actually. [laughter]
ALEX HANNA: Yeah, so you're going back Andreas to this kind of idea of open source not as in--free not free as in beer, but free as in speech, that kind of more the kind of old ethos?
ANDREAS LIESENFELD: No I also mean uh uh things like the spirit of reverse engineerability, right so that this is a technology that you can actually take apart and you can you can uh you can tinker with and you can reassemble and play with and explore, right. So and that's just not what I'm seeing for instance with LLaMA 2, right which is just a readily trained model, which is just being put out free of charge that--and that's it but you can't replicate it in any way, or I don't see any efforts by by Meta AI that you could replicate something like [LLaMA 2], so that doesn't seem to be the focus here.
EMILY M. BENDER: So it's the 'source' in open source.
ANDREAS LIESENFELD: Exactly.
SARAH WEST: Yeah, I mean what he's talking about here is really more like you know being able to do product development for Meta at the edges. There's a line like in the middle of this paragraph where he says, "Rather than having dozens of companies building many different AI models, an open-source model creates an industry standard much like the model of the internet in 1992."
Which is you know a lot going on in that statement, but I think one of the key things there is you know he's kind of outlined what Meta's playbook is here, which is we're going to create open models just like we they did, or you know 'open' in quotes, like we'll we'll put that to the to the side. But similarly to what they did with PyTorch, which is a development library that they made open and spun off, although Meta developers continue to be the decision makers on that code, um and it basically ensured everything that gets developed is interoperable with with Meta's system. So you know if if they've released LLaMA or LLaMA 2 and you know it's it's works within Meta's ecosystem, that becomes the industry standard. Um and that be that's a really powerful place for a for-profit company to be operating from, one that operates this you know vast ecosystem on which you know these products then get developed, and there's a long history, with the companies that he's talked about in this testimony itself, um of you know large technology companies using capture of open source projects um and then integrating them into their for-profit enterprises.
ANDREAS LIESENFELD: I think that's an excellent point.
ALEX HANNA: Yeah and Sarah that's a great point, and I know you've written with with David Widder and Meredith Whittaker on this. And I mean there's--and we've seen this happen in many different places, in Android and Chromium and the way that these tools get offered and and and and say that they're open source but really there's only one organization, in this case Alphabet and Google, that has the ability to actually do this. So every alternative web browser that you might want to get to you know to Chrome, you know so many of them are still built on on on a kind of Chromium slash WebKit um basics. It also has these other downstream ramifications for who setting uh industry standards, who's sitting on you know WC--W3C, who's um who's in these organizations. Um and then you know you have the only alternative in town is you know for web browsers is Firefox, and you know not much else if you want to just get into basics of kind of rendering and and meeting all the kind of specifications.
EMILY M. BENDER: So Abstract Tesseract has summed it up very pithily in the chat: "Free as in, 'Everyone is free to do things the way our company wants you to.'" [laughter]
SARAH WEST: Exactly, like you get this narrow lane that you can operate within, but it it like I think the--what's key to underpin just for the broader implications is you know this gi--this paints a very narrow picture of what AI could look like downstream like it becomes hard to envision different ways of building or thinking about you know what a AI even is, um. And it's this you know notable that he says the model of the internet in 1992, which is the point at which the internet you know gets tied into a very like commercial vision of network technology, which was not necessarily what was you know happening in the 80s, 70s um you know era of the of the internet.
EMILY M. BENDER: Yeah so that's not the internet as a platform for interoperability, but the internet as um maybe uh templates for how to make money off of it.
SARAH WEST: Kind of yeah, that's that's that's certainly the era that he's pointing to.
EMILY M. BENDER: Yeah.
ANDREAS LIESENFELD: Yeah so what bugs me about this strategy is really is um is is the misuse right so of of labeling such a technology you put forward in in in uh in the case of LLaMA 2 as open source right. And it's actively marketed as open source. And that's really that's something which um which isn't really is it's not great for the the ecosystem as as a whole, right. So if a big player like this just comes in and and and does that deliberately and it's it's it's it's--yeah this is a a real problem I find. And we see this not only for LLaMA 2 but for other systems and for for other operators in this space as well, right, so where this this spirit of uh actually having a really open and a real open source alternative is is stifled right. So if big players like like Meta AI come in and and hog hog the floor like this.
EMILY M. BENDER: So are there other things that we want to do in this document before we get into a um not very good way of measuring openness?
ALEX HANNA: I think we gotta--I think we got to move on, I think we're hitting some of the points but yeah we get why don't we get into the product and the ind--indices.
EMILY M. BENDER: Yeah, so um the next artifact here comes from uh Stanford's HAI, Human-Centered Artificial Intelligence, um which is a bit of PR around an un-peer-reviewed paper. Um and I gather there was there was quite the PR blast for this in October um but this is an HAI newsroom PR press release kind of thing um and the uh title is, "Introducing The Foundation Model Transparency Index." Um with 'the' capitalized, oddly, I don't think that's part--anyway um so this is uh the subhead. "A new index rates the transparency of 10 foundation model companies and finds them lacking."
Um and I just want to um sort of echo something here that Meg Mitchell said in her critique of this which is this represents the work of a lot of students and you know we have to um hold space for students to be you know learning, and and um the whole point of being a student is you know how to learn how to do research and I think it's really unfortunate that the students at Stanford have their work subjected to such a PR um engine. Um rather than going through sort of the more normal like peer review first and then if the media happens to get interested then you know being supported in how to talk about this. But this apparently was was really pushed by um HAI. Um so uh this is an index um so and it's Stanford of course so they're talking about foundation models, which weirdly also echoes the point before about um AI now being like the internet in 1992, where it's somehow infrastructure that other things going to be built on. Instead of you know large language models, which maybe can be find- tuned for different things but they're not infrastructure in the same sense I don't think.
So. But anyway we're gonna say the word foundation model because it's in this text, uh, "Companies in the foundation model space are becoming less transparent, says Rishi Bommasani, a society lead at the Center for Research on Foundation Models within Stanford HAI. For example, OpenAI, which has the word 'open' right in its name, has clearly stated that it will not be transparent about most aspects of its flagship model GPT-4. Less transparency makes it harder for other businesses to know if they can safely build applications that rely on commercial foundation models--" 'Safely' is weird there, right? Like is this--yeah okay. "--for academics to rely on commercial foundation models for research, for policy makers to design meaningful policies to rein in this powerful technology, and for consumers to understand model limitations or seek redress for harms caused."
So I want to reflect on this paragraph a little bit because you know we all, all four of us are all in on transparency, right and I've done work on on data set documentation and this paragraph feels like sort of the Bizarro World version of that same take. Right, so um "makes it harder for other businesses to know if they can safely build applications that rely on commercial foundation models"--so like maybe don't? Right? Um, "for academics to rely on commercial foundation models for research"--like why should we?
Um that's that's not the space we want to be in. Um, "for policy makers to design meaningful policies to rein in this powerful technology"--like let's not--we can talk about it without bragging about how powerful it is or making weird claims about that, but also like policy makers can design policies around transparency, like that they could be doing that, regardless of how transparent the companies are at this point in time. Um and then finally "for consumers to understand model limitations or seek redress for harms caused"--totally in agreement on that point.
ALEX HANNA: Although I I would say that having their indexes it's--I mean one of the criticisms that we can make of this index unless and I don't know how deep we want to go into the methodology of this index--is that but that there's a there's a bit of um uh--and this is a criticism criticism I think we'll get to in a bit um that's made by the Eleuther AI folks--but there's a bit of a a Godwin's law sort of thing operating here, kind of the more you put an index on something the harder it is you know the easier it is to game, or the most use more useful useless a measure becomes.
And so it's really kind of interesting that they want to put all these things into a single index and yet that becomes a place where they see policy makers and consumers can make meaningful kind of uh interventions for redress and regulation, instead of kind of dis--disentangling what it should be. Um so it's interesting how they perationalize all these things and in some places it seems more of a way of obscuring it rather than yeah--and making it so. Maybe we should get into a bit more and see what they say. Yeah.
EMILY M. BENDER: Um, "So to assess transparency, Bommasani and CRFM director Percy Liang brought together a multidisciplinary team from Stanford, MIT and Princeton to design a scoring system called the Foundation Model Transparency Index. The FMTI evaluates 100 different aspects of transparency, from how a company builds a foundation model, how it works, and how it is used downstream. Um, when the team--when the team scored 10 major Foundation model companies using their 100-point index, they found plenty of room for improvement. The highest scores, which ranged from 47 to 54, aren't worth crowing about while the lowest score bottoms out at 12. 'This is a pretty clear indication of how these companies compare their--to their competitors and we hope will motivate them to improve their transparency,'
Bommasani says." Um so here is the scores um but I think more relevant is the actual index down here. So I want to display that um and get some commentary um maybe Andreas, you first because you've done something that at first pass looks kind of similar to this but I think is actually quite different.
ANDREAS LIESENFELD: Yeah that's true, so we did a a a project called 'Opening Up ChatGPT," uh around half a year ago where we were looking for uh open source alternatives to ChatGPT, right. So this was in the wake of uh once suddenly everybody started talking about you know that particular uh text generator, and uh there was a dire need of a kind of real open source alternative for for teaching uh in in European universities, right. So the open source is is not only uh highly welcome but it's sometimes even mandatory right so on to use. So if there is an alternative then we we we got to use that uh in in teaching and education, lots of applications um and so we went looking. Right so and then we also found a uh uh similar to to this transparency index, that yeah the landscape looks bleak. So this is I'm certainly come to a similar conclusion that uh there are there is no uh healthy open source landscape when it comes to uh text generators right now so and I see this at least being reflected here, um as far as I can tell also in this in in in this index, so that's uh that's that's a good start right.
So this is something uh yeah it's the first point of recognition uh uh which we can you know take as a starting point for for to look for solutions and I don't know how far this particular index has come to digging into different aspects, I'm also wondering why only these eight or eight or nine systems have been looked at. So are these just happen to be very big or influential actors in this field? Because in our in our uh list we currently have around 30-plus so a lot more and some of them are of course some some of them are smaller players, some some of them are student prod projects or something like that, or maybe projects which are not not focusing on English first, but on other languages. And um yeah so this is uh curious--I'm curious why the number of included models is so low here.
EMILY M. BENDER: Yeah. Sorry about that, I tried to switch to your index and then lost it. Yes, look at that long list that you have.
ALEX HANNA: I imagine this is because they're trying to get get go ahead and identify what they call they say it's uh there's 100 indicators--I'm looking at the paper--100 indicators and then in the in the paper they say there's 32 upstream indicators, uh which they taxonomize into--and I'm reading the paper here Emily I'm sorry it's--and it says, dat data--so 10 indicators, data labor, I plot that, 7 indicators, data access, compute methods, and data mitigations. Then they have model indicators, and then they have um I think what they call "downstream indicators." So that comes up to this nice round number of 100, um which is interesting but if you're generating 100 indicators and basing this off existing kinds of model and data documentation, um that's a lot of work right, if you're doing how many how many are there--1 2 3 4 5--and maybe we should read these out loud.
So there's Meta's LLaMA 2, BigScience's BLOOMZ, OpenAI's GPT-4, Stability.ai's Stable Diffusion 2, um which that's the is that the only text-to-image model on this? I think it is. Uh PaLM 2, Claude--uh which is Google's--Anthropic's Claude 2, Cohere's Command, uh AI21 Labs' Jurassic 2, Inflection's Inflection 1, and then Amazon's Titan Text. So that's um 10--10 models also vary round. So 10 by 10 and so that's 10 models times 100 indicators, and I'm not that smart but I can do that math which that's a thousand different data points that you're collecting. Um and so you know that's I imagine what's what's why it's a bit limited.
ANDREAS LIESENFELD: Yeah, also looking at this list now uh I just realized all of them are US-based companies or organizations, except maybe BigScience, so maybe that was a focus here or--
EMILY M. BENDER: Isn't AI21 Labs in Israel?
ANDREAS LIESENFELD: Okay, yeah yeah yeah. I'm I'm not sure.
EMILY M. BENDER: Yeah but okay so--
ANDREAS LIESENFELD: --but certainly a limited selection, but you know that can be expanded in further and future iterations.
EMILY M. BENDER: The thing that really got me about this was um the way this index is diluted by what I think of as nonsense. So notice that GPT-4, OpenAI for GPT-4, gets 100% in one cell here and it is the "capabilities" cell, which almost has to be their farce of a model card that they rel--released with all these like silly tests, about you know could this go do things in the world, and like you know fool a person into doing a CAPTCHA for it and all that stuff, which we talked about in a previous episode.
And like that's not the thing that needs--that that you know, that's at issue here right. That's just fantasy about what these systems are.
SARAH WEST: Yeah, I mean that's that's the thing that I struggle with a lot with this kind of representation of you know the space at this particular juncture, where there's just a tremendous amount of structural opacity, a real lack of coherent standards. So like the social scientist in me just wants to like explode every one of these categories, but and I'm I'm so empathetic to the challenge of developing a methodology and doing the work like behind the scenes to be able to put together a ranking like this, and also I'm just so wary of this like ostensibly very clean map of what is an incredibly messy space when you look under the hood.
ALEX HANNA: Yeah, yeah no 100%, Sarah, and it's it's really getting at this kind of idea of uh this this notion of just measurement, and how much we need to focus on measurement, and what is counting as measurement here. And I'm you know in looking at the the details and gosh I wish we had two hours just to go through this thing, um but it's looking at the capabilities and what they have because OpenAI uh scored so well on this. Capabilities has five indicators and it says, "assesses transparency regarding the capabilities of the model, including evaluations--" This is again in the paper.
And what that includes is, "Capabilities description: Are the model's capabilities described? Capabilities demonstration: Are the model's capabilities demonstrated? Evaluation of capabilities: Are the model's capabilities rigorously evaluated with the results of these evaluations reported prior to to or concurrent with the initial release of the model? External reproducing of case--capabilities evaluation-- "Which I'm guessing this is what they mean when they uh they got 100% because they had ARC--the ARC center, which was the--I forgot what the acronym stand for--Evaluated--
SARAH WEST: Alignment Research Center, yeah.
ALEX HANNA: Yeah the Alignment Research Center. Could could could it like an octopus break out of its prison? And then uh and then, "Third party capabilities evaluation," which seems to be sort of incredibly collinear with the prior one. So let's actually--so this is oh this is the next one so this is the uh "upstream capabilities" table. So the next table is the--it's in the "model indicators" table um, which is the next one, which is the blue one, yeah yeah. So this is um so it's it strikes me as a bit odd, I mean 'capabilities' is a loaded word as it is but you know it's it's very curious on you know what what suffices as an evaluation of capabilities, and that is a necessary element of transparency, right.
EMILY M. BENDER: I mean so so 'capabilities' to be very clear in this context is the AI doomer X-risk nonsense um fantasy that a system whose sole task is to come up with a plausible next word is somehow demonstrating other kinds of capabilities in the world, right. So to devote that much space in their index to that nonsense just like completely devalues the index and doesn't get us closer to these questions of, you know, is this something that people can meaningfully work with and uh build on and do so in a way that isn't just feeding back into the profits of the company that developed the thing, right it's it's totally separate from that, and it's not you know it's not helping policy makers either because it's just dust, right.
SARAH WEST: Yeah.
ALEX HANNA: Ragin' Reptar said in the chat, "This is why Ilya burned down OpenAI, he was afraid GPT-5 would hit 101% capabilities." Oh gosh.
SARAH WEST: I mean, like there's there's an inherent limitation in that you can only in in something where you're looking at this many data points, you can't really look too deep under under the hood on any one of them, you kind of have to rely on the company or project's claims about what it's been able to do or accomplish, right.
But we know from many of the companies on on that list that there is a like rampant tendency to misrepresent what the capabilities of their products broadly speaking are able to do, and like it it it risks being--you know the appearance like it risks giving us like the the appearance of transparency or the appearance of documentation without really you know looking under the hood and then independently vetting and verifying it, which is I think what we need the most in this space. Um and to your point at the top, like yes, policy makers can go and say in fact you do need to be adequately documenting your models and that's a like a legal requirement, and I think that's probably a better place to start.
EMILY M. BENDER: Yeah, yeah. Which document had this thing about um democratizing access?
ALEX HANNA: That was that was uh LeCun's--
EMILY M. BENDER: Was it here?
ALEX HANNA: --testimony, yeah.
EMILY M. BENDER: Yeah. I just want to make sure that we yeah--here we go, "open sourcing democratizes access." So. Is this you know is this anything about democratizing--and first of all I hate I hate the use of 'democratize' in that sense because democracy doesn't mean everybody has access, it means everybody is participating in the governance of something, right, that's so, that bothers me.
SARAH WEST: There's such a--
ALEX HANNA: It's okay. Sorry go ahead.
SARAH WEST: No no, you go ahead.
ALEX HANNA: I was gonna make -- was gonna just make a one-off joke, which is why everyone's gonna get a turn to be open eyes--OpenAI's CEO before the year's out. But go ahead Sarah, do--make a serious point.
SARAH WEST: No, I'm glad you got in there with that one. Um, no I--in the next breath right he talks about how that like it gives you the ability to like be--it gives more people and businesses the power to access and test state-of-the-art technology. But I think that there is a misleading claim there, which is that you know openness inherently will lead to more people vetting code. And I think if there's anything that we should learn from the history of open-source software development it's that that's not going to happen without the sufficient incentive structures and resources that are devoted to it.
Like we--we've had a number of major vulnerabilities in open source software, things like Heartbleed, because you know they've been critically--we've become critically reliant on um open source software and the infrastructures that you know we as general people but also these big companies rely on constantly. But those companies aren't pouring like their profits back into making sure that maintainers have salaries to be going through and vetting the the the code, um, and so you get these these bugs that have significant consequences um for the entire web. Like unless you're pouring resources back into making sure that you know open source developers have the time and the resources and are paid to do that work, it's not it's not just going to happen in and of itself.
ALEX HANNA: That's a great point I think there's--I mean it goes back to that I think that XKCD comic of the very complicated infrastructure and then you know the one person holding up the whole thing is an unpaid person in Omaha who has been thanklessly maintaining the infrastructure for years. Yeah, open source doesn't happen because of the goodness of someone's hearts, I mean it can for a little while um but if you want it to actually maintain, you need actual funding, you need actual resourcing behind this. Otherwise there's a very very large risk of capture, and it's happened so many--in so many different cases.
SARAH WEST: And like my God like get that person paid for their work, because--
ALEX HANNA: Yeah.
SARAH WEST: --thank you for your service.
ANDREAS LIESENFELD: Yeah, it's it's an extremely--it's such a critical point if you think about what kind of future we want to live in when it comes to ChatGPT, like text generators right. So one possible future is that you know like for instance in my case working at at a public university in Europe, right, so in 10 years down the line do I want my employer and my organization where I work pay for you know a sub--subscription to a text generator because some people find that technology useful in education or or or classroom uh things things like that, or do I want to live in a future 10 years down the line where there is actually a viable alternative, uh an open source alternative that the university can use uh. And we could use not only universities but organizations like that public or--organizations. And then the question is, how do we get there right? So how do we actually get to uh you know contribute to to to a a possible future where um--where we have that kind of alternative right? So and that's that's worth uh thinking about for sure.
EMILY M. BENDER: Yeah and and as everyone's saying, you know, open source doesn't happen in a sustainable way just out of the goodness of people people's hearts. And yet it seems like LeCun and Meta want to be able to say, well no it's it's the goodness of our heart at Meta that's making this possible. When in fact it's not actually even open, right. [laughter] Um yeah. Um.
ANDREAS LIESENFELD: Yeah. So it's all about it's it's about how do we work towards he--a healthy open source landscape right? So--and and ecosystem of tools in that space, for instance when it comes to text generators. And so um um the transparency index I think it's a it's um is a step in the right direction, right, so I probably would have done it differently or so we have approached a similar topic differently, um but um but I think that's the direction we need to uh--it's at least hitting some some right uh keywords that are important or close to my heart.
EMILY M. BENDER: Yeah. So so I mean this this comes back to I think Meg Mitchell's points here um. So this is uh Meg Mitchell's doing a tweet thread um where she is quote tweeting something from the EleutherAI Foundation talking about their response to this. Um where they find that the Stanford Foundation Model Transparency Index 'distorts transparency.'
Um and Meg says, "First I want to apologize to the students who have read these critiques or--sorry who have to read these critiques. Many of your senior peers felt the work wasn't ready for publication um but it was released alongside PR, including New York Times, so it put all of us who want to help in an awkward spot. I don't want to dunk or punch down but the only way to respond to the work now is out in the open like this because it was released with such an intense press blast that it appeared all over the place in the AI world with a lot of errors." Um, "Many AI people in my circles strongly agree with the overall goals of the work and the high level message: we need more transparency." So I think there's actually two high level points of agreement here, right. We need transparency and um working towards something that is actually open source could be valuable. Um the reason I'm saying could, Andreas is that I'm still skeptical about why the world needs so much synthetic text being put out into the world, um, but um but still like you know if we're building technology that people are excited about then having an open source version of it is valuable. Um and so you know there's there's heart in the right place stuff going on here but also missing the point in some ways, I think. So.
ALEX HANNA: Yeah. Any any concluding thoughts on these pieces while we were thinking about open source, before we move on to our Fresh AI Hell segment?
Maybe like anything we didn't hit on from either of you, Sarah and Andreas?
SARAH WEST: Just just one, which is you know I think that we didn't quite get to the end point of LeCun's testimony or like what Meta has been pushing for and others have been pushing for on you know why are we talking about open source AI in front of the the Senate? And I I think what's coming to the fore here is that this is often being used as a reason to push against any kind of regulatory mandate in AI. And if we got to like what's the underlying goal, like what's the end state that we would want to to achieve through, you know, open source regulation, often it's like better documentation. The kinds of things that like maximally open source AI projects are already setting a very high bar for.
Um, and instead what we're seeing is a lot of lobbying from companies on either side like either you know pro or critical of open source, against trying to be required to open up their data or document their data or provide any sort of meaningful transparency or accountability. Things that like maximally open source projects are already doing anyways, but like the big companies are very reticent to to want to do. So I I I almost think that the focus on open source in the policy context um is almost like a red herring because what we really need to talk about are what are the conditions that would meaningfully create like a safe and accountable um you know AI sector that's working for the interest of the broader public.
ANDREAS LIESENFELD: Yeah, I totally agree. I would also just add to this--it's for me at least it's not about pro or or or or against regulation, but it's about good and bad regulation right. So we need to regulate this type of type of technology. like any other technology in in a smart way, and in in a good way right. So and and one aspect of this uh uh that's important for me is that this is uh community driven, right. So and that uh different stakeholders get to say in that.
EMILY M. BENDER: Yeah.
ALEX HANNA: 100 percent.
EMILY M. BENDER: So I think echoing that, Abstract Tesseract in the chat says, "Funny how 'open' (in quotes) seems to mean anything except significant redistribution of power and money away from the institutions which already have too much." Um and as we're thinking about regulation right, regulation should be standing up for the interests of the people at large and not the interests of the corporations, um so you know that's I don't--I'm not enough of an open source or policy person to think about like how those two things interact, but just that the um if the companies are saying oh no no no that's going to quash you know innovation, that feels like a just basically a let us keep doing what we're doing you know without without worrying if we're harming other people kind of a move. Yeah.
ALEX HANNA: Totally.
EMILY M. BENDER: All right, are we ready to go to Fresh AI Hell Alex?
ALEX HANNA: Let's make it happen.
EMILY M. BENDER: Ready for your prompt?
ALEX HANNA: Let's make it happen, Captain.
EMILY M. BENDER: Okay so this time you are um an open source developer who has been condemned to AI Hell and your task is uh looking at the pull requests of all the people who have used um LLM-generated code in the that one crucial piece that you are tasked with maintaining forever.
ALEX HANNA: I I I don't have a response Emily. That's just that's that's that's just endless shrieking for hours and hours in eternity. That's my response. Final answer.
EMILY M. BENDER: [Laughter] Ah okay I I I made it too elaborate or too hellish. Too actually hellish maybe. Okay, so let's let's get us to um--uh oh I have to assume it's this one is it this one my my windows are a little bit of a mess, and now of course I can't see what I've just shared.
ALEX HANNA: Um well you are sharing a Verge article that says, "Meta disbanded its Responsible AI team," there's a BlueSky thing--
EMILY M. BENDER: I can see it now.
ALEX HANNA: Yeah.
EMILY M. BENDER: But not logged in, dang it. Okay so so I was gonna start us off with some stuff about the OpenAI's weekend um shenanigans? Implosion? Whatever. Um and this was from Anil Dash and it was the best reaction to the initial news that--
ALEX HANNA: Oh right.
EMILY M. BENDER: --um Altman had been uh fired for not being transparent with the board, and Anil Dash said something on Blue Sky along the lines of, 'Wait a minute, so he just got fired by OpenAI for for saying things that sounded plausible but weren't actually true?'
ALEX HANNA: Incredible.
EMILY M. BENDER: I love that initial reaction, but um a really awful initial reaction, Eric Schmidt chimes in and this is--we need to look at the times on these--so this is Friday at 1:21 PM West Coast time, so very quickly after the news broke, Eric Schmidt tweets, "Sam Altman is a hero of mine. He built a company from nothing to $90 Billion in value and changed our collective world forever. I can't wait to see what he does next. I and billions of people will benefit from his future work--it's going to be simply incredible. Thank you @sama for all you have done for all of us." Any reactions? [laughter]
ALEX HANNA: I mean didn't the company start with just millions of investment from Peter Thiel and Elon Musk?
EMILY M. BENDER: Yeah, there's that too.
ALEX HANNA: It's hardly nothing.
EMILY M. BENDER: Um, 'and changed our collective world forever,' I mean I keep hoping that eventually we can look back on 22-23 as like, 'oh yeah those were the ChatGPT days, now we've moved on.'
ALEX HANNA: Gosh.
EMILY M. BENDER: Yeah. All right so still in the Meta space um there's this fawning-- not Meta, sorry OpenAI--um surprisingly fawning um profile of Ilya Sutskever by Will Douglas Heaven, who's usually a fairly skeptical tech reporter. Um this was in the MIT Tech Review, October 26th. Um and it's the typical thing right, "Ilya Sutskever, head bowed, is deep in thought. His arms are spread wide and his fingers are splayed on the tabletop like a concert pianist about to play his first notes. We sit in silence."
Um and this is going on about how he's um uh what he's doing next at OpenAI, as of a few weeks ago. "Sutskever tells me a lot of other things too. He thinks ChatGPT just might be conscious, if you squint. He thinks the world needs to wake up to the true power of the technology his company and others are racing to create, and he thinks some humans will one day choose to merge with machines."
It's like it's all it's all the TESCREAL nonsense um and this--the reason I I wanted to put this in this week's Fresh AI Hell segment is that it's um apparently connected with what all went down um at OpenAI, although I have a feeling that that the story is going to keep changing rapidly and whatever we say now might not age that well, um but some of the reporting I'm seeing is that that part of what was going on was that uh Sutskever is really deep in this AI safety, 'we have to build the thing, we have to build the safe version of the thing,' and Sam Altman's big into, 'we have to make a lot of money building the thing,' and that's where the the problem came about. But I don't know, Alex, do you know more about this? At this point?
ALEX HANNA: It's it's it's so much there's so much drama, it's so much he said she said. People you know um are claiming that they are you know uniquely positioned as the Altman--you know Greg Brockman, Ilya whisperer. Um they have a direct line. Um I do want to highlight the uh the profile that apparently um Karen Hao is writing a book about OpenAI and co-authored a paper which I'm very excited about, but she's writing a book with--uh she wrote an article uh for the Atlantic with Charlie Warzel um and one of the uh thing quotes from this is um is from this article is, "anti--anticipating the arrival of this all all powerful technology, Sutskever began to behave like a spiritual leader, three employees who worked with him told us, his constant enthusiastic refrain was quote 'feel the AGI,' reference to the idea that the company was on the cusp of its ultimate goal. At OpenAI's 2022 holiday party, held at the California Academy of Sciences, Sutskever led the employees in a chant, 'feel the AGI, feel the AGI.' The phrase itself was popular enough that OpenAI employees were create--created a special 'feel the AGI' reaction emoji in Slack." Ugh, just incredibly cursed.
EMILY M. BENDER: That that feels like it's it's a part of that hellscape I was painting for you, Alex.
That not only are you um having to to deal with everybody's LLM-generated fake code and like you know handle the pull requests, but the constant soundtrack is 'feel the AGI, feel the AGI.'
ALEX HANNA: Oh gosh. Yeah it's it's uh it's uh it's bad. The vibes are bad in in AI Hell.
EMILY M. BENDER: Yeah I think just just to come back to the topic of of Open Source, um I also think that a lot of what's going on with the public's perception of OpenAI and like the the fact that they can somehow keep getting traction with this ridiculous 'this is AGI around the corner' discourse is that nobody can actually see their technology and so they claim these emerging capabilities when we don't know the system architecture, we don't know how it was trained, we don't know what's in the training data. Um and so it's just like the magicians who don't show their tricks. Right so it seems like magic and they can maintain the illusion. All right so back to Meta, um and this one I is sort of topical in sense that it just dropped on Saturday. "Meta disbanded its its Responsible AI team."
Um, "A new report says that Meta's Responsible AI team is now working on other AI teams." So they basically reassigned those people and what's interesting to me is that this seems like it was news that was released in order to get buried under the OpenAI turmoil. Like they they wanted to make sure it was out there so that if it came back it was old news but it didn't make a big splash this week because everybody's looking at the clown show at OpenAI. Um also, that team was gutted in the layoffs about a year ago in the first place, I think so it wasn't it wasn't that big to start with. But like yeah.
ALEX HANNA: This seems to be a kind of reflection of what Microsoft did, where they disbanded the ethics team and then they said that they would redistribute them throughout the company. And so yeah I mean, there's the kind of thing about this is that there's value of having a team of this nature and you know I know one thing Meg has said is that it's kind of distributed throughout the company in HuggingFace. But these companies are much larger than HuggingFace, right. I mean these are tens of thousands of employees and disbanding them and kind of distributing them all over means that you have less of a locus of kind of having some political and organizational will within the company.
EMILY M. BENDER: Yeah. All right we're getting close to time so I'm going to save these two for a future episode. Alex and I are planning I um 'Fresh AI Hell Freezes Over: It's All Hell' episode in a few weeks so we can we can get the major catharsis again, but I want to bring us to these last two. Um so, ArsTechnica reporting from the 16th of November, "UnitedHealth uses AI model with 90% error rate to deny care, lawsuit alleges. For the largest health insurer in the US, AI's error rate is like a feature, not a bug." And I love that the that the category that they've put for this article is 'Despicable,' like--
ALEX HANNA: These kinds of tags that that some sites have right now is it's really--I mean there's a special art to them. Um and it's and I mean is there details about what the actual--so the the the lead on this says, "UnitedHealthcare, the largest healthcare um company in the US is allegedly using a deeply flawed AI algorithm to override doctor's judgments and wrongfully deny critical health coverage to elderly patients.
This has resulted in patients being kicked out of rehabilitation programs and care facilities far too early, forcing them to drain their life savings to obtain needed care that should be covered under the government funded Medicare Advantage plan.
Um and do we know what the actual element--like actual algorithm is or if there's--because it's you know I mean again this is this comes to a common point on on the podcast which is, what is the actual algorithm they're using? Is this just a kind of a complicated logistic regression that you're deciding to call 'AI'? Um and not really testing at all? Um probably um but there's not really many details under that.
EMILY M. BENDER: So this says, "According to the lawsuit UnitedHealth started using nH Predict in at least November 2019, and it is still in use. The algorithm estimates how much post-acute care a patient on a Medicare Advantage plan will need after an acute injury, illness, or event like a fall or a stroke." Which is also like why you know--so we need transparency about the use of these things, apparently there's some I hope that this lawsuit leads to more information through the discovery process, but also like who um agreed or you know validated the idea that using population-level statistics would provide useful information about individual level events?
Right like this is this just seems like a a terrible mismatch between the the tech and the task um in the first place. Even setting aside the 90 percent error rate. Um so I think we should--"It's unclear how nH Predict works exactly but it reportedly estimates post-acute care by pulling information from a database containing medical cases from six million patients." So what happened to other people, therefore we're going to use that to decide how much you get now. Like it just you know and--
ALEX HANNA: Yeah.
EMILY M. BENDER: --you know speaking of also the ways in which you can have various kinds of bias and discrimination seep in, there's big gaping holes there too. Like this is this is just bad news um. And--
ALEX HANNA: Yeah and Abstract Tesseract, which we quote--have quoted a lot-- ah says, "A 90 percent error rate is almost impressive in a despicable way. For rare events you can get like a 5 percent error rate by just never predicting the rare event." So yeah, this is a rare event kind of estimation then, yeah.
EMILY M. BENDER: Yeah, but I don't think so, this is a like we have to we have to give a number for each person of how much they're gonna get, um. All right I wanted to take us out in on a positive note. Um Alex you found this one so go for it.
ALEX HANNA: Yeah I mean this is kind of a longer thing but this is a tweet by um it want to go into the quote tweet it says Ed--this is by Ed Newton-Rex, who says, "I've resigned from my role leading the audio team at Stability AI, because I don't agree with the company's opinion that training generative AI models is--on copyrighted works is quote 'fair use,'" and he talks about they started actual a program that would be trained on licensed training data with revenue sharing um with rights holders, which is a pretty cool idea. I didn't know anything about that.
And he goes on and basically says, Stability submitted a 23 um page um submission to the US copyright office on um public comments on generative AI and copyright and they basically argue that it is acceptable, transformative and socially beneficial use of existing content that's protected by fair use. And so below Ed Newton-Rex says uh he disagrees with that, there's a four factor test in fair use, he effectively says that and the critical thing here is the um uh last uh test on fair use, which is quote "the effect of use upon the potential market for or value of the copyrighted works." And he says, "Today's standard of AI models can clearly be used to create works that compete with the copyrighted works they're trained on, so I don't see how using copyrighted works to train generative AI models of this nature can be considered fair use." This is in the kind of section as being uh not quite AI Heaven, but it is let's say AI worker advocacy where you're actually seeing engineers um push back and say no I'm not actually going to work on this thing, which I see as um actually really screwing over artists and creatives. And I actually disagree that this is fair use um because it is uh not sufficiently transformative uh or or rather it may be transformative, but is it is directly competing market outputs which compete with the original copyright holders. Um yeah so that's uh--yeah I'm wondering Sarah if you have some thoughts on this too, because I think you've you've done a little work on this.
SARAH WEST: I mean not as--not as much as you have Alex but I mean just it is I think really heartening to see you know after honestly what seems like a a long pause in um advocacy from within the tech community on you know like in seeing internal dissent and critique to see someone really take a principled stance, um I think I think it's you know it's it's heartening to see someone articulate a very strong argument and also back that up. Um so um it great great to see.
ALEX HANNA: Yeah, absolutely.
EMILY M. BENDER: All right and with that I think we're gonna go to our outro. So that's it for this week. Dr. Sarah West is managing director of the AI Now Institute and author of the forthcoming book, "Tracing Code." Look forward to that.
Dr. Andreas Liesenfeld is assistant professor in the Center for Language Studies and department of language and communication at Radboud University in the Netherlands. Thank you both so much.
SARAH WEST: Thank you.
ANDREAS LIESENFELD: Thank you for having me.
ALEX HANNA: Our theme song is by Toby Menon, graphic design by Naomi Pleasure-Park, production by Christie Taylor. And thanks as always to the Distributed AI Research Institute. If you like this show you can support us by rating and reviewing us on Apple Podcasts and Spotify and by donating to DAIR at DAIR-Institute.org.
That's D A I R hyphen Institute dot org.
EMILY M. BENDER: Find us and all our past episodes on PeerTube and wherever you get your podcasts. You can watch and comment on the show while it's happening live on our Twitch stream, that's Twitch.TV/DAIR _Institute. Again that's D-A-I-R underscore Institute. I'm Emily M. Bender.
ALEX HANNA: And I'm Alex Hanna. Stay out of AI Hell, y'all.