Episode 12: What is AI? Demystifying ChatGPT Artwork

Trial By Wire

Struggling to understand all of the tech buzz words in the news? We break down complex technology concepts into bite-size pieces to talk about how to keep yourself safe online.

All Episodes

Trial By Wire

Episode 12: What is AI? Demystifying ChatGPT

June 16, 2024 • Denton Wood • Season 1 • Episode 12

You've seen the headlines, but what really is AI? We talk about machine learning, large language models, and why you need to be careful browsing the Internet nowadays, even if you haven't jumped in on the AI craze.

Links:

Clip from the Mitchells vs the Machines: https://www.youtube.com/watch?v=bjj7J18-gv0
Reddit Charting for AI Training Data: https://arstechnica.com/information-technology/2023/04/reddit-will-start-charging-ai-models-learning-from-its-extremely-human-archives/
NYT Sues OpenAI: https://www.nytimes.com/2024/04/30/business/media/newspapers-sued-microsoft-openai.html

Keep up with the show! https://linktr.ee/trialbywireshow

Questions? Comments? Email trialbywireshow@gmail.com

Music:

Intro/Outro: Rock_BPM135 by DigitalDreamer
Segments: Farmland by Musictown

Welcome back to Trial by Wire! My name is Denton, and today, we're talking about the hottest new technology on the market. You know it, you love it, you're scared it's taking over the world - it's AI! We're going to dig into some high-level fundamentals of how AI works and some caution points when using it. By the end of the episode, you should be able to understand a little better what AI is and why it can be problematic. Let's get started.

So first of all, what is "artificial intelligence"? Well, it's not "sentience" - machines aren't taking over the world just yet. Intelligence is defined by Merriam-Webster as "the ability to learn or understand or to deal with new or trying situations." Artificial, in this case, means that this intelligence is being programmed by a person into a machine. Basically, AI is the technology in which machines can understand a problem in the way that a human would and come up with a solution.

But, how do humans understand problems? Let's take an example using "machine learning", a popular field of AI which deals with data processing and identification. Imagine that you're trying to teach a toddler what a dog is. You might try to describe it: something that stands on four legs, is furry, and has a wet nose. The problem is that cats, horses, foxes, and a lot of other mammals also fit that definition, and your toddler doesn't speak great English at this point to understand you. So what do you do? You point to a dog and say "that's a dog". You point to a cat and say "that's not a dog". When your toddler points at a squirrel and says "doggy!", you laugh and say "no sweetie, that's a squirrel." You keep doing this over and over with a large number of examples until your toddler finally understands what a dog is. They learned by examples, not by definitions.

Machine learning is the process of teaching a bunch of metal toddlers how to recognize something. Inputs to machine learning algorithms require large quantities of data, both positive and negative examples. If you want to teach someone what a dog is, you also need examples of what a dog isn't. All of this data gets fed into something called a "neural network", a program modeled after neurons in our brains. The data gets sent through the model in a process called "training", which adjusts the individual "neurons" and their connections as the data goes through the network. As a developer trains a neural network, the network "learns" the data by identifying patterns in the data, much like a human would. As a result, it gains the ability to do whatever you're trying to train it to do (for example, how to recognize a dog in a picture).

The important thing to note about training neural networks is that it requires a lot of data; specifically, a lot of diverse data. If I showed our metaphorical toddler a bunch of brown dogs, they might think that all dogs were brown. Then, when I point out a yellow lab walking down the street, they might incorrectly surmise "kitty!" If I only give the model still pictures of dogs, it might not be able to recognize a blurry dog in motion. If I only give the model full pictures of dogs, it might not be able to recognize a picture of only a dog's head. Neural networks need as many data points as possible to work with. If a developer can't scrounge up enough data, the network may not be able to correctly perform its identification task. There's a funny example of this in the movie "The Mitchells vs the Machines" on Netflix. The robots taking over the world (presumably running on some form of AI) are so confused by the incredibly fat family dog that they incorrectly identify it as a loaf of bread, and it fries their systems. In this case, the training data for the AI was insufficient; it could not correctly identify a dog (even though it was very obese).

But, what about ChatGPT and all these chatbots out there? This is something called "generative AI". As the name implies, these networks don't just recognize things; they generate them. These models have been trained on enough data to be able to understand requests for something and respond to those requests. ChatGPT itself is something called a "large-language model" or LLM, meaning that it can understand and respond in human language. Let's think about the complexity of this. Learning how to recognize a dog is easy - most kids can do that. LLMs are the adults of the AI world - they can learn entirely new languages. That requires some very fancy algorithms and a metric ton of training data. Other generative AI models can generate pictures, audio, and even entire videos. AI technology has actually been around for a few decades, but generative AI is what has caught the attention of the masses in recent years as a really useful tool.

AI is pretty cool, but we're about the ethics on this show. Let's get into the goodness and badness of AI.

When you're thinking about AI, the number one problematic area is the training data. What data was the model trained on? Like we've said, in order for the model to work, you need a lot of diverse data that gives the model good examples and patterns to learn on. For LLMs like ChatGPT, you need tons of examples of actual humans talking via text in order for the model to be able to speak like a human; you can't generate that data. But, where are the developers getting that data from? Well, historically, they may have just scanned the Internet. In 2023, Reddit, a popular online forum site, changed its site rules to prevent people wanting to train LLMs from being able to get that data for free. The New York Times and other newspapers have sued OpenAI, the company that made ChatGPT, and its now-parent company Microsoft over copyright infringement for training on the newspapers' articles. Remember our discussion on "who owns the content" with user-generated content in Episode 5? These sites are going to bat over that argument, and we haven't even talked about getting the users' permission to train on their data.

The AI boom was made possible in part by the large amount of user-generated content on the Internet. However, AI is opening up a new can of worms as we wrestle with acceptable uses of data online. Is it ok to make a new product (an LLM) using content that people have made freely available? Part of the discussion goes back to the intention with which people make content available. If you're a content creator like me, you post content frequently that you want other people to see and share (which, by the way, be sure to share this episode with your friends if you liked it. Thanks!). You're probably ok with people taking your content and making something else out of it. For example, I sourced multiple articles when writing the script for this video, and I would be ok with someone sourcing this video to make content on AI (as long as they don't say mean things about me). Am I ok with a generative AI training on my videos to be able to make videos for other people? Probably less so, especially because AI is being frequently used to fool people.

Speaking of fooling people, AI is getting really, really good at generating quality content. That's a success of the technology, sure, but it's also particularly dangerous. You may have heard of something called a “deepfake video”, or an AI-generated video which simulates a person saying or doing something that they haven’t. Deepfakes are made by generative AI models using existing video or audio footage of a person. Celebrities and politicians are notoriously easy since there is so much video footage out there of them. So, just because you see a video of a candidate for office saying something doesn't mean they actually said those things. Keep that in mind when browsing social media; just like phishing scams, always check your sources. It's never been more true that you cannot trust everything you see on the Internet.

If you decide to start using AI, particularly LLMs like ChatGPT, keep in mind that you should treat the generated content like search engine results. Not everything that the AI says is true. A neural network is only as good as the data that it's trained on, and there is a lot of misinformation out there on the Internet for it to be trained on. Can the AI be correct about things? Absolutely, and it can be useful too. I tried out Microsoft Copilot by generating an outline for a podcast about emails, and it served as a reference for me to write Episode 7. Notably, it helped me find a few points that I had missed about email usage, and I was able to write a better episode more quickly because of it. However, I used the AI alongside my own research and experience, which allowed me to analyze the results from it rather than blindly trusting them. if you start treating the AI like an authoritative source instead of a tool, you're going to be misled. AIs can lie, and they are not sentient enough to feel bad about it.

Is AI inherently bad? Not necessarily. There are a lot of ethical problems surrounding the technology, and it's being used for a lot of bad things. But, just like search engines, social media, and pitchforks, AI is a tool that can be used for good or for bad. Importantly, it doesn't look like AI is going away any time soon, so it's important that we start to learn how to live in a world with it. We're going to have to work hard to fight fake content online and find the truth.

For your homework today, do a quick search on "deepfake videos" and see what you can find. As a warning, it's a little scary, so do it when you have some time to process what you're seeing, maybe with someone else sitting alongside you. Understand that this is a real thing and it happens, and you need to be able to recognize it so that you don't fall for it. Think about that, and I'll see you next time!

Hey, thanks for listening! Subscribe for more if you like what you heard. If you’re on YouTube, give us a like and a comment, or rate and review us on your favorite podcast feed. It helps out a lot! If you want to talk to us, you can find us on X (formerly known as Twitter) or on Instagram at @trialbywireshow or on Facebook at facebook.com/trialbywirepodcast. You can also send me an email at trialbywireshow@gmail.com. See you soon!