AI Safety Fundamentals: Alignment

Intro to Brain-Like-AGI Safety

June 17, 2024 BlueDot Impact Season 13
Intro to Brain-Like-AGI Safety
AI Safety Fundamentals: Alignment
More Info
AI Safety Fundamentals: Alignment
Intro to Brain-Like-AGI Safety
Jun 17, 2024 Season 13
BlueDot Impact

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)


Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?

I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them.


If this whole thing seems weird or stupid, you should start right in on Post #1, which contains definitions, background, and motivation. Then Posts #2#7 are mainly neuroscience, and Posts #8#15 are more directly about AGI safety, ending with a list of open questions and advice for getting involved in the field.


Source:

https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8


Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Show Notes Chapter Markers

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)


Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?

I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them.


If this whole thing seems weird or stupid, you should start right in on Post #1, which contains definitions, background, and motivation. Then Posts #2#7 are mainly neuroscience, and Posts #8#15 are more directly about AGI safety, ending with a list of open questions and advice for getting involved in the field.


Source:

https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8


Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

3. Two subsystems: Learning & Steering
3.1 Post summary / Table of contents
3.2 Big picture
3.2.1 Each subsystem generally needs its own sensory processor
3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater
3.4 Three types of ingredients in a Steering Subsystem
3.4.1 Summary table
3.4.2 Aside: what do I mean by “drives”?
3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive)
3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives)
3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance)
6. Big picture of motivation, decision-making, and RL
6.1 Post summary / Table of contents
6.2 Big picture
6.2.1 Relation to “two subsystems”
6.2.2 Quick run-through
7. From hardcoded drives to foresighted plans: A worked example
7.1 Post summary / Table of contents
7.2 Reminder from the previous post: big picture of motivation and decision-making
7.3 Building a probabilistic generative world-model in the cortex
7.4 Credit assignment when I first bite into the cake
7.5 Planning towards goals via reward-shaping
7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now