Joe Carlsmith Audio
Audio versions of essays by Joe Carlsmith. Philosophy, futurism, and other topics. Text versions at joecarlsmith.com.
Joe Carlsmith Audio
Arguments for/against scheming that focus on the path SGD takes (Section 3 of "Scheming AIs")
•
Joe Carlsmith
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
This is section 3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
3. Arguments for/against scheming that focus on the path that SGD takes
3.1 The training-game-independent proxy-goals story
3.2 The “nearest max-reward goal” story
3.2.1 Barriers to schemer-like modifications from SGD’s incrementalism
3.2.2 Which model is “nearest”?
3.2.2.1 The common-ness of schemer-like goals in goal space
3.2.2.2 The nearness of non-schemer goals
3.2.2.3 The relevance of messy goal-directedness to nearness
3.2.3 Overall take on the “nearest max-reward goal” argument
3.3 The possible relevance of properties like simplicity and speed to the path SGD takes
3.4 Overall assessment of arguments that focus on the path SGD takes