Episode 165 - English AI Generated: KS Pulse - Emotional Stimuli, Short Circuiting Artwork

Knowledge Science - Alles über KI, ML und NLP

Knowledge Science - Der Podcast über Künstliche Intelligenz im Allgemeinen und Natural Language Processing im Speziellen. Mittels KI Wissen entdecken, aufbereiten und nutzbar machen, dass ist die Idee hinter Knowledge Science. Durch Entmystifizierung der Künstlichen Intelligenz und vielen praktischen Interviews machen wir dieses Thema wöchentlich greifbar.

All Episodes

Knowledge Science - Alles über KI, ML und NLP

Episode 165 - English AI Generated: KS Pulse - Emotional Stimuli, Short Circuiting

June 10, 2024 • Sigurd Schacht, Carsten Lanquillon • Season 1 • Episode 165

Send us a text

Englisch Version - The German Version also exists, but the content differs minimally:
AI-generated News of the Day. The Pulse is an experiment to see if it is interesting to get the latest news in 5 min. small packages generated by an AI every day.

It is completely AI-generated. Only the content is curated. Carsten and I select suitable news items. After that, the manuscript and the audio file are automatically created.

Accordingly, we cannot always guarantee accuracy.

- Large Language Models Understand and Can be Enhanced by Emotional Stimuli - https://arxiv.org/pdf/2307.11760
- Improving Alignment and Robustness with Short Circuiting - https://arxiv.org/pdf/2406.04313

Support the show

Welcome to the Knowledge Science Pulse podcast where we dive into the latest advancements in artificial intelligence. I'm your host Sigurd and today I'm excited to have Carsten joining me to discuss two fascinating papers that explore emotional intelligence in large language models and a novel approach for improving alignment and robustness. Carsten, great to have you here!

#### Thanks Sigurd, it's a pleasure to be here and discuss these intriguing papers. The first one by Li et al. takes a deep look at whether large language models can genuinely grasp and be enhanced by emotional stimuli, which is a key aspect of human intelligence.

#### That's right, the researchers conducted automatic experiments on 45 tasks using various LLMs like GPT-4, Llama, BLOOM etc. They found that LLMs do have a grasp of emotional intelligence and their performance can be improved by up to 8% on instruction induction tasks and a whopping 115% on BIG-Bench tasks by using emotional prompts.

#### Yes, and interestingly, these emotional prompts were designed based on psychological theories like self-monitoring, social cognitive theory and cognitive emotion regulation. Simply adding phrases like "This is very important to my career" or "Believe in your abilities and strive for excellence" to the original prompts enhanced the LLMs' performance significantly.

#### And to validate these findings on more open-ended generative tasks, the authors also conducted a human study with 106 participants. They found that emotional prompts boosted LLM performance by an average of 10.9% in terms of metrics like overall quality, truthfulness and responsibility of the generated responses.

#### That's really impressive. The paper provides an insightful discussion on why these emotional prompts work by analyzing their effects on the LLMs' internal representations and gradients. Factors like model size and temperature also seem to play a role.

#### Absolutely. Moving on to the second paper by Zou et al., they propose a novel approach called "short-circuiting" to make AI systems safer and more robust to adversarial attacks that try to make models generate harmful content.

#### Yes, instead of trying to detect and refuse potentially harmful requests, which can often be bypassed, their method directly targets and remaps the model's internal representations that are responsible for generating the harmful outputs in the first place.

#### Exactly, by linking these harmful representation states to incoherent or refusal states, they can reliably prevent the model from producing undesirable content, even under strong adversarial attacks. They demonstrate this on both language models and multimodal vision-language models.

#### And notably, their short-circuited models achieve significantly lower attack success rates while preserving performance on standard benchmarks. For example, their enhanced LLaMA model called Cygnet reduces harmful outputs by nearly 100X even under powerful attacks.

#### That's a remarkable advancement in making AI systems safer and more reliable. The authors also extend the short-circuiting approach to AI agents, showing considerable reductions in harmful actions when under attack.

#### Indeed, this paper presents a major step forward in developing robust safeguards against adversarial attacks and misuse of AI systems. By making models intrinsically safer, it opens up exciting possibilities for deploying AI more reliably in the real world.

#### Well, those were two highly impactful papers. It's exciting to see the rapid progress in making AI systems more emotionally intelligent and robust. Thanks for the insightful discussion Carsten!

#### It was a pleasure Sigurd! Looking forward to more such stimulating conversations on the latest AI breakthroughs. Subscribe to the podcast to never miss an episode. See you next time.