Researchers from the University of North Carolina, Chapel Hill, recently published pre-print research outlining the difficulties in deleting information from LLMs and developing mitigation methods to prevent adversarial prompting attacks.
A trio of scientists from the University of North Carolina, Chapel Hill recentlypre-print artificial intelligence research showcasing how difficult it is to remove sensitive data from large language models such as OpenAI’s ChatGPT and Google’s Bard.
Once a model is trained, its creators cannot, for example, go back into the database and delete specific files in order to prohibit the model from outputting related results. Essentially, all the information a model is trained on exists somewhere inside its weights and parameters where they’re undefinable without actually generating outputs. This is the “black box” of AI.
Here, we see that despite being"deleted" from a model's weights, the word"Spain" can still be conjured using reworded prompts.However, as the UNC researchers point out, this method relies on humans finding all the flaws a model might exhibit and, even when successful, it still doesn’t “delete” the information from the model.“A possibly deeper shortcoming of RLHF is that a model may still know the sensitive information.
Deutschland Neuesten Nachrichten, Deutschland Schlagzeilen
Similar News:Sie können auch ähnliche Nachrichten wie diese lesen, die wir aus anderen Nachrichtenquellen gesammelt haben.
AI, ChatGPT reshaping society, workforceArtificial intelligence and chatbots are reshaping society, author Thomas Fellow discusses the impact on jobs and what individuals need to do to stay relevant in what he calls this new economy.
Weiterlesen »
We Asked ChatGPT When Will The Next Bitcoin Bull Run StartWhen Bitcoin bull market? Let's find out what ChatGPT has to say about it.
Weiterlesen »
ChatGPT Gives Dogecoin & Shiba Inu Price Analysis, Picks Out a New Meme Coin That Could PumpDogecoin (DOGE) and Shiba Inu (SHIB) are back making moves, with both coins seeing significant price pumps over the past few days. Amid this backdrop,
Weiterlesen »
ChatGPT Outperforms Humans in Emotional Awareness TestChatGPT can identify and describe human emotions in hypothetical scenarios.
Weiterlesen »
Can ChatGPT help us form personal narratives?New research found that the language model can produce accurate personal narratives from stream-of-consciousness data.
Weiterlesen »