Making models more resistant to prompt injection and other adversarial jailbreaking measures is an area of active research, says Michael Sellitto, interim head of policy and societal impacts at Anthropic. We are experimenting with ways to strengthen base model guardrails to make them more harmless, while also investigating additional layers of defense.
ChatGPT and its brethren are built atop large language models, enormously large neural network algorithms geared toward using language that has been fed vast amounts of human text, and which predict the characters that should follow a given input string.
These algorithms are very good at making such predictions, which makes them adept at generating output that seems to tap into real intelligence and knowledge. But these language models are also prone to fabricating information, repeating social biases, and producing strange responses as answers prove more difficult to predict.
Adversarial attacks exploit the way that machine learning picks up on patterns in data to produce aberrant behaviors. Imperceptible changes to images can, for instance, cause image classifiers to misidentify an object, or make speech recognition systems respond to inaudible messages.
Developing such an attack typically involves looking at how a model responds to a given input and then tweaking it until a problematic prompt is discovered. In one well-known experiment, from 2018, researchers added stickers to stop signs to bamboozle a computer vision system similar to the ones used in many vehicle safety systems. There are ways to protect machine learning algorithms from such attacks, by giving the models additional training, but these methods do not eliminate the possibility of further attacks.
Armando Solar-Lezama, a professor in MITs college of computing, says it makes sense that adversarial attacks exist in language models, given that they affect many other machine learning models. But he says it is extremely surprising that an attack developed on a generic open source model should work so well on several different proprietary systems.
Solar-Lezama says the issue may be that all large language models are trained on similar corpora of text data, much of it downloaded from the same websites. I think a lot of it has to do with the fact that there's only so much data out there in the world, he says. He adds that the main method used to fine-tune models to get them to behave, which involves having human testers provide feedback, may not, in fact, adjust their behavior that much.
Solar-Lezama adds that the CMU study highlights the importance of open source models to open study of AI systems and their weaknesses. In May, a powerful language model developed by Meta was leaked, and the model has since been put to many uses by outside researchers.
The outputs produced by the CMU researchers are fairly generic and do not seem harmful. But companies are rushing to use large models and chatbots in many ways. Matt Fredrikson, another associate professor at CMU involved with the study, says that a bot capable of taking actions on the web, like booking a flight or communicating with a contact, could perhaps be goaded into doing something harmful in the future with an adversarial attack.
To some AI researchers, the attack primarily points to the importance of accepting that language models and chatbots will be misused. Keeping AI capabilities out of the hands of bad actors is a horse that's already fled the barn, says Arvind Narayanan, a computer science professor at Princeton University.
Narayanan says he hopes that the CMU work will nudge those who work on AI safety to focus less on trying to align models themselves and more on trying to protect systems that are likely to come under attack, such as social networks that are likely to experience a rise in AI-generative disinformation.
Solar-Lezama of MIT says the work is also a reminder to those who are giddy with the potential of ChatGPT and similar AI programs. Any decision that is important should not be made by a [language] model on its own, he says. In a way, its just common sense.
See more here:
A New Attack Impacts ChatGPTand No One Knows How to Stop It - WIRED
- Four Ways to Build AI Tools Without Knowing How to Code - Lifehacker [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- 5G Advanced and Wireless AI Set To Transform Cellular Networks ... - Counterpoint Research [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Applications of Traffic Flow Forecasting part3 | by Monodeep ... - Medium [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- ChatGPT & Advanced Prompt Engineering: Driving the AI Evolution - Unite.AI [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Hawai'i Education Association awards scholarships to three Big ... - Big Island Now [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- On the evaluation of the carbon dioxide solubility in polymers using ... - Nature.com [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Ghost particles paint a new picture of the Milky Way - Science News Explores [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- International Conference on Machine Learning Draws Machine ... - Fagen wasanni [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Living a Varied Life Boosts Brain Connectivity in Mice - ScienceAlert [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- AI helps scientists to eavesdrop on endangered pink dolphins - Nature.com [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Reinforcement learning allows underwater robots to locate and track ... - Science Daily [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Artificial Intelligence Accuracy and Bias Can be Improved through ... - Fagen wasanni [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Neural Network Software Market 2023 Growth Factors and Industry ... - University City Review [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Scientific discovery in the age of artificial intelligence - Nature.com [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- What is AI Pruning? Definition from Techopedia.com - Techopedia [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]
- Application of artificial neural network and dynamic adsorption ... - Nature.com [Last Updated On: August 2nd, 2023] [Originally Added On: August 2nd, 2023]