In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
When sensing defeat in a match against a skilled chess bot, advanced models sometimes hack their opponent, a study found.
From October 2009 to October 2024, ransomware and hacking have increasingly driven healthcare data breaches, a May 14 study published in JAMA Network Open found. The study examined ransomware attacks ...
In a discovery that could reshape how the tech world thinks about AI security, a new study by Anthropic has revealed a surprisingly simple method for compromising large language models (LLMs).
Academic study finds 25 attack methods in major cloud password managers exposing vault, recovery, and encryption design risks.