In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
When sensing defeat in a match against a skilled chess bot, advanced models sometimes hack their opponent, a study found.
From October 2009 to October 2024, ransomware and hacking have increasingly driven healthcare data breaches, a May 14 study published in JAMA Network Open found. The study examined ransomware attacks ...
In a discovery that could reshape how the tech world thinks about AI security, a new study by Anthropic has revealed a surprisingly simple method for compromising large language models (LLMs).
Academic study finds 25 attack methods in major cloud password managers exposing vault, recovery, and encryption design risks.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results