An organization OpenAI frequently partners with to probe the capabilities of its AI models and evaluate them for safety, Metr, suggests that it wasn’t given much time to test one of the company’s highly capable new releases, o3. In a blog post published Wednesday, Metr writes that one red teaming benchmark of o3 was “conducted […]