OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied
A discrepancy between first- and third-party benchmark results for OpenAI’s o3 AI model is raising questions about the company’s transparency and model testing practices.
When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on
OpenAI’s new reasoning AI models hallucinate more
OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.
Hallucinations have proven to be one of the biggest and most
ChatGPT is referring to users by their names unprompted, and some find it ‘creepy’
Some ChatGPT users have noticed a strange phenomenon recently: Occasionally, the chatbot refers to them by name as it reasons through problems. That wasn’t the default behavior previously, and several users claim ChatGPT is mentioning their names despite never having been told what to call th
ChatGPT will now use its ‘memory’ to personalize web searches
OpenAI is upgrading ChatGPT’s “memory” again.
In a changelog and support pages on OpenAI’s website Thursday, the company quietly announced “Memory with Search,” a feature that lets ChatGPT draw on memories — details from past conversations, such as your favori
Google’s latest AI model report lacks key safety details, experts say
On Thursday, weeks after launching its most powerful AI model yet, Gemini 2.5 Pro, Google published a technical report showing the results of its internal safety evaluations. However, the report is light on the details, experts say, making it difficult to determine which risks the model might pose.
The latest viral ChatGPT trend is doing ‘reverse location search’ from photos
There’s a somewhat concerning new trend going viral: People are using ChatGPT to figure out the location shown in pictures.
This week, OpenAI released its newest AI models, o3 and o4-mini, both of which can uniquely “reason” through uploaded images. In practice, the models can cr
xAI adds a ‘memory’ feature to Grok
Elon Musk’s AI company, xAI, is slowly bringing its Grok chatbot to parity with top rivals like ChatGPT and Google’s Gemini.
On Wednesday night, xAI announced a “memory” feature for Grok that enables the bot to remember details from past conversations with a user. Now if yo
OpenAI’s latest AI models have a new safeguard to prevent biorisks
OpenAI says that it deployed a new system to monitor its latest AI reasoning models, o3 and o4-mini, for prompts related to biological and chemical threats. The system aims to prevent the models from offering advice that could instruct someone on carrying out potentially harmful attacks, according
OpenAI partner says it had relatively little time to test the company’s o3 AI model
An organization OpenAI frequently partners with to probe the capabilities of its AI models and evaluate them for safety, Metr, suggests that it wasn’t given much time to test one of the company’s highly capable new releases, o3.
In a blog post published Wednesday, Metr writes that one
OpenAI debuts Codex CLI, an open source coding tool for terminals
In a bid to inject AI into more of the programming process, OpenAI is launching Codex CLI, a coding “agent” designed to run locally from terminal software.
Announced on Wednesday alongside OpenAI’s newest AI models, o3 and o4-mini, Codex CLI links OpenAI’s models with local
OpenAI launches a pair of AI reasoning models, o3 and o4-mini
OpenAI announced on Wednesday the launch of o3 and o4-mini, new AI reasoning models designed to pause and work through questions before responding.
The company calls o3 its most advanced reasoning model ever, outperforming the company’s previous models on tests measuring math, coding, reason
Microsoft researchers say they’ve developed a hyper-efficient AI model that can run on CPUs
Microsoft researchers claim they’ve developed the largest-scale 1-bit AI model, also known as a “bitnet,” to date. Called BitNet b1.58 2B4T, it’s openly available under an MIT license and can run on CPUs, including Apple’s M2.
Bitnets are essentially compressed models
Capsule captures $12M to build the next version of its AI video editor for brands
Capsule is upgrading its AI-powered video editing assistant for marketing, sales, and media teams following the close of a $12 million round of Series A funding, the company announced on Wednesday.
The upgraded editor will include new features like AI suggestions and support for real-time collabor
A dev built a test to see how AI chatbots respond to controversial topics
A pseudonymous developer has created what they’re calling a “free speech eval,” SpeechMap, for the AI models powering chatbots like OpenAI’s ChatGPT and X’s Grok. The goal is to compare how different models treat sensitive and controversial subjects, the developer told