Artificial Intelligence

OpenAI, HealthBench, Claude, and HIPAA Compliance: What Healthcare IT Needs to Know

There’s been a lot of movement in healthcare among companies behind major large language models. Here’s a look at how the latest from OpenAI, Anthropic and Google may impact artificial intelligence procurement decision-making.

Brian Eastwood

Brian Eastwood is a freelance writer with more than 15 years of experience covering healthcare IT, healthcare delivery, enterprise IT, consumer technology, IT leadership and higher education.

There’s been no shortage of movement in healthcare for the tech titans behind large language models.

Since May 2025, the industry has seen OpenAI introduce HealthBench and OpenAI for Healthcare, Google launch Gemini 3, and Anthropic unveil Claude for Healthcare. (In addition, OpenAI announced ChatGPT Health, a consumer-facing LLM for aggregating and querying personal health records that’s siloed from the general-purpose ChatGPT product.)

These announcements came with both fanfare and questions, namely about the efficacy of artificial intelligence and compliance with HIPAA provisions. Let’s look at how each release works — and how it may impact purchasing decisions for hospitals and health systems looking to invest in AI to alleviate administrative burdens or improve clinical decision-making.

“General-purpose models are improving quickly and already play an important role in health, but this is a highly regulated environment with high stakes,” says Merage Ghane, director of responsible AI at the Coalition for Health AI. “Often, successful and accurate task completion in existing workflows matter more than model capability.”

Click the banner below to read the new CDW Artificial Intelligence Research Report.

HealthBench: OpenAI’s Medical AI Benchmark Scores Explained — and What They Mean for Clinical AI

OpenAI describes HealthBench as “a new benchmark designed to better measure capabilities of AI systems for health.” It issues scores based on a set of more than 48,000 criteria written by physicians relevant to the conversation. These conversations may fall into 1 of 7 categories HealthBench has defined, from emergency referrals and health data tasks to asking for context or identifying uncertainty. In addition, each criterion is further graded on factors such as accuracy, clarity and completeness, which includes next-best action recommendations.

In a research paper accompanying the HealthBench release, OpenAI reports “steady initial progress … and more rapid recent improvements” in model performance and safety.

Independent research has been more mixed. One paper says HealthBench “is reliable and aligns well with physician ratings” but notes that it lacks “real-time clinical interaction assessments or measurement of downstream clinical outcomes.” A second paper describes HealthBench as a “significant advancement in medical AI benchmarking” but notes an underrepresentation of rare diseases and an inability to assess longitudinal workflows, “limiting insights into AI’s impact across the complete care continuum.”

Ghane says it’s important to remember that benchmarks such as HealthBench aren’t direct substitutes for real-world evidence. “Scores reflect performance in simulated environments and should be interpreted alongside real-world, local testing, workflow integration and safety,” she says. “Health systems should not rely entirely on benchmarks for deployment decisions; they should be one of many metrics used to inform AI procurement.”

Enterprise Deployment Considerations: Claude, Gemini and OpenAI

Meanwhile, in recent months, each of the major LLM players has released a set of AI-powered products for hospitals and health systems. Each offering is a bit different, and it’s important for organizations to understand this nuance as they evaluate enterprise-grade AI tools. “What matters most is how a solution performs on your unique patients, context of use, data and workflows,” Ghane says.

Claude for Healthcare. Claude can pull from “industry-standard systems and databases” as well as the National Provider Identifier Registry, the ICD-10 code base and coverage determination databases. Organizations can deploy AI agents for prior authorization and Fast Healthcare Interoperability Resources data exchange, which present options to automate a range of administrative processes.

Gemini 3.0. Aashima Gupta, global director of healthcare for Google Cloud, suggests in a LinkedIn post that Gemini’s differentiator is multimodality, or the ability to bring together “text, voice, images, waveforms, scans, genomics data, clinical guidelines, and operational data.” This can be used to support next-best action recommendations. Gemini 3.0 also includes AI agents for automating workflows across business applications.

Click the banner below to sign up for HealthTech’s weekly newsletter.

OpenAI for Healthcare. This offering includes ChatGPT for Healthcare, which current health system partners are using to “synthesize medical evidence alongside institutional guidance” through integrations with systems such as Microsoft SharePoint. There are also templates for automating tasks such as writing discharge summaries and supporting prior authorization requests.

All three toolsets aim to make it easier to find and interpret information within healthcare’s massive data sets. OpenAI launched with many enterprise-level partners, while Gemini is well positioned “to solve complex decision-making problems that require multiple layers of analysis,” according to Forbes. At the same time, according to Darwin Research Group, Anthropic’s constitution and status as a public benefit corporation may appeal to healthcare organizations, as it reflects a commitment to ethical use of AI.

The HIPAA-Eligible AI Landscape: Who’s Actually Compliant

Ghane notes that no technology tool can promise HIPAA compliance, but she says it can be demonstrated through “clear privacy protections, strong security controls, transparency and documented governance throughout development, deployment and monitoring.”

Here’s how the leading tools stack up.

OpenAI will sign HIPAA business associate agreements with customers that subscribe to use ChatGPT for Healthcare. HIPAA Journal notes that other ChatGPT-based services can be deployed without the need for a BAA if they use deidentified protected health information.
Google Workspace and Cloud Identity customers subject to HIPAA and using PHI must enter a BAA, according to the company. Whether Gemini itself meets HIPAA requirements depends on how it’s implemented and used, Google Cloud partner Cloudasta says, noting the technology does meet multiple international security and privacy standards.
Anthropic, too, will provide BAAs to commercial customers. Critically, notes email security vendor Paubox, the company also operates under BAAs with technology partners Amazon Web Services (for Amazon Bedrock), Google (for Google Cloud) and Microsoft (for Azure) — the only major AI model to do so, the company said.

ChatGPT Health: A Driver of Patient Engagement Investment?

ChatGPT Health is a consumer-facing product that’s not yet generally available. (As of February 2026, anyone interested can sign up for a waitlist.) Though the product won’t directly impact AI procurement strategies, health systems still need to take note.

ChatGPT Health is designed to allow consumers to connect their medical records and wellness app data so they can prepare for appointments, interpret test results and visit summaries, or develop diet and fitness routines. It’s meant to support and not replace medical care.

At the time of ChatGPT Health’s release, OpenAI told Axios more than 40 million people globally used the generally available ChatGPT daily to answer health and wellness questions, and 230 million used it weekly.

Those numbers are hard to ignore, and they could pose a leakage problem. Ty Aderhold, director of the Advisory Board, says in a briefing on the company’s website: We could continue to see consumers turn to third-party organizations like OpenAI for more of their needs. Over time, this could lead to a potential erosion of the relationship between consumers and provider organizations and plans.”

Forward-thinking health systems should view ChatGPT Health’s forthcoming release as an opportunity to invest in an increasingly AI-driven patient experience — one strengthened by the name recognition of a health system brand that patients trust.

PonyWang/Getty Images

Become an Insider

Sign up today to receive premium content!

HealthTech Magazine

OpenAI, HealthBench, Claude, and HIPAA Compliance: What Healthcare IT Needs to Know

HealthBench: OpenAI’s Medical AI Benchmark Scores Explained — and What They Mean for Clinical AI

Enterprise Deployment Considerations: Claude, Gemini and OpenAI

The HIPAA-Eligible AI Landscape: Who’s Actually Compliant

ChatGPT Health: A Driver of Patient Engagement Investment?

Healthcare Deals 2025: Notable Mergers & Acquisitions Activity

AI-Powered Healthcare Wearables: The Next Generation of Remote Patient Monitoring

Behind the Scenes of Houston Methodist’s Smart Hospital Campus

New Research from CDW on Workplace Friction

HealthBench: OpenAI’s Medical AI Benchmark Scores Explained — and What They Mean for Clinical AI

Enterprise Deployment Considerations: Claude, Gemini and OpenAI

The HIPAA-Eligible AI Landscape: Who’s Actually Compliant

ChatGPT Health: A Driver of Patient Engagement Investment?

More On

Related Articles