Close

New Workspace Modernization Research from CDW

See how IT leaders are tackling workspace modernization opportunities and challenges.

Mar 03 2026
Artificial Intelligence

OpenAI, HealthBench, Claude, and HIPAA Compliance: What Healthcare IT Needs to Know

There’s been a lot of movement in healthcare among companies behind major large language models. Here’s a look at how the latest from OpenAI, Anthropic and Google may impact artificial intelligence procurement decision-making.

There’s been no shortage of movement in healthcare for the tech titans behind large language models.

Since May 2025, the industry has seen OpenAI introduce HealthBench and OpenAI for Healthcare, Google launch Gemini 3, and Anthropic unveil Claude for Healthcare. (In addition, OpenAI announced ChatGPT Health, a consumer-facing LLM for aggregating and querying personal health records that’s siloed from the general-purpose ChatGPT product.)

These announcements came with both fanfare and questions, namely about the efficacy of artificial intelligence and compliance with HIPAA provisions. Let’s look at how each release works — and how it may impact purchasing decisions for hospitals and health systems looking to invest in AI to alleviate administrative burdens or improve clinical decision-making.

“General-purpose models are improving quickly and already play an important role in health, but this is a highly regulated environment with high stakes,” says Merage Ghane, director of responsible AI at the Coalition for Health AI. “Often, successful and accurate task completion in existing workflows matter more than model capability.”

Click the banner below to read the new CDW Artificial Intelligence Research Report.

 

HealthBench: OpenAI’s Medical AI Benchmark Scores Explained — and What They Mean for Clinical AI

OpenAI describes HealthBench as “a new benchmark designed to better measure capabilities of AI systems for health.” It issues scores based on a set of more than 48,000 criteria written by physicians relevant to the conversation. These conversations may fall into 1 of 7 categories HealthBench has defined, from emergency referrals and health data tasks to asking for context or identifying uncertainty. In addition, each criterion is further graded on factors such as accuracy, clarity and completeness, which includes next-best action recommendations.

In a research paper accompanying the HealthBench release, OpenAI reports “steady initial progress … and more rapid recent improvements” in model performance and safety.

Independent research has been more mixed. One paper says HealthBench “is reliable and aligns well with physician ratings” but notes that it lacks “real-time clinical interaction assessments or measurement of downstream clinical outcomes.” A second paper describes HealthBench as a “significant advancement in medical AI benchmarking” but notes an underrepresentation of rare diseases and an inability to assess longitudinal workflows, “limiting insights into AI’s impact across the complete care continuum.”

Ghane says it’s important to remember that benchmarks such as HealthBench aren’t direct substitutes for real-world evidence. “Scores reflect performance in simulated environments and should be interpreted alongside real-world, local testing, workflow integration and safety,” she says. “Health systems should not rely entirely on benchmarks for deployment decisions; they should be one of many metrics used to inform AI procurement.”

READ MORE: Take advantage of data and AI for better healthcare outcomes.

Enterprise Deployment Considerations: Claude, Gemini and OpenAI

Meanwhile, in recent months, each of the major LLM players has released a set of AI-powered products for hospitals and health systems. Each offering is a bit different, and it’s important for organizations to understand this nuance as they evaluate enterprise-grade AI tools. “What matters most is how a solution performs on your unique patients, context of use, data and workflows,” Ghane says.

Claude for Healthcare. Claude can pull from “industry-standard systems and databases” as well as the National Provider Identifier Registry, the ICD-10 code base and coverage determination databases. Organizations can deploy AI agents for prior authorization and Fast Healthcare Interoperability Resources data exchange, which present options to automate a range of administrative processes.

Gemini 3.0. Aashima Gupta, global director of healthcare for Google Cloud, suggests in a LinkedIn post that Gemini’s differentiator is multimodality, or the ability to bring together “text, voice, images, waveforms, scans, genomics data, clinical guidelines, and operational data.” This can be used to support next-best action recommendations. Gemini 3.0 also includes AI agents for automating workflows across business applications.

Click the banner below to sign up for HealthTech’s weekly newsletter.

 

OpenAI for Healthcare. This offering includes ChatGPT for Healthcare, which current health system partners are using to “synthesize medical evidence alongside institutional guidance”  through integrations with systems such as Microsoft SharePoint. There are also templates for automating tasks such as writing discharge summaries and supporting prior authorization requests.

All three toolsets aim to make it easier to find and interpret information within healthcare’s massive data sets. OpenAI launched with many enterprise-level partners, while Gemini is well positioned “to solve complex decision-making problems that require multiple layers of analysis,” according to Forbes. At the same time, according to Darwin Research Group, Anthropic’s constitution and status as a public benefit corporation may appeal to healthcare organizations, as it reflects a commitment to ethical use of AI.

The HIPAA-Eligible AI Landscape: Who’s Actually Compliant

Ghane notes that no technology tool can promise HIPAA compliance, but she says it can be demonstrated through “clear privacy protections, strong security controls, transparency and documented governance throughout development, deployment and monitoring.”

Here’s how the leading tools stack up.

READ MORE: Healthcare leaders clear up AI misunderstandings.

ChatGPT Health: A Driver of Patient Engagement Investment?

ChatGPT Health is a consumer-facing product that’s not yet generally available. (As of February 2026, anyone interested can sign up for a waitlist.) Though the product won’t directly impact AI procurement strategies, health systems still need to take note.

ChatGPT Health is designed to allow consumers to connect their medical records and wellness app data so they can prepare for appointments, interpret test results and visit summaries, or develop diet and fitness routines. It’s meant to support and not replace medical care.

At the time of ChatGPT Health’s release, OpenAI told Axios more than 40 million people globally used the generally available ChatGPT daily to answer health and wellness questions, and 230 million used it weekly.

Those numbers are hard to ignore, and they could pose a leakage problem. Ty Aderhold, director of the Advisory Board, says in a briefing on the company’s website: We could continue to see consumers turn to third-party organizations like OpenAI for more of their needs. Over time, this could lead to a potential erosion of the relationship between consumers and provider organizations and plans.”

Forward-thinking health systems should view ChatGPT Health’s forthcoming release as an opportunity to invest in an increasingly AI-driven patient experience — one strengthened by the name recognition of a health system brand that patients trust.

PonyWang/Getty Images