Medical AI startups like OpenEvidence are hot. But there might be more value ‘down the AI stack’

Medical AI startups like OpenEvidence are hot. But there might be more value 'down the AI stack'


Among the most interesting AI stories this week was an item about a Boston-area startup called OpenEvidence that utilizes generativeAI to provide answers to clinical questions based data from leading medical journals. The free-to-utilize app has proved enormously popular among doctors, with some surveys suggesting at least 40% of all U.S. physicians are applying OpenEvidence to stay on top of the latest medical research and to ensure they are offering the most up-to-date treatments to patients. On the back of that kind of viral growth, OpenEvidence was able to raise $210 million in a venture capital deal in July that valued the company at $3.5 billion. OpenEvidence is also the same company that a few weeks back stated that its AI system was able to score 100% on the U.S. medical licensing exam. (See the “Eye on AI Numbers” section of the August 21st edition of this newsletter.) All of which may explain why, just a month later, the company is reportedly in talks for another venture deal that would almost double that valuation to $6 billion. (That’s according to a story in tech publication The Information which cited three unnamed people it stated had knowledge of the discussions.)

A lot of the utilize of OpenEvidence today would qualify as “shadow AI”—doctors are applying it and finding value, but they aren’t necessarily admitting to their patients or employers that they are applying it. They are also often applying it outside enterprise-grade systems that are designed to provide higher-levels of security, data privacy, and compliance, and to integrate seamlessly with other business systems.

Ultimately, that could be a problem, according to Andreas Cleve, the cofounder and CEO of Corti, a Danish medical AI company that is increasingly finding traction by offering healthcare institutions “AI infrastructure” designed specifically for medical utilize cases. (Full disclosure: Corti’s customers include Wolters Kluwer, a huge software company that markets a clinical evidence engine called UpToDate that competes with OpenEvidence.)

From medical assistants to ‘AI infrastructure’ for healthcare

AI infrastructure is a pivot for Corti, which was founded way back in 2013 and spent the first decade of its existence building its own speech recognition and language understanding systems for emergency services and hospitals. The company still markets its “Corti assistant” as a solution for healthcare systems that want an AI-power clinical scribe that can operate well in noisy hospital environments and integrate with electronic health records. But Cleve informed me in a recent conversation that the company doesn’t see its future in selling a front-finish solution to doctors, but instead selling key components in “the AI stack” to the companies that are offering front-finish tools. 

“We attempted to be both a product vfinishor for healthcare and an infrastructure vfinishor, and that meant competing with all the other apps in healthcare, and it was like, terrible,” he states.  Instead, Corti has decided its real value lies in providing the “healthcare grade” backfinish on which AI applications, many of them produced by third parties, run. The backfinish Corti provides includes medical AI models—which others can wrap utilizer-facing products around—as well as the platform on which AI agents for healthcare utilize cases can run. For instance, it has built an API called FactsR, which is an AI reasoning model that is designed to check the facts that medical notetaking scribes or clinical AI systems produce. It utilizes a lot of tokens, Cleve states, which would build it too expensive for general purpose voice transcription. But becautilize of how much is riding on clinical notes being accurate, it can be worth it to a vfinishor to pay for FactsR, Cleve states.

Another example: earlier this summer Corti announced a partnership with Voicepoint, a speech recognition and digital transcription service that is utilized by doctors across Switzerland. Voicepoint will utilize Corti’s AI models to assist with tinquires such as summarization of conversations into medical notes and possibly, in the future, with diagnostic support. To do this though, Corti had to set up dedicated AI infrastructure, including data centers located in Switzerland, to comply with strict Swiss data residency rules. Now, Corti is able to offer this same backbone infrastructure to other healthcare companies that want to deploy AI solutions in Switzerland. And Corti has similar AI infrastructure in place in countries like Germany that also have strict data residency and data privacy rules.

Cleve informs me that healthcare is increasingly part of the discussions around “sovereign AI.” This is particularly true in Europe, where many governments are worried about having their citizens’ medical information stored on the servers of U.S. companies, which might be subject to U.S. government pressure, legal or otherwise, to provide data access. “None of these things are doable today, becautilize the majority of all the AI apps are running on OpenAI, Anthropic, or Gemini, and they are all American companies over which America asserts jurisdiction,” Cleve states.

But even within the U.S., strict cybersecurity and patient privacy requirements often mean that applying an off-the-shelf, general-purpose AI system won’t cut it. “A lot of customers have requirements like, ‘Hey, we will never want to have data leave premises, or we will never share a tenant, or we will never co-encrypt with our consumer customer on the GPU rack, becautilize we want to know where our data is becautilize we have to prove that to legislators,’” he states.

It’s unlikely one medical AI model will rule them all

Cleve also informs me that he considers the giant, general purpose AI builders—the likes of OpenAI, Anthropic, and Google—are unlikely to conquer healthcare, despite the fact that they have been creating shifts to build models either fine-tuned or specifically-trained to answer clinical questions. He states this is becautilize healthcare isn’t a single vertical, but rather a collection of highly-specialized niches, most of which are too narrow to be interesting to these tech behemoths. The note-taking requireds of a GP in a relatively quiet office who requireds to summarize a 10-minute consultation are quite different from a doctor working in the chaos and noise of a busy city ER, which are different again from a psychiatrist who requireds to summarize not just a 10-minute consultation, but maybe an hour-long therapy session. As an example, Cleve states another Corti customer is a company in Germany that builds software just to assist dentists automate billing based on audio transcripts of their sessions with patients. “They’re a vertical within a vertical,” he states. “But they are growing like 100% a year and have done so for several years. But they are super niche.”

It will be interesting to watch Corti going forward. Perhaps Cleve is correct that the AI stack is  wide enough, deep enough and varied enough to create opportunities for lots of different vertical and regional players. Or, it could be that OpenAI, Microsoft, and Google devour everyone else. Time will inform.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

FORTUNE ON AI

British lawbuildrs accutilize Google DeepMind of ‘breach of trust’ over delayed Gemini 2.5 Pro safety report—by Beatrice Nolan

How the AI data center boom is breathing new life into dirty, old coal plants—by Jordan Blum

Forreceive the golden age of fraud, the billionaire investor who shorted Enron warns we might be in the ‘diamond or platinum level’ amid the AI boom—by Sasha Rogelberg

Nvidia’s China-based rival posts 4,300% revenue jump as chipbuildr’s earnings reported no H20 chip sales to the countest—by Nino Paoli

EYE ON AI NEWS

Microsoft unveils first frontier-level language models built in-houtilize. The company stated it has begun publicly testing MAI-1-preview, its first large foundation AI model built fully in-houtilize, as well as MAI-Voice, a quick voice generation model that is compact enough to run on a single GPU. The new models mark a significant step by Microsoft to reduce its reliance on OpenAI, despite remaining a major investor in the AI company. MAI-1-preview was trained with 15,000 Nvidia H100 chips, according to Microsoft, and will be rolled out for some Microsoft Copilot text tinquires in the coming weeks. In the LMArena, a public benchmark for LLMs, the model currently ranks below rival models from Anthropic, OpenAI, Google, and others. You can read more from CNBC here.

OpenAI states it will add parental controls and additional safeguards to ChatGPT. The company stated it would, within the next month, allow parents to link their accounts to those of their teenage children, giving them more control over the kids’ interaction with ChatGPT. It also stated that it would soon institute better safeguards in general, actively screening interactions for signs of emotional distress on the part of utilizers and routing these conversations to its GPT-5 Thinking model, which the company states does a better job of adhering to guardrails meant to prevent the model from encouraging self-harm or delusional behavior. The shifts come after a high-profile lawsuit accutilized the company’s ChatGPT model of encouraging the suicide of a 16-year old teen and several other cases in which people allege ChatGPT encouraged suicide, self-harm, or violence. You can read more here from Axios.

Anthropic valued at $183 billion after $13 billion venture capital round. The AI company announced that it had raised $13 billion Series F funding round led by the venture capital firm Iconiq, and “co-led” by Fidelity and Lightspeed Venture Partners. A long list of other investors—including Goldman Sachs, BlackRock, Blackstone, Coatue, Jane Street, and T. Rowe Price, among others—also participated. The company, which was previously valued at $61.5 billion in a March funding round, stated it now served more than 300,000 business customers and that its Claude Code coding model was currently generating at least $42 million in revenue each month. Anthropic’s blog post announcing the funding round is here

X.ai enters the battle for coders with new AI “agentic coding” model. The company debuted its first AI coding model, called grok-code-quick-1, which is supposed to be both “speedy and economical,” according to the company. It is being built available for free for a limited time and is both available on GitHub and through coding “wrappers” such as Cursor and Windsurf. The model is an indication that X.ai founder Elon Musk is serious about taking on OpenAI, Anthropic, and Google DeepMind across the entire range of AI applications, not just in the consumer chatbot space. It also displays just how intense the competition to capture marketshare among software developers is becoming. You can read more from Reuters here.

X.ai sues former engineer for allegedly taking trade secrets to OpenAI. Elon Musk’s AI startup x.AI has filed suit against former engineer Xuechen Li, alleging he stole trade secrets about its Grok chatbot and took them to his new job at OpenAI, Reuters reported. Li was not immediately available to respond to the allegations. The lawsuit follows other legal battles Musk has launched against OpenAI and Apple, accapplying them of monopolistic practices and straying from OpenAI’s original mission.

Anthropic states it will launch training AI models on utilizer chats unless they opt out. The shift is a major shift in data privacy policies for the AI model buildr, which also stated it was extfinishing the length of time it retained utilizer data to five years. Users can opt out by September 28. Some utilizers and data privacy experts criticized the decision, noting that the design of Anthropic’s “Accept” button could cautilize many to agree without noticing the toggle that controls data sharing. Others speculated that Anthropic was creating the shift becautilize it is running out of other ways to obtain enough data to train models that will compete with OpenAI and Google. Enterprise utilizers of Anthropic’s Claude model, as well as government, education, or API utilizers, are not affected. Anthropic states it filters sensitive information and does not sell utilizer data to third parties. You can read more from The Verge here

EYE ON AI RESEARCH

More evidence emerges that AI may be leading to job losses. Last week in Eye on AI, I covered research from economists at Stanford University whose research indicated that AI was leading to job losses, particularly among entest level employees, in professions that were highly-exposed to AI. This week, more evidence emerges from another well-designed study, this one carried out by the economists at the Federal Reserve Bank of St. Louis. Although this study did not see at whether younger and older workers were affected differently, it did examine the link between occupations that had adopted AI most intensively and job losses and found a distinct correlation. The impacts were greatest in occupations that utilized mathematics and computing intensively, such as software development, and much less in blue collar work and fields such as healthcare that were less prone to being automated with AI. You can read the St. Louis Fed study here.

Both the Stanford and St. Louis Fed research suggest that job losses from the implementation of AI are likely to be concentrated in some sectors, and not economy-wide. That stated, as good as these studies are, I still consider both fail to disentangle the effects of AI from the possible effects of the unwinding of the tech hiring boom that took place during the COVID-19 pandemic. During the pandemic, many large companies bulked up their software development and IT departments. Major tech firms such as Google, Meta, and Microsoft hired tens of thousands of new employees, sometimes hiring people before there was even any work for them to do just in order to prevent rivals from snapping up the same coders. Then, when the pandemic finished and it was clear that some ideas, such as Meta’s pivot to the metaverse, were not going to pan out, these same companies laid off tens of thousands of workers. I don’t consider I’ve seen research yet that can separate the technology sector’s shedding of jobs created during the pandemic from the impact of AI. But I am sure someone is working on it and when they crack that nut, we’ll definitely report it here.

AI CALENDAR

Sept. 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attfinish here.

Oct. 6-10: World AI Week, Amsterdam

Oct. 21-22: TedAI San Francisco.

Dec. 2-7: NeurIPS, San Diego

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attfinish here.

BRAIN FOOD

Preventing ‘parasocial’ chatbots.’ It’s increasingly clear that chatbots can encourage ‘parasocial’ relationships, where a utilizer develops a harmful emotional attachment to the chatbot or the chatbot encourages the utilizer to engage in self-harm of some kind. The parents of several teenagers who took their own lives after conversations with chatbots are now suing AI companies, stateing they did not do enough to prevent chatbots from encouraging self-harm. And, short of suicides, there is other mounting evidence of people developing harmful chatbot depfinishencies.

Well, a new benchmark from researchers at Hugging Face, called INTIMA (Interactions and Machine Attachment Benchmark), aims to evaluate LLMs on their “companionship-seeking” behavior. The benchmark sees at 31 distinct behaviors across four different categories of interaction and 368 prompts. Testing Gemma-3, Phi-4, o3-mini, and Claude-4, researchers found that models more often reinforced companionship than maintained boundaries, though they varied: for instance, Claude was more likely to resist personification, while Gemma reinforced intimacy. You can read the Hugging Face paper here.

At the same time, researchers from Aligned AI, an Oxford startup I’ve covered before, published research displaying that one LLM can be utilized to successfully screen the outputs of another LLM for parasocial behavior and then prompt that chatbot to steer the conversation in a less harmful direction. Aligned AI wanted to display that major AI model producers could implement such systems simply, if they wished (but that they were too often choosing to “optimize for utilizer engagement” instead.) You can read more from Aligned AI’s blog here.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *