Our current stance on AI at Cliniko

AI is an amazing technological advancement, and we’ve been testing and experimenting with using it in Cliniko for quite some time. But while it’s already very useful in many industries, we need to tread carefully in healthcare and only use it when we can do so accurately and safely.

Joel Friedlaender·27 September, 2024

There’s a lot of hype around AI’s capabilities right now, and rightfully so. It’s almost magical in the right scenario. You’ve probably seen OpenAI’s ChatGPT do some amazing things, and maybe you’ve already found ways to use it for yourself too. But there is currently less attention on and transparency around the efficacy of the content produced by new tools. And that can be a problem.

What is AI

For the purpose of this post, I’m talking specifically about Generative AI, and by that I mean artificial intelligence that produces content. Even more specifically, I’m referring to the Large Language Models (LLMs), which produce text. The other popular form of generative AI is for image generation, with tools like Midjourney and DALL-E, however they are not commonly used in the healthcare space yet.

OpenAI is probably the biggest and most recognisable AI company in the world, and is best known for its product, ChatGPT. ChatGPT is an LLM that can operate as a chatbot and is able to generate content in response to your questions and prompts. It can also be incorporated by businesses into their products.

So, what is an LLM? LLMs are very sophisticated systems that have a simple premise: they generate text in response to a prompt (your input/question to them), based on their training data. When you give it a prompt, it will produce a text-based response for you word by word, choosing the most statistically probable next word to use each time, based on its training data.

The results can feel quite magical because of the amount of training data that it’s had. Nobody controls the results that it gives—just the data that it’s trained on, and the way that it formulates an answer. Even developers of AI can be surprised by the way it behaves, and it can also be very difficult to have it behave how you want.

While the responses generated by ChatGPT are very human-like, the term “artificial intelligence” is still quite generous. An LLM doesn’t have the ability to reason, it doesn’t understand what you’re asking, or even what it’s telling you.

Concerns with using AI in healthcare

The answers you get from an LLM can be both surprisingly good and surprisingly bad, in both accuracy and comprehensiveness.

Interestingly, Google recently released AI Overviews into Search in the USA (which allows Google’s AI tool Gemini to answer your queries) and claims there’s a high rate of user satisfaction with the responses. However, research has found that “the average amount of search results where AI Overviews appear have dropped to just 11 percent of queries, down from around 27 percent when the tool launched publicly in mid-May.” This was before the AI Overview made headlines for recommending that people add glue to their pizza sauce to help the cheese stick, or eat rocks to get their daily mineral requirements.

Hallucinations

Probably the biggest worry with LLMs in healthcare are hallucinations, which is the term for when it makes stuff up that’s not accurate. This is a problem that plagues LLMs’ effectiveness, and we have no solution in sight for it.

There was a notable example where a Canadian lawyer used ChatGPT to cite cases for them in their legal research, and they used this in their court case. Unfortunately ChatGPT had completely fabricated those cases, and even invented web addresses to point to them.

As a test I asked ChatGPT to make a wikipedia page for myself. While it did get some things right, as is often the case, it also used a fair bit of creative licence (made stuff up) and included this:

I have never played guitar, done photography, and to say I explore outdoors would be a stretch.

I also decided to ask ChatGPT-4o to write me a referral letter for a patient. I did not provide a lot of information, hoping for a very simple letter to be written, this is my prompt:

The resulting letter contains a lot of hallucinated information:

The hallucinations include:

made up descriptions of the manual therapy
incorrect patient outcomes
false results of a physical examination
suggested results to look for in imaging that did not come from the prompt

Now whilst it is impressive that this letter was generated from such a simple prompt, the accuracy is extremely poor.

Not only are these hallucinations concerning and unavoidable, the LLMs do not know when they are doing this. One approach engineers have been exploring is to have a confidence score that comes with the outputs, to indicate when the LLM is less sure of the facts. However, this has not proved to be a viable solution so far, as the LLM cannot self-evaluate for this.

Omissions

Whilst hallucinations are a significant risk with LLMs in healthcare—core omissions are possibly an even bigger concern, particularly when AI is used for recording transcripts or summarising information. If you ask it to summarise a long passage of text, it typically struggles to give it all equal weight, and may leave out information from the middle, while strongly favouring the early and later parts. You can even see this when providing AI with a long prompt; often the first and last parts of your prompt will be more respected.

If you’re using AI for automatically recording a session and summarising the notes, or to generate a letter, it’s very strongly recommended that you thoroughly review the output you receive. In doing so, hopefully you’ll catch any hallucinations/misinformation, but it can be much harder to remember and find the omissions. It is generally easier to review the content you’re provided with than to consider what isn’t there. What if important patient medical history is omitted, what kind of impact can that have?

Earlier this year, a joint study by Princeton University, UMass Amherst, Adobe, and the Allen Institute for AI put several different LLMs to the test by having them summarise recently-published fiction books, and then analysed the faithfulness of the summaries they generated to the original texts. The results were disappointing to say the least, particularly for the ChatGPT-4 Turbo model (one of the most commonly used LLMs): close to 70% of the book summaries it generated had factual errors and 80% had omissions. Only 23% of the summaries by GPT-4 turbo were deemed “well done”. In fact, the top score on this criteria achieved by any of the LLMs tested was just 50%.

In many cases this kind of accuracy can be good enough, considering the time-saving it brings, but we need to be very careful in the healthcare space. Anything less than 100% accuracy needs to be carefully considered, and appropriate steps put in place to mitigate the risks.

Legal compliance with AI in healthcare

The issue of compliance depends on your practice location and the tool you want to use. As an example, while OpenAI does not speak directly about the Australian Privacy Principles (APPs), they do acknowledge GDPR and say that in order to be compliant, you cannot send any sensitive information to their API. I've also spoken with the OpenAI privacy office and they will not give a direct answer on the Australian Privacy Principles. They do have options to sign a Business Associate Agreement with them for HIPAA compliance, but this is relevant in the United States only.

Whilst there's no certainty right now, it is extremely questionable to send sensitive information to OpenAI. And even if it is compliant, considering there is a much more private option available, I'd consider it very irresponsible to entrust OpenAI with sensitive healthcare information. For example, it is possible to host ChatGPT in Microsoft Azure and create a contained version that can be used in a compliant way (Microsoft Azure is clear about being compliant with the Australian Privacy Principles, GDPR, HIPAA, etc.). So, you can use OpenAI’s tooling and be compliant but sending sensitive data directly to OpenAI API or using it with the free ChatGPT application is not recommended.

ChatGPT is of course not the only LLM available, and in many scenarios, not the most effective either. Another way for systems to incorporate AI is through Amazon AWS Bedrock platform, which gives access to other popular LLMs such as Meta’s Llama or Anthropic’s Claude in a way that is compliant.

When making the choice for your own business, you’ll likely be getting access to LLMs through the software tools that you use, and you need to do some basic due diligence to make sure that your provider handles sensitive information in a compliant way. They should be able to confirm for you categorically that their use of AI with healthcare information complies in your region.

Our experience with AI in Cliniko so far

AI can seem amazing at first glance, but when you dig deeper, the magic disappears fast. We’ve experienced it ourselves at Cliniko with features we’ve explored building. We know there’ll be appropriate places to apply AI to make your life easier (there’s even one going live shortly), but we’ll only do so when we are sure we’re doing right by you to release it, not just as a marketing gimmick.

It might be that we need to work harder to get it reliable, or maybe we still need to wait until the tools get better. But right now, despite highly proficient developers in our team, with AI experience, we’ve not achieved the reliability you’d expect from us, and we take the trust you put in us seriously. We are relied upon for our technical and business knowledge, we can’t be swept up in the hype and let you, your business, or your patients down.

Where we’re headed at Cliniko

Every day we are exploring ways that we can embed AI in Cliniko to make your life easier. We’re not giving up on it; it’s a space that is changing rapidly, and we’re sure AI can be an integral part of Cliniko in the future.

Specifically, we’re very aware that there's a desire to use AI to save time through summation to create notes and reports. We certainly want to bring this to you within Cliniko, but we won’t release something that could produce inaccurate results. Even though practitioners can and should check that the content of anything produced by AI is accurate, it still seems quite likely that many errors will manage to find their way through (at least as of right now). Nonetheless, you do have options to use AI within Cliniko, more on that later.

Our guiding principle for AI in Cliniko is to find places where its flaws are acceptable—or even better, where its strengths are utilised, and its drawbacks don’t matter. There are going to be ways to apply AI that haven’t been seen yet; it’s still very early days. But we also know that a lot of great things can be done without AI, often with better results. While it can be tempting to only look down the AI path, we have a lot of development on the way offering good, old fashioned, hand-made improvements to Cliniko that we’re sure you’ll benefit greatly from. And they’ll work every time with 100% accuracy.

Where is AI headed?

AI is here to stay, and it will only get better from here. It’s already been adopted in the healthcare space by many, and that number will only rise. It’s important to understand its limitations, and have appropriate processes in place to compensate for them. This means taking advantage of AI with our eyes open, not just trusting the marketing pitch from companies that stand to make a lot of money from it.

Are there any AI tools that I can use in my clinic right now?

There are plenty of systems already available if you want to use AI in your clinic. Many have already connected with Cliniko and will work seamlessly for you. We are big believers in choice, and we don’t want anyone feeling restricted by our approach to AI.

If you’re looking for AI note taking and summaries, and even report writing, there’s Patient Notes, Heidi, and CliniScripts to name a few that are connected with Cliniko and are rapidly gaining popularity. An advantage of dedicated AI note taking systems is the teams behind them spend all their time working on this one function. We know that LLM efficacy can be improved with certain development techniques, and companies like these that have this as their sole focus should be doing a better job of it. Of course, you still need to do your due diligence in choosing a third-party application and know that they cannot be 100% accurate or without omissions in the current state of AI. So always carefully check any AI generated note or report.

AI is clearly here to stay, and we are spending significant time working with, experimenting and evaluating it. However, we won't rush to release AI-powered features to meet the hype when they’re not ready or suitable. We are trusted by our customers to look after them and their business with technology, and we won’t jeopardise that to make a quick buck. You can be sure that any AI features we do release will have the best interests of you and your patients in mind.