Page 19 - AIH-1-4
P. 19
Artificial Intelligence in Health AI scribe in clinical documentation
like a low-hanging fruit, that would be easy to sell to the 2.4. Context window limitations
burnt-out clinician with the promise of alleviating some LLMs have a context window, which means they will
of the burden of clinical documentation. This has led take into consideration a certain amount of input textual
5
to an explosion of startups offering such applications, data to craft a response for the user. If the length of the
with additional features such as recommendation of the input data exceeds the context window, some information
International Classification of Disease codes, patient will likely be missed. In the context of AI scribes, if the
instructions in simple language, and even some degree of encounter goes on for too long and there is a large amount
clinical decision-making.
of text in the input transcript, it is possible that the LLM
2. The challenges misses information because of the narrow context window,
leading to incomplete documentation.
The AI scribe offers a potential solution to a problem
that seemed impossible. However, this rapid adoption 2.5. Data security/Health Insurance Portability and
has not been devoid of challenges. Current popular Accountability Act (HIPAA) compliance
large language models (LLM) like those that power Many of the AI scribe applications utilize third-party
ChatGPT and Google’s Gemini are very good at general LLMs through APIs, requiring data to be passed on to
tasks, but their performance is suboptimal in domain- external servers. This poses a data security risk, as the
specific tasks. Thus, despite early excitement and some organization loses control of the security and privacy of
good feedback, it was soon realized that the level of the data once it leaves their systems. In addition, if the
diligence and accuracy of these models needed in organization that owns the AI scribe application does
clinical documentation may be out of reach. Some of the not implement HIPAA-compliant technologies for data
8
challenges are as follows:
transmission and storage, the confidentiality of patient
2.1. Hallucinations and unfaithfulness data may be compromised.
LLMs are known to hallucinate, meaning that they can 3. The way forward
“make up” information that may not be accurate. This is
6,7
because they have been trained on a large amount of textual Even though AI scribes come with a unique set of challenges,
data, the model tries to “fill in the gaps” with generated text their place in healthcare is undeniable. Therefore, a lot of
based on its dataset. This can be very helpful in some tasks work is being done to improve their performance. Some of
where accuracy is not a big concern; however, in healthcare, the potential solutions are as follows:
this poses a significant risk of introducing inaccuracies in 3.1. Fine-tuning
clinical documentation which may compromise patient
care. The LLMs, like OpenAI’s Generative Pre-trained
Transformer, can be fine-tuned for a specific task with
2.2. Omission of information the right data. In this process, the model is provided with
The transcript of a clinical encounter may contain sample input data and the expected response. The model
information that is not clinically relevant, such as small talk then learns from this data and modifies future output to
between the patient and the clinician. The LLM may decide match the desired output. Fine-tuning is relatively easy to
to include that information, or conversely, decide not to implement and may improve performance.
include information that is clinically relevant, generating a 3.2. Selective information extraction
note deficient in clinical information.
One potential solution to improve accuracy could be
2.3. Note formatting inconsistencies labeling information in the transcript based on their
Even though there are generally accepted formats for relevance, then omitting information that is labeled as
clinical notes, each clinician has their own unique not clinically relevant and including relevant information.
style of note-taking. Some may prefer to document This could provide the model with data that is relevant and
problem-wise, while others may like it system-wise; concise, reducing the amount of data provided as input,
some like their notes in a descriptive format while fitting it in the context as well as reducing computation
others in bullets. The LLMs can be prompted to draft a time.
note in a certain format; however, their response is not 3.3. Domain-specific models
always consistent, potentially leading to frustration for
clinicians who expect their notes to be laid out in their Models trained on curated medical data and designed
preferred format. specifically for the task of clinical note generation from
Volume 1 Issue 4 (2024) 13 doi: 10.36922/aih.3103

