The use of patient messaging has steadily increased every year since, including a 157% surge during the COVID-19 pandemic that remained elevated as the pandemic waned.1 In the 2023 report on burnout from the American Urological Association, responsibilities within the EHR, such as managing patient portal messages, were a primary or secondary source of dissatisfaction for 43.7% of urologists.2 Burnout negatively affects patient care quality and service, potentially leading to reduced patient satisfaction and poorer health outcomes. Furthermore, responding to these messages is often a cost center that falls outside of reimbursement models, with fewer than 1% of patient messages billed in a recent Medicare survey.3
Our recent study, “Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians,” explored using large language models (LLMs) to help with patient inquiries. Responding to messages is repetitive and time-consuming. Chatbots, a common LLM implementation, excel at generating clear, comprehensive responses to routine questions. Chatbots in our study gave broader answers than the urologists, offering additional information that patients found valuable. An example of one exchange is provided:

Chatbot responses had higher word counts, likely contributing to higher comprehensiveness scores compared to physicians in our study. However, it's impractical for clinicians to craft lengthy messages; typing long responses is time-consuming. While patients preferred detailed messages, they often attributed them to human authorship and less preferable ones to chatbots, suggesting a bias against AI-generated communications. This underscores the need for physician involvement to maintain trust and address patient concerns effectively.
While LLMs are adept at generating responses, the output is the result of pattern matching based on a body of training data, often from undefined or undisclosed sources. Reverse engineering the decision-making process of LLMs is an active field of research, but it is generally accepted that LLMs lack the capacity for genuine logical reasoning that physicians employ. A recent paper out of Apple’s Machine Learning Research highlighted the struggle of using LLMs to complete formal reasoning tasks, particularly in mathematics and complex problem-solving.4 They found altering input phrasing or adding a single piece of irrelevant information could create significant performance drops, up to a 65% decrease in Apple’s study. This limitation underscores that physicians bring indispensable reasoning skills to patient care—skills that AI tools currently cannot replicate.
As we gain insight into these limitations, we believe physician oversight is essential for responsible AI integration in clinical practice. Patients are already turning to AI tools for medical information, and it's crucial for healthcare providers to engage with emerging technologies.
Integrating chatbots into practice can offer patients detailed messages without increasing clinicians' workload, enhancing patient satisfaction, and reducing staff burnout. Implementing a "human-in-the-loop" approach ensures critical issues are appropriately identified and addressed; a safeguard that should not be solely entrusted to AI. A balanced approach combines the use of AI with physician involvement in patient communication, respecting patients' preference for comprehensive responses while ensuring the patient-physician relationship remains integral to providing care.
Written by: Eric J. Robinson,1 Chunyuan Qiu,2 Stuart Sands,3 Mohammad Khan,4 Shivang Vora,5 Kenichiro Oshima,6 Khang Nguyen,7 L. Andrew DiFronzo,8 David Rhew,9 Mark I. Feng10
- Department of Urology, Los Angeles Medical Center, Kaiser Permanente, Los Angeles, CA, USA.
- Department of Anesthesiology, Baldwin Park Medical Center, Kaiser Permanente, Baldwin Park, CA, USA.
- Kaiser Permanente, Pleasanton, CA, USA.
- Microsoft Health & Life Sciences, Irvine, CA, USA.
- Microsoft Health & Life Sciences, Dallas, TX, USA.
- Kaiser Permanente, Oakland, CA, USA.
- Department of Family Medicine, Kaiser Permanente, Pasadena, CA, USA.
- Kaiser Permanente, Pasadena, CA, USA.
- Microsoft Health & Life Sciences, Redmond, WA, USA.
- Department of Urology, Baldwin Park Medical Center, Kaiser Permanente, 1011 Baldwin Park Blvd., Baldwin Park, CA, USA.
- Holmgren AJ, Downing NL, Tang M, Sharp C, Longhurst C, Huckman RS. Assessing the impact of the COVID-19 pandemic on clinician ambulatory electronic health record use. Journal of the American Medical Informatics Association. 2022;29(3):453-460. doi:10.1093/jamia/ocab268
- Harris AM, Teplitsky S, Kraft KH, Fang R, Meeks W, North A. Burnout: A Call to Action from the AUA Workforce Workgroup. Journal of Urology. 2023;209(3):573-579. doi:10.1097/JU.0000000000003108
- Liu T, Zhu Z, Holmgren AJ, Ellimoottil C. National trends in billing patient portal messages as e-visit services in traditional Medicare. Health Affairs Scholar. 2024;2(4). doi:10.1093/haschl/qxae040
- Mirzadeh I, Alizadeh K, Shahrokhi H, Tuzel O, Bengio S, Farajtabar M. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. Published online October 7, 2024.