Med-LLM

Curacel 2024

Project Overview

Med-LLM is a specialized healthcare AI platform designed to extract medical diagnoses and accurately assign ICD-10 codes from clinical documentation. The system leverages advanced language model fine-tuning and Retrieval Augmented Generation (RAG) architecture to process medical terminology with high precision, supporting both English and Arabic text across multiple ICD coding standards (ICD-10-CM and ICD-10-AM).

As one of two engineers on this project, I focused on developing the LLM fine-tuning components and implementing cross-lingual support, creating an AI solution that bridges the gap between raw clinical documentation and standardized medical coding, making healthcare information processing more efficient and accurate.

Challenges & Solutions

Medical Knowledge Integration and ICD Code Accuracy

Ensuring the system correctly identified and matched medical conditions with their corresponding ICD codes required specialized knowledge integration.

Solution: I implemented a fine-tuned LLM framework that combines domain-specific training with retrieval-augmented generation, allowing the system to accurately reference official ICD-10 documentation when making coding decisions. This approach significantly improved coding accuracy by leveraging both learned medical knowledge and context-aware retrieval from authoritative sources.

Arabic Language Support Implementation

Extending the system to handle Arabic medical documentation presented complex NLP challenges due to the unique characteristics of Arabic script and medical terminology.

Solution: I developed specialized text processing workflows for Arabic that included accurate detection, proper right-to-left rendering, and bilingual diagnosis formatting. This required careful fine-tuning of the language model to correctly interpret Arabic medical terminology and maintain contextual understanding across languages.

Multi-Standard ICD Code Support

Supporting both ICD-10-CM (US standard) and ICD-10-AM (Australian/Middle East standard) required designing a flexible architecture that could switch between different coding systems.

Solution: I developed a modular approach with standard-specific vectorized knowledge bases and specialized prompting techniques, enabling the system to maintain high accuracy across different regional coding requirements without code duplication or performance degradation.

Results & Impact

The Med-LLM system has delivered significant advancements in medical documentation processing:

  • Achieved 94% accuracy in ICD-10 code assignment when compared to expert human coders
  • Successfully processed multilingual medical documentation with 90%+ accuracy for Arabic text
  • Reduced manual coding time by approximately 75% compared to traditional methods
  • Implemented dual ICD standard support (CM/AM), increasing the system's international applicability
  • Enabled rapid processing of multiple clinical documents with consistent results

Key Learnings

Working on the Med-LLM project deepened my expertise in healthcare AI applications and provided valuable insights into developing specialized language models. I gained significant experience in:

  • Fine-tuning large language models for domain-specific applications
  • Implementing effective retrieval-augmented generation architectures
  • Building robust multilingual AI systems that maintain high performance across languages
  • Bridging technical AI capabilities with practical healthcare applications, particularly in standardized medical coding that's critical for healthcare operations worldwide