Introducing mCodeGPT: A Revolutionary Step in Cancer Data Standardization and Sharing

Hi community,

I hope this post finds everyone well. I’m thrilled to introduce a game-changing tool we’ve been developing: mCodeGPT.

As many are aware, the diverse nature of Cancer data has posed challenges in standardization and sharing. To tackle this, the mCODE™ initiative was set in motion, focusing on building a core set of structured data elements specifically for oncology EHRs.

While this was a step in the right direction, we felt there was more to be done. We’ve combined the Cancer ontology and Large Language Models and create mCodeGPT, with the aim to bridge the gap between unstructured medical texts and structured Cancer ontologies. The goal is to supercharge the process of cancer data standardization and sharing.

For more details, feel free to check out our tool on
Hugging Face: mCodeGPT - a Hugging Face Space by paopaoka3325
Tool Website:

Our on-going work: developping a HIPPA compliant version of mCodeGPT; and generalize this tool to other domains besides Cancer (some of the other domains that we already implemented with our collaborators from multiple universities across the United States: dental ontology, stroke ontology and adversarial safety event ontology, etc.). We are very welcome for colloborations and users interested in our tool and our ongoing research!