Show HN: ReadMyMRI DICOM native preprocessor with multi model consensus/ML pipes

github.com

2 points by daftpixie 6 hours ago

I'm building ReadMyMRI to solve a problem I kept running into: getting medical imaging data (DICOM files) ready for machine learning without violating HIPAA or losing critical context.

What it does: ReadMyMRI is a preprocessing pipeline that takes raw DICOM medical images (MRIs, CTs, etc.) and:

Strips all Protected Health Information (PHI) automatically while preserving DICOM metadata integrity Compresses images to manageable sizes without destroying diagnostic quality Links deidentified scans to user-provided clinical context (symptoms, demographics, outcomes) Uses multi-model AI consensus analysis for both consumer facing 2nd opinions and clinical decision making support at bedside Outputs everything into a single dataframe ready for ML training using Daft (Eventual's distributed dataframe library)

Technical approach:

Built on pydicom for DICOM manipulation Uses Pillow/OpenCV for quality-preserving compression Daft integration for distributed processing of large medical imaging datasets Frontier models for multi model analysis (still debating this)

What I'm looking for:

Feedback from anyone working with medical imaging ML Edge cases I haven't thought about Whether the Daft integration actually makes sense for your use case or if plain pandas would be better HIPAA/privacy concerns I am not thinking about

Happy to answer questions about the architecture, HIPAA considerations, or why medical imaging data is such a pain to work with.