What it does: ReadMyMRI is a preprocessing pipeline that takes raw DICOM medical images (MRIs, CTs, etc.) and:
Strips all Protected Health Information (PHI) automatically while preserving DICOM metadata integrity Compresses images to manageable sizes without destroying diagnostic quality Links deidentified scans to user-provided clinical context (symptoms, demographics, outcomes) Uses multi-model AI consensus analysis for both consumer facing 2nd opinions and clinical decision making support at bedside Outputs everything into a single dataframe ready for ML training using Daft (Eventual's distributed dataframe library)
Technical approach:
Built on pydicom for DICOM manipulation Uses Pillow/OpenCV for quality-preserving compression Daft integration for distributed processing of large medical imaging datasets Frontier models for multi model analysis (still debating this)
What I'm looking for:
Feedback from anyone working with medical imaging ML Edge cases I haven't thought about Whether the Daft integration actually makes sense for your use case or if plain pandas would be better HIPAA/privacy concerns I am not thinking about
Happy to answer questions about the architecture, HIPAA considerations, or why medical imaging data is such a pain to work with.