Useful Medical Images Datasets For ML Projects Skip to main content

Useful Medical Images Datasets For ML Projects


Medical image datasets. Source

Artificial intelligence (AI) has made impressive strides in healthcare, but one major challenge remains: lack of data. Deep learning algorithms need lots of data to be effective, and medical images are expensive and difficult to obtain due to ethical and resource constraints. This makes it hard for researchers outside the medical field to develop new AI tools.

This story aims to help by providing a comprehensive list of medical image datasets to support deep learning research. The datasets cover various body areas and are categorized for easy reference.

Another list of medical datasets was recently published:

Health and scientific research

  • MedPix: A free online collection of over 59,000 medical images from various patients.
  • The Cancer Imaging Archive (TCIA): A public, de-identified cancer images (MRI, CT scans, etc.) organized by disease, type, or research area. Supporting data like patient outcomes and expert analysis is also available.
  • Re3Data: This global registry helps you find research data repositories across various disciplines, including medical imaging.

CT datasets

  • CT Medical Images: A smaller selection of CT scans from The Cancer Imaging Archive, focusing on middle slices with specific tags (age, modality, and contrast). It includes data from 69 patients.
  • Deep Lesion: Dive into a vast collection of CT images released by the NIH. This dataset, containing over 32,000 lesions from 4,000 patients, aims to improve lesion detection accuracy.
  • Public Lung Database: Access a limited set of annotated CT lung scans to study lung lesion measurement challenges. All images are downloadable for free.
  • VIA Group Public Databases: Two publicly available datasets with lung CT scans (DICOM format) featuring radiologist-documented abnormalities.

MRI datasets

  • OASIS Brains Datasets: This free collection offers MRI scans of the brain across various ages, cognitive abilities, and genetic backgrounds. Researchers can use this data for studies on normal aging, cognitive decline, and other brain-related topics.
  • MRNet: Knee MRI’s: This dataset focuses on knee MRIs. It includes over 1,300 scans, with manual labels identifying knee injuries like ACL and meniscal tears. This resource can support research on knee structure and injury diagnosis.
  • IVDM3Seg: This dataset delves into intervertebral discs (IVDs) in the lower spine. It offers 3D MRI scans from 12 individuals at different stages of a bed rest study. Each scan comes with a detailed segmentation map (binary mask) for specific IVDs. Researchers can use this data to investigate the effects of factors like bed rest on IVDs.

X-Ray datasets

  • NIH Chest X-ray Dataset: Explore a massive collection of over 112,000 chest X-rays from various patients.
  • ChestX-Det-Dataset: This dataset focuses on specific abnormalities. It contains 3,578 images with detailed annotations for 13 categories, including atelectasis, cardiomegaly, and nodules.
  • CheXpert: Dive into a comprehensive dataset containing over 224,000 chest X-rays with corresponding radiology reports. This resource supports research on various aspects of chest X-ray analysis.
  • SCR Database: This database offers segmented chest X-rays focusing on lung fields, heart, and clavicles. It includes images with and without lung nodules.
  • MURA: Explore a dedicated collection of musculoskeletal X-rays, including over 40,000 images of various upper extremity bones like the elbow, hand, and shoulder.

COVID-19 datasets

V7 COVID-19 X-Ray dataset

  • 6500 chest X-rays: Includes a mix of COVID-19 and non-COVID-19 cases (517 confirmed COVID-19).
  • Detailed lung segmentation: Each image has precise annotations outlining the lungs, including the region behind the heart.
  • Pneumonia type labels: Images are labeled for different pneumonia types (viral, bacterial, fungal, healthy).
  • Additional COVID-19 patient information: For confirmed cases, data like age, sex, and outcome is available.

This dataset is well-suited for researchers studying lung features in COVID-19 patients and comparing them to other types of pneumonia.

Additional COVID-19 image resources:

  • COVID-19 image dataset: This smaller dataset includes 317 chest X-rays categorized as COVID-19, viral pneumonia, or normal.
  • COVID-19 CT scans: This limited dataset offers 20 CT scans of patients with COVID-19, along with expert segmentation.

If you like the article and would like to support me make sure to:
📰 View more content on my
medium profile and 👏Clap for this article
📰 View more content on AI-ContentLab Blog
🚀👉 Read more related articles to this one on Medium


You may like

Latest Posts

SwiGLU Activation Function

Position Embedding: A Detailed Explanation

How to create a 1D- CNN in TensorFlow

Introduction to CNNs with Attention Layers

Meta Pseudo Labels (MPL) Algorithm

Video Classification Using CNN and Transformer: Hybrid Model

Liquid Neural Networks: Introduction