medical imaging | VALIANT

Foundation Models for Volumetric Medical Imaging: Opportunities, Challenges, and Future Directions

waddelma — Wed, 29 Apr 2026 03:50:01 +0000

Ghosh, Tapotosh; Sheikhi, Farnaz; Guo, Junlin; Singh, Yashbir; Younis, Khaled; Kuanar, Shiba; Faghani, Shahriar; Farina, Eduardo Moreno Judice de Mattos; Huo, Yuankai; Maleki, Farhad (2026).��.��Electronics (Switzerland), 15(6), 1245.��

Foundation models—large, pre-trained artificial intelligence systems that can be adapted to many different tasks—are increasingly transforming medical image analysis. While earlier work focused mostly on 2D images, there is growing interest in applying these models to��volumetric (3D) medical images, such as��CT, MRI, and PET scans, which capture the body in three dimensions and provide richer clinical information. This review summarizes recent progress in building such models, including their use of��3D neural network architectures��(AI systems designed to process 3D data) and different training approaches.

The paper highlights how these models are being used for important medical tasks, such as��classification��(identifying diseases),��segmentation��(outlining organs or tumors),��image registration��(aligning different scans), improving image quality, and even��visual question answering��(answering questions about medical images). At the same time, it discusses key challenges, including the high computational resources required, the limited availability of large and diverse 3D medical datasets, and difficulties in adapting models across different clinical settings.

Overall, the review provides a clear overview of the field and outlines future directions for developing more scalable and reliable AI tools that can work effectively with 3D medical images in real-world healthcare.

Figure 1.��Paper selection process. 376 articles were initially found from different repositories. After removing duplicates and imposing strict criteria, 60 articles were selected for review.

MedPTQ: a practical pipeline for real post-training quantization in 3D medical image segmentation

waddelma — Thu, 26 Mar 2026 18:45:00 +0000

Chongyu Qu; Ritchie Zhao; Ye Yu; Bin Liu; Tianyuan Yao; Junchao Zhu; Bennett A. Landman; Yucheng Tang; Yuankai Huo (2026).��.��Journal of Medical Imaging, 13(1), 014006.��

This study focuses on making advanced deep learning models for medical imaging more efficient and practical to use, especially in settings with limited computing power. One common approach is��quantization, which reduces the numerical precision (or bit-width) of a model’s calculations—for example, using 8-bit numbers instead of standard 32-bit ones—to shrink model size and speed up processing. However, many previous methods only��simulate��this lower precision without actually improving real-world performance. To address this gap, the researchers developed MedPTQ, an open-source pipeline that enables true 8-bit (INT8) quantization for complex 3D medical imaging models, such as U-Net and transformer-based architectures. Their method works in two stages: first, it uses a tool called TensorRT to simulate lower-precision computations using sample data, and then it converts this into real low-precision execution on GPUs (graphics processing units), which are commonly used for high-performance computing.

The results show that MedPTQ can significantly reduce model size (by up to nearly four times) and speed up processing (by almost three times faster) while maintaining almost the same accuracy as full-precision models, as measured by the Dice similarity coefficient—a standard metric for evaluating how well predicted image segments match the true regions. Importantly, the approach was tested across multiple types of models and datasets, including scans of the brain, abdomen, and entire body from CT and MRI imaging, demonstrating strong flexibility and reliability. Overall, this work shows that real, not just simulated, low-precision AI models can be effectively deployed in medical imaging, making them more accessible and efficient without sacrificing performance.

Fig.��1

We introduce MedPTQ, an open-source pipeline for real post-training quantization that converts FP32 PyTorch models into INT8 TensorRT engines. By leveraging TensorRT for real INT8 deployment, MedPTQ reduces model size and inference latency while preserving segmentation accuracy for efficient GPU deployment.

The Hidden Impact of Radiography and Fluoroscopy—An Environmental Life Cycle Assessment

waddelma — Fri, 19 Dec 2025 �� 17:00:14 +0000

Snyder, E. J., Thiel, C. L., Struk, O., Vigil-Garcia, M., Meijer, C., Gehrels, J., Omary, R. A., Scheel, J. R., & Carver, D. E. (2025 ��).��.��Journal of the American College of Radiology.��

Medical imaging techniques like radiography and fluoroscopy have a measurable environmental impact, primarily due to energy use and associated emissions. This study assessed the carbon footprint of these imaging services at a large academic medical center over one year using life cycle assessment (LCA) methods, collecting data through observations, records, staff interviews, and energy metering.

The analysis estimated that radiography and fluoroscopy generated about 55,100 kilograms of CO₂ equivalent annually. Energy use was the largest contributor, responsible for 47% of emissions. Per-scan emissions were higher for fluoroscopy (4.8–9.6 kg CO₂e per scan) compared to radiography (0.8 kg CO₂e per scan). Medical linens contributed 24% of total emissions, and additional environmental effects included ozone depletion, smog, acidification, and eutrophication.

The study highlights that reducing energy consumption—through decarbonized electricity and optimized equipment use—can significantly cut greenhouse gas emissions. Sustainable management of linens, responsible procurement, and minimizing unnecessary imaging are also important strategies for lowering the environmental footprint of radiography and fluoroscopy.

Fig. 1��Flow diagram of components included in the study. ∗This study could not account for the production of all additional capital equipment. See e-only��here and in the previous publication [] for more information. ∗∗The environmental footprint of waste disposal was not included in this study.

Measuring the Environmental Impact of MRI and CT: A Life Cycle Assessment

waddelma — Fri, 19 Dec 2025 �� 16:58:47 +0000

Carver, D. E., Pruthi, S., Struk, O., Vigil-Garcia, M., Meijer, C., Gehrels, J., Omary, R. A., Scheel, J. R., & Thiel, C. L. (2025 ��).��.��Journal of the American College of Radiology.��

Medical imaging, such as MRI and CT scans, has a notable environmental footprint due to energy use, equipment production, and disposable supplies. This study evaluated the environmental impact of MRI and CT services at a large academic medical center in the Southeastern United States over one year using life cycle assessment methods. Researchers collected data from direct observation, records, staff interviews, and energy metering, and assessed impacts with established environmental databases and software.

Results showed that MRI and CT services produced an estimated 221 and 108 tons of carbon dioxide equivalent annually—comparable to the emissions of 52 and 25 cars driven for a year, respectively. Energy use contributed most to emissions (58% for MRI, 33% for CT), followed by disposable supplies, equipment production, and linens. Switching to solar power could cut MRI emissions by 70% and CT emissions by 40%, though the relative contribution of supplies and equipment would then become more significant.

These findings highlight the importance of energy consumption in imaging services and suggest that renewable energy adoption, efficient scanner use, reusable supplies, and circular business practices—such as extending equipment life—can meaningfully reduce the environmental impact of medical imaging.

Fig. 1��Flow diagram of components included in the study. ∗This study could not account for the production of all additional capital equipment. See e-only��here and in the previous publication [] for more information

Evaluating cell AI foundation models in kidney pathology with human-in-the-loop enrichment

waddelma — Fri, 19 Dec 2025 �� 16:47:48 +0000

Guo, J., Lu, S., Cui, C., Deng, R., Yao, T., Tao, Z., Lin, Y., Lionts, M., Liu, Q., Xiong, J., Wang, Y., Zhao, S., Chang, C. E., Wilkes, M., Fogo, A. B., Yin, M., Yang, H., & Huo, Y. (2025 ��).��.��Communications Medicine,��5(1), 495.��

Large artificial intelligence foundation models are becoming important tools in healthcare, including digital pathology, where they help analyze medical images. Many of these models have been trained to handle complex tasks such as diagnosing diseases or measuring tissue features using very large and diverse datasets. However, it is less clear how well they perform on more focused tasks, such as identifying and outlining cell nuclei within images from a single organ like the kidney. This study examines how well current cell foundation models perform on this task and explores practical ways to improve them.

To do this, the researchers assembled a large dataset of 2,542 kidney whole slide images collected from multiple medical centers, covering different kidney diseases and even different species. They evaluated three widely used, state-of-the-art cell foundation models—Cellpose, StarDist, and CellViT—for their ability to segment cell nuclei. To improve performance without requiring extensive, time-consuming pixel-level annotations from experts, the team introduced a “human-in-the-loop” approach. This method combines predictions from multiple models to create higher-quality training labels and then refines a subset of difficult cases with corrections from pathologists. The models were fine-tuned using this enriched dataset, and their segmentation accuracy was carefully measured.

The results show that accurately segmenting cell nuclei in kidney pathology remains challenging and benefits from models that are more specifically tailored to this organ. Among the three models, CellViT showed the best initial performance, with an F1 score of 0.78. After fine-tuning with the improved training data, all models performed better, with StarDist reaching the highest F1 score of 0.82. Importantly, combining automatically generated labels from foundation models with a smaller set of pathologist-corrected “hard” image regions consistently improved performance across all models.

Overall, this study provides a clear benchmark for evaluating and improving cell AI foundation models in real-world pathology settings. It also demonstrates that high-quality nuclei segmentation can be achieved with much less expert annotation, supporting more efficient and scalable workflows in clinical pathology without sacrificing accuracy.

Fig. 1: Overall framework.

The upper panel��(a–c) illustrates the diverse evaluation dataset consisting of 2542 kidney WSIs.��a��shows the number of kidney WSIs in publicly available cell nuclei datasets versus our evaluation dataset, which exceeds existing datasets by a large margin.��b��depicts the diverse data sources included in our dataset.��c��indicates that these WSIs were stained using Hematoxylin and Eosin (H&E), Periodic acid–Schiff methenamine (PASM), and Periodic acid–Schiff (PAS).��Performance: Kidney cell nuclei instance segmentation was performed using three SOTA cell foundation models: Cellpose, StarDist, and CellViT. Model performance was evaluated based on qualitative human feedback for each prediction mask. Data Enrichment: A human-in-the-loop (HITL) design integrates prediction masks from performance evaluation into the model’s continual learning process, reducing reliance on pixel-level human annotation.

Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets

waddelma — Mon, 25 Aug 2025 �� 19:45:55 +0000

Kim, Michael E., Gao, Chenyu, Newlin, Nancy R., Rudravaram, Gaurav, Krishnan, Aravind R., Ramadass, Karthik, Kanakaraj, Praitayini, Schilling, Kurt G., Dewey, Blake E., & Bennett, David Alan. (2025 ��). “.” PLOS ONE, 20(8), e0327388.

Careful quality control (QC) is essential when working with large medical imaging datasets, because poor-quality data can lead to wrong conclusions or poorly trained machine learning models. However, QC can be very time consuming. Most existing methods try to save time using automated tools that detect unusual data points, but these tools cannot catch every mistake. This means researchers still need to visually check the results of data processing in a reliable and scalable way.

In this study, we designed a QC pipeline for a large collection of brain scans, including diffusion-weighted and structural MRI. Our method was built to: (1) provide a consistent way for teams of researchers to perform and manage QC, (2) allow fast visualization of preprocessed data so the process is quicker without sacrificing quality, and (3) make it easy to combine and share QC results across datasets and pipelines.

We tested our method by comparing it to an automated QC approach on a set of 1,560 brain scans, and by measuring how much agreement there was between different researchers performing QC. The results showed mostly high agreement among researchers and only small differences compared to the automated method. Overall, while visual QC still takes time, our approach makes the process more streamlined and efficient.

Fig 1. Issues with automatic and team-based QC.

When maintaining large neuroimaging datasets with multiple processing pipelines, shallow quality control processes that rely on derived metrics can fail to catch instances of algorithmic failures. However, deep QC processes quickly become unscalable and inefficient as the amount of data available increases due to the required time for mass visualization of outputs. For example, opening 50,000 T1w images separately in an image viewer for deep QC can take over 60 hours if it takes five seconds to load images in and out of the viewer. Team driven efforts to alleviate such large time costs come with additional challenges due to inconsistencies in reporting and methods of performing QC.

MAISI: Medical AI for Synthetic Imaging

waddelma — Wed, 21 May 2025 �� 16:14:44 +0000

Guo, Pengfei; Zhao, Can; Yang, Dong; Xu, Ziyue; Nath, Vishwesh; Tang, Yucheng; Simon, Benjamin; Belue, Mason; Harmon, Stephanie; Turkbey, Baris; Xu, Daguang. “” Proceedings – 2025 �� IEEE Winter Conference on Applications of Computer Vision, WACV 2025 �� (2025 ��): 4430–4441.��.��

Medical imaging, like CT scans, is extremely valuable for diagnosing and treating health conditions. But creating these images for research or training AI tools comes with big challenges — such as not having enough data, the high cost of having experts label the images, and concerns about patient privacy.��

This study introduces a new tool called��MAISI��(Medical AI for Synthetic Imaging), which uses��AI and a technique called diffusion modeling��to create realistic, 3D synthetic CT scans. These synthetic images can be made in high resolution and with flexible sizes to match different medical needs.��

MAISI also includes a tool called��ControlNet, which allows the system to generate CT scans that already have important organs labeled — up to��127 anatomical structures��— saving time and effort for researchers and doctors.��

The results from tests show that MAISI can create very lifelike and medically accurate images for a variety of body parts and conditions. This suggests that synthetic images created with MAISI could help solve major problems in medical imaging by reducing the need for real patient data and expensive manual labeling.��

Figure 1.��

(a) a generated high-resolution ct volume (with volume dimensions of 512 × 512 × 768 and voxel spacing of 0.86 × 0.86 × 0.92 mm3) by the proposed method and its corresponding segmentation condition overlaid on generated volume. we show the axial, sagittal, and coronal views from top to bottom, respectively. (b) 3d volume rendering of generated ct by maisi. the rendering setting is tuned to highlight bone structures and demonstrate the realism of the generated ct volume.��

GloFinder: AI-empowered QuPath plugin for WSI-level glomerular detection, visualization, and curation

waddelma — Wed, 23 Apr 2025 �� 14:05:14 +0000

Yue, Jialin; Yao, Tianyuan; Deng, Ruining; Lu, Siqi; Guo, Junlin; Liu, Quan; Xiong, Juming; Yin, Mengmeng; Yang, Haichun; Huo, Yuankai. “” Journal of Pathology Informatics 17 (2025 ��): 100433. .��

��

Artificial intelligence (AI) has made it easier to automatically detect glomeruli—the tiny filtering units in the kidney—using high-resolution images of kidney tissue. But many of the existing AI tools are hard to use unless you have advanced programming skills, which makes them less useful for doctors and other healthcare professionals. On top of that, current tools are often trained on only one type of data and don’t let users adjust how confident the system needs to be before marking something as a glomerulus.��

To solve these problems, we created GloFinder, a user-friendly tool that works as a plugin for the QuPath image viewer. With just one click, GloFinder can scan an entire kidney slide image and find glomeruli automatically. It also lets users review and edit the results directly on the screen.��

GloFinder uses an advanced detection method called CircleNet, which represents glomeruli as circles to help the system find them more precisely. It was trained using around 160,000 manually labeled glomeruli to boost accuracy. To make the results even better, GloFinder uses a smart technique that combines results from several AI models, weighting their confidence levels to improve overall performance.��

This tool is designed to make it easier for clinicians and researchers to analyze kidney images quickly and accurately—no programming required—making it a valuable resource for kidney disease research and diagnosis.��

Fig. 1.��

Glomerular detection results using the GloFinder plugin. Detected glomeruli are represented as circles with various colors indicating detection confidence.��

mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis

waddelma — Wed, 23 Apr 2025 �� 14:00:09 +0000

Liu, Quan; Deng, Ruining; Cui, Can; Yao, Tianyuan; Yang, Yuechen; Nath, Vishwesh; Li, Bingshan; Chen, You; Tang, Yucheng; Huo, Yuankai. “” IS and T International Symposium on Electronic Imaging Science and Technology 37, no. 12 (2025 ��): HPCI-183. .��

Researchers have developed a new way to help computers better understand and analyze large, detailed images of human tissue—like those used to diagnose diseases—by combining both image data and written medical notes.��

In medicine, especially in fields like cancer diagnosis, doctors often study massive, high-resolution images of tissue samples under a microscope. These images are so large (sometimes called “gigapixel images”) that it’s hard for computers to analyze them in one go. Existing computer methods usually break the image into smaller pieces, study those, and then try to make sense of the whole picture later. But it’s tricky to combine that image data with written reports from pathologists in a smooth, efficient way.��

This study introduces a new method called mTREE (Multi-Level Text-Guided Representation End-to-End Learning). It uses written descriptions—like those found in pathology reports—to guide the computer in figuring out which parts of a tissue image are important. The model learns to focus on both small, detailed parts of the image and the big picture all at once, using the text to improve its understanding of both.��

The written notes help in two ways: first, they guide the model to zoom in on key areas, and second, they help it blend that information into a full understanding of the image. The researchers tested their approach on tasks like predicting what kind of disease is present and estimating patient survival, and their new method outperformed other existing tools.��

This work could make it easier for doctors and researchers to get useful information from complex medical images by making better use of the text that already comes with them. The tool is publicly available for others to use and build upon.��

��

Write Sentence with Images: Revisit the Large Vision Model with Visual Sentence

waddelma — Wed, 23 Apr 2025 �� 13:58:37 +0000

Liu, Quan; Cui, Can; Deng, Ruining; Yao, Tianyuan; Yang, Yuechen; Tang, Yucheng; Huo, Yuankai. “” IS and T International Symposium on Electronic Imaging Science and Technology 37, no. 12 (2025 ��): HPCI-172. .��

��

This paper presents a new method for creating high-quality images from “visual sentences”—basically, meaningful snapshots pulled from video clips. The team combined two types of AI models: a lightweight model that predicts sequences and another that helps create realistic images. This combo allows the system to generate accurate and detailed images while using fewer computer resources than traditional approaches.��

Unlike other methods that need lots of data and power, this approach works efficiently even with only partially labeled video frames. It produces smooth, context-aware images and performs especially well in real-time situations or on devices with limited computing power.��

The method also shows promise in medical imaging—helping clean up noisy images, adjust lighting, and separate different parts of an image. In short, this work offers a smart, efficient way to generate high-quality images across many fields, from everyday video content to medical analysis.��

��