datasets | VALIANT /valiant Vanderbilt Advanced Lab for Immersive AI Translation (VALIANT) Thu, 26 Mar 2026 20:17:22 +0000 en-US hourly 1 Loneliness, Anxiety Symptoms, Depressive Symptoms, and Suicidal Ideation in the All of Us Dataset /valiant/2026/03/26/loneliness-anxiety-symptoms-depressive-symptoms-and-suicidal-ideation-in-the-all-of-us-dataset/ Thu, 26 Mar 2026 20:17:22 +0000 /valiant/?p=6368 Katherine Musacchio Schafer; Jacob Franklin; Peter J. Emb铆; Colin G. Walsh (2026).听.听JAMA Network Open, 9(3), e260596.听

This study examined how feelings of loneliness may help explain the link between anxiety, depression, and听suicidal ideation听(thinking about suicide). Using survey data from over 62,000 adults in the U.S., researchers measured anxiety symptoms, depressive symptoms, loneliness, and suicidal thoughts using standard mental health questionnaires. They found that all three鈥攁nxiety, depression, and loneliness鈥攚ere positively related to suicidal ideation, meaning higher levels of each were associated with more frequent suicidal thoughts.

Importantly, the study showed that听loneliness acts as a mediator, meaning it partly explains听how听anxiety and depression are connected to suicidal ideation. In other words, people with anxiety or depression may be more likely to feel lonely, and that loneliness, in turn, increases the likelihood of suicidal thoughts. While anxiety and depression still had direct effects on suicidal ideation, loneliness accounted for a meaningful portion of this relationship.

Overall, the findings suggest that addressing loneliness could be a key strategy for reducing suicide risk, alongside treating anxiety and depression. By targeting loneliness鈥攖hrough social support, community engagement, or other interventions鈥攊t may be possible to interrupt the pathway from mental health symptoms to suicidal thinking.

Figure 1. 听Participant Flowchart From the All of Us Research Program

]]>
Microenvironment-aware spatial modeling for accurate inference of cell identity /valiant/2026/01/28/microenvironment-aware-spatial-modeling-for-accurate-inference-of-cell-identity/ Wed, 28 Jan 2026 15:26:03 +0000 /valiant/?p=5652 Liu, Qi; Wang, Yu; Hsu, Chihyuan; Wanjalla, Celestine N.; Lau, Ken S.; & Shyr, Yu. (2026).听.听Nucleic Acids Research,听54(1).听

Spatial omics technologies make it possible to measure many molecular features in cells while also preserving information about where those cells are located in a tissue. This spatial context provides valuable insight into how cells are organized and how tissues are structured. New platforms that work at single-cell resolution have further improved our ability to detect cell states that depend on the surrounding microenvironment. However, most existing computational tools for analyzing spatial omics data focus on identifying broad spatial regions rather than determining the identities of individual cells.

Traditional single-cell clustering methods define cell identities using only molecular features inside the cell, such as gene expression, and do not account for how nearby cells and the local tissue environment influence cell behavior. To address this limitation, we introduce MEcell, a method that directly incorporates spatial information and automatically determines how much influence the surrounding environment should have when identifying cell types. MEcell does not require users to tune parameters, making it easier to apply across datasets.

We tested MEcell on 90 simulated datasets and 7 real-world datasets from multiple spatial transcriptomics platforms and tissue types, including MERFISH Vizgen, Xenium, CosMx, Visium HD, Slide seq V2, and open ST. Across all tests, MEcell consistently performed better than existing methods at accurately identifying cell identities. These results show that the local microenvironment plays a crucial role in defining cell identity and demonstrate that MEcell is a powerful tool for capturing the full diversity of cells in spatial omics data.

Figure 1.

The rationale of MEcell. (A) A toy example where a cell鈥檚 transcriptionally similar neighbors are located within the same microenvironment, indicating that the microenvironment will play a minimal role in shaping the nearest-neighbor graph. (B) A toy example where a cell鈥檚 transcriptionally similar neighbors are located across distinct microenvironments, suggesting that the microenvironment will play a significant influence in shaping the nearest-neighbor graph.

]]>
Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step /valiant/2025 海角情/12/19/biomedical-data-repositories-require-governance-for-artificial-intelligence-machine-learning-applications-at-every-step/ Fri, 19 Dec 2025 海角情 16:44:58 +0000 /valiant/?p=5557 Clayton, E. W., Rose, S., Nebecker, C., Novak, L., Bensoussan, Y. E., Chen, Y., Collins, B. X., Cordes, A., Evans, B. J., Ferryman, K. S., Hurst, S., Jiang, X., Lee, A. Y., McWeeney, S., Parker, J., B茅lisle-Pipon, J.-C., Rosenthal, E. S., Yin, Z., Yracheta, J. M., & Malin, B. A. (2025 海角情).听.听JAMIA Open,听8(6), ooaf134.听

This article examines the experience of the NIH鈥檚 Bridge2AI Program, which funded four large biomedical and behavioral datasets designed to be well documented and ready for use with artificial intelligence (AI) and machine learning (ML). The goal of these datasets is to encourage responsible and effective use of AI in research, but building them raised many ethical, legal, social, and practical challenges. The authors describe the key steps involved in creating and managing these AI-ready datasets, including deciding which data to collect and why, responding to public concerns, handling participant consent based on how the data were obtained, ensuring responsible future use, determining where and how data are stored, clarifying how much control participants have over data sharing, and setting rules for data access and downloading.

Across these steps, the projects faced important questions about long-term data storage, future uses of the data, and how to balance openness with privacy and participant protection. The authors highlight the different choices made by the four projects, such as how they gathered public input, selected data storage solutions, and defined criteria for who can access and download the data. Although the governance approaches varied, common themes emerged, suggesting shared best practices.

Overall, the article summarizes key lessons learned from the Bridge2AI Program about how to collect, manage, and govern large datasets intended for AI and ML. These insights can guide future initiatives in designing datasets that are not only technically useful for AI, but also ethically sound, socially responsible, and trustworthy.

Figure 1.

Steps in governance of data collection and decision-making and responsible use for the development of AI with greater attention to public concerns throughout. The first 2 steps鈥攑romoting responsible selection鈥攁ddress the primary work of the DGPs, while the remaining 4 steps鈥攑romoting responsible use鈥攁re crucial factors the DGPs must consider.

]]> Fine-grained multiclass nuclei segmentation with molecular empowered all-in-SAM model /valiant/2025 海角情/11/23/fine-grained-multiclass-nuclei-segmentation-with-molecular-empowered-all-in-sam-model/ Sun, 23 Nov 2025 海角情 16:56:52 +0000 /valiant/?p=5466 Li, Xueyuan., Cui, Can., Deng, Ruining., Tang, Yucheng., Liu, Quan., Yao, Tianyuan., Bao, Shunxing., Chowdhury, Naweed Iffat., Yang, Haichun., & Huo, Yuankai. (2025 海角情).听.听Journal of Medical Imaging,听12(5), 57501.听

Recent advances in computational pathology鈥攖he use of computers to analyze tissue images鈥攈ave been driven by听Vision Foundation Models (VFMs), particularly the听Segment Anything Model (SAM). SAM, a type of VFM, can segment, or outline, cell nuclei using either prompts (zero-shot segmentation) or specialized cell-focused models, allowing it to work across many types of cells. However, general VFMs often struggle with fine-grained tasks, such as identifying specific nuclei subtypes or particular cells.

To address this, we developed the听molecular empowered all-in-SAM model, which enhances SAM and VFMs for more precise pathology analysis. Our approach has three key components: (1)听annotation, where molecular-informed guidance allows even non-experts to label images without detailed pixel-level work; (2)听learning, where SAM is adapted with a SAM adapter to focus on specific cell types and biological features; and (3)听refinement, which improves segmentation accuracy through molecular-oriented corrective learning.

Testing on both in-house and public datasets showed that all-in-SAM greatly improves cell classification, even when annotation quality varies. This approach reduces the workload for human annotators and makes precise biomedical image analysis more accessible, especially in resource-limited settings, supporting advances in automated pathology and medical diagnostics using VFMs.

Fig.听1

Overall idea of our work: this diagram illustrates the distinctions between our approach (bottom panel) and existing methods. (1)听Traditional: expert annotators manually label cells using only PAS images. (2)听MOCL: lay annotators provide pixel-level labels under the guidance of IF molecular images, followed by the application of deep learning for segmentation. (3)听SAM-L: the SAM technique is utilized to expedite the annotation process, requiring only minimal (box) annotations. (4)听All-in-SAM (our method): we integrate SAM in the annotation phase and adaptively fine-tune it during the training of the model.

]]>
Multimodal state-dependent connectivity analysis of arousal and autonomic centers in the brainstem and basal forebrain /valiant/2025 海角情/08/25/multimodal-state-dependent-connectivity-analysis-of-arousal-and-autonomic-centers-in-the-brainstem-and-basal-forebrain/ Mon, 25 Aug 2025 海角情 19:49:23 +0000 /valiant/?p=5012 Pourmotabbed, Haatef, Martin, Caroline G., Goodale, Sarah E., Doss, Derek J., Wang, Shiyu, Bayrak, Roza G., Kang, Hakmook, Morgan, Victoria L., Englot, Dario J., & Chang, Catie E. (2025 海角情). 鈥.鈥 Imaging Neuroscience, 3, IMAG.a.91.

Vigilance, or how alert and awake we are, constantly changes and affects our thinking and behavior. This state can be disrupted in many brain disorders. Certain areas deep in the brain, called neuromodulatory nuclei in the brainstem and basal forebrain, help regulate alertness and drive widespread brain activity and communication. However, it is not well understood how the brain鈥檚 large-scale networks change when we shift between being alert and drowsy.

In this study, we used simultaneous EEG (which measures brain electrical activity) and advanced fMRI scans to explore how these arousal centers connect with other parts of the brain depending on vigilance. We found that when people are drowsy, most of these nuclei show stronger global connections, especially to regions like the thalamus, precuneus, and sensory and motor areas. When people are more alert, the nuclei connect most strongly to networks involved in attention, internal thought, and hearing. These patterns remained consistent even after controlling for blood flow effects.

To confirm our findings, we analyzed two large brain imaging datasets and showed that these connectivity patterns are reproducible across different types of fMRI scans. Overall, this study provides new insights into how brain regions that regulate arousal influence large-scale brain activity depending on our level of alertness.

Fig 1 – Reproducible static connectivity profiles of neuromodulatory arousal centers. (a) Static functional connectivity (FC) t-maps of the locus coresuleus (LC), cuneiform/subcuneiform nucleus (CSC), and nucleus basalis of Meynert (NBM) in the VU 3T-ME, HCP 3T, and HCP 7T datasets for the mCSF/WM preprocessing pipeline. The FC t-maps were thresholded at 40% of the top t-values in the gray matter and at p < 0.05 (voxel-wise false discovery rate [FDR]-corrected over the entire gray matter volume). AFNI was used for visualization of the t-maps (@chauffeur_afni function; upper functional range set to the 98th听percentile). (b) Spatial overlap of the thresholded static FC t-maps of the subcortical arousal regions with 16 canonical brain network templates from the FINDLAB and Melbourne atlases (Shirer et al., 2012;听Tian et al., 2020). A positive value for the spatial overlap corresponds to mostly positive correlations within the brain network template while a negative value corresponds to mostly negative correlations. (c) Spatial reproducibility (Dice similarity coefficient) of the thresholded static FC t-maps between the three fMRI datasets.

]]> Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets /valiant/2025 海角情/08/25/scalable-quality-control-on-processing-of-large-diffusion-weighted-and-structural-magnetic-resonance-imaging-datasets/ Mon, 25 Aug 2025 海角情 19:45:55 +0000 /valiant/?p=5008 Kim, Michael E., Gao, Chenyu, Newlin, Nancy R., Rudravaram, Gaurav, Krishnan, Aravind R., Ramadass, Karthik, Kanakaraj, Praitayini, Schilling, Kurt G., Dewey, Blake E., & Bennett, David Alan. (2025 海角情). 鈥.鈥 PLOS ONE, 20(8), e0327388.

Careful quality control (QC) is essential when working with large medical imaging datasets, because poor-quality data can lead to wrong conclusions or poorly trained machine learning models. However, QC can be very time consuming. Most existing methods try to save time using automated tools that detect unusual data points, but these tools cannot catch every mistake. This means researchers still need to visually check the results of data processing in a reliable and scalable way.

In this study, we designed a QC pipeline for a large collection of brain scans, including diffusion-weighted and structural MRI. Our method was built to: (1) provide a consistent way for teams of researchers to perform and manage QC, (2) allow fast visualization of preprocessed data so the process is quicker without sacrificing quality, and (3) make it easy to combine and share QC results across datasets and pipelines.

We tested our method by comparing it to an automated QC approach on a set of 1,560 brain scans, and by measuring how much agreement there was between different researchers performing QC. The results showed mostly high agreement among researchers and only small differences compared to the automated method. Overall, while visual QC still takes time, our approach makes the process more streamlined and efficient.

Fig 1. Issues with automatic and team-based QC.

When maintaining large neuroimaging datasets with multiple processing pipelines, shallow quality control processes that rely on derived metrics can fail to catch instances of algorithmic failures. However, deep QC processes quickly become unscalable and inefficient as the amount of data available increases due to the required time for mass visualization of outputs. For example, opening 50,000 T1w images separately in an image viewer for deep QC can take over 60 hours if it takes five seconds to load images in and out of the viewer. Team driven efforts to alleviate such large time costs come with additional challenges due to inconsistencies in reporting and methods of performing QC.

]]>
Edge Classification on Graphs: New Directions in Topological Imbalance /valiant/2025 海角情/04/23/edge-classification-on-graphs-new-directions-in-topological-imbalance/ Wed, 23 Apr 2025 海角情 13:57:14 +0000 /valiant/?p=4214 Cheng, Xueqi; Wang, Yu; Liu, Yunchao; Zhao, Yuying; Aggarwal, Charu C.; Derr, Tyler. “WSDM 2025 海角情 – Proceedings of the 18th ACM International Conference on Web Search and Data Mining (2025 海角情): 392-400. .

Recent years have seen great success in using Graph Machine Learning (GML) for tasks like node and graph classification, and predicting links between nodes. However,听edge classification鈥攚hich has many real-world uses, such as analyzing social networks and improving cybersecurity鈥攈as not progressed as much, even with the rise of GML methods.

To address this gap, our study presents a comprehensive approach to edge classification. We identify a novel problem called the听Topological Imbalance Issue, which happens when edges are unevenly distributed across classes. This imbalance affects the structure around each edge and reduces classification performance.

Inspired by recent work showing how node classification accuracy can vary with local graph patterns, we explore whether similar local structure differences affect edge classification. To do this, we introduce听Topological Entropy (TE)鈥攁 new metric that measures how imbalanced the local edge class distribution is. Our results show that TE accurately reflects this local imbalance and that focusing on edges with high TE can improve edge classification.

Based on this insight, we propose two strategies:

  1. Topological Reweighting, which adjusts training weights based on TE, and
  1. TE Wedge-based Mixup, which creates synthetic edges between highly imbalanced areas to improve training.

We combine these into a new edge classification strategy called听TopoEdge, designed specifically to address topological imbalance. Experiments on real-world datasets show that our methods significantly improve performance. Our code and data are available at听, and we also provide curated datasets and testing setups to serve as a new benchmark for future edge classification research.

]]>