Galaxy Project
The open-source ecosystem for data-intensive science, workflow automation, and reproducible research.

The premier multi-omics discovery index for cross-repository data orchestration and meta-analysis.
OmicsDI (Omics Discovery Index) represents a critical infrastructure component in the 2026 life sciences ecosystem, serving as a unified metadata harvester and indexing engine for biological datasets. It bridges the silos between proteomics, genomics, metabolomics, and transcriptomics repositories by providing a standardized search interface. The platform utilizes a sophisticated metadata schema based on Schema.org and Bioschemas to harmonize disparate data from over 20 global repositories, including PRIDE, PeptideAtlas, GEO, and Metabolights. In the current era of AI-driven drug discovery, OmicsDI provides the essential 'ground truth' metadata layer required to train Large Biological Models (LBMs) by identifying high-quality, peer-reviewed datasets across molecular levels. Its technical architecture supports semantic linking, allowing researchers to track a single biological study across multiple omics domains. This cross-omics integration is vital for systems biology approaches, enabling the identification of multi-layered biomarkers and regulatory networks. Managed by the European Bioinformatics Institute (EMBL-EBI) and international partners, OmicsDI ensures data findability, accessibility, interoperability, and reusability (FAIR principles) for the global research community.
Algorithms that identify datasets belonging to the same biological experiment across different omics repositories using metadata matching.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
Natural Language Processing (NLP) is used to extract and index biological entities (genes, proteins, tissues) from dataset descriptions.
A scoring mechanism based on metadata completeness and compliance with community standards (e.g., MIAPE).
Full compliance with Bioschemas.org for structured data representation in biology.
Integration with ORCID to allow researchers to claim datasets they authored.
Provides bubble charts and heatmaps of dataset distributions across repositories and species.
A RESTful microservices architecture that queries multiple backend indexes in real-time.
Aggregating diverse data types for the same biological condition to train LBMs.
Registry Updated:2/7/2026
Validating a genomic biomarker at the protein level.
Finding all available datasets for a specific species to increase statistical power.