Last update: 20240919

Data flow

The data generation and analysis process involves several stages, beginning with sample collection and culminating in the presentation of final results. This process is designed to integrate with existing clinical databases using newly developed OMICs concepts based on the SPHN ontology. The process is summarised in figure 1{reference-type=”ref” reference=”fig:precision_med_dataflow”}.

We have worked with SPHN and TheHyve to develop new concepts which cover the generation of sequencing data and analysis, recently published in @van2023bridging as illustrated in figure 3{reference-type=”ref” reference=”fig:concept”}. We have also developed the concepts which cover the final outputs of downstream analysis results for omic results. This work enhances the integration of omics data into the SPHN Semantic Interoperability Framework, which primarily handles clinical routine data.

Our new genomics extension enriches the framework to include comprehensive descriptions of genomics experiments, encompassing both clinical and research applications. It outlines the entire omics process flow, detailing steps from sample processing to data analysis, including specifics like library preparation and sequencing analysis. The extension also integrates additional omics metadata, such as details on sequencing instruments and quality control metrics. By aligning with established semantic models and leveraging common biomedical vocabularies (e.g., EDAM, OBI, and FAIR genomes), it promotes semantic interoperability and aims to FAIRify data for shared use within the Swiss network, enhancing data reuse in a unified knowledge graph.

{#fig:precision_med_dataflow width=”60%”}

{#fig:concept width=”80%”} {#fig:concept width=”80%”}

Sample collection and initial processing

Location: .
Details: Sample collection encompasses various biological materials including DNA, RNA, serum, and other tissue types. These are initially processed by the Swiss Multi-Omics Center (SMOC), which is responsible for the physical handling and preliminary omics data generation.

Data transfer and bioinformatics analysis

Transfer: The raw multi-omic data generated by SMOC is transferred to BioMedIT using secure protocols such as sFTP.
Bioinformatics processing: At BioMedIT, advanced bioinformatics pipelines are employed to analyse the data. This includes comprehensive analyses across metabolomics, proteomics, and genomics disciplines.
Outputs: The main outputs from these analyses include:
- A large dataset stored for long-term reuse and research purposes.
- Standard reports generated in formats such as TSV, PDF, and HTML.
- Result data formatted in RDF, SQL, TSV, which are then adapted to meet the SPHN connector’s requirements for merging into clinical data warehouses.

Data conversion and integration

Conversion: Key actionable results are extracted from the large dataset and prepared according to our reporting evidence guidelines and formatted using the SPHN ontology. This preparation uses specific OMICS concepts such as “omic result” to ensure that the data can be seamlessly integrated and interpreted within the clinical framework.
Integration: The processed results are converted to fit the database requirements of the hospital’s clinical data warehouse.

Presentation of final results

Internal Network Transfer: Outputs, including the standard reports and result data, are transferred back to the network. This step is crucial as clinicians do not have direct access to secured BioMedIT servers.
Access and Presentation: Final analysis results are made accessible to clinicians through an internal webpage and downloadable TSV, PDF, or other formats. This ensures that the results are readily available for clinical decision-making and further research. The clinical data warehouse will maintain the main omic result data in RDF, SQL, or other suitable formats that best match the current system.

By aligning with the SPHN RDF ontology and implementing it through newly developed OMICS concepts, this data flow ensures that genomic and other omic data types are integrated into the hospital’s clinical operations, enhancing the capacity for precision medicine and personalised patient care. The entire process is illustrated in figure 1{reference-type=”ref” reference=”fig:precision_med_dataflow”}, providing a visual representation of the data flow from sample collection to final result presentation within the hospital’s infrastructure.