Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Last update: 20240919

Data flow

The data generation and analysis process involves several stages, beginning with sample collection and culminating in the presentation of final results. This process is designed to integrate with existing clinical databases using newly developed OMICs concepts based on the SPHN ontology. The process is summarised in figure 1{reference-type=”ref” reference=”fig:precision_med_dataflow”}.

We have worked with SPHN and TheHyve to develop new concepts which cover the generation of sequencing data and analysis, recently published in @van2023bridging as illustrated in figure 3{reference-type=”ref” reference=”fig:concept”}. We have also developed the concepts which cover the final outputs of downstream analysis results for omic results. This work enhances the integration of omics data into the SPHN Semantic Interoperability Framework, which primarily handles clinical routine data.

Our new genomics extension enriches the framework to include comprehensive descriptions of genomics experiments, encompassing both clinical and research applications. It outlines the entire omics process flow, detailing steps from sample processing to data analysis, including specifics like library preparation and sequencing analysis. The extension also integrates additional omics metadata, such as details on sequencing instruments and quality control metrics. By aligning with established semantic models and leveraging common biomedical vocabularies (e.g., EDAM, OBI, and FAIR genomes), it promotes semantic interoperability and aims to FAIRify data for shared use within the Swiss network, enhancing data reuse in a unified knowledge graph.

precision medicine unit data flow from sample collection to final result presentation. **Sample collection**: Sample collection occurs in . SMOC typically processes most physical samples of DNA, RNA, serum, or other tissues. Multi-omic data is generated and transfered to BioMedIT. **Analysis on BioMedIT**: Bioinformatic analysis pipelines process the data and produce a main analysis output which is stored long-term. Key actionable results from this large dataset are prepared according to SPHN ontology, using concepts such as "sequencing assay" and "omic result". **Transfer results to** : Two datasets are prepared before transfer to the network. (1) The data for which we have SPHN concepts and is suitable for a clinical data warehouse is prepared in TSV, SQL, RDF, or other format. (2) Supplemental reports with extensive metadata, visualisations, and contextual information in formats such as TSV, PDF, HTML. Both datasets are transfered to the network for database integration and file storage, respectively.{#fig:precision_med_dataflow width=”60%”}

Figure extract and text quoted from @van2023bridging: (A) Basic excerpt of the schema for the (gen)omics process flow. (B) Diagram visualising an instance of a sequencing assay that analyses one sample and produces one FASTQ file. "Sequencing Assay" concept, together with its "Instrument", "Library Preparation", "Standard Operating Procedure", and "Quality Control Metric" concepts from which it is composed.{#fig:concept width=”80%”} Figure extract and text quoted from @van2023bridging: (A) Basic excerpt of the schema for the (gen)omics process flow. (B) Diagram visualising an instance of a sequencing assay that analyses one sample and produces one FASTQ file. "Sequencing Assay" concept, together with its "Instrument", "Library Preparation", "Standard Operating Procedure", and "Quality Control Metric" concepts from which it is composed.{#fig:concept width=”80%”}

Sample collection and initial processing

  • Location: .

  • Details: Sample collection encompasses various biological materials including DNA, RNA, serum, and other tissue types. These are initially processed by the Swiss Multi-Omics Center (SMOC), which is responsible for the physical handling and preliminary omics data generation.

Data transfer and bioinformatics analysis

  • Transfer: The raw multi-omic data generated by SMOC is transferred to BioMedIT using secure protocols such as sFTP.

  • Bioinformatics processing: At BioMedIT, advanced bioinformatics pipelines are employed to analyse the data. This includes comprehensive analyses across metabolomics, proteomics, and genomics disciplines.

  • Outputs: The main outputs from these analyses include:

    • A large dataset stored for long-term reuse and research purposes.

    • Standard reports generated in formats such as TSV, PDF, and HTML.

    • Result data formatted in RDF, SQL, TSV, which are then adapted to meet the SPHN connector’s requirements for merging into clinical data warehouses.

Data conversion and integration

  • Conversion: Key actionable results are extracted from the large dataset and prepared according to our reporting evidence guidelines and formatted using the SPHN ontology. This preparation uses specific OMICS concepts such as “omic result” to ensure that the data can be seamlessly integrated and interpreted within the clinical framework.

  • Integration: The processed results are converted to fit the database requirements of the hospital’s clinical data warehouse.

Presentation of final results

  • Internal Network Transfer: Outputs, including the standard reports and result data, are transferred back to the network. This step is crucial as clinicians do not have direct access to secured BioMedIT servers.

  • Access and Presentation: Final analysis results are made accessible to clinicians through an internal webpage and downloadable TSV, PDF, or other formats. This ensures that the results are readily available for clinical decision-making and further research. The clinical data warehouse will maintain the main omic result data in RDF, SQL, or other suitable formats that best match the current system.

By aligning with the SPHN RDF ontology and implementing it through newly developed OMICS concepts, this data flow ensures that genomic and other omic data types are integrated into the hospital’s clinical operations, enhancing the capacity for precision medicine and personalised patient care. The entire process is illustrated in figure 1{reference-type=”ref” reference=”fig:precision_med_dataflow”}, providing a visual representation of the data flow from sample collection to final result presentation within the hospital’s infrastructure.