Systems epidemiology is a new research discipline that seeks to integrate pathways analyses into observational study designs to improve the understanding of biological processes in human organisms as time-dependent changes or trajectories of functional genomics. This chapter guides the reader on the different aspects of the book. The aim is to improve the understanding of the structures of data from complex study designs, data handling, new statistical methods and the interpretation of the results.
The increasing use of omics data in epidemiology enables many novel study designs, but also introduces challenges for data analysis. We describe the possibilities for systems epidemiological designs in the Norwegian Women and Cancer (NOWAC) study and show how the complexity of NOWAC enables many beautiful new study designs. We discuss the challenges of implementing designs and analyzing data. Finally, we propose a systems architecture for swift design and exploration of epidemiological studies.
Standardizing and documenting computational analyses is necessary to ensure reproducible results. We describe an R-based implementation of data management and preprocessing that is well integrated with the analysis tools typically used for statistical analysis of omics data. We have used these tools to organize data storage and documentation, and to standardize the analysis of gene expression data, in the Norwegian Women and Cancer study.
For tissue-based studies of breast cancer, getting access to truly normal, well-annotated tissue can be a challenge. To address that need, we collected 368 breast tissue biopsies and buffered blood samples from healthy postmenopausal women. Volunteers were part of the Norwegian Women and Cancer (NOWAC) Postgenome cohort, recruited through the national mammography screening program. The NOWAC normal breast tissue biobank for gene expression analysis will provide a correct basis for comparison in case-control studies.
Omics researchers routinely use hypothesis tests. These tests can lead to highly inefficient use of omics data. Through a familiar example, we show the need for exploratory approaches and show how common statistical tools such as p-values and confidence intervals can be used for exploratory omics research. We discuss the often-misunderstood hypothesis test and emphasize its lesser known flexibility. This work is an effort to improve the use of statistical tools in omics by non-statisticians.
We develop new statistical methods for analyzing sparsely sampled curves that vary in time. The typical dataset is differences in log gene expressions from case-control pairs for a large number of genes sampled relative to time of diagnosis. We focus on weak signals in the gene expression in many genes instead of strong signals in a few genes. The methods are based on moving windows in time, hypothesis testing, dimension reductions and randomization of the time to observation.
Interpretations of findings in transcriptomic analyses as part of systems epidemiology are usually based on analogies from mostly reductionist experiments on mice. Such transfer of knowledge from one scientific discipline to another depends on the validity of comparisons. The potential fallacies of analogical thinking cover all aspects of the differences between mice and humans, genetically and in lifestyle. We need better classification of the experimental information in standard databases.
The understanding of changes in temporal processes related to human carcinogenesis is limited. Here we compile trajectories of differential expression of genes, based on measurements from many case-control pairs. We propose a new statistical method that does not assume any parametric shape for the gene trajectories. This new statistical approach had good properties in terms of statistical power and type 1 error under minimal assumptions. It was able to discriminate between groups of genes with non-linear similar patterns before diagnosis.
Using the time-dependent dynamics of gene expression from immune cells in blood, we aimed to explore single gene expression trajectories as biomarkers for death after a diagnosis of breast cancer introducing a new statistical method denoted Difference in Time Development Statistics (DTDS). This shows as proof of principle that the gene expression profiles from immune cells in blood differed in the postdiagnostic period are dependent on later vital status.
A deep comprehension of what cancer is as a biological phenomenon is lacking. Several theories have been proposed and many of them do not necessarily contradict each other. One of the theories is the intriguing hypothesis that a cancer cell may be triggered by mutations, but is basically a self-activated throwback to an ancestral cell phenotype running its ancient core functionality by preserving its vital functions, such as survival and uncontrolled proliferation.
Over the decades, many theories or models of carcinogenesis have been proposed. Based on the systems epidemiology research on gene expression from immune cells in peripheral blood, the concept of the dynamic interface between the immune system and the carcinogen driven carcinogenesis is put forward. This combines traditional exposure research in cancer epidemiology with upcoming knowledge of the immunological response to cancer, from clones of cancer cells to clones of immune cells.