Home > Repository of Information and Tools > Epidemiology and Biostatistics

Epidemiology and Biostatistics

This section presents material principally supporting conceptualization of epidemiological studies, statistical analysis and dissemination of results.

Conceptualization

Design of Study

In Theory

General documentation and guidelines regarding the conception of epidemiological studies.

NHMRC Biobanks Information PaperPDF
The aim of this paper is to provide information relevant for the establishment, management and governance of biobanks in Australia. National and international documentation, as well as other relevant case studies are used to identify best practices with regard to standardization of biobank policies, practices and procedures. Prepared by the NHMRC (2010).

Recommendation for indicators, international collaboration, protocol and manual of operations for chronic disease risk factor surveysPDF
Part I of this document gives a list of recommended population indicators for chronic disease risk factors. Part II of this document deals with issues related to defining every step of the survey organization, from the target population to the recruitment per se. A European Health Risk Monitoring-written document, the Copyright is reserved for the Finnish National Public Health Institute (2002).

HuGENet™ HuGE Review HandbookPDF
This document aims to help biomedical researchers undertaking studies and systematic reviews in human genome epidemiology to take sensible decisions about methods right through from study design to final analysis. A HuGENet Systematic Review and Meta-Analysis Working Group handbook (2006).

In Practice

Real-life examples related to conception of specific studies.

Please visit the following section:

The National Children's StudyExternal
This website displays the "Reviews and Analytic Reports" section of the website of the National Children's Study (NCS). It offers access to a wide variety of literature reviews and "white papers" on study design and conduct relevant to the construction of a large (approx 100 000 live births) national birth cohort. Other sections of the website deal with issues specific to the NCS itself (e.g. study hypotheses). The website is cosupported by the U.S. Environmental Protection Agency, U.S. Department of Health and Human Services and USA.gov Agency.

UK Biobank: Protocol for a large-scale prospective epidemiological resourcePDF
Part 1 of this document outlines the scientific rationale for constructing a very large cohort-based biobank in middle aged adults in the UK. It also considers the basic design of such a study, reviewing and justifying decisions about baseline quetionnaires, physical measures and biological samples. Prepared by UK biobank (2007).

UK Biobank: Report of the integrated pilot phasePDF
This document reports on, and provides recommendations based on, the results obtained from the pilot phase of UK Biobank. The pilot phase was conducted between February and June 2006, in order to assess the basic design of the study and the approaches to data and sample collection. Prepared by UK Biobank (2006).

Power and Sample Size

In Practice

Real-life examples related to power and sample size issues and calculations.

UK Biobank: distribution of incident and prevalent cases of chronic disease and the statistical power of nested case-control studiesPDF
This document explores and describes the incidence rates anticipated for common chronic diseases in UK Biobank (a cohort of 500 000 adults aged 40-69 years at recruitment). It also generates and describes the power profiles for nested case-control studies (for direct and interactive effects) based on the UK Biobank resource. A UK Biobank-commissioned document (2005).

The National Children's StudyExternal
This website displays the "Reviews and Analytic Reports" section of the website of the National Children's Study (NCS). It offers access to a wide variety of literature reviews and "white papers" on study design and conduct relevant to the construction of a large (approx 100 000 live births) national birth cohort. Other sections of the website deal with issues specific to the NCS itself (e.g. study hypotheses). The website is cosupported by the U.S. Environmental Protection Agency, U.S. Department of Health and Human Services and USA.gov Agency.

IT Tools

Selected software for power and sample size calculation.

Please visit the following section:

Rockefeller University Linkage SoftwareExternal
This website is a repository of software dealing with: genetic linkage analysis for human pedigree data, QTL analysis for animal/plant breeding data, genetic marker ordering, genetic association analysis, haplotype construction, pedigree drawing, population genetics, TDT and power size calculation (such as QUANTOExternal, TDTASPExternal and PBATExternal). A Laboratory of Statistical Genetics-maintained website, based at the Rockefeller University.

STATA power and sample sizeExternal
STATA is a powerful, flexible, integrated statistical analysis application that provides data management, data analysis and graphics. Read how to access external utilities and servicesExternal to enhance basic use of STATA and how to interface with the international STATA community. Ado and help files are also available to calculate power and sample size for gene-gene and gene-environment interactions. A Genetic Epidemiology Division-maintained program, based at the Cancer Research UK Clinical Centre, University of Leeds. Website maintained by StataCorp, USA.

Power and Sample Size Calculator (PS)External
This software performs power and sample size calculations. PS can be used for studies with dichotomous, continuous, or survival response measures. Developed by Drs W.D. Dupont and W.D. Plummer Jr., Department of Biostatistics, based at the Vanderbilt University.

BioconductorExternal
Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. It is primarily based on the R programming language. Bioconductor and this website are maintained by the Fred Hutchinson Cancer Research Center.

Design and Conduct

Please visit the following sections:

Analysis

In Theory

General documentation and guidelines regarding statistical analysis of genomic and epidemiological data.

Please visit also the following section(s):

Comparison Chart of Websites for Genetic and Genomic Statistics

Comparison Chart of Websites for Genetic and Genomic Statistics
This comparison chart summarises the content of a number of websites offering information and tools that are useful for undertaking a wide spectrum of analyses in genetic and genomic statistics. A P3G-prepared document (2009).

GeneStatExternal
This website presents a series of tutorials on the design, analysis and interpretation of empirical studies involving genetic data. Aimed at geneticists, epidemiologists and molecular scientists working within the field of functional genomics, it includes links to other relevant internet-sites and computer programs for analyzing genetics data. A Wiki-conceived webpage, maintained by the Medical Epidemiology and Biostatistics and based at the Karolinska Institute.

HuGENet™ HuGE Review HandbookPDF
This document aims to help biomedical researchers undertaking studies and systematic reviews in human genome epidemiology to take sensible decisions about methods right through from study design to final analysis. A HuGENet Systematic Review and Meta-Analysis Working Group handbook (2006).

In Practice

Real-life examples reporting specific analytic approaches.

National Health and Nutrition Examination Survey (NHANES)PDF
This document presents analytic and reporting guidelines that should be used for NHANES data analyses and publications. It represents the latest information from the National Center for Health Statistics on recommended approaches for analysis of all NHANES data, but with a particular focus on data collected in the continuous NHANES (since 1999). A US Department of Health and Human Services-written document (2006).

IT Tools

Selected software for statistical analysis.

SAILExternal
is an open source system designed to hold phenotype availability information and meta data about samples, experiments and phenotypes, submitted by data owners or databases that contain actual measurement data. This can apply for epidemiological data as well as biosamples. SAIL is produced by SIMBioMS itself powered by EBI, IMCS and FIMM.

GeneStatExternal
This website presents a series of tutorials on the design, analysis and interpretation of empirical studies involving genetic data. Aimed at geneticists, epidemiologists and molecular scientists working within the field of functional genomics, it includes links to other relevant internet-sites and computer programs for analyzing genetics data. A Wiki-conceived webpage, maintained by the Medical Epidemiology and Biostatistics and based at the Karolinska Institute.

SAS Software for DistributionExternal
This website lists SAS macros which you might need while using the SAS statistical software. Provided by the Bonn University and by the SAS Institute Inc.

Rockefeller University Linkage SoftwareExternal
This website is a repository of software dealing with: genetic linkage analysis for human pedigree data, QTL analysis for animal/plant breeding data, genetic marker ordering, genetic association analysis, haplotype construction, pedigree drawing, population genetics and TDT (such as UnphasedExternal, PDTExternal, PLINKExternal, QTDTExternal, TDT-PCExternal and FBATExternal). A Laboratory of Statistical Genetics-maintained website, based at the Rockefeller University.

RExternal
R is a powerful, flexible, integrated statistical analysis environment that provides analysis, modelling and graphics for data management. R is a programming language equivalent to S and user defined functions provide for an open source route to the rich array of specialist statistical methodology usually vehicled by the S language. StatLib - software and extensions for the S (Splus), and R languageExternal provides access to S, SPlus and R functions across a wide range of specialist areas. It was initially written by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland. Website maintained by the R Foundation.

Epi Info™External
Epi Info is a versatile software developed for the interface between epidemiology and public health. It can be used to develop a simple questionnaire or form, customize the data entry process or undertake basic analyses. The emphasis is on speed and ease of use; it is not a platform for sophisticated modelling. The software and website are maintained by the Centers for Disease Control and Prevention (CDC) in the United States.

PEDSYSExternal
PEDSYS is a database system developed to provide a software environment in which to manage and analyse genetic and demographic data, particularly in relation to pedigree (family-based) data. The system supports integrated collection, management and analysis of constantly evolving data sets. The software and website are supported by the Southwest Foundation For Biomedical Research.

Dr David Clayton's websiteExternal
This website contains the C and C++ source files for SPLINKExternal, TRANSMITExternal, PED2SPLExternal, GH2STATExternal and SNPHAPExternal. It also displays packages for analysis of genetic association studies for the STATA statistical system together with some exercises, and some miscellaneous programs in the R language. A Cambridge Institute for Medical Research-maintained website.

Gonçalo's websiteExternal
This website, from Gonçalo Abecasis' research group, emphases on developing computational and statistical software required for understanding human genetic variation, with a particular focus on complex human disease and encompassing both linkage and association analysis. The website provides access to a number of packages widely used by genetic epidemiologists including Merlin, QTDT and Gold. Website maintained by the University of Michigan.

STATA power and sample sizeExternal
STATA is a powerful, flexible, integrated statistical analysis application that provides data management, data analysis and graphics. Read how to access external utilities and servicesExternal to enhance basic use of STATA and how to interface with the international STATA community. Ado and help files are also available to calculate power and sample size for gene-gene and gene-environment interactions. A Genetic Epidemiology Division-maintained program, based at the Cancer Research UK Clinical Centre, University of Leeds. Website maintained by StataCorp, USA.

BioconductorExternal
Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. BioConductor is primarily based on the R programming language. Bioconductor and this website are maintained by the Fred Hutchinson Cancer Research Center.

Broad Institute SoftwareExternal
This website displays a wide spectrum of software tools (such as ArachneExternal, ArgoExternal, ConradExternal, PLINKExternal, EIGENSTRATExternal, HaploviewExternal, LocusviewExternal, SweepExternal, TaggerExternal, GeneHunterExternal, MapMaker3External, GenePatternExternal, GSEAExternal, GeneCruiserExternal, ConnectivityMapExternal) for the analysis of genomic and genetic datasets: genome sequence, genetic variation, linkage analysis, high-throughput image analysis and expression analysis. Website maintained by the Broad Institute, a collaborative effort involving the MIT, Harvard and its affiliated hospitals, and the Whitehead Institute.

HuGE NavigatorExternal
This website provides access to a continuously updated knowledge base in human genome epidemiology through three main section. The HuGEpedia, including PhenopediaExternal and GenopediaExternal, is an encyclopedia of human genetic variation in health and disease. HuGEtools allow users to search and mine the literature in human genome epidemiology ( HuGE Literature FinderExternal, GWAS IntegratorExternal, HuGE Investigator BrowserExternal, Gene ProspectorExternal, Genotype Prevalence CatalogExternal, HuGE WatchExternal, Variant Name MapperExternal, HuGE Risk TranslatorExternal ). A series of HuGE related informatics utilities and projects such as GAPscreenerExternal, HuGE TrackExternal, and Open SourceExternal are available in the HuGEmix section. Website developed and maintained by the Human Genome Epidemiology Network (HuGENet™)(2009).

Dissemination

In Theory

General documentation and guidelines regarding dissemination of results.

STrengthening the Reporting of OBservational studies in Epidemiology (STROBE)External
This website provides guidelines to help writing complete and well-planned scientific articles based on observational studies in epidemiology. STROBE is an initiative (analogous to the CONSORT initiative for randomized trials) involving an international collaboration of epidemiologists, methodologists, statisticians, researchers and editors involved in the conduct and dissemination of observational studies, that share the common aim of STrengthening the Reporting of OBservational studies in Epidemiology. The initiative is currently led from the University of Bern.

Consolidated Standards of Reporting Trials (CONSORT)External
This website provides the CONSORT StatementExternal which is an evidence-based, minimum set of recommendations for reporting randomized controlled trials. Supported by Family Health International, Cancer Research UK, Ottawa University and NHS.

In Practice

GAPPNETExternal
This organization aims to accelerate and streamline effective and responsible use of validated and useful genomic knowledge and applications. GAPPNET has 4 functions: knowledge synthesis and dissemination, evidence-based recommendations development, translational research and translational programs; for which some tools have been developed such as ACCEExternal, EGAPPExternal, USPSTFExternal. GAPPNet was formed by CDC's Office of Public Health Genomics, NCI's Division of Cancer Control and Population Sciences, and other stakeholders in 2009.

IT Tools

ORCIDExternal
This website aims to solve the author/contributor name ambiguity problem in scholarly communications by creating a central registry of unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID and other current author ID schemes. ORCID is an independent organization.

© 2005 Public Population Project in Genomics.
All rights reserved.
Information Usage