ID

45646

Description

Principal Investigator: Neal Freedman, PhD, MPH, National Cancer Institute, Rockville, MD, USA MeSH: Neoplasms https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001286 The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial is a large population-based randomized trial designed and sponsored by the National Cancer Institute (NCI) to determine the effects of screening on cancer-related mortality and secondary endpoints in over 150,000 men and women aged 55 to 74. The screening component of the trial was completed in 2006. However, participants have been under follow-up for cancer incidence and mortality since that time. In addition, PLCO included a large biological sample biorepository which has served as a unique resource for cancer research, particularly for etiologic and early-marker studies. As part of these efforts, PLCO has been used for a large number of genome-wide association and exome sequencing studies for different types of cancer.

Lien

dbGaP study = phs001286

Mots-clés

  1. 20/03/2023 20/03/2023 - Simon Heim
Détendeur de droits

Neal Freedman, PhD, MPH, National Cancer Institute, Rockville, MD, USA

Téléchargé le

20 mars 2023

DOI

Pour une demande vous connecter.

Licence

Creative Commons BY 4.0

Modèle Commentaires :

Ici, vous pouvez faire des commentaires sur le modèle. À partir des bulles de texte, vous pouvez laisser des commentaires spécifiques sur les groupes Item et les Item.

Groupe Item commentaires pour :

Item commentaires pour :


Aucun commentaire

Vous devez être connecté pour pouvoir télécharger des formulaires. Veuillez vous connecter ou s’inscrire gratuitement.

dbGaP phs001286 The Prostate, Lung, Colon, Ovary Screening Trial (PLCO)

Eligibility Criteria

Inclusion and exclusion criteria
Description

Inclusion and exclusion criteria

Alias
UMLS CUI [1,1]
C1512693
UMLS CUI [1,2]
C0680251
Here, we are posting a harmonized and imputed dataset of PLCO GWAS and exome data, consisting of all harmonizable PLCO genotype data from each completed scan of cancer cases and controls, as well as the key covariates of sex and participant ID. As PLCO is a prospective cohort, incident cancers and other diseases are occurring all of the time. It is therefore important that researchers use contemporary follow-up in order to precisely define cancer case/control status. Therefore, to use this data, researchers should obtain the genetic data from dbgap and in parallel obtain up-to-date data on cancer and other diseases through the PLCO Cancer Data Access System (CDAS): http://prevention.cancer.gov/major-programs/prostate-lung-colorectal/cancer-data-access-system. Also available in CDAS are a large variety of covariate and endpoints as well as published biomarker data, which can be used for both main-effect and gene x environment studies. Together, we believe that these data will serve as a helpful resource for the entire scientific community.
Description

Here, we are posting a harmonized and imputed dataset of PLCO GWAS and exome data, consisting of all harmonizable PLCO genotype data from each completed scan of cancer cases and controls, as well as the key covariates of sex and participant ID. As PLCO is a prospective cohort, incident cancers and other diseases are occurring all of the time. It is therefore important that researchers use contemporary follow-up in order to precisely define cancer case/control status. Therefore, to use this data, researchers should obtain the genetic data from dbgap and in parallel obtain up-to-date data on cancer and other diseases through the PLCO Cancer Data Access System (CDAS): http://prevention.cancer.gov/major-programs/prostate-lung-colorectal/cancer-data-access-system. Also available in CDAS are a large variety of covariate and endpoints as well as published biomarker data, which can be used for both main-effect and gene x environment studies. Together, we believe that these data will serve as a helpful resource for the entire scientific community.

Type de données

boolean

Alias
UMLS CUI [1,1]
C3274646
UMLS CUI [1,2]
C0150098
UMLS CUI [1,3]
C2350277
UMLS CUI [1,4]
C1514515
UMLS CUI [1,5]
C5446360
UMLS CUI [1,6]
C0006826
UMLS CUI [1,7]
C1706256
UMLS CUI [1,8]
C1882979
UMLS CUI [1,9]
C3165543
UMLS CUI [1,10]
C1709709
UMLS CUI [1,11]
C1551358
UMLS CUI [1,12]
C0035173
UMLS CUI [1,13]
C1522577
UMLS CUI [1,14]
C3274646
UMLS CUI [1,15]
C2698971
UMLS CUI [1,16]
C4684740
UMLS CUI [1,17]
C2349179
UMLS CUI [1,18]
C1879847
UMLS CUI [1,19]
C0596609
UMLS CUI [1,20]
C1273305
This PLCO dataset contains data genotyped on Illumina GSA, Oncoarray and historical data on Illumina OmniExpress (OmniX), Omni2.5M (Omni25) and Omni5M (Omni5). Most of the platforms used in PLCO were run separately, processed and QCed at different times. GSA data was generated at CGR within a relatively short period. Oncoarray data was genotyped at CGR and multiple external Institutes. OmniX, Omni25 and Omni5M data was genotyped at CGR historically. Genotype data from OmniX and Omni25M was generated with different clustering files.
Description

This PLCO dataset contains data genotyped on Illumina GSA, Oncoarray and historical data on Illumina OmniExpress (OmniX), Omni2.5M (Omni25) and Omni5M (Omni5). Most of the platforms used in PLCO were run separately, processed and QCed at different times. GSA data was generated at CGR within a relatively short period. Oncoarray data was genotyped at CGR and multiple external Institutes. OmniX, Omni25 and Omni5M data was genotyped at CGR historically. Genotype data from OmniX and Omni25M was generated with different clustering files.

Type de données

boolean

Alias
UMLS CUI [1,1]
C1514515
UMLS CUI [1,2]
C0150098
UMLS CUI [1,3]
C4687476
UMLS CUI [1,4]
C1285573
UMLS CUI [1,5]
C2987304
UMLS CUI [1,6]
C0179312
UMLS CUI [1,7]
C3846158
UMLS CUI [1,8]
C0035172
All genotype data was prepared in the binary PLINK file format. All released data should be in GRCh37/hg19. Chip data generated within CGR have had internal QC measures (iterative 80% and 95% sample- and variant-level call rate filters) applied, but not more stringent pre-imputation MAF and HWE filtering; external data have inconsistent QC due to provenance. Samples present in multiple genotyping datasets are released in all applicable datasets with the same synchronized PLCO ID.
Description

All genotype data was prepared in the binary PLINK file format. All released data should be in GRCh37/hg19. Chip data generated within CGR have had internal QC measures (iterative 80% and 95% sample- and variant-level call rate filters) applied, but not more stringent pre-imputation MAF and HWE filtering; external data have inconsistent QC due to provenance. Samples present in multiple genotyping datasets are released in all applicable datasets with the same synchronized PLCO ID.

Type de données

boolean

Alias
UMLS CUI [1,1]
C1285573
UMLS CUI [1,2]
C5401465
UMLS CUI [1,3]
C3844091
UMLS CUI [1,4]
C3844095
UMLS CUI [1,5]
C0600596
UMLS CUI [1,6]
C3846158
UMLS CUI [1,7]
C0034378
UMLS CUI [1,8]
C0180860
UMLS CUI [1,9]
C2699638
UMLS CUI [1,10]
C0919481
UMLS CUI [1,11]
C3846158
UMLS CUI [1,12]
C1514515
UMLS CUI [1,13]
C2348585
All subjects were split and cleaned by GRAF ancestry (see below) before imputation. More specifically, imputed data from each platform was split into 7 ancestral groups (African+African American, East Asian+Other Asian, European, Hispanic1, Hispanic2, Other, South Asian) based on ancestry assignment using GRAF (https://github.com/ncbi/graf).
Description

All subjects were split and cleaned by GRAF ancestry (see below) before imputation. More specifically, imputed data from each platform was split into 7 ancestral groups (African+African American, East Asian+Other Asian, European, Hispanic1, Hispanic2, Other, South Asian) based on ancestry assignment using GRAF (https://github.com/ncbi/graf).

Type de données

boolean

Alias
UMLS CUI [1,1]
C5447420
UMLS CUI [1,2]
C2699638
UMLS CUI [1,3]
C1710360
UMLS CUI [1,4]
C0085756
UMLS CUI [1,5]
C0027567
UMLS CUI [1,6]
C0078988
UMLS CUI [1,7]
C4540996
UMLS CUI [1,8]
C1519427
UMLS CUI [1,9]
C0239307
UMLS CUI [1,10]
C0086409
TOPMED reference panel 5b was used for imputation with Michigan Imputation Server (https://imputationserver.sph.umich.edu). Pre-phasing using phased reference data from TOPMed release 5b was conducted using EAGLE 2.4 (doi: 10.1038/ng.3679). Imputation was conducted against the same reference panel using minimac4 (https://genome.sph.umich.edu/wiki/Minimac4). Due to the limitation of sample size allowed by Michigan Imputation Server, the GSA/European dataset was imputed by splitting to 4 different batches.
Description

TOPMED reference panel 5b was used for imputation with Michigan Imputation Server (https://imputationserver.sph.umich.edu). Pre-phasing using phased reference data from TOPMed release 5b was conducted using EAGLE 2.4 (doi: 10.1038/ng.3679). Imputation was conducted against the same reference panel using minimac4 (https://genome.sph.umich.edu/wiki/Minimac4). Due to the limitation of sample size allowed by Michigan Imputation Server, the GSA/European dataset was imputed by splitting to 4 different batches.

Type de données

boolean

Alias
UMLS CUI [1,1]
C1706462
UMLS CUI [1,2]
C2699638
UMLS CUI [1,3]
C1554143
UMLS CUI [1,4]
C3846158
UMLS CUI [1,5]
C0242618
UMLS CUI [1,6]
C0449295
UMLS CUI [1,7]
C0150098
UMLS CUI [1,8]
C1534709
Each platform/ancestry pair was cleaned according to the filtering method in https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008500. Briefly, all variants with Rsq < 0.3 are removed to be consistent with traditional quality filters on MACH-style output. Then, the remaining variants are partitioned into minor allele frequency (MAF) bins {[0,0.0005], (0.0005,0.002], (0.002,0.005], (0.005,0.01], (0.01,0.03], (0.03,0.05], (0.05, 0.5]}. Variants in each bin are filtered out, starting at the lowest Rsq, until the average Rsq of remaining variants within the corresponding MAF bin is at least 0.9 (the Kowalski et al. citation suggests 0.8; the use of a more stringent threshold has no impact on common variation).
Description

Each platform/ancestry pair was cleaned according to the filtering method in https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008500. Briefly, all variants with Rsq < 0.3 are removed to be consistent with traditional quality filters on MACH-style output. Then, the remaining variants are partitioned into minor allele frequency (MAF) bins {[0,0.0005], (0.0005,0.002], (0.002,0.005], (0.005,0.01], (0.01,0.03], (0.03,0.05], (0.05, 0.5]}. Variants in each bin are filtered out, starting at the lowest Rsq, until the average Rsq of remaining variants within the corresponding MAF bin is at least 0.9 (the Kowalski et al. citation suggests 0.8; the use of a more stringent threshold has no impact on common variation).

Type de données

boolean

Alias
UMLS CUI [1,1]
C1710360
UMLS CUI [1,2]
C5447420
UMLS CUI [1,3]
C1709450
UMLS CUI [1,4]
C0180860
UMLS CUI [1,5]
C3846158
UMLS CUI [1,6]
C4722262
UMLS CUI [1,7]
C0205419
UMLS CUI [1,8]
C0919481

Similar models

Eligibility Criteria

Name
Type
Description | Question | Decode (Coded Value)
Type de données
Alias
Item Group
Inclusion and exclusion criteria
C1512693 (UMLS CUI [1,1])
C0680251 (UMLS CUI [1,2])
Here, we are posting a harmonized and imputed dataset of PLCO GWAS and exome data, consisting of all harmonizable PLCO genotype data from each completed scan of cancer cases and controls, as well as the key covariates of sex and participant ID. As PLCO is a prospective cohort, incident cancers and other diseases are occurring all of the time. It is therefore important that researchers use contemporary follow-up in order to precisely define cancer case/control status. Therefore, to use this data, researchers should obtain the genetic data from dbgap and in parallel obtain up-to-date data on cancer and other diseases through the PLCO Cancer Data Access System (CDAS): http://prevention.cancer.gov/major-programs/prostate-lung-colorectal/cancer-data-access-system. Also available in CDAS are a large variety of covariate and endpoints as well as published biomarker data, which can be used for both main-effect and gene x environment studies. Together, we believe that these data will serve as a helpful resource for the entire scientific community.
Item
Here, we are posting a harmonized and imputed dataset of PLCO GWAS and exome data, consisting of all harmonizable PLCO genotype data from each completed scan of cancer cases and controls, as well as the key covariates of sex and participant ID. As PLCO is a prospective cohort, incident cancers and other diseases are occurring all of the time. It is therefore important that researchers use contemporary follow-up in order to precisely define cancer case/control status. Therefore, to use this data, researchers should obtain the genetic data from dbgap and in parallel obtain up-to-date data on cancer and other diseases through the PLCO Cancer Data Access System (CDAS): http://prevention.cancer.gov/major-programs/prostate-lung-colorectal/cancer-data-access-system. Also available in CDAS are a large variety of covariate and endpoints as well as published biomarker data, which can be used for both main-effect and gene x environment studies. Together, we believe that these data will serve as a helpful resource for the entire scientific community.
boolean
C3274646 (UMLS CUI [1,1])
C0150098 (UMLS CUI [1,2])
C2350277 (UMLS CUI [1,3])
C1514515 (UMLS CUI [1,4])
C5446360 (UMLS CUI [1,5])
C0006826 (UMLS CUI [1,6])
C1706256 (UMLS CUI [1,7])
C1882979 (UMLS CUI [1,8])
C3165543 (UMLS CUI [1,9])
C1709709 (UMLS CUI [1,10])
C1551358 (UMLS CUI [1,11])
C0035173 (UMLS CUI [1,12])
C1522577 (UMLS CUI [1,13])
C3274646 (UMLS CUI [1,14])
C2698971 (UMLS CUI [1,15])
C4684740 (UMLS CUI [1,16])
C2349179 (UMLS CUI [1,17])
C1879847 (UMLS CUI [1,18])
C0596609 (UMLS CUI [1,19])
C1273305 (UMLS CUI [1,20])
This PLCO dataset contains data genotyped on Illumina GSA, Oncoarray and historical data on Illumina OmniExpress (OmniX), Omni2.5M (Omni25) and Omni5M (Omni5). Most of the platforms used in PLCO were run separately, processed and QCed at different times. GSA data was generated at CGR within a relatively short period. Oncoarray data was genotyped at CGR and multiple external Institutes. OmniX, Omni25 and Omni5M data was genotyped at CGR historically. Genotype data from OmniX and Omni25M was generated with different clustering files.
Item
This PLCO dataset contains data genotyped on Illumina GSA, Oncoarray and historical data on Illumina OmniExpress (OmniX), Omni2.5M (Omni25) and Omni5M (Omni5). Most of the platforms used in PLCO were run separately, processed and QCed at different times. GSA data was generated at CGR within a relatively short period. Oncoarray data was genotyped at CGR and multiple external Institutes. OmniX, Omni25 and Omni5M data was genotyped at CGR historically. Genotype data from OmniX and Omni25M was generated with different clustering files.
boolean
C1514515 (UMLS CUI [1,1])
C0150098 (UMLS CUI [1,2])
C4687476 (UMLS CUI [1,3])
C1285573 (UMLS CUI [1,4])
C2987304 (UMLS CUI [1,5])
C0179312 (UMLS CUI [1,6])
C3846158 (UMLS CUI [1,7])
C0035172 (UMLS CUI [1,8])
All genotype data was prepared in the binary PLINK file format. All released data should be in GRCh37/hg19. Chip data generated within CGR have had internal QC measures (iterative 80% and 95% sample- and variant-level call rate filters) applied, but not more stringent pre-imputation MAF and HWE filtering; external data have inconsistent QC due to provenance. Samples present in multiple genotyping datasets are released in all applicable datasets with the same synchronized PLCO ID.
Item
All genotype data was prepared in the binary PLINK file format. All released data should be in GRCh37/hg19. Chip data generated within CGR have had internal QC measures (iterative 80% and 95% sample- and variant-level call rate filters) applied, but not more stringent pre-imputation MAF and HWE filtering; external data have inconsistent QC due to provenance. Samples present in multiple genotyping datasets are released in all applicable datasets with the same synchronized PLCO ID.
boolean
C1285573 (UMLS CUI [1,1])
C5401465 (UMLS CUI [1,2])
C3844091 (UMLS CUI [1,3])
C3844095 (UMLS CUI [1,4])
C0600596 (UMLS CUI [1,5])
C3846158 (UMLS CUI [1,6])
C0034378 (UMLS CUI [1,7])
C0180860 (UMLS CUI [1,8])
C2699638 (UMLS CUI [1,9])
C0919481 (UMLS CUI [1,10])
C3846158 (UMLS CUI [1,11])
C1514515 (UMLS CUI [1,12])
C2348585 (UMLS CUI [1,13])
All subjects were split and cleaned by GRAF ancestry (see below) before imputation. More specifically, imputed data from each platform was split into 7 ancestral groups (African+African American, East Asian+Other Asian, European, Hispanic1, Hispanic2, Other, South Asian) based on ancestry assignment using GRAF (https://github.com/ncbi/graf).
Item
All subjects were split and cleaned by GRAF ancestry (see below) before imputation. More specifically, imputed data from each platform was split into 7 ancestral groups (African+African American, East Asian+Other Asian, European, Hispanic1, Hispanic2, Other, South Asian) based on ancestry assignment using GRAF (https://github.com/ncbi/graf).
boolean
C5447420 (UMLS CUI [1,1])
C2699638 (UMLS CUI [1,2])
C1710360 (UMLS CUI [1,3])
C0085756 (UMLS CUI [1,4])
C0027567 (UMLS CUI [1,5])
C0078988 (UMLS CUI [1,6])
C4540996 (UMLS CUI [1,7])
C1519427 (UMLS CUI [1,8])
C0239307 (UMLS CUI [1,9])
C0086409 (UMLS CUI [1,10])
TOPMED reference panel 5b was used for imputation with Michigan Imputation Server (https://imputationserver.sph.umich.edu). Pre-phasing using phased reference data from TOPMed release 5b was conducted using EAGLE 2.4 (doi: 10.1038/ng.3679). Imputation was conducted against the same reference panel using minimac4 (https://genome.sph.umich.edu/wiki/Minimac4). Due to the limitation of sample size allowed by Michigan Imputation Server, the GSA/European dataset was imputed by splitting to 4 different batches.
Item
TOPMED reference panel 5b was used for imputation with Michigan Imputation Server (https://imputationserver.sph.umich.edu). Pre-phasing using phased reference data from TOPMed release 5b was conducted using EAGLE 2.4 (doi: 10.1038/ng.3679). Imputation was conducted against the same reference panel using minimac4 (https://genome.sph.umich.edu/wiki/Minimac4). Due to the limitation of sample size allowed by Michigan Imputation Server, the GSA/European dataset was imputed by splitting to 4 different batches.
boolean
C1706462 (UMLS CUI [1,1])
C2699638 (UMLS CUI [1,2])
C1554143 (UMLS CUI [1,3])
C3846158 (UMLS CUI [1,4])
C0242618 (UMLS CUI [1,5])
C0449295 (UMLS CUI [1,6])
C0150098 (UMLS CUI [1,7])
C1534709 (UMLS CUI [1,8])
Each platform/ancestry pair was cleaned according to the filtering method in https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008500. Briefly, all variants with Rsq < 0.3 are removed to be consistent with traditional quality filters on MACH-style output. Then, the remaining variants are partitioned into minor allele frequency (MAF) bins {[0,0.0005], (0.0005,0.002], (0.002,0.005], (0.005,0.01], (0.01,0.03], (0.03,0.05], (0.05, 0.5]}. Variants in each bin are filtered out, starting at the lowest Rsq, until the average Rsq of remaining variants within the corresponding MAF bin is at least 0.9 (the Kowalski et al. citation suggests 0.8; the use of a more stringent threshold has no impact on common variation).
Item
Each platform/ancestry pair was cleaned according to the filtering method in https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008500. Briefly, all variants with Rsq < 0.3 are removed to be consistent with traditional quality filters on MACH-style output. Then, the remaining variants are partitioned into minor allele frequency (MAF) bins {[0,0.0005], (0.0005,0.002], (0.002,0.005], (0.005,0.01], (0.01,0.03], (0.03,0.05], (0.05, 0.5]}. Variants in each bin are filtered out, starting at the lowest Rsq, until the average Rsq of remaining variants within the corresponding MAF bin is at least 0.9 (the Kowalski et al. citation suggests 0.8; the use of a more stringent threshold has no impact on common variation).
boolean
C1710360 (UMLS CUI [1,1])
C5447420 (UMLS CUI [1,2])
C1709450 (UMLS CUI [1,3])
C0180860 (UMLS CUI [1,4])
C3846158 (UMLS CUI [1,5])
C4722262 (UMLS CUI [1,6])
C0205419 (UMLS CUI [1,7])
C0919481 (UMLS CUI [1,8])

Utilisez ce formulaire pour les retours, les questions et les améliorations suggérées.

Les champs marqués d’un * sont obligatoires.

Do you need help on how to use the search function? Please watch the corresponding tutorial video for more details and learn how to use the search function most efficiently.

Watch Tutorial