iPSCORE Samples and Datasets

Here we provide information about the iPSCORE samples and molecular datasets including how:

      1) to properly cite references if you use the samples

      2) to understand the nomenclature of the samples

      3) to obtain the datasets, VCFs and summary statistics

Detailed information on how to access and the methods used to generate the iPSCORE molecular datasets can be found here

 

The iPSC Collection for Omic Research (iPSCORE) Resource consists of whole genome sequences (WGS) for 273 iPSCORE subjects, 238 iPSCs derived from 221 of these individuals, as well as iPSC-derived cell types (cardiovascular progenitor cells [iPSC-CVPCs] and pancreatic progenitor cells [iPSC-PPCs]) with RNA-seq, ATAC-seq and H3K27 acetylation ChIP-seq data. The resource was created as part of the Next-Gen Consortium funded by NHLBI with the overarching purpose of providing a large collection of human induced pluripotent stem cells (iPSCs) for use in studying the impact of genetic variation on molecular and physiological phenotypes. The resource has been used in a number of studies in the Dr. Kelly Frazer’s Lab examining both the characteristics of human iPSCs and a variety of iPSC-derived cell types including iPSC-CVPCs, iPSC-PPCs, and retina pigment epithelium cells (iPSC-RPEs). We have shown that the iPSCs, CVPCs, PPCs, and RPEs are suitable surrogate models to identify genetic factors active in early developmental processes because they exhibit fetal-like molecular properties.

Of the 273 participants in the study, 181 are part of 55 families that include 24 monozygotic twin pairs and 5 dizygotic twin pairs, allowing for the incorporation of familial relationships into genetic analyses. 

Germline DNA has been sequenced from blood or fibroblast samples for all 273 individuals (available through dbGaP phs001325) and other genomic data (RNA-seq, DNA methylation, genotype arrays, ATAC-seq, H3K27ac ChIP-seq, CTCF ChIP-seq, NKX2-5 ChIP-seq and HiC-seq) have been generated from the 238 iPSCs as well as derived cell types (available through dbGaP phs000924). 

QTL analyses were conducted for multiple omics data types and summary statistics are available through dbGaP phs001325

Important note: Of the 273 individuals, 268 are consented for general research use and 5 are consented for cardiac research only. 

The 238 well-characterized iPSC lines that constitute the iPSCORE resource are available, if you are interested in the collection, please contact Dr. Kelly Frazer (kafrazer@health.ucsd.edu).

Cite Us

The generation of iPSCORE datasets were described in multiple publications. Please see below to find references to cite that are relevant for the datasets that you use. Thank you.

General iPSCORE references

  1. Panopoulos AD, D'Antonio M, Benaglio P, Williams R, Hashem SI, Schuldt BM, DeBoever C, Arias AD, Garcia M, Nelson BC, Harismendy O. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem cell reports. 2017 Apr 11;8(4):1086-100.

  2. Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Jaureguy J, Silva N, Henson B, iPSCORE Consortium, Panopoulos AD, Belmonte JC, D'Antonio M, McVicker G. Multi-omic QTL mapping in early developmental tissues reveals phenotypic and temporal complexity of regulatory variants underlying GWAS loci. bioRxiv. 2024:2024-04.

Molecular Samples Summary Table

The iPSCORE molecular samples in the following table are described in greater detail below. The Methods used to generate the iPSCORE molecular datasets can be found in detail in publications and here.

iPSC – YF (Yamanaka Factors)

iPSCs derived from fibroblasts using Yamanaka Factors. The majority of iPSCs were generated using Sendai virus.

iPSC (unmatched)

Molecular data generated from iPSCs harvested at earlier passages (typically p9-p12 but sometimes later)

iPSC (matched UDID)

Molecular data generated from iPSCs harvested at D0 of differentiation protocol were assigned a UDID that matched the UDID of the derived cells

iPSC-CMs (15-day)

iPSC-derived cardiomyocytes obtained using a 15-day differentiation protocol

iPSC-CVPC (25-day)

iPSC-derived cardiovascular progenitor cells obtained using a 25-day differentiation protocol with lactate selection

CM differentiation stages

iPSC-CM time course study: iPSC-derived cardiomyocytes collected in the same study at D0 (iPSC), D2, D5, D9, and D15

IFN-γ-treated iPSC-CVPC

25-day CVPCs treated with interferon-gamma and controls

iPSC-PPC

iPSC-derived pancreatic progenitor cells

iPSC-RPE

iPSC-derived retinal pigment epithelium

hFetal RPE

human fetal retinal pigment epithelium

CVPC + EPI isogenic mixtures

Four CVPC:EPI mixtures were made from isogenic CVPCs and EPIs at defined ratios. 

Bulk RNA-seq Ribo

Six “Total RNA” samples to examine non-coding RNAs

UDID Naming Strategy

Each iPSC differentiation into a derived cell type was given a unique identifier (UDID).  The names of all molecular samples generated from that differentiation contain the UDID.  Some iPSC RNA-seq samples and most iPSC ATAC-seq and H3K27ac ChIP-seq samples have a UDID associated with them, as they were obtained at D0 of the iPSC-CM 25-day plus lactate protocol. The UDID for the molecular samples generated from these iPSCs is the same UDID used for the molecular samples generated from the corresponding iPSC-CVPCs.

Sample Naming Convention

Samples are typically named with the following convention:

[internal individual ID] _[ histological_type] _ [clone] [passage] _ [UDID differentiation ID] _ [assay_type] _ [pellet replicate] [library replicate] [sequencing replicate]

Example: “S08003_CM_C3P18_UDID012_ChIPH3K27ac_R02L01S01”

This sample corresponds to a bulk H3K27ac ChIP-seq sample (“ChIPH3K27ac”) generated from a “CVPC” sample (differentiation ID 'UDID012') derived from iPSC clone 3 ('C3') at passage 18 ('P18') from individual ‘S08003’. The sample was produced using cell pellet (i.e. replicate) #2 ('R02') (either because cell replicate #1 failed or a new replicate was desired for a different study). This particular sample is from the first sequencing run ('S01') of library preparation #1 ('L01').

Samples that do not follow the standard naming convention were collected from earlier studies before this convention was implemented.

CM vs CVPC Naming Convention

iPSC-CMs (cardiomyocytes) are derived using a 15-day protocol while iPSC-CVPCs (cardiovascular progenitor cells) are derived using a 25-day protocol that includes a lactate-selection step. These samples are distinguished by the DIFFERENTIATION_PROTOCOL column in the SampleAttributes table by (“iPSC-CM_D25” and “iPSC-CM_D15”). Access to the SampleAtrributes table is available on dbGaP phs000924 and a redacted version is available on figshare ( https://doi.org/10.6084/m9.figshare.26866672.v3). 

Pooled samples

Pooled samples are when cells from multiple samples are combined, processed, and sequenced as a single sample. We have pooled samples for scRNA-seq, snATAC-seq, and control ChIP inputs. 

NOTE: The SUBJECT_ID was uniquely generated for each pooled sample - they do not correspond to any individual.

To determine whether a sample is a pooled sample or not, go to the SampleAttributes table and filter the “POOLED_SAMPLE” column for “1”. A 1 indicates that the sample is pooled (i.e., contains a mixture of samples) and a 0 indicates that the sample is not pooled (i.e., contains cells from a single sample). Access to this table is available on dbGaP phs000924 and a redacted version is available on figshare ( https://doi.org/10.6084/m9.figshare.26866672.v3).

To get information about what samples were used to create the mixture:

  1. Get the SAMPLE_ID for the pooled sample from SampleAttributes table.

  2. Then, go to SampleSubjectMapping(SSM)table and filter the SAMPLE_ID column by the SAMPLE_ID and get the SUBJECT_ID.

  3. Then, go to SubjectConsents table and filter the SUBJECT_IDS_OF_POOLED_SAMPLES column by the SUBJECT_ID of the pooled sample. 

Access to the SampleSubjectMapping and the SubjectConsents tables are available on dbGaP phs000924 and redacted version are available on figshare (https://doi.org/10.6084/m9.figshare.26866672.v3). 

Universally Unique Identifier (UUID) Naming Strategy

A Universally Unique Identifier (UUID) is assigned for each sequencing run for each library. When a library is sequenced multiple times and the data from these runs are combined for downstream analysis, this merged library is given a new UUID. We typically report the UUIDs of the samples used in our studies, whether they are merged or individual runs. 

Datasets in dbGaP 

Subject and Sample Attribute Information

For information about each subject and sample in phs001325 and phs000924, users may download the Phenotype Datasets either from dbGaP using the instructions below or a redacted version from figshare (https://doi.org/10.6084/m9.figshare.26866672):

  1. Go to the main project page on dbGaP (linked here: phs001325 or phs000924)

  2. Select “Phenotypes Datasets” at the top of the page

  3. Select “List all datasets within this study”

  4. Copy the “Dataset Accession” for the dataset you are interested in

  5. Log onto dbGaP > “My Projects” at the top of page > “file selector” next to the Project name

  6. Select the files you are interested in and click “Cart file”, which will download the *.krt file. 

  7. Follow the steps under “Downloading phenotype files with ngc” to download the phenotype files: https://www.ncbi.nlm.nih.gov/sra/docs/sra-dbgap-download/

Whole Genome Sequences (phs001325.v6.p1)

dbGap release notes containing the description for all data in the accession are available here: 

Note: The below link is for version 5 and will be updated with the new version once released.

https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs001325/phs001325.v5.p1/release_notes/Release_Notes.phs001325.iPSCORE_WGS.v5.p1.MULTI.pdf

The accession contains raw data (FASTQ) WGS data for 295 samples (from 273 individuals), including 258 WGS from blood (254 individuals, 4 samples sequenced twice, see below for more details), 19 fibroblasts (19 individuals), and 18 iPSCs (18 individuals). 

For four of the blood samples, WGS data was generated twice (see table below). For the first sequences, which were generated at the same time as the WGS for all the other iPSCORE individuals, we have hg19 genotype calls from the WGS, but individual FASTQ files are not available. For the second sequences, the FASTQ files, as well as the hg38 genotype calls (SNPs and INDELs), are available in version 6:

Table description: Four individuals (Column “Subject ID”) had their blood samples sequenced twice.  Column “Previous biospecimen_sample_repository_id” indicates the sequences that we do not have FASTQ files for but have hg19 genotype calls (hg38 genotype calls not available for these samples). Column “New biospecimen _sample_repository_id” indicates sequences that we do have FASTQ files for and have hg38 genotype calls (hg19 genotype calls not available for these samples). 

Subject ID

Previous biospecimen_sample_repository_id

New biospecimen_sample_repository_id

65ee0ab5-4a77-454f-9112-a2f245d63396

09b0ae5a-59d7-40de-b3f2-45c39773b7d0

e9def219-d732-4723-9e8c-d429b2dd284a

a8f2c942-d9ae-486a-b87f-1fb4ef45ae99

dfd22868-f51c-4708-97d0-d447cf7a1f6c

d9822a00-15e6-4a18-8350-809553466ab0

0c24b869-9532-40cb-ad97-f579d1b1fa8e

ec1334b1-db5e-4842-affe-fb99dfe90328

0736ae79-3dc6-488c-a377-ca2e20b25022

59ba8d64-a92b-4ae2-bcb4-847de2fdf1dd

f7bc631e-d073-4df3-bbdb-f98c74c36fd5

92328532-773e-4931-a298-b4fae9b0934a

VCF and Summary Statistics (phs001325.v6.p1)

In addition to WGS FASTQs, dbGaP phs001325.v6.p1 also provides the following five VCF and five QTL summary statistic files. Four QTL summary statistic files are also available on figshare (see links below). 

dbGaP File accession

File description

phg000904

VCF containing GATK SNP and INDEL genotype calls (hg19) for 273 samples (273 individuals: 254 blood + 19 fibroblasts) in Deboever et al., Cell Stem Cell, 2017 (PMID 28388430). See table above, the FASTQ files for 4 of these samples are not available.

phg000904

VCF containing CNV genotype calls (hg19) for the 273 subjects using Genome STRiP or LUMPY in DeBoever et al., Cell Stem Cell, 2017 (PMID 28388430).

phg001172

VCF containing GATK SNP and INDEL genotype calls (hg19) for 18 iPSC samples (from 18 individuals) in D’Antonio M et al., Insights into the mutational burden of human induced pluripotent stem cells using an integrative multi-omics approach. Cell Reports. 2018 (PMID: 30044985)

phg001172

Phased SNPs for 7 individuals in Greenwald et al., Nature Communications, 2019 (PMID 30837461).

phg001407

VCF containing genotype calls (hg19) for the 273 subjects for SVs and STRs (DUP, DEL, mCNV, INV, BND, rMEI, MEI) in Jakubosky et al., Nature Communications, 2020 (PMID 32522985)

phg001408

iPSC eQTL summary statistics performed in Deboever et al., Cell Stem Cell, 2017 (PMID 28388430). Also available on figshare: https://doi.org/10.6084/m9.figshare.27321060

phg001408

iPSC eQTL summary statistics performed in Jakubosky et al., Nature Communications, 2020 (PMID 32522982). Also available on figshare: https://doi.org/10.6084/m9.figshare.27328224.v1

Accession not yet created

CVPC eQTL summary statistics performed in D’Antonio et al., Nature Communications, 2023 (PMID 36854752). Also available on two figshare pages:

https://doi.org/10.6084/m9.figshare.c.5594121

https://doi.org/10.6084/m9.figshare.26240339.v1

Note: The two links contain the same dataset in different formats. The first link has the data stored in RDS format and contains a list of genes and isoforms and their QTL statistics. The second link is a text version of the summary statistics formatted to dbGaP requirements and was generated from the RDS data in the first link.

Accession not yet created

PPC eQTL summary statistics performed in Nguyen et al., Nature Communications, 2023 (PMID 37903777). Also available on two figshare pages: 

https://figshare.com/projects/Large-scale_eQTL_analysis_of_iPSC-PPC/156987 

https://doi.org/10.6084/m9.figshare.26240618.v1

Note: The two links contain the same dataset in different formats. The first link has the data stored in an Robj format. The second link is a text file version of the summary statistics formatted to dbGaP requirements and was generated from the Robj in the first link. 

Accession not yet created

Multi-omic QTL summary statistics performed in Arthur et al., bioRxiv, 2024 (preprint). Also available on figshare: https://plus.figshare.com/articles/dataset/iPSCORE_Multiomic_QTL_Summary_Statistics/27328071

Accession not yet created

VCF containing GATK SNP and INDEL genotype calls (hg38) for 291 samples (273 individuals; 254 blood + 19 fibroblasts + 18 iPSC)

(These data are not published – methods provided below). 

Sequence alignment and variant calling 

Paired-end sequence reads were mapped to the human reference genome GRCh38 using 'bwa mem' (PMID: 19451168)(v0.7.17). Germline SNVs and indels were called following recommendations of the GATK best practice variant calling guidelines (Poplin 2020; Biorxiv doi: https://doi.org/10.1101/201178). Briefly, base quality scores of the aligned reads were recalibrated using a model developed from known SNPs and indels,the genotype likelihood of each sample was determined at a given site using BaseRecalibrator, ApplyBQSR . Most likely genotypes were assigned by the ‘gatk’ 'HaplotypeCaller' function in tiled regions of exome sequencing kits. Genotypes were called jointly for all samples using the 'GenotypeGVCFs' function. 

Variant recalibration models were constructed separately for SNVs and indels using 'VariantRecalibrator', considering known variants for the recalibration. The models were applied to the newly generated variant genotypes with the ‘ApplyVQSR’ function and assigned quality scores.

 

Molecular Data (phs000924.v5.p1)

All molecular data (ATAC, RNA, H3K27ac ChIP, CTCF ChIP, HiC, DNA methylation) and SNP genotype array data generated from blood, fibroblasts, iPSC, and iPSC derived tissues (PPC, CVPC, RPE) are available in phs000924. 

dbGap release notes (what data is available? how many samples and subjects for each data? file accessions for each data?, etc.) for the data in phs000924 are available here:

https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000924/phs000924.v5.p2/release_notes/Release_Notes.phs000924.iPSCORE.v5.p2.MULTI.pdf

SNP Array (HumanCoreExome) 

To examine DNA rearrangements after reprogramming, SNP Array data was generated by the Institute of Genomic Medicine (IGM) at UCSD using either the HumanCoreExome 12v1 or 24v1 Kit. The HumanCoreExome Chips were used for 497 samples, including 206 blood (206 individuals), 269 iPSC (251 clones, 224 individuals), and 22 fibroblasts (22 individuals). 

Publications:

  1. Panopoulos AD, D'Antonio M, Benaglio P, Williams R, Hashem SI, Schuldt BM, DeBoever C, Arias AD, Garcia M, Nelson BC, Harismendy O. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem cell reports. 2017 Apr 11;8(4):1086-100.

  2. Panopoulos, Athanasia D., et al. "Aberrant DNA methylation in human iPSCs associates with MYC-binding motifs in a clone-specific manner independent of genetics." Cell Stem Cell 20.4 (2017): 505-517.

  3. D’Antonio M, Benaglio P, Jakubosky D, Greenwald WW, Matsui H, Donovan MK, Li H, Smith EN, D’Antonio-Chronowska A, Frazer KA. Insights into the mutational burden of human induced pluripotent stem cells from an integrative multi-omics approach. Cell reports. 2018 Jul 24;24(4):883-94.

SNP Array (MEGA) 

To examine DNA arrangements in iPSC after reprogramming, genotyping was performed through the NHLBI DNA Resequencing and Genotyping (RS&G) Service Center using the Illumina Infinium MEGA Chip for 217 samples, including 206 blood (206 subjects) and 16 fibroblasts (16 subjects). The array data for the iPSCs is not available.

Publication:

  1. Kanchan K, Iyer K, Yanek LR, Carcamo-Orive I, Taub MA, Malley C, Baldwin K, Becker LC, Broeckel U, Cheng L, Cowan C. Genomic integrity of human induced pluripotent stem cells across nine studies in the NHLBI NextGen program. Stem cell research. 2020 (PMID: 32442913)

DNA Methylation Arrays

DNA methylation array data was generated for 55 iPSC samples, which were derived from the fibroblasts of 6 individuals using Retroviruses encoding the Yamanaka Factors. These data were used in one publication.

Publication:

  1. Panopoulos, Athanasia D., et al. "Aberrant DNA methylation in human iPSCs associates with MYC-binding motifs in a clone-specific manner independent of genetics." Cell Stem Cell 20.4 (2017): 505-517.

Bulk RNA-seq

Bulk RNA-seq was generated from a variety of tissue types and used in most iPSCORE publications.  Bulk RNA-seq samples shown in Summary Table above: 233 of the iPSCs (220 subjects) were from earlier passages (typically at p9-p12) and are referred to as unmatched. While 89 (25 subjects) were from D0 of the iPSC-CM 25-day differentiation protocol (see above) and assigned a UDID and are referred to as matched. The UDID for these iPSC samples is the same UDID used for the corresponding iPSC-CVPC samples. Nine of the iPSCs (D0) and 9 of the iPSC-CMs (iPSC-CM 15-day differentiation protocol) are from the time course study in which samples at different stages were collected in the same study at D0 (iPSC), D2, D5, D9, and D15; the time course study was published in the 2017 iPSCORE resource paper. Four of the iPSC-CVPCs samples are from the IFN-γ study.

Cell type

Samples

Differentiations

Subjects

Fibroblast

6

NA

6

iPSC-YF

44

NA

6

iPSC (unmatched)

233

NA

220

iPSC (matched)

89

67

25

iPSC-CM (15-day)

15

14

7

iPSC-CVPC (25-day)

426

180

139

iPSC-RPE

6

6

6

iPSC-PPC

107

107

106

Human fetal RPE

1

NA

1

IFN-γ treated iPSC-CVPC

4

4

4

iPSC (D0)

9

9

3

iPSC-MP (D2)

9

9

3

iPSC-CP (D5)

9

9

3

iPSC-CC (D9)

9

9

3

iPSC-CM (D15)

9

9

3

Publications: 

  1. Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Jaureguy J, Silva N, Henson B, iPSCORE Consortium, Panopoulos AD, Belmonte JC, D'Antonio M, McVicker G. Multi-omic QTL mapping in early developmental tissues reveals phenotypic and temporal complexity of regulatory variants underlying GWAS loci. bioRxiv. 2024:2024-04.

  2. Arthur TD, Joshua IN, Nguyen JP, D’Antonio-Chronowska A, iPSCORE Consortium, Frazer KA, D’Antonio M. IFN-γ activates an immune-like regulatory network in the cardiac vascular endothelium. bioRxiv 2024.05.03.592380.

  3. Arthur TD, Nguyen JP, D’Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Greenwald WW, D’Antonio M, Pera MF. Complex regulatory networks influence pluripotent cell state transitions in human iPSCs. Nature Communications. 2024 Feb 23;15(1):1664.

  4. Nguyen JP, Arthur TD, Fujita K, Salgado BM, Donovan MK, Matsui H, Kim JH, D’Antonio-Chronowska A, D’Antonio M, Frazer KA. eQTL mapping in fetal-like pancreatic progenitor cells reveals early developmental insights into diabetes risk. Nature Communications. 2023 Oct 30;14(1):6928.

  5. D’Antonio M, Nguyen JP, Arthur TD, Matsui H, D’Antonio-Chronowska A, Frazer KA. Fine mapping spatiotemporal mechanisms of genetic variants underlying cardiac traits and disease. Nature communications. 2023 Feb 28;14(1):1132.

  6. D’Antonio M, Nguyen JP, Arthur TD, Matsui H, Donovan MK, D’Antonio-Chronowska A, Frazer KA. In heart failure reactivation of RNA-binding proteins is associated with the expression of 1,523 fetal-specific isoforms. PLoS computational biology. 2022 Feb 28;18(2):e1009918.

  7. Jakubosky D, D’Antonio M, Bonder MJ, Smail C, Donovan MK, Young Greenwald WW, Matsui H, D’Antonio-Chronowska A, Stegle O, Smith EN. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nature communications. 2020 Jun 10;11(1):2927.

  8. D'Antonio M, Reyna J, Jakubosky D, Donovan MKR, Bonder MJ, Matsui M, Stegle O, Nariai N, D'Antonio-Chronowska A, Frazer KA. Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease. eLife. 2019 November 20. doi: 10.7554/eLife.48476.

  9. D'Antonio-Chronowska A, Donovan MK, Greenwald WW, Nguyen JP, Fujita K, Hashem S, Matsui H, Soncin F, Parast M, Ward MC, Coulet F. Association of human iPSC gene signatures and X chromosome dosage with two distinct cardiac differentiation trajectories. Stem cell reports. 2019 Nov 12;13(5):924-38.

  10. Benaglio P, D’Antonio-Chronowska A, Ma W, Yang F, Young Greenwald WW, Donovan MK, DeBoever C, Li H, Drees F, Singhal S, Matsui H. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nature genetics. 2019 Oct;51(10):1506-17.

  11. Smith EN, D'Antonio-Chronowska A, Greenwald WW, Borja V, Aguiar LR, Pogue R, Matsui H, Benaglio P, Borooah S, D'Antonio M, Ayyagari R. Human iPSC-derived retinal pigment epithelium: a model system for prioritizing and functionally characterizing causal variants at AMD risk loci. Stem cell reports. 2019 Jun 11;12(6):1342-53.

  12. Greenwald WW, Li H, Benaglio P, Jakubosky D, Matsui H, Schmitt A, Selvaraj S, D’Antonio M, D’Antonio-Chronowska A, Smith EN, Frazer KA. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nature communications. 2019 Mar 5;10(1):1054.

  13. D’Antonio M, Benaglio P, Jakubosky D, Greenwald WW, Matsui H, Donovan MK, Li H, Smith EN, D’Antonio-Chronowska A, Frazer KA. Insights into the mutational burden of human induced pluripotent stem cells from an integrative multi-omics approach. Cell reports. 2018 Jul 24;24(4):883-94.

  14. Nariai N, Greenwald WW, DeBoever C, Li H, Frazer KA. Efficient prioritization of multiple causal eQTL variants via sparse polygenic modeling. Genetics. 2017 Dec 1;207(4):1301-12.

  15. Panopoulos AD, D'Antonio M, Benaglio P, Williams R, Hashem SI, Schuldt BM, DeBoever C, Arias AD, Garcia M, Nelson BC, Harismendy O. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem cell reports. 2017 Apr 11;8(4):1086-100.

  16. Panopoulos AD, Smith EN, Arias AD, Shepard PJ, Hishida Y, Modesto V, Diffenderfer KE, Conner C, Biggs W, Sandoval E, D’Antonio-Chronowska A. Aberrant DNA methylation in human iPSCs associates with MYC-binding motifs in a clone-specific manner independent of genetics. Cell Stem Cell. 2017 Apr 6;20(4):505-17.

  17. DeBoever C, Li H, Jakubosky D, Benaglio P, Reyna J, Olson KM, Huang H, Biggs W, Sandoval E, D’Antonio M, Jepsen K. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell stem cell. 2017 Apr 6;20(4):533-46.

Bulk ATAC-seq

Bulk ATAC-seq was generated from a variety of tissue types and used in many iPSCORE publications. Bulk ATAC-seq samples shown in Summary Table above: 284 of the iPSCs from 135 individuals were from D0 of the iPSC-CM 25-day differentiation protocol (see above) and assigned a UDID and are referred to as matched. The UDID for these iPSC samples is the same UDID used for the corresponding iPSC-CVPC samples. Fifteen of the iPSCs (D0) and 18 of the iPSC-CMs (iPSC-CM 15-day differentiation protocol) are from the time course study in which samples at different stages were collected in the same study at D0 (iPSC), D2, D5, D9, and D15. Four of the iPSC-CVPCs samples are from the IFN-γ study.

Cell type

Samples

Differentiations

Subjects

iPSC (matched)

284

162

135

iPSC-CVPC (25-day)

339

148

125

iPSC-PPC

109

109

108

iPSC-RPE

6

6

6

Human fetal RPE

1

1

1

IFN-γ-treated iPSC-CVPC

4

4

4

iPSC (D0)

15

9

3

iPSC-MP (D2)

9

9

3

iPSC-CP (D5)

9

9

3

iPSC-CC (D9)

9

9

3

iPSC-CM (D15)

18

9

3

Publications:

  1. Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Jaureguy J, Silva N, Henson B, iPSCORE Consortium, Panopoulos AD, Belmonte JC, D'Antonio M, McVicker G. Multi-omic QTL mapping in early developmental tissues reveals phenotypic and temporal complexity of regulatory variants underlying GWAS loci. bioRxiv. 2024:2024-04.

  2. Arthur TD, Joshua IN, Nguyen JP, D’Antonio-Chronowska A, iPSCORE Consortium, Frazer KA, D’Antonio M. IFN-γ activates an immune-like regulatory network in the cardiac vascular endothelium. bioRxiv 2024.05.03.592380

  3. Arthur TD, Nguyen JP, D’Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Greenwald WW, D’Antonio M, Pera MF. Complex regulatory networks influence pluripotent cell state transitions in human iPSCs. Nature Communications. 2024 Feb 23;15(1):1664.

  4. Benaglio P, D’Antonio-Chronowska A, Ma W, Yang F, Young Greenwald WW, Donovan MK, DeBoever C, Li H, Drees F, Singhal S, Matsui H. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nature genetics. 2019 Oct;51(10):1506-17.

  5. Smith EN, D'Antonio-Chronowska A, Greenwald WW, Borja V, Aguiar LR, Pogue R, Matsui H, Benaglio P, Borooah S, D'Antonio M, Ayyagari R. Human iPSC-derived retinal pigment epithelium: a model system for prioritizing and functionally characterizing causal variants at AMD risk loci. Stem cell reports. 2019 Jun 11;12(6):1342-53.

  6. Greenwald WW, Li H, Benaglio P, Jakubosky D, Matsui H, Schmitt A, Selvaraj S, D’Antonio M, D’Antonio-Chronowska A, Smith EN, Frazer KA. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nature communications. 2019 Mar 5;10(1):1054.

Bulk H3K27ac ChIP-seq

Bulk H3K27ac ChIP-seq was generated from a variety of tissue types and used in several iPSCORE publications. Bulk H3K27ac ChIP-seq samples shown in Summary Table above: 68 of the iPSCs from 42 individuals were from D0 of the iPSC-CM 25-day differentiation protocol (see above) and assigned a UDID and are referred to as matched. The UDID for these iPSC samples is the same UDID used for the corresponding iPSC-CVPC samples. One of the iPSCs (D0) and 3 of the iPSC-CMs (iPSC-CM 15-day differentiation protocol) are from the time course study in which samples at different stages were collected in the same study at D0 (iPSC), D2, D5, D9, and D15. 

Cell type

Samples 

Input

Differentiations

Subjects

iPSC (matched)

68

9

60

42

iPSC-CM (15-day)

16

2

14

7

iPSC-CVPC (25-day)

250

21

21

102

iPSC-RPE

9

3

6

6

Human fetal RPE

1

1

1

1

iPSC (D0)

1

1

 

 

 

 

1

1

iPSC-MP (D2)

3

3

3

iPSC-CP (D5)

3

3

3

iPSC-CC (D9)

3

3

3

iPSC-CM (D15)

3

3

3

Publications:

  1. Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Jaureguy J, Silva N, Henson B, iPSCORE Consortium, Panopoulos AD, Belmonte JC, D'Antonio M, McVicker G. Multi-omic QTL mapping in early developmental tissues reveals phenotypic and temporal complexity of regulatory variants underlying GWAS loci. bioRxiv. 2024:2024-04.

  2. Benaglio P, D’Antonio-Chronowska A, Ma W, Yang F, Young Greenwald WW, Donovan MK, DeBoever C, Li H, Drees F, Singhal S, Matsui H. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nature genetics. 2019 Oct;51(10):1506-17.

  3. Smith EN, D'Antonio-Chronowska A, Greenwald WW, Borja V, Aguiar LR, Pogue R, Matsui H, Benaglio P, Borooah S, D'Antonio M, Ayyagari R. Human iPSC-derived retinal pigment epithelium: a model system for prioritizing and functionally characterizing causal variants at AMD risk loci. Stem cell reports. 2019 Jun 11;12(6):1342-53.

  4. Greenwald WW, Li H, Benaglio P, Jakubosky D, Matsui H, Schmitt A, Selvaraj S, D’Antonio M, D’Antonio-Chronowska A, Smith EN, Frazer KA. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nature communications. 2019 Mar 5;10(1):1054.

Hi-C

We generated high resolution Hi-C chromatin maps from iPSCs and iPSC-CMs from seven individuals in a three-generation family. And used these data along with phased gene expression (RNA-seq) and enhancer activity (H3K27ac ChIP-seq) data generated from the same iPSCs and iPSC-CMs.

Cell type

Samples 

Subjects

iPSC

11 

7

iPSC-CM

13 

7

Publications:

  1. Benaglio P, D’Antonio-Chronowska A, Ma W, Yang F, Young Greenwald WW, Donovan MK, DeBoever C, Li H, Drees F, Singhal S, Matsui H. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nature genetics. 2019 Oct;51(10):1506-17.

  2. Greenwald WW, Li H, Benaglio P, Jakubosky D, Matsui H, Schmitt A, Selvaraj S, D’Antonio M, D’Antonio-Chronowska A, Smith EN, Frazer KA. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nature communications. 2019 Mar 5;10(1):1054.

Single-cell RNA-seq

We generated scRNA-seq for iPSC, CVPC and PPC samples, which were used in several studies.

For iPSC, we generated 10X single-cell RNA-seq data for 2 iPSC lines from 2 subjects. 

For CVPC, we generated data for 8 CVPC samples and 1 Pooled CVPC sample, totaling to 9 scRNA-seq samples for CVPC. The Pooled sample contains cells from 10 UDIDs (6 of which overlap with the 8 individual CVPC samples). In total, we have scRNA-seq data for 12 CVPC samples from 12 individuals. 

Similarly, for PPC, we have 7 independent scRNA-seq samples for 7 PPCs and 2 Pooled samples, where one Pooled sample contains cells from 4 PPCs (overlaps with the 7 independent scRNA-seq samples) and the other Pooled sample contains cells from 3 PPCs (do not overlap with the 7 independent scRNA-seq samples). For the 7 independent scRNA-seq samples, the cells were harvested fresh from culture and used directly to generate libraries. For the two Pooled samples, the cells were first harvested from culture, cryopreserved, and then thawed to generate the libraries. We observed no difference in gene expression profiles between the fresh and cryopreserved cells, however, we observed that the fresh samples had replicating cells which were not observed in the cryopreserved samples.  

To get information about what samples were used to generate each pool, see the “Pooled Samples” section above. 

Cell type

Samples 

Differentiations

Subjects

#Pooled Samples

iPSC

2

NA

2

0

iPSC-CVPC

9

12

12

1

iPSC-PPC

9

10

10

2

Publications:

  1. D'Antonio-Chronowska A, Donovan MK, Greenwald WW, Nguyen JP, Fujita K, Hashem S, Matsui H, Soncin F, Parast M, Ward MC, Coulet F. Association of human iPSC gene signatures and X chromosome dosage with two distinct cardiac differentiation trajectories. Stem cell reports. 2019 Nov 12;13(5):924-38.

  2. Nguyen JP, D’Antonio-Chronowska A, Fujita K, Salgado BM, Matsui H, Arthur TD, iPSCORE Consortium, Donovan MK, D’Antonio M, Frazer KA. Regulatory variants active in iPSC-derived pancreatic progenitor cells are associated with Type 2 Diabetes in adults. bioRxiv. 2021 Oct 21; https://doi.org/10.1101/2021.10.20.465206.

  3. Nguyen JP, Arthur TD, Fujita K, Salgado BM, Donovan MK, Matsui H, Kim JH, D’Antonio-Chronowska A, D’Antonio M, Frazer KA. eQTL mapping in fetal-like pancreatic progenitor cells reveals early developmental insights into diabetes risk. Nature Communications. 2023 Oct 30;14(1):6928.

Single-nuclei ATAC-seq

We generated snATAC-seq for PPC samples. These data were used in one study.

10X single-nuclei ATAC-seq was generated for two “Pooled Samples” which were composed of 7 PPC samples, 4 of which were in one pooled sample (39ee03a2-3d9a-4a41-8e46-42d4828e8d20) and 3 of which were in the second pool (ab418c3a-1c32-4b13-9f09-7b0b9195f341). 

To know what samples and subjects were used to generate each pool, see the “Pooled Samples” section above. 

Cell type

Samples 

Differentiations

Subjects

#Pooled Samples

iPSC-PPC

2

7

7

2

Publications:

  1. Nguyen JP, D’Antonio-Chronowska A, Fujita K, Salgado BM, Matsui H, Arthur TD, iPSCORE Consortium, Donovan MK, D’Antonio M, Frazer KA. Regulatory variants active in iPSC-derived pancreatic progenitor cells are associated with Type 2 Diabetes in adults. bioRxiv. 2021 Oct 21; https://doi.org/10.1101/2021.10.20.465206.

Bulk NKX2-5 ChIP-seq

We generated NKX2-5 ChIP-seq from iPSC-CMs from seven individuals in a three-generation family. These data were used in one study.

Cell type

Samples 

Input

Differentiations

Subjects

iPSC-CM

15

1

12

7

Publications:

  1. Benaglio P, D’Antonio-Chronowska A, Ma W, Yang F, Young Greenwald WW, Donovan MK, DeBoever C, Li H, Drees F, Singhal S, Matsui H. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nature genetics. 2019 Oct;51(10):1506-17.

Bulk CTCF ChIP-seq

We generated CTCF ChIP-seq from iPSCs from five individuals. These data were used in one study.

Cell type

Samples 

Input

Differentiations

Subjects

iPSC

10

3

NA

5

Publications:

  1. DeBoever C, Li H, Jakubosky D, Benaglio P, Reyna J, Olson KM, Huang H, Biggs W, Sandoval E, D’Antonio M, Jepsen K. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell stem cell. 2017 Apr 6;20(4):533-46.

IFN-γ -Treatment Study

In this study, we treated 4 iPSC-CVPCs with IFN-γ or Control (untreated) for a period of 24 hours and then generated bulk RNA-seq and bulk ATAC-seq for each sample. We examined the effects of IFN-γ on the regulatory landscape and gene expression. 

Data Type

Samples

Subjects

Bulk ATAC

4 Treated + 4 Ctrl 

4

Bulk RNA

4 Treated + 4 Ctrl 

4

Publications:

  1. Arthur TD, Joshua IN, Nguyen JP, D'Antonio-Chronowska A, iPSCORE Consortium, Frazer KA, D'Antonio M. IFN-γ activates an immune-like regulatory network in the cardiac vascular endothelium. bioRxiv. 2024:2024-05. https://doi.org/10.1101/2024.05.03.592380

iPSC-CM Differentiation Time Course Study

In this study, we generated bulk RNA-seq, bulk ATAC-seq, and bulk H3K27ac ChIP-seq on 3 iPSC lines at 5 different timepoints of CM differentiation using the 15-day protocol. ATAC- and ChIP-seq samples have not been published. The RNA-seq samples from the time course study were published in the 2017 iPSCORE resource paper (see below).

Data Type

Cell type

Samples 

Input

Differentiations

Subjects

RNA

iPSC (D0)

9

NA

9

3

RNA

iPSC-MP (D2)

9

NA

9

3

RNA

iPSC-CP (D5)

9

NA

9

3

RNA

iPSC-CC (D9)

9

NA

9

3

RNA

iPSC-CM (D15)

9

NA

9

3

ATAC

iPSC (D0)

15

NA

9

3

ATAC

iPSC-MP (D2)

9

NA

9

3

ATAC

iPSC-CP (D5)

9

NA

9

3

ATAC

iPSC-CC (D9)

9

NA

9

3

ATAC

iPSC-CM (D15)

18

NA

9

3

H3K27ac

iPSC (D0)

1

1

1

1

H3K27ac

iPSC-MP (D2)

3

3

3

H3K27ac

iPSC-CP (D5)

3

3

3

H3K27ac

iPSC-CC (D9)

3

3

3

H3K27ac

iPSC-CM (D15)

3

3

3

Publications:

  1. Panopoulos AD, D'Antonio M, Benaglio P, Williams R, Hashem SI, Schuldt BM, DeBoever C, Arias AD, Garcia M, Nelson BC, Harismendy O. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem cell reports. 2017 Apr 11;8(4):1086-100.

CVPC+EPI  Isogenic Mixture Study

In this study, we sought to determine how accurately we could make isogenic mixes of CVPCs and EPIs at defined ratios. These data have not been published but the methods used to generate them are described below.  

Data Type

Cell type

Samples 

UDID

Subjects

RNA

CVPC

1

1

1

RNA

EPI

1

0

1

RNA

CVPC + EPI

3

1

1

Methods: From the iPSCORE collection we selected a highly pure iPSC-CVPC sample (mostly cardiomyocytes) and using the same iPSC line derived isogenic pure iPSC-EPIs using the protocol of Whitehead et al. To make cellular mixtures (CMs:EPIs) at three ratios, we thawed the cryopreserved iPSC-CMs and cultured them for 5 days to ensure full cell recovery after cryopreservation obtaining iPSC-CM at D30. Similarly, we thawed and expanded isogenic iPSC-EPI and cultured them to D35. Flow cytometry analysis and RNA-seq analysis showed that cells before and after cryopreservation maintain their high purity. Next we collected iPSC-CM and iPSC-EPI using Accutase, counted them and prepared the following samples: 1) pure iPSC-CM (CM - 100) at D30; 2) pure iPSC-EPI at D35 (EPI – 100); 3) mixture of CM:EPI 75:25; 4) mixture of CM:EPI 50:50; 5) mixture of CM:EPI 25:75. We generated bulk RNA-seq for all mixtures (0-100; 25-75; 50-50; 75-25; and 100-0 CM-EPDC). The cellular ratios (CMs:EPIs) in each sample was validated by co-staining the cells for two markers (cardiac marker cTNT and epicardium marker WT1) and analyzing them by flow cytometry.