SCAP-T Data Guide

A Guide to the SCAP-T Data

The first set of SCAP-T data was released on February 17, 2015. This first data release comprises of the phenotypic information and next generation sequencing data of whole transcriptome for 697 single cells from human brain and heart. This document serves as a tutorial to this data.

Where does this data come from?

The break up among these 697 human single cells is as follows:

  1. 185 single cardiomyocyte cells from SCAP-T (University of Pennsylvania)
  2. 497 single neuronal cell nuclei from SCAP-T (University of California in San Diego)
  3. 15 single neuronal cells from (SCAP-T University of Southern California)


What does the data release comprise of?

Each center has its own set of experimental protocols that they use to process the single cells. The detailed experimental protocols can be obtained here. All centers also track a common set of metadata and phenotypic attributes for these single cells and for the human subjects that are the source of these cells. The list of attributes that are collected is provided here. After the RNA sequencing has been carried out for these cells, all the raw sequencing data is transferred to the data coordination team at University of Pennsylvania, which then performs the NGS analysis for all the samples using the PennSCAP-T pipeline. The details of this pipeline can be obtained here. This pipeline also generates several QC metrics for each sample. To see what these metrics are click here.

The data release comprises of the compiled metadata (including image files and protocols), NGS BAM files and gene expression counts (for exons as well as introns).


Where can I access the data?

The following is an overview of the steps required to submit an application for the SCAP-T data:

  1. All applications need to be submitted to dbGaP through the dbGaP Authorized Access web page. For instructions on how to request dbGaP data please view the dbGaP Request Procedures (PDF). Please log in using your eRA Commons account (NIH intramural investigators should use their NIH login) and enter your application.
  2. The dbGaP dataset ID for SCAP-T is phs000833.v1.p1. Click here to go to our project website on dbGaP.


Once you get the authorization, you will be able to access the data via dbGaP, or you may use the SCAP-T data portal which has been developed as a user friendly tool to browse the data, filter the data using custom queries and download the samples of interest with ease. We also plan to provide informative summary statistics pertaining to the data on the data portal.


Are there any usage rules associated with the SCAP-T data?

The SCAP-T data are embargoed for a period of six months for each available wave of data. Applicants for data are prohibited from publishing findings during this period.


Where can I go for more information?

We will provide all relevant information including dates for future releases on the SCAP-T website. If you have any other questions, please contact us.