- Cell line view
- To search information of ONE selected cell line
- Gene view
- To search alterations of ONE selected gene in MULTIPLE cell lines
- Intergenic view
- To search alterations in ONE selected gene and its flanking regions in MULTIPLE cell lines
- Chromosome view
- To search alterations of ONE selected chromosome or cytoband in MULTIPLE cell lines
- Work flow chart:
- 338 and 182 Affymetrix GeneChip SNP array sets, including 250K and 500K and 6.0 platforms, were downloaded via caArray and GEO from NCI and NCBI databases, respectively. After data processing (see as SNP data processing part), amplification and homozygous deletion regions were defined. Finally, we stored these amplicons and homozygous deletions into central database.
- Cell line characteristics provided by 3 well-known authentic bioresource providers, including ATCC and DSMZ and ECACC, were retrieved through webpage spiders and then parsed into central database.
- UCSC Hg18 annotation data, including cytoband and refGene and refFlat, was used to integrate all parts of data in central database.
- The Catalogue of Somatic Mutations in Cancer (COSMIC) data v47 was downloaded from Sanger Institute, and we extracted and incorporated point mutation data in our central database.
- The tp53 mutation data 2008_R2 version was downloaded from UMD TP53 Mutation Database and parsed into our central database.
- The gene expression data of 950 cancer cell lines and 140 normal tissues data performed by Affymetrix GeneChip Human Genome U133 Plus 2.0 array platform were downloaded via caArray and GEO from NCI and NCBI databases, respectively. After data processing (see as Expression data processing part), fold change value based on logarithm 2 was stored in central database.
- SNP data processing:
The copy number alterations (CNAs) data was generated from GeneChip Human Mapping 250K and 500K and 6.0 SNP array sets which downloaded via caArray from caBIG maintained in NCI and via gene expression omnibus (GEO) maintained in NCBI. Downloaded source CEL files were analyzed by dChip 6.0.
The calling of homozygous deletions and amplicons were published recently based on our developed protocol (Hepatology 2010, PMID: 20799341).
In brief, CEL format data is normalized using invariant set normalization algorithms and then generate normalized-within-chip intensity data based on reference data set of 50 normal individuals genotyped in the same platform.
Based on these signal values, the raw copy number for a SNP in a sample is computed as: [ log2( intensity of SNP / mean of intensity of reference X 2) ].
A window size of 3 SNPs is then applied for median smoothing method and to infer raw copy number (ICN).
- Expression data processing:
Microarray Gene Expression data sets of 309 cancer cell lines (duplicated or triplicated experiments for a total of 950 experiments) that performed by GeneChip Human Genome U133 Plus 2.0 Array platform were downloaded via caArray from NCI.
All these cancer cell lines were classified into 29 tissue categories.
According to these categories, 140 human normal tissue expression data sets to serve as reference in the same platform were downloaded via GEO from NCBI.
All the downloaded CEL format expression data was first pre-processed using invariant set normalization algorithms by dChip and then converted into intensity value with absent/present call.
We performed linear models for microarray data package (Limma) in R for calculating fold change of each probes.
In Limma, the logarithm 2 value of expression data intensity of each probe in cancer cell line was subtracted to that of reference data derived from the same tissue category, and finally obtained the logFC for the expression fold change in cancer cell lines.