Signature Gene Identification of Cancer Occurrence and Pattern Recognition

To identify signature genes for the pathogenesis of cancer, which provides a theoretical support for prevention and early diagnosis of cancer. The pattern recognition method was used to analyze the genome-wide gene expression data, which was collected from the The Cancer Genome Atlas (TCGA) database. For the transcription of invasive breast carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, colon adenocarcinoma, renal clear-cell carcinoma, thyroid carcinoma, and hepatocellular carcinoma of the seven cancers, the signature genes were selected by means of a combination of statistical methods, such as correlation, t-test, confidence interval, etc. Modeling by artificial neural network model, the accuracy can be as high as 98% for the TCGA data and as high as 92% for the Gene Expression Omnibus (GEO) independent data, the recognition accuracy of stage I is more than 95%, which is higher compared with the previous study. The common genes emerging in five cancers were obtained from the signature genes of seven cancers, PID1, and SPTBN2. At the same time, we obtain three common pathways of cancer by using Kyoto Encyclopedia of Genes and Genomes' pathway analysis. A functional analysis of the pathways shows their close relationship at the level of gene regulation, which indicted that the identified signature genes play an important role in the pathogenesis of cancer and is very important for understanding the pathogenesis of cancer and the early diagnosis.

Journal of computational biology : a journal of computational molecular cell biology. 2018 Jun 29 [Epub ahead of print]

Jian-Xin Wen, Xiao-Qin Li, Yu Chang

College of Life Science and Bioengineering, Beijing University of Technology , Beijing, P.R. China .