Ph.D. Molecular Biology
1999, University of Georgia
Office: Greenwood Genetic Center
Phone: (864) 941-8120
Research Focus Areas
Gene Function and Regulation
We are currently focused on two areas of bioinformatics research, biological knowledge discovery and genomic data integration. Our research in the first area aims to build machine learning models for understanding gene regulation and protein function. Machine learning is particularly appealing for knowledge discovery in biological data. For many biological problems, although a number of experimental observations have been made, the underlying mechanisms remain unclear. It is thus highly desired that machine learning techniques can be used to model the complex patterns hidden in the available data. We have been developing new machine learning approaches with relevant biological features for modeling protein-DNA/RNA interactions, and for predicting protein stability changes upon amino acid substitutions. Several web servers have been set up to make our predictive models freely available to the biological research community. These web servers include BindN (http://bioinfo.ggc.org/bindn/) and BindN+ (http://bioinfo.ggc.org/bindn+/) for sequence-based prediction of DNA or RNA-binding residues, and MuStab (http://bioinfo.ggc.org/mustab/) for predicting protein stability changes. In collaboration with scientists at the Greenwood Genetic Center (GGC), we are using both sequence and structure-based methods to predict point mutations that may cause intellectual disability. We are also interested in several other machine learning problems related to gene regulation and protein function.Genomic data integration is the other focus area of our research. High-throughput experiments generate large datasets, from which useful information may be extracted for understanding biological systems. The scale and complexity of the datasets give rise to substantial challenges in data management and integration. We previously developed the model organism database called BeetleBase, and the BioStar framework for genomic data integration in biomedical warehouses. Recently, we have been developing an integrated data resource for human genetics research. In particular, we have compiled a compendium of publicly available microarray gene expression profiles of various human tissues. A computational pipeline has been developed to integrate the microarray data from heterogeneous sources. The integrated dataset is used for computational identification of tissue-selective genes, co-expression network analysis and regulatory network modeling. Together with GGC scientists, we are also using the dataset to study intellectual disability gene function and regulation, and to prioritize candidate disease genes. A database system is currently being developed to make the integrated data accessible to the biomedical research community.
Biochemistry Senior Seminar
Genetics Senior Seminar
Teng, S., Luo, H. and Wang, L. (2011) Predicting protein sumoylation sites from sequence features. Amino Acids, in press.
Zhou, Z., Marepally, S.R., Nune, D.S., Pallakollu, P., Ragan, G., Roth, M.R., Wang, L., Lushington, G., Visvanathan, M. and Welti, R. (2011) LipidomeDB data calculation environment: online processing of direct-infusion mass spectral data for lipid profiles. Lipids, 46(9):879-884.
Teng, S., Srivastava, A.K., Schwartz, C.E., Alexov, E. and Wang, L. (2011) Structural assessment of the effects of amino acid substitutions on protein stability and protein-protein interaction. International Journal of Computational Biology and Drug Design, 3(4):334-349.
Wang, L., Huang, C. and Yang, J.Y. (2010) Predicting siRNA potency with random forests and support vector machines. BMC Genomics, 11(S3):S2.
Wang, L., Srivastava, A.K. and Schwartz, C.E. (2010) Microarray data integration for genome-wide analysis of human tissue-selective gene expression. BMC Genomics, 11(S2):S15.
Teng, S., Srivastava, A.K. and Wang, L. (2010) Sequence feature-based prediction of protein stability changes upon amino acid substitutions. BMC Genomics, 11(S2):S5.
Zhang, Z., Teng, S., Wang, L., Schwartz, C.E. and Alexov, E. (2010) Computational analysis of missense mutations causing Snyder-Robinson Syndrome. Human Mutation, 31(9):1043-1049.
Wang, L., Huang, C., Yang, M.Q. and Yang, J.Y. (2010) BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Systems Biology, 4(S1):S3.
Smith, C.M., Liu, X.M., Wang, L., Liu, X., Chen, M.S., Starkey, S. and Bai, G. (2010) Aphid feeding activates expression of a transcriptome of oxylipin-based defense signals in wheat involved in resistance to herbivory. Journal of Chemical Ecology, 36(3):260-276.
Li, D., Wang, L., Yang, X., Zhang, G. and Chen, L. (2010) Proteomic analysis of blue light-induced twining response in Cuscuta australis. Plant Molecular Biology, 72(1-2):205-213.