ChengXiang Zhai, text analysis, is Assistant Professor of Computer Science, with joint appointments in Information Science and the Institute of Genomic Biology. He holds a Ph.D. in Computer Science from Nanjing University and a Ph.D. in Language and Information Technologies from Carnegie Mellon University. He has extensive research experience on natural language text analysis and information retrieval from both academia and industry. His research interests span many topics in information management, especially information retrieval,  information filtering, and text mining.  He received an NSF CAREER award from IDM-CISE and a 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), recognizing his work in user-centered adaptive information retrieval. He also received a best paper award from ACM SIGIR in 2004.

His work on applying statistical language models to information retrieval represents a new generation of models for searching text. He developed a new general framework for information retrieval based on Bayesian decision theory, which facilitates modeling complex retrieval problems and automatic tuning of performance. Recently, he has been more focused on developing techniques for personalized information management and on applying text retrieval and mining techniques to biological data analysis and biomedical literature minding.

He also has significant experience with developing information management software. While working in Clairvoyance Corp., he was involved in developing a commercial toolkit underlying the ConceptBase software product, which won the "Software of the Year" award in Japan. He also led a team working on personalized information filtering system. The filtering techniques he developed consistently perform well in TREC – the premier international text retrieval evaluation workshop sponsored by NIST. These techniques resulted in four US patents. While at Carnegie-Mellon University, he was a major architecture designer and implementor of Lemur, an information retrieval and language model toolkit that is now used worldwide for both research and education.

