Data Science in Healthcare

Since mid-2020, I have been the Head of Engineering and Data Science at Dascena, a healthcare startup company based in the San Francisco Bay area, where I lead the research and development of applying machine learning techniques to clinical decision support and pharmaceutical development. Previously, I worked as a lead machine learning engineer at Dascena between 2015 and 2017.

I have published research articles on applying machine learning to a variety of clinical decision support problems, and I have worked as a reviewer for many journals. I currently serve on the editorial board of Computers in Biology and Medicine.

Clinical Decision Support Systems with A Machine Learning Approach

A clinical decision support system (CDSS) is an information technology system that assists health professionals to perform clinical decision-making. CDSS is a major topic in artificial intelligence (AI) in healthcare.

BMJ Open Sepsis ROC
Receiver operating characteristic curves for Dascena InSight versus competing methods at the time of (A) sepsis onset, (B) severe sepsis onset, and (C) 4 hours before septic shock onset. (Mao et al 2018).

Many traditional clinical decision supports are based on simple scoring systems that only use tabulation or linear combinations of a limited number of patient vital signs and laboratory results. These traditional systems are often inefficient and have limited predictive value. Modern computer-based predictive analytics have been demonstrated to improve diagnostic outcomes and reduce failures in care delivery, care coordination, and overtreatment.

At Dascena, I lead a team of scientists and engineers to utilize modern machine learning methods to develop the next generation clinical decision supports for various clinical applications, such as early prediction of sepsis onset, patient stability and mortality prediction and automated transfer recommendations, etc. Our team used freely accessible dataset such as the MIMIC database, and we also collaborate with large hospitals, such as University of California San Francisco (UCSF) Medical Center, to analyze their comprehensive datasets to develop data-driven, efficient, and secure clinical decision support tools which will benefit both health systems and patients.

Healthcare Data Security


Data security is a very big concern in digital healthcare and is one of the biggest challenges for any company in the healthcare business, no matter you are a startup or an established corporation. Healthcare related information is considered as sensitive information and is well regulated to protect patients' privacy and avoid potential abuse and fraud. In the United States, sensitive healthcare data are regulated under Health Insurance Portability and Accountability Act, or HIPAA for short. A platform used to transfer, store, and process Protected Health Information (PHI) must be HIPAA compliant.

How to build a secure and compliant but also efficient and low-cost healthcare data platform is a big and evolving topic. Recently, cloud services providers also push hard to provide cloud-based HIPAA compliant solutions to speed up the development of healthcare platforms and applications. For example, Amazon Web Services (AWS) has made many of their services to be HIPAA compliant, and they have provided architecture strategies and whitepapers that outline general principles and best practices.

At Dascena, we frequently need to do sensitive data collecting and processing. I served as the first Security Officer of the company and led the design and deployment of the data pipeline using AWS. Another thing to keep in mind is that compliance is not only about technologies, but also about proper policies and procedures. Working together with consultants and security experts, I managed to develop the set of security policies and procedures for the company, which formed the foundation of our security operations.