By Rabail Baig
Since it was declared a global pandemic in March 2020, COVID-19 upturned university and college campuses across the United States, causing major disruption to student life. As Duke’s campus went into a full lockdown following a steep uptick in COVID-19 infections in North Carolina last spring, Duke’s Harshavardhan (Harsha) Srijay, a 19-year-old second-year undergrad student majoring in math and data science, saw his plans for the 2020 summer crumble. As prior opportunities fell through the cracks, the Duke Plus Data Science (+DS) Advanced Projects summer program provided him a platform to not only be engaged and productive through a very difficult summer, but also come out of it with a successful project that he recently presented at the American Medical Informatics Association (AMIA) 2021 Virtual Informatics Summit.
The AMIA Informatics Summit is an annual scientific conference that brings together physicians, scientists, students, economists, analysts, policy makers and entrepreneurs from across the country to learn about best practices, new methodologies and shared challenges in translational and clinical research, implementation informatics, and data science. Harsha received a conference grant from the Duke Undergraduate Research Support Office (URS) that provided funding for him to attend the conference.
Working with mentor Ricardo Henao, PhD, Assistant Professor in Duke Biostatistics and Bioinformatics and the scientific director of the +DS advanced projects program, Harsha’s project on predictive modeling using transcriptomic signatures of COVID-19 and other infectious diseases was among the 58 accepted podium abstracts presented at AMIA this year. Harsha joined the Duke +DS Advanced Projects program beginning in summer 2020 in the health data science track, which offers Duke undergraduate and graduate students the opportunity to be a part of research teams applying advanced machine learning to important areas of medicine.
We caught up with Harsha after the presentation to ask him about his experience of presenting at a preeminent informatics conference despite going through one of the hardest academic years students have faced in recent history.
Tell us about yourself and how you ended up applying to the +DS Advanced Projects summer program
I am a Duke sophomore double majoring in math and data science (an interdepartmental major between computer science and statistics) and am thinking about pursuing a certificate in decision sciences. My interests generally lie in helping others by optimizing decision making, so I am very interested in not only machine learning and AI, but in approaching real-life problems also through the interdisciplinary lens of economics, neuro-economics, neuroscience and psychology.
I came across Duke +DS programs and projects during March 2020, just as quarantine started and we were sent back home. My original plan for the summer was getting involved with Duke Engage, but that was canceled because of the pandemic. I applied to the +DS Advanced Projects program and was luckily accepted which was quite thrilling, more so because I got the chance of working with Dr. Ricardo Henao.
How did the pandemic disrupt or affect your academic plans for 2020?
Being away from campus and having courses and programs cancelled was very disappointing and demoralizing. Like many others, I also was unsure about what I was going to do over the summer. The positive aspect was definitely the +DS Advanced Projects summer session extending its application deadline so that students like me could apply. The program provided me the opportunity to work independently, with Dr. Henao overseeing my progress. I have had the opportunity of working at a lab before, but I found the +DS Advanced Projects to be a lot more engaging, enabling and intensive. The program is rigorously structured in a good way, where your mentor holds you accountable and keeps you busy and engaged through three-times-a-week one-on-one interactions and journal clubs. It was a tremendous learning opportunity that gave a sense of direction and purpose in an otherwise unexpected and challenging summer.
What was it like working with Dr. Ricardo Henao?
Working with Ricardo Henao was great. He is incredibly knowledgeable about machine learning and its applications, and all around a very good mentor to work with. It is strange how I have never met him face-to-face, though, despite working together for almost a year.
Looking at my previous experience of working with transcriptomic and other next generation sequencing data formats, Dr. Henao encouraged me to cultivate my idea of predicting disease states for patients that have COVID-19 and other upper respiratory infections. I ended up working on the project all summer and continued throughout the school year as well. With his help, I was able to submit a podium presentation abstract to AMIA by August 2020 and am thrilled to have had the opportunity of presenting at the AMIA 2021 Virtual Informatics Summit.
Can you shed some light on your research?
Given the state of global disease today, I wanted to draw attention to the applications of machine learning algorithms to diseases like COVID-19, specifically trying to use these algorithms to predict patient disease state. We looked at four upper respiratory infections (including COVID-19) in our study.
Symptoms of these acute upper respiratory infections are not specific. This means that patients that are infected with these pathogens can present overlapping symptoms, which can make it difficult for physicians and doctors to perform targeted therapy and optimize diagnosis/treatment for these patients. Given that clinical signs and symptoms of such infections are not pathogen-specific, the goal of our project was to essentially enable targeted therapy by building pathogen-specific, multi-disease predictive models. These predictive models should be able to preemptively predict whether a patient has a certain disease, using only the patient’s gene expression. This will enable a better understanding of how these infections cause differentiated responses in patients, which will illuminate valuable information about these diseases and how we can effectively treat them.
Can you briefly touch upon the key findings?
Results from our study confirm the potential for robust multi-class disease state predictive models using representation learning methods. We were able to build simpler models that had very strong levels of performance. These simpler models are more interpretable, and more likely to perform well when deployed in the real-world.
What experiments or studies could be done to build on your work and to address unanswered questions?
While this project was done mainly in the realm of model building, we believe future work should emphasize the understanding of the biological relevance of the findings in the study.
We are currently furthering the results of this study, specifically looking into the biological intuition of the model’s class discrimination capabilities and investigating how these findings can further our understanding of the differentiation between these acute upper respiratory infections.
Anything else you’d like to add?
I would like to thank the Center for Applied Genomics and Precision Medicine at Duke University for providing the data that we used in this project. I would also like to thank Dr. Ricardo Henao, Shelley Rusincovitch, and the Duke +DS program for their role in advising and overseeing this project.