By Rabail Baig
In the second session of this summer’s 8-week COVID + Data Science Virtual Seminar Series sponsored by Duke University’s Plus Data Science (+DS) program, Duke Assistant Professor of Civil and Environmental Engineering David Carlson introduced natural language processing concepts with the challenge of parsing important findings in scientific literature. After touching upon the basics of natural language processing, Carlson introduced the algorithmic foundation of these tools, and discussed how the same toolkit can be applied in diverse scenarios.
“There has been an explosion of scientific literature on COVID-19 in the past several months with thousands of articles already available,” said David Carlson, “and in order to try to make sense of this quickly evolving literature, people have turned to data science and natural language processing tools to process the vast literature and find the most relevant articles.”
Among the participants from across Duke who attended Carlson’s talk on natural language processing was David Bradway, PhD, a staff research scientist in Biomedical Engineering at Duke, who later reached out to Carlson in an email sharing how he applied the methods Carlson presented to a dataset he had been working with at home.
“I had been following the announcements of seminars presented at Duke on data science both in-person before COVID-19, and now remotely via Zoom,” said Bradway, who has an avid interest in machine learning methods that operate on imaging data, which applies to his work in ultrasound imaging.
David Bradway earned his Ph.D. in biomedical engineering in 2013 from Duke, after which he was a guest postdoc at the Technical University of Denmark, supported by a Whitaker International Program Scholarship. Working on ultrasound research since 2002, Bradway is interested in diagnostic ultrasound imaging and developing methods to improve image quality and medical diagnoses. More recently, he has been picking up skills in newer methods in data science and statistics through Duke’s Machine Learning Summer School, workshops, and online tutorials.
Aside from his work at Duke, Dr. Bradway is also currently involved with a group of volunteers at EngageDurham, a pilot initiative that facilitates community outreach and engagement for the Durham Comprehensive Plan process. The Comprehensive Plan is a broad policy document that outlines the community’s vision and recommendations for decision-making covering a range of areas including housing, health, education, development, environment, land use, transportation, and an umbrella strategy for how the city and county wants to grow and develop over the next 30 years.
“The Planning Staff at EngageDurham had acknowledged that the process of creating previous Comprehensive Plans was quite biased, with much of the decision-making power resting in the hands of overwhelmingly white people in power including the Planning Commission and City Council members,” said Bradway. “The process being used this time was intentionally more inclusive and equitable.”
After the first round of community engagement in late 2019, Bradway realized that planning staff had to apply manual methods of data entry and tagging for the analysis of the text results collected. To augment their manual tagging methods and apply methods that were less reliant upon human judgment for categorizing the topics of comments, he used the opportunity to learn and apply natural language processing to try to gain some additional insights with automated algorithms.
“I learned through my analyses that the topics important to people varied with the demographics of those giving the feedback,” said Bradway. “The different channels of feedback that the planning staff utilized had quite different racial and socioeconomic demographics, and this was reflected in the contents of the comments that were received.”
“Dr. Carlson’s talk on applying natural language processing to COVID-19 literature caught my eye due to its applicability to the EngageDurham community project, and I really appreciated how Dr. Carlson actually used the methods he was teaching on real data, included examples in the talk, and made the whole analysis available in a notebook and public repository. This made it really easy to adapt to my dataset and project,” said Bradway.
David Bradway applied methods from Dr. Carlson’s talk to EngageDurham data acquired from its Listening and Learning engagement program, which comprised text results from online surveys, large in-person facilitated sessions, and small community feedback sessions of primarily underrepresented groups.
“To use the data, I copied the spreadsheet that the planning staff posted and did some manual data cleaning including the headings, spelling, and standardizing the tags applied across the sheets,” said Bradway. He then used analyses from Dr. Carlson and made a copy of his data and code available on his Github profile.
“One of the goals of working with the community in this process is to improve transparency and to rebuild trust,” said Lisa Miller, a senior planner at EngageDurham who worked directly with Bradway. “Dr. Bradway has been involved in this important work since it began last summer and most recently, his work developing an app to improve access and searchability of the database of resident input from engagement so far has been invaluable.”
“I hope that the tools and analyses I am helping with will inform the Comprehensive Plan and help shape the next 30 years of Durham’s growth in a more equitable, racially just way,” said Bradway who hopes that people with appropriate skills can use data science and statistics methods to improve the lives of people around them.