In late spring 2020, Sidi “Daniel” Liu 20BBA/MSBA received a LinkedIn notification about the 2nd IKCEST “The Belt and Road” International Big Data Competition. Co-hosted by several Chinese educational and technology organizations, the topic—forecasting the future incidence of highly pathogenic contagious diseases—got Liu’s attention. It was a timely, real world problem, and Liu thought the competition would be worth entering. He spoke with three fellow Goizueta MS in Business Analytics students, Fei Hou 20MSBA, Zihan Wang 20MSBA and Patrick He 20MSBA, and they decided to form a team.
Liu had “confidence in our business analytics background,” he said, but he’d never taken part in such a large-scale competition. Divided into three rounds—preliminary (May 1st to June 25th), semi-final (July 1st to July 31st) and the final round and award ceremony (September and October)—the competition took place over a six-month span. More than 5,000 students—mostly computer science doctoral students from artificial intelligence and/or machine learning programs—from 580 universities participated in the competition. Since none of Liu’s teammates had backgrounds in applying machine learning in healthcare, the team asked Yingtian Hu, a 2021 doctoral candidate in biostatistics in Emory University’s Rollins School of Public Health to join. Then they got to work.
Of the more than 3,000 teams spread across 22 countries that took part in the competition, the Emory team placed 6th overall and was the top-finishing team from a U.S. university (teams from 28 U.S. universities competed). The team credited George Easton, associate professor of information systems & operations management, with helping the team refine its approach to the data challenge. Easton had taught the students Machine Learning I (a core master’s in business analytics program course).
“George had new ideas, ‘You can do this, try that,’” said Hou. “After we took his approach, we made big progress.”
Using data to predict infection rates
Over the course of the competition, teams studied the infection rates in 11 virtual cities, divided into key regions, for a period of 90 days. Teams were given access to the first 45 days of data in each region. Based on this data, they were tasked with predicting the number of newly infected people in each region in the cities over the last 45 days. Easton met with the students four or five times over Zoom to talk about strategy, framing, and how to look at the data to better understand “things that would affect the infection rate,” Easton said.
To make accurate overall predictions, the students needed to design models that best predicted infection rates and specify in what regions they occurred. Since the infection had a lag—infections occur up to two weeks before victims get sick or show symptoms—the team needed insight “on how to build models on lag variables,” explained Liu.
As in most data competitions, the students were given a data set with randomly selected pieces—in this case, future data pieces—excluded. The team was allowed to submit up to three predictive models a day. Those models were then compared to the “actual” rates of infection and ranked in real time relative to how well the team’s model predicted the infection rates.
Embracing fierce competition
As the competition progressed, the Emory team benchmarked itself against the other U.S.-based teams, taking note of their positions in the daily ranking. The team took special notice of a competing team from Columbia University, which was advised by a professor well known for his academic work on pandemics.
“We felt the pressure,” said Liu. “But George was super awesome. There was no disadvantage.” In fact, Easton was one of a handful of advisors given an “Excellent Instructor” award by the IKCEST judges.
In addition to Easton’s guidance and hard work, Hou credited teamwork for the team’s success. Both Hou and He, who graduated in May 2020 at the beginning of the competition, began new jobs over the summer—He as a senior data analyst, assurance and advisory at The Home Depot, and Hou, as the chief data scientist at Optimized Payments. “It was a little bit challenging for us. Some of us were thinking we don’t want to continue anymore,” said Hou. But the team’s persistence paid off. “Working with colleagues from a company or any organization, you have to use teamwork to figure out complex tasks as a whole team. Our contribution as a member of that team was the most important thing for us.”
Easton gave the students high marks for their topnotch presentation skills and for keeping their algorithm simple.
“The students did sensible things. That’s key,” Easton explained. “And they did an exceptionally good job of telling their story. I was very impressed.”
He was also impressed with the fact that the team entered a six-month competition in the first place. “I really think these students are role models in terms of being proactive and having both the courage and the drive to tackle a problem like this,” Easton said. “A lot of the things we talked about in class become a whole lot more real when you’re confronted with a problem and with data like this. I was happy to work with them.”
Established in 2014, the International Knowledge Centre for Engineering Sciences and Technology (IKCEST) is a comprehensive and international knowledge center devoted to the engineering sciences, technology and applied tech. The Chinese Academy of Engineering is responsible for the operation and management of IKCEST.