Skip navigation links

Oct. 26, 2021

Arjun Krishnan: Organizing the ocean of public data using machine learning

Each year, the National Science Foundation’s Faculty Early Career Development Program, or CAREER, awards grants to “early-career faculty who have the potential to serve as academic role models in research and education.” The following story is part of a series highlighting Michigan State University’s recipients of NSF’s 2021 CAREER grant awards.

This Spartan computational scientist is developing a way to standardize data and make it searchable 

Data from more than 2 million biological experiments are available through public online databases, but much of this information is not used by the biological and biomedical researchers who need it.

One reason is a lack of standards for the information contained in each sample that can be vague or incomplete. A human tissue sample, for example, may not include its origin, say heart, kidney or brain.

Michigan State University’s Arjun Krishnan is working to standardize the information reported about each sample and make it searchable for researchers through a web interface.

“We will develop machine learning approaches to automatically annotate publicly available samples from six species (human and five animal models) on a massive scale to enable researchers to seamlessly discover relevant published data,” says Arjun Krishnan assistant professor in the Department of Computational Mathematics, Science and Engineering.

Once the data has been labeled and organized, Krishnan and his team will create an online web interface so that researchers can search for data that aligns with their research needs. As the field of biology has gradually shifted toward data science, Krishnan was inspired by the role that computation can play.

“Data-driven computational biology represents a confluence of ideas from diverse scientific and technological disciplines, including computer science, statistics, physics and applied mathematics,” says Krishnan, who is also in the Department of Biochemistry and Molecular Biology within the College of Natural Science. “A constant reminder that good ideas can come from anywhere and from anyone.”

With the support of a 2021 National Science Foundation Career award, Krishnan will analyze large datasets using high-performance computing. It is important for researchers to access the data and to develop research skills that can be applied elsewhere using rapidly changing tools and techniques.

Krishnan also has a passion for training, educating and providing research opportunities for the next generation of computational scientists.

“Our educational and training activities will result in formalizing and openly disseminating a curriculum in modern bioinformatics,” he says. “The resulting new curriculum, online instructional content and training activities will provide learners with bioinformatics research experience combined with practical primers, case studies, conversations with scientists and co-work sessions.”

This content will be offered as Open Community Un-Workshops to the broader MSU community in collaboration with R-Ladies East Lansing, a local group of aspiring and experienced R programmers focused on mentoring, networking and teaching, to draw underrepresented students/trainees into biological data science. Through existing partnerships with R-Ladies and local East-Lansing/Lansing High Schools, Krishnan and his team will recruit high school and undergraduate students, especially from underrepresented groups, to participate in their research program.

“The students involved in research will contribute case studies, co-lead and facilitate workshops/co-work sessions and serve as near-peer mentors, thus gaining valuable teaching experience and professional development,” Krishnan says. “It is the difference between learners knowing things about computational science and learners being able to think and work as a computational scientist.”


By: Emilie Lorditch and Kelsie Lane

SERIES

Passion and purpose: Meet the researchers changing tomorrow today