Skip navigation, to content.

2006 Rochester Computational Science and Education Conference

Computational approaches for microRNA identification

Authors: Praveen Sethupathy 1, 2, Molly Megraw 1, 2, Martin Reczko 4, and Artemis G. Hatzigeorgiou 1, 2, 3

  1. Penn Center for Bioinformatics
  2. University of Pennsylvania, Department of Genetics, School of Medicine
  3. University of Pennsylvania, Department of Computer and Information Sciences, School of Engineering
  4. Institute of Computer Science, Foundation of Research and Technology Hellas, Heraklion, Greece

Abstract

Computational biology is a field in flux. In recent years, primarily due to the genome sequencing projects, we have experienced an exponential growth in the volume of biological data. The challenge has been to develop methods that analyze and integrate different types of biological data to create tools that provide statistically reasonable models and predictions that explain biological phenomena. In this report, we will describe the development of computational methods based on machine learning approaches, such as Support Vector Machines, that have been applied toward understanding the biogenesis and function of a new group of important gene regulators called microRNAs (miRNAs).

Gene regulation is the complex set of processes within the cells of any living organism that dictate which genes are "turned on" or "turned off" in a particular tissue at a particular time. The process of "turning on" a gene encoded within the genomic DNA requires a pipeline of molecular events. Briefly, the DNA is transcribed into RNA, the RNA is translated into protein, and protein is often modified into an activated form. Different cells have mechanisms in place at each of these stages to ensure that exactly the right set of genes is turned on. One such mechanism is miRNA based silencing. MiRNAs are short RNA segments that function by binding to longer RNA segments and preventing their translation into protein, effectively "turning off" a gene. To understand when miRNAs work and which longer RNA segments they target, it is first necessary to carefully characterize them. In other words, how many miRNAs are there, where are they located, and what do they look like? The lack of high-throughput experimental methods to answer these questions has provided the impetus for the development of computational approaches. The central idea behind the computational approaches has been to identify and utilize features of miRNAs that best differentiate them from other genomic entities, thereby serving to predict where they are located. We describe in this report the progression of computational miRNA characterization over the past half decade. Since errors in miRNA based silencing can result in various forms of cancer, developmental defects, and other significant medical conditions, we also use computational miRNA analysis to highlight the potent utility of computational science in the area of biology and medicine.