Building Better Phylogenetic Trees
This new work from researchers at UC San Diego and Brown University will be published online by PNAS the week of December 18, 2006.
Microinversions – usually tens to thousands of base pairs in length – can only be detected if you have the exact nucleotide sequence of the same genomic region for all the species you are considering. Many recent studies have pointed to microinversions as large sources of genetic diversity that have not previously been characterized, and the new research from UCSD provides a more careful and accurate approach to identifying microinversions.
“As more fine-grained genomic data becomes available, microinversions will be increasingly important in understanding genetic diversity both between and within species,” said Mark Chaisson, the first author on the paper and a Bioinformatics Ph.D. student from UCSD’s Jacobs School of Engineering.
|Mark Chaisson, the first author on the paper, is a Bioinformatics Ph.D. student from UCSD’s Jacobs School of Engineering.|
“This method might be able to provide evidence for the entire mammalian phylogeny, such as the presence of an Afrotheria clade,” he said. A clade is a cohort of mammals that are all descended from a common ancestor, and DNA analysis – so far – has indicated but not proved that Afrotheria includes nearly a third of all mammalian orders currently found in Africa and Madagascar – including many species that are at risk of extinction.
|The phylogenetic tree reconstructed by the authors is on the left and the corresponding canonical mammalian phylogeny is on the right.|
Using data from their microinversion detection technique – an open-source software system called InvChecker – the researchers reconstructed the phylogenetic tree for 15 mammals. This work largely confirmed the existing phylogenetic tree that connects these mammals.
“Three years ago, we didn’t know microinversions existed,” explained Pavel Pevzner, the senior authors on the paper, a computer science and engineering professor at UCSD’s Jacobs School of Engineering, and director of the newly-established Center for Algorithmic and Systems Biology (CASB) at the UCSD Division of Calit2.
“When they were discovered, there was a lot of skepticism. In the last year, scientists have realized just how common microinversions are in evolution – even in variation between humans, which is why they are such a hot topic today.”
“We’ve only looked for microinversions in 0.1 percent of the genomic sequence from several mammals, and we can already confirm many of today’s ideas about the history of evolution. When similar analyses extend to one percent of the genomes under investigation, we’ll have a 10 fold increase in data. This should shed light on splits between species that have long been debated in molecular evolution,” explained Pevzner.
“This microinversion detection method could be used for detecting human structural variants once we have the necessary data,” explained Ben Raphael, a professor of computer science at Brown University. Raphael is the second author on this paper and a former postdoctoral researcher in Pevzner’s lab at UCSD. Raphael helped establish CASB and is one of the organizers of the RECOMB Satellite Workshop on Computational Cancer Biology at Calit2 on September 16-18, 2007.
To create InvChecker, the researchers modified an existing software system created by UCSD mathematics professor Glenn Tesler (a former postdoctoral researcher in Pevzner’s lab at UCSD), in order to make it better at detecting microinversions and differentiating microinversions from other genomic rearrangements. Such false positives are generally not useful in understanding the history of evolution and can introduce error to the reconstruction of phylogenetic trees.
With InvChecker, the researchers analyzed the CFTR region in a collection of mammal species. CFTR is a heavily studied and highly conserved, gene rich area of human chromosome 7 that is home to the cystic fibrosis gene.
“It’s quite a subtle problem to find microinversions. Our goal is to use these tiny inversions to develop a history of species,” said Pevzner.
The researchers also used InvChecker to study the specific differences between humans and chimpanzees. They found that 80 percent of the microinversions between humans and chimps that were proposed last year are, in fact, repeat-induced artifacts and not microinversions. The researchers also uncovered 167 human-chimp microinversions recently missed by scientists using software other than InvChecker.
“This finding doesn’t change the conclusions between humans and chimps, but is does say that the detection of microinversion needs to be done carefully,” said Chaisson. “InvChecker does a more careful job of comparing sequences than previous attempts to find microinversions.”
With InvChecker, you can take the same genomic region from two species sequences, partition them into regions that are unique to one species or common to both (orthologous), and find how the order of these regions relates between the two species.
“We are looking for orthologous sequences in reverse order that are surrounded by elements in forward order. That’s a microinversion,” Chaisson explained.
|The character matrix for 67 microinversions in 15 species (Upper) and the matrix after performing the first 49 good inversions (Lower). Each column represents an orthologous inversion locus. Red and green cells represent inversion loci in opposite orientation, and gray cells correspond to ? signs (unknown orientation). Columns with a single green cell are inversions unique to a species. The number of inversions performed on each species is shown to the left of Lower.|
Microinversions have certain advantages over other evolutionary signals used for studying evolution such as amino acid changes, Chaisson explained. “With microinversions, it’s easy to develop evolutionary relationships between species and difficult to debate whether one species is inverted relative to another species.”
With InvChecker and microinversions, researchers are not limited to comparing species that are evolutionarily close, as is the case when using other genomic features like repetitive sequences and deletions for phylogenetic analysis. The new process can also detect microinversions that are the result of convergent evolution and thus do not play a role in tracking evolution and defining phylogenies.
Once the researchers have the microinversion data, they use it to reconstruct phylogenies using an algorithm that attempts to move “back in time” by iteratively undoing microinversions and bringing the existing species closer to the ancestral mammalian genome.