Scientists sequence a genome seven times bigger than yours

Using haploid DNA and advanced computer technology, researchers have finally managed to sequence the genome of the loblolly pine tree.

Conifers are the predominant members of the 300 million year old Gymnosperm clade. Conifers are also distinguished by their leviathan genomes. The reference genome sequence of Loblolly pine is published in the March issue of the journal GENETICS, published by the Genetics Society of America. Its 22-Gb genome size, makes it the largest genome sequenced and assembled to date.

Dr. Ronald Billings, Texas A&M Forest Service

March 20, 2014

After fitting 16 billion separate fragments together, scientists have finally managed to sequence the genome of the loblolly pine tree, the largest ever genome sequenced so far.

The scientists, who published their papers in GENETICS and the journal Genome Biology, used DNA extracted from a single haploid seed of a Loblolly pine tree.

To obtain the DNA, the scientists first had to remove the embryo from the seed, says Indiana University's Keithanne Mockaitis, an author on the paper. What remains is then a haploid, whose cells have just one set of chromosomes.

Why many in Ukraine oppose a ‘land for peace’ formula to end the war

Using next-generation sequencing technology, researchers obtained billions of shorter sequence of bases. The challenge now was to sift through the data, identify the overlapping sequences, and assemble them together – a computational puzzle called "genome assembly."

In the case of loblolly pine, the huge size of the genome made this process difficult.

The "challenge isn't just collecting all the sequence data. The problem is assembling that sequence into order," said David Neale, a professor of plant sciences at the University of California, Davis, who led the loblolly pine genome project.

"You have this big pile of tiny pieces and now you have to reassemble the book," said Steven Salzberg, professor of medicine and biostatistics at Johns Hopkins University, one of the directors of the loblolly genome assembly team, who was also an author on the papers.

As a solution, researchers developed a kind of software that eliminates repetitive base pairs from the original data, so that it can all fit within the memory of a supercomputer.

Howard University hoped to make history. Now it’s ready for a different role.

Getting rid of the redundancies is important because it leaves the computer with 100 times less sequence data to deal with, say researchers.

The loblolly will serve as a good "reference" genome because "the size of the pieces of consecutive sequence that we assembled are orders of magnitude larger than what's been previously published," said Dr. Neale.

The tree is a source of most American paper products. It is also an important feedstock for biofuel.

"In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied, " say researchers.