A fish with a genome 30 times larger than ours has been sequenced.

Zoom in / African lungfish showing off its thin, light fins.

When it was first discovered, the coelacanth caused a lot of excitement. It was a living example of a group of fish that was thought to exist only as fossils. And not just any group of fish. Coelacanths and their relatives, with their long, stalk-like fins, are thought to include the ancestors of all vertebrates that aren’t fish—the tetrapods, or vertebrates with four limbs. That means, among many other things, us.

But since then, evidence has accumulated that we are more closely related to lungfish, freshwater fish found in Africa, Australia and South America. But lungfish are a bit of a weirdo. Both African and South American species have seen their ancestors’ limb-like fins reduced to thin, flexible filaments. Getting some perspective on their evolutionary history has proven difficult because they have the largest known animal genome—the South American lungfish genome contains more than 90 billion base pairs. That’s 30 times as much DNA as ours.

But new sequencing technologies have made it possible to meet these kinds of challenges, and an international collaboration has now completed the largest genome ever, with all but one of its chromosomes carrying more DNA than the human genome. The work points to a history in which the South American lungfish has been adding an extra 3 billion bases of DNA every 10 million years for the past 200 million years, all without adding a significant number of new genes. Instead, it appears to have lost the ability to keep unwanted DNA in check.

See also  These stunning images captured by NASA's James Webb Telescope are a treat for space lovers

go long

This work was enabled by a technique commonly called “long-read sequencing.” Most genomes have been sequenced using short reads, typically in the range of 100–200 base pairs. The key was to do enough sequencing so that each base in the genome was sequenced multiple times on average. A cleverly designed computer program could then determine where two pieces of sequence overlapped and record that as a single, longer piece of sequence, repeating the process until the computer had produced long strings of contiguous bases.

The problem here is that most non-microbial species contain stretches of repeated sequence (think hundreds of copies of the bases G and A, respectively) that are more than a few hundred bases long—and nearly identical sequences that appear at multiple locations in the genome. It is impossible to match these sequences at a unique location, so the output of a genome assembly program will contain many gaps of unknown length and sequence.

This creates a huge challenge for genomes like the lungfish genome, which is full of “non-functional” “junk” DNA, all of which is usually duplicated. The program tends to produce a genome with more gaps than sequence.

Long-read technology gets around this problem by doing exactly what its name suggests. Instead of being able to sequence fragments of 200 or so bases, it can generate sequences thousands of base pairs long, easily covering entire repeats that would otherwise create a gap. One early version of long-read technology involved stuffing long DNA molecules through the pores and watching for different changes in voltage across the pores as different bases passed through. Another had a DNA replication enzyme make a duplicate of a long strand and watch for fluorescence changes as different bases were added. These early versions tended to be a bit error-prone but have since improved, and there are several newer competing technologies on the market now.

See also  Astrophysicists have created a 'time machine' simulation to observe the life cycle of the ancestors of galactic cities

In 2021, researchers used this technique to: Complete the genome Scientists have found genomes from African and South American species, each of which appears to have gone its own way during the breakup of the supercontinent Gondwana, a process that began nearly 200 million years ago. Obtaining the genomes of all three species should give us some insight into the traits that all lungfish have in common, and thus are likely to have shared with the distant ancestors that gave rise to tetrapods.

Leave a Reply

Your email address will not be published. Required fields are marked *