Haplotagging is a fast and easy technique to, quite literally, haplotype tagging (and yes, sequencing, too!)
We developed this technique because we want to: 1. sequence a large number of samples; 2. achieve high molecular resolution; and 3. achieve 1. and 2. within a reasonable budget.
We realized that existing sequencing techniques all leave us wanting. Short-read Illumina sequencing is cheap and higly accurate. But they break up the genome into tiny pieces and so leave out haplotype information. Long-read sequencing tend to be expensive, labourious, and error prone.
Then there is linked-read sequencing, which combines the advantage of ease and accuracy of short-read sequencing with the haplotype information of long-read sequencing.
But existing commerical options are very expensive and very low throughput.
We needed a new way forward.
A couple of papers on continguity-preserving tagmentation sequencing (CPTseq) from Jay Shendure (UW Seattle) and Frank Steemers (Illumina) provided the answer (Amini et al., Nat Biotech, 2014; Zhang et al., Nat Biotech, 2017). They show a completely different way of molecular barcoding from the leading commercial method by 10X Genomics at the time. Unlike 10X Genomics's Chromium platform, which isolates and manipulates individual DNA fragments ("molecules") in microfluidic droplets, CPTseq takes advantage of Tn5, a transposase protein that is capable of transferring moleuclar adapters to DNA sequences in a single enzymatic reaction.
This is a BIG deal, because Tn5 has already been in use under the name "Nextera", to make generating Illumina sequencing library a simple, high-throughput process.
In Zhang et al., 2017, they have also shown, intriguingly, that in a test tube, DNA wraps itself around microbeads in solution.
Separately, in our lab we have also picked up and optimized a simple way to attach Tn5 transpase complexes ("transposomes") onto microbeads.
So if we put all of these different components together... what if we can individually barcode each microbead that carries Tn5 tranposomes on them, wouldn't that give us a way to molecularly barcode each long DNA molecule in solution. In other words, putting Tn5 transposomes onto individually barcoded beads can give us a simple way of linked read sequencing.
We have now shown, in Meier et al., PNAS, 2021, that it works! Not only can we make linked-read libraries and phase variants into megabase-long phased blocks in single individiuals just like 10X's Chromium platform. Because it is now a 10-minute enzymatic reaction with PCR, we have now made thousands of haplotagged libraries from humans, mice, butterflies and more organisms.
This allows us to create genomic data of a scale and quality unlike anytime before.
Using haplotagging, we can map genes, detect signatures of selection, track the breakdown of haplotypes, and also detect large structural rearrangements like inversions. All of this comes at minimal costs (in fact, in many ways it is cheaper than making standard TruSeq libraries), but with all the analytical benefits.
Check the paper out!
Want to know how this works? You can watch a Webinar I have made with Illumina
by following this link.
For associated analytical code, check out our github repositories here and here.
We are actively compiling experimental protocols. Here are a few:
This work is generously supported by the Europea Research Council Project HybridMiX and the Max Planck Society