We’re planning on submitting some dual indexed, paired read, amplicon sequencing samples, and depending on how many we have, we may submit for sequencing on a Miseq or Nextseq. Since this is all custom (and we generated the indices), I had to figure out how exactly how the process plays out for the two instruments to see what we need to the sample sheet for demultiplexing. I figure I’d illustrate it out for people in the lab so they understand the process as well.
The common steps
Everything starts with bridge amplification, where both the forward and reverse strands are physically bound to the flow cell by their 5′ ends. I’m denote where the DNA strands are physically bound to the flow cell with the grey circles on the ends.
Next is denaturing the strands so they are no longer bound, and also cleaving the strand that is bound to the flow cell via the p5 sequence. The result is something that looks like this.
Now the single stranded molecule is ready for sequencing from the read 1 primer. The light blue box is showing the nucleotides we’ll be reading in this particular library.
Once that read is done, next is reading the first index.
We didn’t do a ton of dual indexing of the libraries in my postdoc (and since Jason used to run all of the kits anyway), I didn’t really need to know these steps, but I’ve had to figure them out now setting everything up in my own lab. I’ll go over what happens with the Nextseq first, since that’s conceptually a bit easier. Here’s what the Illumina docs say.
The Nextseq way
So this setup allows for dual indexing read off of two custom primers. So for this specific library, this means that the complementary sequence is going to be synthesized, making a double-stranded bridged molecule again.
And then after everything is denatured and the original template strand removed (presumably by cleaving at the p7 sequence this time), the second index can now be sequenced.
Followed by sequencing of the second read.
The Miseq way
Looking at what Illumina says in their documents for dual indexing on the Miseq, it looks a little different:
Note: I didn’t realize this until we did the exercise, but the p5 cluster generator we used to use in the Fowler lab is longer than the actual p5 sequence Illumina gives in their manuals. Not quite sure for this discrepancy, though I’m assuming that may mean that the sequences immediately after the p5 oligo during this step won’t be our rather variable index sequences, and instead may be some constant bases preceding the indexes. I’m guessing this is not a deal-killer, but something we’ll still have to be cognizant of when determining the run programming.
After that, is the complete second strand synthesis, resulting in a double stranded bridged molecule again.
Followed by denaturing and cleavage (again, presumably of p7 sequence) as described above, followed by annealing of the read2 primer.
So why does this matter?
Well, I think the practical implications are a few-fold. Firstly, Miseq dual indexing won’t need that second custom index read primer, since it will be reading off of the p5 sequence. And this is further complicated by the fact that the p5 adapter sequence we added onto the amplicons may perhaps be a bit longer than what’s actually on the chip, so we may have to factor this into the run parameters. Secondly, the strand that is reading that second index is different. With the Miseq, it’s being read off that first strand and thus in the same orientation as index 1. On the Nextseq, it’s being read on the second strand, so it will be read in the opposite order as index 1. Thus, I think this matters in terms of whether we’re putting the forward or reverse complement in the sample sheet, which will differ depending on whether we’re going Miseq or Nextseq with these samples.