Nanopore denovo assembly

Plasmidsaurus is great, but it looks like some additional companies aside from Primordium are trying to develop a “nanopore for plasmid sequencing” service. We just tried the Plasmid-EZ service from Genewiz / Azenta(?), partially b/c there’s daily pickup from a dropbox on our campus. At first glance, the results were rather mixed. Read numbers seemed decent, but 4 of the 8 plasmid submissions didn’t yield a consensus sequence. Instead, all we were given were the “raw” fastq files. To even see whether these reads were useful or not, i had to figure out how to derive my own consensus sequence from the raw fastq files.

After some googling and a failed attempt or two, I ended up using a program called “flye”. here’s what I did to install and use it, following the instructions here.

## Make a conda environment for this purpose
$ Conda create -n flye

## Pretty easy to install with the below command
$ Conda install flye

## It didn't like my file in a path that had spaces (darn google drive default naming), so I ended up dragging my files into a folder on my desktop. Just based on the above two commands, you should already be able to run it on your file, like so:
$ flye --nano-raw J202B.fastq --out-dir assembled

This worked for all four of those fastq files returned without consensus sequences. Two worked perfectly and gave sequences of the expected size, The remaining two returned consensus sequences that were 2-times as large as the expected plasmid. Looking at the read lengths, these plasmids did show a few reads that were twice as long as expected. That said, those reads being in there didn’t make the program return that doubly-long consensus sequence, as it still made that consensus seq even after the long reads were filtered out (I did this with fastq-filter; “pip install fastq-filter”, “fastq-filter -l 500 -L 9000 -o J203G_filtered.fastq J203G.fastq.gz”). So ya, still haven’t figured out why this happened and if it’s real or not, but even a potentially incorrectly assembled consensus read was helpful, as I could import it into Benchling and align it with my expected sequence, and see if there were any errors.

After this experience, I’ve come to better appreciate how well Plasmidsaurus is run (and how good their pipeline for returning data, is). We’ll probably try the Genewiz Plasmid-EZ another couple times, but so far, in terms of quality of the service, it doesn’t seem as good.