Plasmidsaurus fasta standardizer

I really like plasmidsaurus, and it’s an integral part of our molecular cloning pipeline. That said, I’ve found analyzing the resulting consensus fasta file to be somewhat cumbersome, since where they inevitable start their sequence string in the fasta file is rather arbitrary (which, I don’t blame them for at all, since these are circular plasmids with no particular starting nucleotide, and every plasmid they’re getting is unique), and obviously doesn’t match where my sequence starts on my plasmid map in Benchling.

For the longest time (the past year?) I dealt with each file / analysis individually, where I would either 1) reindex my plasmid map on Benchling to match up with how the Plasmidsaurus fasta file is aligning, or 2) Manually copy-pasting sequence in the Plasmidsaurus fasta file, after seeing hwo things match up after aligning.

Anyway, I got tired of doing this, so I wrote a Python script that standardizes things. This will still require some up-front work in 1) Running the script on each plasmidsaurus file, and 2) Making sure all of our plasmid maps in Benchling start at the “right” location, but I still think it will be easier than what I’ve been doing.

1) Reordering the plasmid map.

I wrote the script so that it reordering the Plasmidsaurus fasta file based on the junction between the stop codon of the AmpR gene, and the sequence directly after it. Thus, you’ll have to reindex your Benchling plasmid map so it exhibits that same break at that junction point. Thus, if your plasmid has AmpR in the forward direction, it should look like so on the 5′ end of your sequence:

And like this on the 3′ end of your sequence:

While if AmpR is in the reverse direction, it should look like this on the 5′ end of your sequence:

And like this on the 3′ end of your map:

Easy ‘nuf.

2) Running the Python script on Plasmidsauru fasta file.

The python script can be found at this GitHub link:

If you’re in my lab (and have access to the lab Google Drive), you don’t have to go to the GitHub repo. Instead, it will already be in the “[…additional_text_here…]/_MatreyekLab/Data/Plasmidsaurus” directory.

Open up Terminal, and go to that directory. Then type in “python3”. Before hitting return to run, you’ll have to tell it which file to perform this on. Because of the highly nested structure of how the actual data is stored, it will probably be easier just to navigate to the relevant folder in Finder, and then drag the intended file into the Terminal window. The absolute path of where the file sits in your directory will be copied, so the command will now look something like “python3 /[…additional_text_here…]/_MatreyekLab/Data/Plasmidsaurus/PSRS033/Matreyek_f6f_results/Matreyek_f6f_5_G1131C.fasta”

It will make a new fasta file suffixed with “_reordered” (such as “Matreyek_f6f_1_G1118A_reordered.fasta”), which you can now easily use for alignment in Benchling.

Note: Currently, the script only works for ampicillin resistant plasmids, since that’s somewhere between 95 to 99% of all of the plasmids that we use in the lab. That said, plasmidsaurus sequencing of the rare KanR plasmid won’t work with this method. Perhaps one day I’ll update the script for also working with KanR plasmids (ie. the first time I need to run plasmidsaurus data analysis on a KanR plasmid, haha).