Anna has been testing her transposon sequencing pipeline, and needing some help processing some of her Illumina reads. In short, she needed to remove sequenced invariant transposon region (essentially a 5′ adapter sequence), trim the remaining (hopefully genomic) sequence to a reasonable 40nt, and then tabulate the reads since there were likely going to be duplicates in there that don’t need to be considered independently. Here is what I did.
# For removing the adapter and trimming the reads down, I used a program called cutadapt. Here's information for it, as well as how I installed and used it below. # https://cutadapt.readthedocs.io/en/stable/installation.html # https://bioconda.github.io/ ## Run the commands below in Bash (they tell conda where else to look for the program) $ conda config --add channels defaults $ conda config --add channels bioconda $ conda config --add channels conda-forge $ conda config --set channel_priority strict ## Since my laptop uses an M1 processor $ CONDA_SUBDIR=osx-64 conda create -n cutadaptenv cutadapt ## Activate the conda environment $ conda activate cutadaptenv ## Now trying this for the actual transposon sequencing files $ cutadapt -g AGAATGCATGCGTCAATTTTACGCAGACTATCTTTGTAGGGTTAA -l 40 -o sample1_trimmed.fastq sample1.assembled.fastq
This should have created a file called “sample1_trimmed.fastq”. OK, next is tabulating the reads that are there. I used a program called 2fast2q for this.
## I liked to do this in the same cutadaptenv environment, so in case it was deactivated, here I am activating it again. $ conda activate cutadaptenv ## Installing with pip, which is easy. $ pip install fast2q ## Now running it on the actual file. I think you have to already be in the directory with the file you want (since you don't specify the file in the command). $ python -m fast2q -c --mo EC --m 2 ## Note: the "python -m fast2q -c" is to run it on the command line, rather than the graphical interface. "--mo EC" is to run it in the Extract and Count mode. "--m 2" is to allow 2 nucleotides of mismatches.