Using PEAR to deal with paired reads

Got some paired Illumina sequencing back today, and wanted to pair the R1 and R2 fastq files. I vaguely remember using PEAR to do this before, so gave it a shot. Since the files are relatively small, I figured I’d just try to run this locally, which meant trying to install it (and it’s various dependencies) on my Macbook. Here are the steps…

  1. Download “autoconf-2.65.tar.gz” from this website. Uncompress it, and then go into the now uncompressed directory, and type in “./configure && make && make install”

2. Download “automake-1.14.1.tar.gz” from this website. Uncompress it, and then go into the now uncompressed directory, and type in “./configure && make && make install”

3. Get PEAR academic from this website by entering in your info, receiving an email, and downloading “pear-src-0.9.11.tar.gz”. Uncompress it, and then go into the now uncompressed directory, and type in “./configure && make && make install”

Huzzah. It should now be installed.

Now you can go to whatever directory has your R1 and R2 files, and type in something like so:
“pear -f KAM-IDT-HM_R1_001.fastq -r KAM-IDT-HM_R2_001.fastq -o IDT_HM”

You should now have a file called “IDT_HM.assembled.fastq” that has all of the joined paired reads.

5/18/21 Update: I was working on something else that required trimming reads. I decided to use fastp to do this. The instructions to install fastp is as follows (essentially following the instructions here):

  1. Download the zip file (under the “Code” button) from, and unzip it.
  2. Use terminal to go to the directory that is made, and run
    $ make
  3. Run the following command:
    $ sudo make install
    … and then type in your computer password when prompted. Fastp should now be installed.

Landing pad plasmid maps

I still intend to get around to posting most if not all landing pad plasmids to Addgene, but it’s taken me forever to get around to it. Once those are there, the plasmid sequences will obviously be publicly available. In the mean time, I figure I’d post some of the most common plasmid maps here, so other people can benefit form them (or I don’t have to send specific emails to each person who asks for them). So here are some of the most popular, published plasmids, in GenBank (.gb) format.
















Chemical transformation efficiencies

We do a lot of molecular cloning in the lab. Standard practice in the workflow is to make your own home-made chemically competent NEB 10-beta bacteria to be used fresh the day of the transformation. This has worked surprisingly well, and we have made ~ 350 different plasmid constructs in the first ~ 600 days). Each time you do the transformation, it’s important to include a positive control (we use 40 ng of attB-mCherry plasmid) to make sure the transformation step was performing properly (to help interpret / troubleshoot what may have gone wrong if you had few / zero colonies in your actual molecular cloning transformation plates). I’ve done this enough times now to know what is “normal”. Thus, especially for new members in the lab, please reference this plot to see how your own transformation results compare to how it has worked for me, where I typically get slightly more than 10,000 transformants (Note: you may get numbers better than me, which is a good thing!).

Edit 2/24/2022: It recently dawned on me that we haven’t necessarily been using a standard number of bacteria per transformation (ie. we typically prep the same amount of bacteria, and that bacteria normally just gets split into however many transformations are being done at the time. Thus, if there are 10 plasmids being transformed, the number of bacteria per transformation is ~ one fifth of the number of bacteria for when two plasmids are being transformed.

Well, since I had a bunch of data recorded for our positive control transformations (40ng of attB-mCherry plasmid), I decided to see what number of bacteria we had used and if there were any patterns in the transformation efficiencies:

So it looks like, depending on the instance, we used anywhere between 300 million to 10 billion bacteria per transformation. Notably, there was no real pattern in the transformation efficiency, were on average, it worked just as well with the half a billion as it was for 5 billion. Thus, it’s likely that the limiting factor at this point is the DNA and the bacteria are in vast excess.

For those curious, I had previously mentioned that 1mL of our e.coli at an OD of 1 corresponded to 5e8 bacterial cells / colony forming units. Thus, for a 180 mL culture at an OD of 0.2, I would do [5e8 cells/mL at OD1] * [0.2 of OD1] * [180 mL] * [1 billion cells / 1e9 cells] = ~ 18 billion cells. So if we were doing 9 transformations with this culture, it would be about 2 billion cells for each transformation.

HEK 293T Landing Pad Dose Response Curves

Positive selection for recombined selection is a pretty useful technique when working with the HEK 293T landing pads, but people might not know what concentrations are best. Well, here is that info, all in one place. In each case, the antibiotic resistance gene had been placed after an IRES element in an attB recombination plasmid, and was linked to mCherry using a 2A stop-start sequence. Thus, the Y axis is showing how well the mCherry+ cells were enriched at various concentrations of antibiotic.

Consistent with the above plots, I generally suggest people use 1 ug/mL Puro, 10 ug/mL Blast, 100 ug/mL Hygro, and 100 ug/mL Zeocin for selections.

Nucleotide biases in primers

We generally talk about degenerate nucleotide positions in synthesized oligo (like primers) as if they are this automatic and even thing, but like everything else, things can be a lot more complicated. We recently ordered some primers with six degenerate positions (so NNNNNN) from ThermoFisher, and they were predominantly filled with T’s, while the same oligo ordered from IDT was a lot more balanced in terms of nucleotide composition. While this was done at really small (Sanger) scale and not quantitative, this reminded me of nucleotide biases I observed in libraries created in the Fowler lab using Inverse PCR (where the degenerate sequence is not hybridizing with a partner strand during initial hybridization, so no degenerate sequence should have a competitive edge over another, at least in terms of hybridization). These were all made with IDT primers. I dug up that slide, and have posted it below:

So in this analysis, you can see that there is a ubiquitous albeit somewhat variable G bias across the libraries. This often resulted in the presence of more glycines than expected. Overall, I don’t think this is ALL that bad; certainly still WAY better than the rampant (like, 86% T) bias with the ThermoFisher primer (When I emailed them about this, they told me “Unfortunately, we do know exactly what the issue is. This is an inherent instrument issue with how automated degeneracies are produced on the existing synthesizer. The only way to fix it from a customer perspective is to order using special instructions to “hand mix” the degeneracy in an optimal ratio to produce as close to a 25% balance as possible.”). So while I’ll still buy regular cloning primers from Thermo (b/c they are the cheapest for me), all degenerate primers will have to go through IDT (or maybe eventually EuroFins, as the Roth lab has noted good luck with them).

5/15/21 update: OK, got some Illumina sequencing back to actually quantitate this, and the ThermoFisher degenerate primer definitely suffered from too many T’s, with ~ 49% of all nucleotides being T’s. Actually, the IDT hand mixed did worse than IDT machine mixed, which had the closest to the target values. Interesting.

Mutational Scanning Symposium 2021

The 2021 symposium was held remotely, and videos from roughly half of the talks are posted onto YouTube, at the CMAP_CEGS channel. This includes Kenny’s talk from the workshop portion this year. In contrast to his talk from last year (which really focused on the basic with few specifics), the talk from this year was much more about a specific example fitting into some of those more general considerations (with unpublished data to boot!).

Quantitating Western blots

Almost everybody hates doing Western blots since they’re so labor intensive (and somewhat finicky), but they are undeniably useful and will forever have a place in molecular biology / biomedical research. We’re currently putting together a manuscript where we express and test variants of ACE2, which requires some Western blotting quantitation. Since I’m about to do some of the quantitation now, I figured I’d just record the steps I do it so trainees in the lab have a basic set of instructions they can follow in the future.

This already assumes you have a “.tif” file of western blot. While I someday hope to do fluorescent westerns, this will likely be a chemiluminescent image. Hopefully this isn’t a scan of a chemiluminescent image captured on film, b/c film sux. So you presumably took it on a Chemidoc or an equivalent piece of equipment. Who knows; maybe I finally found the time to finish / standardize my chemiluminescent capture using a standard mirrorless camera procedure (unlikely). I digress; all that matters right now is you already have such an image file ready to quantitate.

My favorite method for Western blot quantitation is to use “Image Lab” from BioRad. You know, I like BioRad. And I definitely like the fact that they provide this software for free. Anyway, download and install it as we’ll be using it.

Once you have it installed, start it up and open your image file of interest. The screen will probably look something like this:

First off, stop lying to yourself. By default, the imagine is going to be auto-scaled so that the darkest grey values in your image get turned to black, while the lightest grey values get turned white. But in actuality, the image you see is not going to be the “raw” range of your image, so you may as well turn off the auto-scaling so you see your image for what it really is. Thus, press that “Image Transform” button…

… and get a screen that looks like below:

See where those low and high values are? Awful. Set the low value to 0 and the high value to the max (65535).

And now the image looks like this, which is great, since this is what the actual greyscale values in your image actually look like.

OK, now we can actually start quantitating the bands in the plot. First off, don’t expect your western blot to look 100% clean. Lysates are messy, and you can’t always get pure, discrete bands. Sure, some of the lower-sized bands may be degradation products that happened when you accidentally left the lysates out of ice (you should avoid that, of course). Then again, you may have done everything perfectly bench-wise, and the lower-sized bands may be because you’re overexpressing a transgene, and proteins naturally get degraded, and overexpressed proteins may tax the normal machinery and get degraded more obviously. I say the best thing is to acknowledge that happens, show all your results / work, and make the more educated interpretations that you can. Regardless, in that above plot, we’re going to try to quantitate the density of the highest molecular weight band, since that should be the full-length protein. To do that, first select the “Lane and bands” button on the left.

I then press the “Manual” lane annotation button.

In this case, I know I have 11 lanes, so I enter that and get something that looks like this:

Clearly that’s wrong, so grab the handles on the edges and move the grid such that it actually falls more-or-less on the actual lanes.

Sure the overall grid is pretty good, but maybe it’s not perfect for every single individual lane. The program also lets you adjust those. To do that, click off the “Resize frame” button on the left so it’s no longer blue…

And then adjust the individual lanes so they fit the entire band as your human eyes see them, resulting in an adjust grid that looks like this:

Nice. Now go to the left and select the tab at the top that says “Bands”, and then click on the button that says “Add”.

Once you do that, start clicking on the bands you want to quantitate across all of the lanes. You may have to grab the dotted magenta lines in each lane to adjust them so that the actual band is within them (and presumably somewhere near the solid magenta line which should be somewhere in between them). This is what it looks like after I do that:

It’s good to check how well the bands are being seen by the program. Go to the top and press the “Lane profile” button. It should give you a density plot. This is also the window where you can do background subtraction. Find a number that seems sensible (in this case, a disk size of 20 mm seems reasonable), and make sure you hit the “apply to all lanes” button so it propagates this across lanes. While I’m only showing the picture for lane 5, it’s probably worth scanning across the lanes to make sure the settings are sensible.

Now with those settings fine, close out, and then click on “Analysis table” at the top. Once that is open, go to the bottom and click on “lane statistics”. These should be the numbers you’re looking for.

Now export the statistics (either pressing the “copy analysis table to clipboard” button and pasting in an spreadsheet you want to use, or the “export analysis table to spreadsheet”). The number you’ll be looking to analyze will be those in the “Adj. Total Band Vol” column.

Note: Now that I’m doing this, the “standard curve” button is now ringing a bell. I’m fairly certain that in my PhD work, when I ran a ton of Western blots or just straight up protein gels stained with coomassie, that I would run dilutions of lysates / proteins to make a standard curve of known proteins amounts that I could calibrate the densitometry against. We obviously didn’t do that here, since we didn’t have the space, so these numbers aren’t going to be quite as accurate if we had done so. Still, getting some actual numbers we can compare across replicates is still a major step up than not quantitating and having everything be even more subjective.

Terrific for lentivector growth?

At some point, I was chatting with Melissa Chiasson about plasmid DNA yields, and she mentioned that her current boss had suggested using terrific broth instead of Luria broth for growing transformed bacteria. I think both of us were skeptical at first, but she later shared data with me showing that DNA from e.coli grown in TB had actually given her better yield. I thus decided it was worth trying myself to see if I could reproduce it in my lab.

There are two general types of plasmids we tend to propagate a lot in my lab. attB recombination vectors, for expressing transgenes within the landing pad, and also lentiviral vectors of the “pLenti” variety, which play a number of different roles including new landing pad cell line generation and pseudovirus reporter assays.

I first did side-by-side preps of the same attB plasmids grown in TB or LB, and TB-grown cultures yielded attB plasmid DNA concentrations that were slightly, albeit consistently worse. But I eventually I tested some lentiviral vector plasmids and finally saw the increase in yield from TB that I had been hoping for. Relaying this to Melissa, she noted she had been doing transformations with (presumably unrelated sets of) lentiviral vectors herself, so these observations had been consistent after all.

Thus, if you get any attB or pLenti plasmids from me, you should probably grow them in LB (attB plasmids) and TB (pLenti plasmids), respectively, to maximize the amount of DNA yields you get back for your efforts.

COVID testing at CWRU

As a PI, I feel it’s important to know how safe my employees are if coming to the campus to work during the pandemic. While CWRU was rather slow to respond in getting on-campus testing set up, they did set up a surveillance testing program and a public website to post the results, which has largely been reassuring. I’ve been keeping track of the results every week for the last few months and will continue to do so for the foreseeable future. This is what things currently look like:

As of writing this (the first week of February), the absolute numbers of infected students / faculty / staff in a given week are firmly in the double digits, but thankfully the test percent positivity has been at or under 1%, unlike November & December. Now that the students are back for the new semester, we will see how the pattern may change, but at least the pandemic has felt largely under control here, at least in the broader context of the conflagration of viral spread we’ve been seeing in this country over the past year.