Split mCherry

I had a product that could have benefitted from using split mCherry to serve an AND function. Put the split mCherry in my usual mCherry spot in the recombination vector (2A’d with Puromycin, also), and I couldn’t see any visible fluorescence when both the small and big fragments were in the same cell. After some time, I saw a paper using split super-folder mCherry fused with the Spycatcher system, so I used that and that seemed to allow us to now see a shift in fluorescence off the background. I found another paper using an improved split super-folder mCherry (sfmCherry3C), and tested that, which seemed to work slightly better.

So while not perfect, in that there isn’t *complete* separation from the background distribution, it’s shifted away enough that it should serve my purposes (for now). Though, well, I’ll probably keep playing around with this (perhaps adding something like a leucine zipper?) and seeing if that helps increase fluorescence.

Identity matrix of indices used in the lab

We’ll be doing a lot of multiplex amplicon-based Illumina sequencing, which means we’ll eventually have a lot of different indices (I think some people refer to these as barcodes) used to multiplex the samples. I’m doing everything as 10 nt indices, so theoretically there is 4^10 or slightly over one million unique nucleotide combinations that could be made with an index of that length. I don’t intend of having anywhere close to 1 million different primers, so I think we’re pretty safe.

That said, I’d like to ensure our indices are of sufficient distance away from each other such that erroneous reads don’t result in switching of one index for another. Anh has come up with a way that we can make sure our randomly generated indices don’t overlap with previous indices, but still useful for me to keep track and make sure things are running smoothly. Thus, I generated an identity matrix of all of the indices we have in the lab right now.

In a sense, the diagonal is a perfect match, and serves as a good positive control for the ability to see what close matches look like. By eye, the closest matches between any two unique indices seem to be 70% identity, which I can live with.

Dual index on Miseq and Nextseq

We’re planning on submitting some dual indexed, paired read, amplicon sequencing samples, and depending on how many we have, we may submit for sequencing on a Miseq or Nextseq. Since this is all custom (and we generated the indices), I had to figure out how exactly how the process plays out for the two instruments to see what we need to the sample sheet for demultiplexing. I figure I’d illustrate it out for people in the lab so they understand the process as well.

The common steps

Everything starts with bridge amplification, where both the forward and reverse strands are physically bound to the flow cell by their 5′ ends. I’m denote where the DNA strands are physically bound to the flow cell with the grey circles on the ends.

Next is denaturing the strands so they are no longer bound, and also cleaving the strand that is bound to the flow cell via the p5 sequence. The result is something that looks like this.

Now the single stranded molecule is ready for sequencing from the read 1 primer. The light blue box is showing the nucleotides we’ll be reading in this particular library.

Once that read is done, next is reading the first index.

The dissimilar steps

We didn’t do a ton of dual indexing of the libraries in my postdoc (and since Jason used to run all of the kits anyway), I didn’t really need to know these steps, but I’ve had to figure them out now setting everything up in my own lab. I’ll go over what happens with the Nextseq first, since that’s conceptually a bit easier. Here’s what the Illumina docs say.

The Nextseq way

So this setup allows for dual indexing read off of two custom primers. So for this specific library, this means that the complementary sequence is going to be synthesized, making a double-stranded bridged molecule again.

And then after everything is denatured and the original template strand removed (presumably by cleaving at the p7 sequence this time), the second index can now be sequenced.

Followed by sequencing of the second read.

The Miseq way

Looking at what Illumina says in their documents for dual indexing on the Miseq, it looks a little different:

So the clear difference here is that the second index for the Miseq system is without a second custom index seq primer, but instead *during* the second strand synthesis step primed by the p5 oligo. With our specific library, it will look like this, starting with sequencing of that second index:

Note: I didn’t realize this until we did the exercise, but the p5 cluster generator we used to use in the Fowler lab is longer than the actual p5 sequence Illumina gives in their manuals. Not quite sure for this discrepancy, though I’m assuming that may mean that the sequences immediately after the p5 oligo during this step won’t be our rather variable index sequences, and instead may be some constant bases preceding the indexes. I’m guessing this is not a deal-killer, but something we’ll still have to be cognizant of when determining the run programming.

After that, is the complete second strand synthesis, resulting in a double stranded bridged molecule again.

Followed by denaturing and cleavage (again, presumably of p7 sequence) as described above, followed by annealing of the read2 primer.

So why does this matter?

Well, I think the practical implications are a few-fold. Firstly, Miseq dual indexing won’t need that second custom index read primer, since it will be reading off of the p5 sequence. And this is further complicated by the fact that the p5 adapter sequence we added onto the amplicons may perhaps be a bit longer than what’s actually on the chip, so we may have to factor this into the run parameters. Secondly, the strand that is reading that second index is different. With the Miseq, it’s being read off that first strand and thus in the same orientation as index 1. On the Nextseq, it’s being read on the second strand, so it will be read in the opposite order as index 1. Thus, I think this matters in terms of whether we’re putting the forward or reverse complement in the sample sheet, which will differ depending on whether we’re going Miseq or Nextseq with these samples.

First paper from the lab published!

The first paper from our lab is now out in PLOS Pathogens! We created a panel of ACE2 variant cells and found that (pseudo)viruses with SARS-CoV spike, or the WT or N501Y SARS-CoV-2 spikes, differentially use the ACE2 protein surface during entry. This was a team effort, with Nidhi and Sarah on pseudotyped virus assays and molecular cloning, with Anna and Vini performing the BSL3 SARS-CoV-2 work. Great job, everyone!

The lab is awarded an ESI R35 from NIGMS!

Our application, titled “Recombinant DNA Technologies for Multiplex Genetic Assays in Human Cells” was funded by the National Institutes of Health, National Institute for General Medical Sciences. This five year, $250,000 direct cost per year grant will support our continued efforts pairing landing pad -based cell engineering with multiplex assays to unlock new aspects of protein and cell biology, as well as improving our understanding of human genetics. Goals include creating generalizable, multiplex methods for functional complementation, fluorescent transcriptional reporters, & large-scale cDNA screening. Thank you NIH NIGMS for supporting us with this wonderful funding mechanism!

Estimating coverage for NNK SSM transformations

We were recently doing some small scale Gibson-based NNK site saturation mutagenesis PCR reactions. In this scheme, we are independently transforming each position separately, so the number of transformants (ie. colonies) on a given plate should be directly related to the likelihood that all of the desired variants that we want to see are there at least once.

In fact, there are three parameters that factor into how good the variant coverage is at a given position. This is going to be 1) nucleotide biases in the creation of the NNK degenerate region of the primer, 2) the number of transformants, and 3) the fraction of the number of total transformants are actually variants, rather than undesired molecules such as carryover of the WT plasmid used for the template.

For any given experiment, you’re not going to know what the nucleotide bias is like until you actually Illumina sequence your library…. but at that point, you’ll already know the variant coverage of your library, so no need to estimate it anymore. On the other hand, if you know the nucleotide biases you observed for similar libraries, then you can do this estimation far before you get around to Illumina sequencing. Based on previous libraries, I have a pretty good idea of what the biases from machine-mixed NNK primers from IDT are like. For simplicity sake, I’m using 40% G, 20% C, 20% A, and 20%T as a rough estimate for the nucleotide bias I saw in the most biased NNK libraries.

The other two parameters are going to be very much experiment specific, and can be determined shortly after generating the library. The number of transformants can be determined by counting colonies from the transformation. And the amount of template contamination can be roughly determined by performing Sanger sequencing on a handful of colonies from those plates. Thus, I chose a few reasonable values for each: colony counts ranging from the very small (10 and 20) to quite large (400 and 1000), and template contamination percentages from almost impossibly low (0%) and much more likely (10 or 20%) all the way to possibly prohibitively high (50% and 75%). I then simulated the entire process, bootstrapped 20 times to get a sense of the average output, and made a plot showing what types of variant coverages you get depending on the combinations of those observed parameters. This is what the plot looks like:

So there you go. In a reasonable condition where you have, let’s say 10 or 20% template contamination, then you’d really be hoping to see at least 200 colonies, and hopefully around 400, where you can then really pat yourself on the back. If things went awry with the DPNI step, for example, and you were getting between a quarter to a half of colonies being template, then you’d minimally want 400 or so colonies and don’t feel too safe until you got a fair bit more than that. Though that’s only to make sure you at least have one copy of every variant at that position. If your library is half template, then chances are you’ll be running into a bunch of other problems down the line.

Vacuum Concentration

I hate the high cost of research lab materials / equipment, especially when the underlying principles are pretty simple and mundane. For example, I’ve used blue LEDs and light-filtering sunglasses to visualize DNA with SYBR Safe. And I’ve used a mirrorless digital camera paired with a Python script to visualize Western blots.

Well, this time around I was thinking about vacuum concentration. Many of the lab-spaces I’ve been around have had speed-vacs accessible, though I’ve never really used them since I don’t ever really need to lyophilize or concentrate aqueous materials. Though the other day, we had some DNA that was 1.5 to 2-fold less concentrated then we needed for submission to a company, and I was reluctant to ethanol precipitate or column-concentrate the sample at the risk of losing some of the total yield. Thus, became curious about taking advantage of vacuum concentration.

So the lab already has built-in vacuum lines, so I just needed a vessel to serve as a vacuum chamber. I bought this 2-quart chamber from Amazon for $40, and started seeing what rates of evaporation I see if I leave 200uL of ddH2O in an open 1.5mL tube out on the bench, or if I instead leave it in the vacuum chamber.

The measurements of vacuums are either in “inches of mercury”, starting at 0″ Hg, which is atmospheric pressure, to 29.92″ Hg, which is a perfect vacuum (so no air left). As you can see, the built in vacuum lines at work top out at ~ 21″ Hg, so somewhat devoid of air, yes, but far from a perfect vacuum. I even did a test where I put in a beeping lab timer into it, and while the vacuum chamber did make it a lot quieter, it was far from completely silent, like the vacuum chamber exhibit at the Great Lakes Science Center achieves (here’s the Peeps version). But what does it do for vacuum concentrating liquid? Here’s a graph of the results, when performed at room temperature.

So the same sample in the vacuum is clearly evaporating much faster. I can make a linear model of the relationship between time and amount of sample lost (which is the line in the above plot), and it looks like the water is evaporating at about 1% (or 2 uL) per hour in atmospheric conditions (oh the bench), while it’s evaporating at about 2% (or 4 uL) per hour in the vacuum chamber. Thus, leaving the liquid in the vacuum chamber for 24 hours resulted in half the volume, or presumably, a 2-fold concentration of the original sample.

Clearly, this is not a speedvac. If I understand it correctly, speedvacs also increase temperature to speed up the evaporation process. I could presumably recreate that by putting a heating block under the vacuum chamber, but I haven’t gotten around to trying that yet. There also is no centrifuge. While I could probably modify and fit one of my Lego minicentrifuges inside, the speed of evaporation at room temp has been slow enough that everything has stayed on the bottom of the tube anyway, so it’s not really a worry so far. At some point, I’ll also perform a number of comparison at 4*C as well (since the vacuum chamber is so small, I can just put it in my double-deli lab fridge), which may make more sense for slowly concentrating more sensitive samples.

Overall, for a $40 strategy to achieve faster evaporation, this doesn’t seem too bad. In the future, if we need to concentrate a DNA sample 2-fold or so, maybe it’s worth just leaving it in the vacuum chamber overnight. Furthermore, the control sample is kind of interesting to consider, as it’s now defined how fast samples left uncapped on the bench may evaporate (I suppose I’ll try this with capped samples at some point as well, which will presumably evaporate a little bit slower). Same thing with samples kept in the fridge, which are also evaporating at a slow but definable rate. After all, “everything is quantifiable“.

1/25/2023 Update: In explaining this as a potential option, I used the word “slow-vac” which is good name for this. Time to trademark it! Though other people were onto this name a while back so maybe they did (obviously they didn’t).

HEK 293T Bxb1 Landing Pad recombination protocol

Due to popular request, I’m going to put my *most current* version of the HEK 293T Landing Pad recombination protocol here for others’ benefit. Much of the credit goes to Sarah Roelle, who wrote up this current version of the protocol.

Recombination:

[This protocol is for a 24-well plate:]
Day 1:

1) Make 2 transfection mixtures per sample:
Tube 1: 23 μL Opti-MEM + 1 μL Fugene6

Tube 2 (If using cells that don’t already express the Bxb1 recombinase enzyme): μL of DNA corresponding to 16 ng Bxb1 plasmid + μL corresponding to 224 ng attB plasmid and OPTI-MEM to a final tube volume of 24 μL.
-OR-
Tube 2 (If using cells that already express the Bxb1 recombinase enzyme): μL of DNA corresponding to 240 ng attB plasmid and OPTI-MEM to a final tube volume of 24 μL.

2) Mix Tube 2 into Tube 1 for each sample. Mix up and down a couple times with pipet. Then let the mixtures sit for 15-30 min while you get the cells ready (Unless you trypsinized and counted the cells first, to know how many you had in case they were limiting).

4) [Meanwhile] Trypsinize and count the cells. Add 120,000 cells per well in a final volume of 300 μL media to each well.

5) Once at least 15 minutes have passed since mixing, add the mixtures dropwise throughout the well of cells being transfected.

Day 2:
Add at least 500 μL media to each well.

Notes about scaling up / down:
We typically pilot new plasmids in a 24-well scale, and transfect larger populations of cells by transfecting 6-wells and increasing the number of plates / wells as needed.

Negative selection (if applicable):

Negative selection with iCasp9 is wonderful, and hopefully you are using one of those landing pad versions. I typically use 10nM final concentration of AP1903, although I’m fairly certain 1nM is just as effective. Death occurs in a couple of hours, so you can come back to your plate / flask later on in the day and change media to get rid of the dying cells. I typically wait until at least 72 hours after recombination to perform this step.

Important note: If you’re doing a library-based experiment, then make sure you leave some cells aside (even 100k to 500k cells will be plenty) that DO NOT go through any selection steps, since this will allow us to estimate the number of recombined cells (and thus estimate the coverage of your library; see below for the calculation). If you’re just doing individual samples, this isn’t nearly as big of a deal, since you’ll be able to visually inspect the number of recombinants (ie. do you see only a handful of surviving cells, or is it 100+ individual cells surviving?).

Positive selection (if applicable):

I tend to do the positive selection step AFTER negative selection, since the negative selection step has usually thinned the number of cells down enough such that the positive selection is going to be most effective (I find that over-confluent wells tend not to do super-grant with positive selection). Thus, while it can be done as soon as 72 hours post recombination, I tend to do this a week or later after recombination. You can find effective concentrations for positive selection of landing pad HEK 293T cells here.

Some additional notes:

  1. Due to a phenomenon of (annotated) promoter-less expression from the thousands of un-recombined plasmids that remain in the cell following transfection, I typically wait ~ 5 to 7 days before running any cells in the flow cytometer, to wait for that background signal to die down.
  2. Other transfection methods may work, but will need to be optimized. For example, I have observed transfection with lipofectamine 3000 to result in prolonged promoter-less expression, probably due to increased transfection and greater toxicity preventing cell division and thus plasmid dilution.
  3. Calculating the number of recombinants. OK, so as I alluded to above, it’s definitely worth estimating the number of recombinants of any library-based experiments. To do this, take your observed recombination rate from running flow on your unselected cells (say, you see 5% of cells as being mCherry+ / recombined when you ran flow on unselected cells 7 days after transfection), and multiply that fraction with the number of cells you transfected. So, if we pretend that you transfected 20 million HEK 293T cells and you had observed 5% of cells recombined in your unselected well, then the rough calculation is 2e7 cells * 0.05, amounting to roughly 1 million cells estimated to have been recombined. Of course, directly sequencing what’s there in the recombined library is a more direct measurement of library coverage at the recombined cell step, but doing this calculation is still usually worthwhile as another line of evidence.

Conda virtual environments

If you’re going to do things with the bash command line, you’ll inevitably have to install a number of packages / dependencies. Having a package manager can help with this, along with virtual environments, that allow you to keep the versions of packages you need for different purposes relatively organized. For package managers, I have tended to like Anaconda the most. So install that. I’m not going to go through the steps here, since I already have it installed, so I’d really have to go out of my way to try to describe that. But it should be pretty straightforward.

Well, I want to create and run a script using biopython, but I don’t already have it installed on this computer, so this seems like a nice time to make a virtual environment for it. The steps I did were as follows:

  1. Create a virtual environment for biopython.
    $ conda create -n biopython
  2. Activate the virtual environment
    $ conda activate biopython
  3. Next, install biopython.
    $ conda install -c conda-forge biopython

That should be it. Whenever you want to deactivate the virtual environment, you can type:
$ conda deactivate

5/17/22 edit: I’ve been doing some image analysis in python, that requires installing some of the following packages:
$ conda config –env –add channels conda-forge
$ conda install numpy
$ conda install matplotlib
$ conda install scipy
$ conda install opencv
$ conda install -c anaconda scikit-image
$ conda install -c conda-forge gdal