HEK 293T Bxb1 Landing Pad recombination protocol

Due to popular request, I’m going to put my *most current* version of the HEK 293T Landing Pad recombination protocol here for others’ benefit. Much of the credit goes to Sarah Roelle, who wrote up this current version of the protocol.

Recombination:

[This protocol is for a 24-well plate:]
Day 1:

1) Make 2 transfection mixtures per sample:
Tube 1: 23 μL Opti-MEM + 1 μL Fugene6

Tube 2 (If using cells that don’t already express the Bxb1 recombinase enzyme): μL of DNA corresponding to 16 ng Bxb1 plasmid + μL corresponding to 224 ng attB plasmid and OPTI-MEM to a final tube volume of 24 μL.
-OR-
Tube 2 (If using cells that already express the Bxb1 recombinase enzyme): μL of DNA corresponding to 240 ng attB plasmid and OPTI-MEM to a final tube volume of 24 μL.

2) Mix Tube 2 into Tube 1 for each sample. Mix up and down a couple times with pipet. Then let the mixtures sit for 15-30 min while you get the cells ready (Unless you trypsinized and counted the cells first, to know how many you had in case they were limiting).

4) [Meanwhile] Trypsinize and count the cells. Add 120,000 cells per well in a final volume of 300 μL media to each well.

5) Once at least 15 minutes have passed since mixing, add the mixtures dropwise throughout the well of cells being transfected.

Day 2:
Add at least 500 μL media to each well.

Notes about scaling up / down:
We typically pilot new plasmids in a 24-well scale, and transfect larger populations of cells by transfecting 6-wells and increasing the number of plates / wells as needed.

Negative selection (if applicable):

Negative selection with iCasp9 is wonderful, and hopefully you are using one of those landing pad versions. I typically use 10nM final concentration of AP1903, although I’m fairly certain 1nM is just as effective. Death occurs in a couple of hours, so you can come back to your plate / flask later on in the day and change media to get rid of the dying cells. I typically wait until at least 72 hours after recombination to perform this step.

Important note: If you’re doing a library-based experiment, then make sure you leave some cells aside (even 100k to 500k cells will be plenty) that DO NOT go through any selection steps, since this will allow us to estimate the number of recombined cells (and thus estimate the coverage of your library; see below for the calculation). If you’re just doing individual samples, this isn’t nearly as big of a deal, since you’ll be able to visually inspect the number of recombinants (ie. do you see only a handful of surviving cells, or is it 100+ individual cells surviving?).

Positive selection (if applicable):

I tend to do the positive selection step AFTER negative selection, since the negative selection step has usually thinned the number of cells down enough such that the positive selection is going to be most effective (I find that over-confluent wells tend not to do super-grant with positive selection). Thus, while it can be done as soon as 72 hours post recombination, I tend to do this a week or later after recombination. You can find effective concentrations for positive selection of landing pad HEK 293T cells here.

Some additional notes:

  1. Due to a phenomenon of (annotated) promoter-less expression from the thousands of un-recombined plasmids that remain in the cell following transfection, I typically wait ~ 5 to 7 days before running any cells in the flow cytometer, to wait for that background signal to die down.
  2. Other transfection methods may work, but will need to be optimized. For example, I have observed transfection with lipofectamine 3000 to result in prolonged promoter-less expression, probably due to increased transfection and greater toxicity preventing cell division and thus plasmid dilution.
  3. Calculating the number of recombinants. OK, so as I alluded to above, it’s definitely worth estimating the number of recombinants of any library-based experiments. To do this, take your observed recombination rate from running flow on your unselected cells (say, you see 5% of cells as being mCherry+ / recombined when you ran flow on unselected cells 7 days after transfection), and multiply that fraction with the number of cells you transfected. So, if we pretend that you transfected 20 million HEK 293T cells and you had observed 5% of cells recombined in your unselected well, then the rough calculation is 2e7 cells * 0.05, amounting to roughly 1 million cells estimated to have been recombined. Of course, directly sequencing what’s there in the recombined library is a more direct measurement of library coverage at the recombined cell step, but doing this calculation is still usually worthwhile as another line of evidence.

Conda virtual environments

If you’re going to do things with the bash command line, you’ll inevitably have to install a number of packages / dependencies. Having a package manager can help with this, along with virtual environments, that allow you to keep the versions of packages you need for different purposes relatively organized. For package managers, I have tended to like Anaconda the most. So install that. I’m not going to go through the steps here, since I already have it installed, so I’d really have to go out of my way to try to describe that. But it should be pretty straightforward.

Well, I want to create and run a script using biopython, but I don’t already have it installed on this computer, so this seems like a nice time to make a virtual environment for it. The steps I did were as follows:

  1. Create a virtual environment for biopython.
    $ conda create -n biopython
  2. Activate the virtual environment
    $ conda activate biopython
  3. Next, install biopython.
    $ conda install -c conda-forge biopython

That should be it. Whenever you want to deactivate the virtual environment, you can type:
$ conda deactivate

5/17/22 edit: I’ve been doing some image analysis in python, that requires installing some of the following packages:
$ conda config –env –add channels conda-forge
$ conda install numpy
$ conda install matplotlib
$ conda install scipy
$ conda install opencv
$ conda install -c anaconda scikit-image
$ conda install -c conda-forge gdal

Analyzing Illumina Fastq data

We recently got some Illumina sequencing back from GeneWiz, and I realized that this is good opportunity to show people in the lab how to do some really basic operations to handle such types of sequencing data, so I’ll write those instructions here. Since essentially everybody in the lab uses a Mac as their primary computer, these instructions will be directly related to performing these steps on a Mac, though the same basic steps can likely be applied to PCs. Also, since these files are small, everything will be done locally; once the files get big enough and the analyses more complicated, we’ll start doing things on a computer cluster. Now to get to the actual info:

  1. First, find the data we’ll be using for practice today. If you’re in the lab, you can go to the lab GoogleDrive into the Data/Illumina/Amplicon_EZ/30-507925014/00_fastq directory to find the files.

We won’t need to analyze everything there for this tutorial; instead, let’s focus on the “KAM-IDT-Std_R1_001.fastq.gz” and “KAM-IDT-Std_R2_001.fastq.gz” files.

2. Copy the files to a directory on your local computer. You can do the old “drag and drop” using the GUI, or you can do it in the command line like so, once you adjust the paths for your own computer:

$ cp /Volumes/GoogleDrive/My\ Drive/MatreyekLab_GoogleDrive/Data/Illumina/Amplicon_EZ/30-507925014/00_fastq/KAM-IDT-Std_R1_001.fastq.gz /Users/kmatreyek/Desktop/Illumina_data

$ cp /Volumes/GoogleDrive/My\ Drive/MatreyekLab_GoogleDrive/Data/Illumina/Amplicon_EZ/30-507925014/00_fastq/KAM-IDT-Std_R2_001.fastq.gz /Users/kmatreyek/Desktop/Illumina_data

3. Un-gzip the files. You can do this in the GUI by double-clicking the files, or you can do it in the terminal (if you’re now in the right directory) like so.

$ gzip -dk KAM-IDT-Std_R1_001.fastq.gz

$ gzip -dk KAM-IDT-Std_R2_001.fastq.gz

Optional: Take a look at your fastq files. You won’t want to open the files in their entirety, so what makes more sense it just looking at the first 4 or so lines of the file, corresponding to the first read. To do this, type:

$ head -4 KAM-IDT-Std_R1_001.fastq

And you should get an output that looks like so:

4. They won’t always be paired reads, but this time it is. So we’ll pair them. I you don’t already have a method for doing this, then download PEAR and install it like I described here. Once you have it installed, you can type in a command like so:

$ pear -f KAM-IDT-Std_R1_001.fastq.gz -r KAM-IDT-Std_R2_001.fastq.gz -o IDT_HM

It took my desktop a couple minutes for this process to complete. You’ll get an output that looks like this.

Your directory should now have all of these files:

You can look at the first read again (now that it’s been paired), using the following line, it should look like so:

$ head -4 IDT_HM.assembled.fastq

As you can tell, the quality scores in the first line went from mostly F’s (Q-Score of 37) to almost all I’s (Q-Score of 40).

5. Now that we’ve prepped the Illumina data, it’s time for the downstream analysis. This will be far more project or experiment specific, so these next steps won’t apply for every situation. But in this case, we made a library of Kozak variants to try to get a range of expression levels of the protein of interest. Furthermore, the template DNA used for the PCR lacked a Kozak sequence and a start codon, and these will inevitably be in the sequencing data also. So the goal of this next step is to identify the reads that are template vs those that have the Kozak sequence, and if it does have a Kozak and ATG introduced, to extract the Kozak sequence from the read.

I went ahead and wrote a short python script that achieves this. So, grab that file (ie. Right click, and it “Save link as…”), stick it in the same directory as the data you want to analyze, and run it with the following command.

$ python3 Extract_Kozak.py IDT_STD.assembled.fastq

The script should then create a file called “IDT_STD.assembled.tsv” that should look like this:

This can now be easily analyzed with whatever your favorite data analysis language is, whether it’s R or Python. Huzzah!

Dark culture media with RUBY

I like playing around with various recombinant DNA tools to see how they work and figure out if they’ll do something useful for me. Sometimes it works out amazingly, like iCasp9, which is a fantastic negative selection transgene. Other times, it’s not so clearly a success…

I recently ordered RUBY from Addgene (originally used by the depositing lab to darken plant roots) to see if I could turn my cultured cells visibly darker. I had actually messed around a little with this concept previously using tyrosinase, where it worked in making the cells darker, albeit one could only see the darker color either as a centrifuged cell pellet, or perhaps as a large overgrowing “colony” on an originally sparse culture plate. Well, I shuttled RUBY into my recombination vector and selected for cells expressing it. I don’t even think I saw dark cells this time (though I didn’t look very closely, since this is just a fun side-experiment), but I did notice that these cells had much darker media than their recombined siblings, like the cells expressing fluorescent proteins in the two wells to the right of it. So I’m not exactly sure what’s happening, but the chromogenic small molecule is definitely making it out into the culture media.

I guess I’ll just file this observation for now and see if it ever comes in useful at some point the future! hahaha.

Update: FYI, RUBY is *BIG*. Like ~ 4kb kind of big, since it encodes multiple enzymes within a chemical pathway. So def not a small chromoprotein kind of thing.

Dealing with paired fastq reads

10/25/2022 update:
Ran into some paired sequences that didn’t work well with fastq-join, so went back to pear. Found an easy way to install it via bioconda, using the following command. So simple.

conda install -c bioconda pear

July 19, 2022 update:
OK, I’m trying to reinstall PEAR on my new-ish laptop and running into errors. After struggling a bit, I decided to switch over to using Fastq-join. This is way easier to install and run, as you can see below:

Based on the information on this website, and assuming you have anaconda installed, run:
conda install -c bioconda fastq-join

After that, actually pair your files running a command like:
fastq-join ACE2-Kozak-mini-library-pooled_R1_001.fastq ACE2-Kozak-mini-library-pooled_R2_001.fastq -o ACE2-Kozak-mini-library-pooled.fastq

Note 1: You can run fastq-join on gzipped files, like so:
fastq-join No-infection-maybe_R1_001.fastq.gz No-infection-maybe_R2_001.fastq.gz -o No-infection-maybe.fastq.gz

Note 2: If working with gzipped fastq data, it’s a little trickier to look at your files compressed files. To look at the first read, for example, you need to use a command like this:
gunzip -c SARS2-MOI-0pt1-1_R1_001.fastq.gz | head -4

Here’s the original (now deprecated) text in this post:

Got some paired Illumina sequencing back today, and wanted to pair the R1 and R2 fastq files. I vaguely remember using PEAR to do this before, so gave it a shot. Since the files are relatively small, I figured I’d just try to run this locally, which meant trying to install it (and it’s various dependencies) on my Macbook. Here are the steps…

  1. Download “autoconf-2.65.tar.gz” from this website. Uncompress it, and then go into the now uncompressed directory, and type in “./configure && make && make install”

2. Download “automake-1.14.1.tar.gz” from this website. Uncompress it, and then go into the now uncompressed directory, and type in “./configure && make && make install”

3. Get PEAR academic from this website by entering in your info, receiving an email, and downloading “pear-src-0.9.11.tar.gz”. Uncompress it, and then go into the now uncompressed directory, and type in “./configure && make && make install”

Huzzah. It should now be installed.

Now you can go to whatever directory has your R1 and R2 files, and type in something like so:
“pear -f KAM-IDT-HM_R1_001.fastq -r KAM-IDT-HM_R2_001.fastq -o IDT_HM”

You should now have a file called “IDT_HM.assembled.fastq” that has all of the joined paired reads.

5/18/21 Update: I was working on something else that required trimming reads. I decided to use fastp to do this. The instructions to install fastp is as follows (essentially following the instructions here):

  1. Download the zip file (under the “Code” button) from https://github.com/OpenGene/fastp, and unzip it.
  2. Use terminal to go to the directory that is made, and run
    $ make
  3. Run the following command:
    $ sudo make install
    … and then type in your computer password when prompted. Fastp should now be installed.

Landing pad plasmid maps

I still intend to get around to posting most if not all landing pad plasmids to Addgene, but it’s taken me forever to get around to it. Actually, the institution takes just as forever to do it, since apparently they have to get permissions to post any plasmid with parts I may have amplified from another source(eg. the puromycin resistance gene)? Once those are there, the plasmid sequences will obviously be publicly available. In the mean time, I figure I’d post some of the most common plasmid maps here, so other people can benefit form them (or I don’t have to send specific emails to each person who asks for them). So here are some of the most popular, published plasmids, in GenBank (.gb) format.

G542A_pLenti-Tet-coBxb1-2A-BFP_IRES_iCasp9-2A-Blast_rtTA3 (Addgene #171588)

G698C_AttB_ACE2_IRES_mCherry-H2A-P2A-PuroR (Addgene #171594)

G758A_AttB_ACE2(del)-IRES-mCherry-H2A-P2A-PuroR (Addgene #171596)

p0489_AttB_EGFP-Link-PTEN-IRES-mCherry_Bgl2.gb

p0669_AttB_sGFP-PTEN-IRES-mCherry-P2A-bGFP_Bgl2

G384A_LLP-iCasp9-Blast

G273A_AttB_sGFP-PTEN-IRES-mCherry-P2A-HygroR

G274A_attB-sGFP-PTEN_IRES-mCherry-P2A-PuroR

attB-EGFP (Addgene #171597)

attB-mCherry (Addgene #171598)

dAAVS1_TetBxb1_mTagBFP2_rtTa3

dAAVS1_CMV-rtTA3-Term

dAAVS1_Dummy

G371A_LLP-rEF1a

G375F_attB-eGFP-rIRES-mCherry

G382C_attB-eGFP-rIRES-mCherry-2A-PuroR

G163A/B_AttB_PuroR-P2A-mCherry_Bgl2

G310A_pLenti-TetBxb1BFP-2A-coBxb1-2A-Blast_rtTA3

pLenti-TetBxb1BFP_coBxb1-rtTA3_Blast

pLenti-TetBxb1BFP-rtTA3_Blast

G57A_attB_TP53-link-EGFP-IRES-mCherry

AttB_sGFP-VHL-IRES-mCherry-P2A-bGFP_Bgl2

AttB_VHL-sGFP_IRES-mCherry-P2A-bGFP_Bgl2

Chemical transformation efficiencies

We do a lot of molecular cloning in the lab. Standard practice in the workflow is to make your own home-made chemically competent NEB 10-beta bacteria to be used fresh the day of the transformation. This has worked surprisingly well, and we have made ~ 350 different plasmid constructs in the first ~ 600 days). Each time you do the transformation, it’s important to include a positive control (we use 40 ng of attB-mCherry plasmid) to make sure the transformation step was performing properly (to help interpret / troubleshoot what may have gone wrong if you had few / zero colonies in your actual molecular cloning transformation plates). I’ve done this enough times now to know what is “normal”. Thus, especially for new members in the lab, please reference this plot to see how your own transformation results compare to how it has worked for me, where I typically get slightly more than 10,000 transformants (Note: you may get numbers better than me, which is a good thing!).

Edit 2/24/2022: It recently dawned on me that we haven’t necessarily been using a standard number of bacteria per transformation (ie. we typically prep the same amount of bacteria, and that bacteria normally just gets split into however many transformations are being done at the time. Thus, if there are 10 plasmids being transformed, the number of bacteria per transformation is ~ one fifth of the number of bacteria for when two plasmids are being transformed.

Well, since I had a bunch of data recorded for our positive control transformations (40ng of attB-mCherry plasmid), I decided to see what number of bacteria we had used and if there were any patterns in the transformation efficiencies:

So it looks like, depending on the instance, we used anywhere between 300 million to 10 billion bacteria per transformation. Notably, there was no real pattern in the transformation efficiency, were on average, it worked just as well with the half a billion as it was for 5 billion. Thus, it’s likely that the limiting factor at this point is the DNA and the bacteria are in vast excess.

For those curious, I had previously mentioned that 1mL of our e.coli at an OD of 1 corresponded to 5e8 bacterial cells / colony forming units. Thus, for a 180 mL culture at an OD of 0.2, I would do [5e8 cells/mL at OD1] * [0.2 of OD1] * [180 mL] * [1 billion cells / 1e9 cells] = ~ 18 billion cells. So if we were doing 9 transformations with this culture, it would be about 2 billion cells for each transformation.

HEK 293T Landing Pad Dose Response Curves

Positive selection for recombined selection is a pretty useful technique when working with the HEK 293T landing pads, but people might not know what concentrations are best. Well, here is that info, all in one place. In each case, the antibiotic resistance gene had been placed after an IRES element in an attB recombination plasmid, and was linked to mCherry using a 2A stop-start sequence. Thus, the Y axis is showing how well the mCherry+ cells were enriched at various concentrations of antibiotic.

Consistent with the above plots, I generally suggest people use 1 ug/mL Puro, 10 ug/mL Blast, 100 ug/mL Hygro, and 100 ug/mL Zeocin for selections.

11/19/2024 Update: I was recently reminded that I had since created this data a bit over year ago using resazurin dye on our plate reader. Gosh, now that I’m looking at this data side-by-side, I’m finding that they correspond with each other even better than I expected. How lovely.

Nucleotide biases in primers

We generally talk about degenerate nucleotide positions in synthesized oligo (like primers) as if they are this automatic and even thing, but like everything else, things can be a lot more complicated. We recently ordered some primers with six degenerate positions (so NNNNNN) from ThermoFisher, and they were predominantly filled with T’s, while the same oligo ordered from IDT was a lot more balanced in terms of nucleotide composition. While this was done at really small (Sanger) scale and not quantitative, this reminded me of nucleotide biases I observed in libraries created in the Fowler lab using Inverse PCR (where the degenerate sequence is not hybridizing with a partner strand during initial hybridization, so no degenerate sequence should have a competitive edge over another, at least in terms of hybridization). These were all made with IDT primers. I dug up that slide, and have posted it below:

So in this analysis, you can see that there is a ubiquitous albeit somewhat variable G bias across the libraries. This often resulted in the presence of more glycines than expected. Overall, I don’t think this is ALL that bad; certainly still WAY better than the rampant (like, 86% T) bias with the ThermoFisher primer (When I emailed them about this, they told me “Unfortunately, we do know exactly what the issue is. This is an inherent instrument issue with how automated degeneracies are produced on the existing synthesizer. The only way to fix it from a customer perspective is to order using special instructions to “hand mix” the degeneracy in an optimal ratio to produce as close to a 25% balance as possible.”). So while I’ll still buy regular cloning primers from Thermo (b/c they are the cheapest for me), all degenerate primers will have to go through IDT (or maybe eventually EuroFins, as the Roth lab has noted good luck with them).

5/15/21 update: OK, got some Illumina sequencing back to actually quantitate this, and the ThermoFisher degenerate primer definitely suffered from too many T’s, with ~ 49% of all nucleotides being T’s. Actually, the IDT hand mixed did worse than IDT machine mixed, which had the closest to the target values. Interesting.