Spec comparisons

Well, I was going to talk about some of these experiments during lab meeting, but why make a Powerpoint or Google CoLab link people won’t follow when I can write it as a blog post.

Regardless, we’ve recently been looking at how our various possible methods of spectrophotometry compare.

  1. Amazon Spec” purchased for $235 back in November 2020.
  2. ThermoFisher Nanodrop in a departmental common room (I don’t actually know what kind as I’ve never used it)
  3. BioTek Syngergy plate reader, either… A) with 200uL of bacteria pipetted into a flat-bottom 96-well plate, or B) using their “BioCell”, which is a $290 cuvette that fits onto one of their adapter plates. I mistakenly label this one as “BioCube” in the plots, but they probably should have just named it that in the first place so I don’t feel too bad.

To test the methods, Olivia sampled bacterial optical densities while a batch of e.coli were growing out to make competent cells. Thus, the different densities in the subsequent data will correspond to different timepoints of the same culture growing out. Each time point was measured with all four methods.

Well, all of the methods correlated pretty well, so no method was intrinsically problematic. I’m not sure if the settings for any automated calculation of absorbance values, but the BioCell numbers were just off by an order of magnitude (The BioCell data also had a clear outlier). The Amazon spec and Nanodrop generally gave similar values, although the nanodrop gave slightly higher numbers, comparatively.

The plate reader option was also perfectly fine, although it required more back-end math to convert the absorbance values to actual optical density. This is also not the raw data, as the media only absorbance has to be collected and subtracted to yield the below graph.

Rather than try to figure out the path length and try to calculate the formula, I just used the above dataset to create a calibration for “nanodrop-esque optical density”. (Note: There was a second independently collected set of data I added for this analysis). Here, the goal was to actually use the raw values from the plate reader output, so people could do the conversion calculation on the fly.

Say you have a particular nanodrop-esque A600 of 0.5 in mind. The formula to convert to plate reader units is 0.524 * [nanodrop value] + 0.123, or in this case, 0.385. Checks out with the linear model line shown above.

Or, if you already have raw platereader values and want to convert to nanodrop-esque values, the formula here is 1.79 * [biotekp value] – 0.2 to get the converted value. Here, let’s pretend we have an absorbance value of 0.3, which calculates to a nanodrop-esque value of 0.338. So perhaps that’s a decent raw plate reader value to get with nearly grown bacterial cultures during the chemically competent cell generation process.

Lastly, it’s worth noting how surprisingly large dynamic range there seems to be for spec readings of bacterial cultures. It’s likely largely because we’re used to handling either mid-to-late log phase growth or saturated / stationary cultures, but we’re used to dealing with values in the 0.2 to 1.2 range, although the log-scale plots above suggest that we can be detecting cultures reasonably well within the 0.01 to 0.1 range as well.

HEK cell small molecule toxicities

I’ve now done a *bunch* of kill curves with HEK cells in various forms (WT HEK cells, single or double landing pad cells). Here’s a compendium of observed toxicity of serial dilutions of various small molecules in HEK cells not engineered to be resistant in any way. (This is mostly for my own reference, when I’m in the TC room and need to check on some optimal concentrations).

Quantitating DNA via the plate reader

As another installment of the “demoing the plate reader” series, we’re also trying to see how well it quantitates DNA. Probably good that I’m actively involved in this process, as it’s making me learn some details about some of the existing and potential assays regularly used by the lab. Well, today, we’re going to look a bit at DNA binding dyes for quantitating DNA.

We have a qubit in the lab, which is like a super-simplified plate reader, in that it only reads a single sample at a time, it only does two fluorescent colors, and the interface is made very streamlined / foolproof, but in that way loses a bunch of customizability. But anyway, we started there, running a serial dilution of a DNA of known high concentration and running it on the qubit (200uL recommended volume). We then took that same sample and ran it on the plate reader. This is what the data looked like:

So pretty decent linearity of fluorescence between 1ug to 1ng. Both the qubit and plate reader gave very comparable results. Qubit readings missing on the upper range of things for the qubit since it said it was over the threshold upper limit of the assay (and thus would not return a value). This is reasonable, as you can see by the plate reader values that it is above the linear response of the assay.

So what if you instead run the same sample as 2uL on a microvolume plate instead of 200uL in a standard 96-well plate? Well, you get the above result in purple. The data is a bit more variable, which probably makes sense with the 100-fold difference in volume used. Also seems the sensitivity of the assay decreased some, in that the results became unreliable around 10ng instead of the 1ng for the full volume in plate format, although again, I think that makes sense with there being less sample to take up and give out light.

7/3/23 update:

Just tried AccuGreen from Biotium. Worked like a charm. They suggest mixing in ~200uL of DNA dye reagent (in this case, to 1uL of DNA already pipetted in), but I tried 100uL and 50uL as well, and if anything, 100uL arguably even looked the best.

Also, I just used the same plate as Olivia had run the samples on Friday. So in running these new samples, I ended up re-measuring those same samples 3 days later. And, well, it looked quite similar. So the samples for reading are quite stable, even after sitting on the bench for 3 days.

Oh, and lastly, this is what it looked like doing absorbance on 2uL samples on the Take3 plate. Looks perfectly fine, although as expected, sensitivity isn’t nearly as much as Accugreen. That said, you do get more linearity at really high concentrations (4000 ng/uL), so that’s kind of nice.

Quantitating cells via the plate reader

In moving down the floor and having some more space, we’ll be purchasing a plate reader. Based on previous experience / history, I’ve started by testing the BioTek Synergy H1. We’re going to put it through a number of the more traditional paces (such as DNA and protein quantitation), but we clearly have a bunch of cell-based experiments and potential assays that are worth testing on it. Since I’m curious about some of the possibilities / limitations here, I’ve taken a pretty active role in testing things out.

For these tests, I essentially did a serial dilution of various fluorescent protein expressing cells, to assess what the dynamic range of detection could be. Here, I did half log dilutions of the cells in a 96-well plate, starting with 75,000 cells (in 150uL) for the “highest” sample, and doing 6 serial dilution from there down to 75 cells per well, with a final row with media only to tell us background fluorescence. Everything was plated in triplicate, with analyses done on the average.

For the purposes of not having a ton of graphs, I’m only going to show the background subtracted graphs, where I’ve created a linear model based on the perceived area of dynamic range, and denote black dots showing datapoints that linear model predicts, to see how well it corresponds to the actual data (in color).

The above shows serial dilutions high UnaG expressing landing pad cells. Seems like a decent linear range of cell number -dependent fluorescence there, where we can see perhaps a little more than two orders of magnitude in predictable range. Of course, this is one of the least relevant (but easiest to measure) cases, so not super relevant to most of our experiments.

On the other hand, this is a situation that is far more relevant to most of our engineered cell lines. Here, it’s the same serial dilutions of cells, except we’re looking for histone 2A -fused mCherry. The signal here is going to be lower for three reasons: it’s behind an IRES instead of cap-dependent translation, red fluors are typically less bright than green fluors, and the histone fusion limits the amount of fluorescence to the nuclei, so the per-cell amount of fluorescence is decreased. Here, it was roughly a 20-fold range of the assay down from a confluent well. Maybe useful for some experiments quantitating effects of some pertubation on cell growth / survival (without exogenous indicators!) but obviously can’t expect to reliably quantitate more than that 20-fold effect.

Those are direct cell counting assays (or, well, relative counts based on total fluorescence), but what about enzymatic assays, such as common cell counting / titering assays based on dye conversion. I could certainly see these assays potentially having more sensitivity due to a “multiplier” effect through that enzymatic activity. Well, this is what it looks like for Resazurin / Alamar Blue / Cell Titer Blue.

The above is based on Resazurin fluorescence, where we have a pretty decent 2-log range, although even with these conditions, the confluent wells already saturated dye conversion and lost linearity.

Here, I’m testing a different cell titering dye, CCK-8 / WST-2, which only uses absorbance at A460 as its readout. Here, the linear range of the assay seemed to be a lot less; roughly 1 order of magnitude. I’m not showing resazurin conversion absorbance (A570, A600) here, but it looked pretty similar in overall range as CCK-8.

How about timing for the Resazurin and CCK8 assays? Well, here is what it looks like…. Note, I didn’t do the same linear modeling things, b/c I got lazy, but you can still tell what may be informative by eye:

So clearly, we lose accuracy at the top end but gain sensitivity at the bottom end by having the reaction run over night. This is looking at fluorescence. How about absorbance? That’s below:

Same shift toward increased sensitivity for looking at fewer cells, but same dynamic range window, so we lose accuracy in the more confluent wells as it shifts.

Here, the overnight incubation really didn’t do anything. Same linear range, really.

Lentivector supe collection

Most lentiviral production protocols (usually with VSV-G pseudotyped particles) tells the user to collect the supe at 48 or 72 hours. My protocols tend to say collect the supe twice a day (once when coming into the lab, and once when leaving) starting at 24 hours and ending a few days later (96 hours? more?).

This largely stems from a point in my PhD when I was generating a bunch of VSV-G pseudotyped lentiviral particles to study the “early stage” of the HIV life cycle (ie. the points preceding integration into the genome, such as the trafficking steps to get into the nucleus). After thinking about the protocol a bit, I realized that there’s really nothing stopping the produced VSV-G pseudotyped particles from attaching and re-entering the cells they emerged from, which is useless for viral production purposes. Even for the particles that are lucky enough not to re-enter the producer cells, they are going to be more stabler in a less energy environment (such as 4*C) than floating around in the supe at 37*C in the incubator.

But, well, data is always better to back such ideas. So back in April 2014 (I know this since I incorporated the date into the resulting data file name), I did an experiment where I produced VSV-G pseudotyped lentiviral particles as normal, and collected the supe at ~12 hour intervals, keeping them separate in the fridge. After they were all collected, I took ~ 10uL from each collected supe, put them on target cells, and measured luciferase activity a couple of days later (these particles had a lentiviral vector genome encoding firefly luciferase). Here’s the resulting data.

Some observations:

  • Particles definitely being produced by 24 hours, and seemingly reaching a peak production rate between 24 and 48 hours.
  • The producer cells kept producing particles at a reasonable constant rate. Sure, there was some loss between 48 and 72 hours, but still a ton being produced.
  • I stopped this experiment at 67 hours, but one can imagine extrapolating that curve out, and presumably there’s still ample production happening after 72 hours.

So yea, I suppose if the goal is to have the highest singular concentration, then taking a single collection at 48 or 72 hours will probably give you that. That said, if the goal is to have the highest total yield (which is usually the situation I’m in), then it makes much more sense to collect at various intervals, and then use the filtered, pooled supe in downstream experiments.

Also, I consider being able to dig up and discuss 10-year old data as a win!

Trimming and tabulating fastq reads

Anna has been testing her transposon sequencing pipeline, and needing some help processing some of her Illumina reads. In short, she needed to remove sequenced invariant transposon region (essentially a 5′ adapter sequence), trim the remaining (hopefully genomic) sequence to a reasonable 40nt, and then tabulate the reads since there were likely going to be duplicates in there that don’t need to be considered independently. Here is what I did.

# For removing the adapter and trimming the reads down, I used a program called cutadapt. Here's information for it, as well as how I installed and used it below.
# https://cutadapt.readthedocs.io/en/stable/installation.html
# https://bioconda.github.io/

## Run the commands below in Bash (they tell conda where else to look for the program)
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda config --set channel_priority strict

## Since my laptop uses an M1 processor
$ CONDA_SUBDIR=osx-64 conda create -n cutadaptenv cutadapt

## Activate the conda environment
$ conda activate cutadaptenv

## Now trying this for the actual transposon sequencing files
$ cutadapt -g AGAATGCATGCGTCAATTTTACGCAGACTATCTTTGTAGGGTTAA -l 40 -o sample1_trimmed.fastq sample1.assembled.fastq

This should have created a file called “sample1_trimmed.fastq”. OK, next is tabulating the reads that are there. I used a program called 2fast2q for this.

## I liked to do this in the same cutadaptenv environment, so in case it was deactivated, here I am activating it again.
$ conda activate cutadaptenv

## Installing with pip, which is easy.
$ pip install fast2q

## Now running it on the actual file. I think you have to already be in the directory with the file you want (since you don't specify the file in the command).
$ python -m fast2q -c --mo EC --m 2

## Note: the "python -m fast2q -c" is to run it on the command line, rather than the graphical interface. "--mo EC" is to run it in the Extract and Count mode. "--m 2" is to allow 2 nucleotides of mismatches.

Nanopore denovo assembly

Plasmidsaurus is great, but it looks like some additional companies aside from Primordium are trying to develop a “nanopore for plasmid sequencing” service. We just tried the Plasmid-EZ service from Genewiz / Azenta(?), partially b/c there’s daily pickup from a dropbox on our campus. At first glance, the results were rather mixed. Read numbers seemed decent, but 4 of the 8 plasmid submissions didn’t yield a consensus sequence. Instead, all we were given were the “raw” fastq files. To even see whether these reads were useful or not, i had to figure out how to derive my own consensus sequence from the raw fastq files.

After some googling and a failed attempt or two, I ended up using a program called “flye”. here’s what I did to install and use it, following the instructions here.

## Make a conda environment for this purpose
$ Conda create -n flye

## Pretty easy to install with the below command
$ Conda install flye

## It didn't like my file in a path that had spaces (darn google drive default naming), so I ended up dragging my files into a folder on my desktop. Just based on the above two commands, you should already be able to run it on your file, like so:
$ flye --nano-raw J202B.fastq --out-dir assembled

This worked for all four of those fastq files returned without consensus sequences. Two worked perfectly and gave sequences of the expected size, The remaining two returned consensus sequences that were 2-times as large as the expected plasmid. Looking at the read lengths, these plasmids did show a few reads that were twice as long as expected. That said, those reads being in there didn’t make the program return that doubly-long consensus sequence, as it still made that consensus seq even after the long reads were filtered out (I did this with fastq-filter; “pip install fastq-filter”, “fastq-filter -l 500 -L 9000 -o J203G_filtered.fastq J203G.fastq.gz”). So ya, still haven’t figured out why this happened and if it’s real or not, but even a potentially incorrectly assembled consensus read was helpful, as I could import it into Benchling and align it with my expected sequence, and see if there were any errors.

After this experience, I’ve come to better appreciate how well Plasmidsaurus is run (and how good their pipeline for returning data, is). We’ll probably try the Genewiz Plasmid-EZ another couple times, but so far, in terms of quality of the service, it doesn’t seem as good.

Plasmidsaurus fasta standardizer

I really like plasmidsaurus, and it’s an integral part of our molecular cloning pipeline. That said, I’ve found analyzing the resulting consensus fasta file to be somewhat cumbersome, since where they inevitable start their sequence string in the fasta file is rather arbitrary (which, I don’t blame them for at all, since these are circular plasmids with no particular starting nucleotide, and every plasmid they’re getting is unique), and obviously doesn’t match where my sequence starts on my plasmid map in Benchling.

For the longest time (the past year?) I dealt with each file / analysis individually, where I would either 1) reindex my plasmid map on Benchling to match up with how the Plasmidsaurus fasta file is aligning, or 2) Manually copy-pasting sequence in the Plasmidsaurus fasta file, after seeing hwo things match up after aligning.

Anyway, I got tired of doing this, so I wrote a Python script that standardizes things. This will still require some up-front work in 1) Running the script on each plasmidsaurus file, and 2) Making sure all of our plasmid maps in Benchling start at the “right” location, but I still think it will be easier than what I’ve been doing.

1) Reordering the plasmid map.

I wrote the script so that it reordering the Plasmidsaurus fasta file based on the junction between the stop codon of the AmpR gene, and the sequence directly after it. Thus, you’ll have to reindex your Benchling plasmid map so it exhibits that same break at that junction point. Thus, if your plasmid has AmpR in the forward direction, it should look like so on the 5′ end of your sequence:

And like this on the 3′ end of your sequence:

While if AmpR is in the reverse direction, it should look like this on the 5′ end of your sequence:

And like this on the 3′ end of your map:

Easy ‘nuf.

2) Running the Python script on Plasmidsauru fasta file.

The python script can be found at this GitHub link: https://github.com/MatreyekLab/Sequence_design_and_analysis/tree/main/Plasmidsaurus_fasta_reordering

If you’re in my lab (and have access to the lab Google Drive), you don’t have to go to the GitHub repo. Instead, it will already be in the “[…additional_text_here…]/_MatreyekLab/Data/Plasmidsaurus” directory.

Open up Terminal, and go to that directory. Then type in “python3 Plasmidsaurus_fasta_standardizer.py”. Before hitting return to run, you’ll have to tell it which file to perform this on. Because of the highly nested structure of how the actual data is stored, it will probably be easier just to navigate to the relevant folder in Finder, and then drag the intended file into the Terminal window. The absolute path of where the file sits in your directory will be copied, so the command will now look something like “python3 Plasmidsaurus_fasta_standardizer.py /[…additional_text_here…]/_MatreyekLab/Data/Plasmidsaurus/PSRS033/Matreyek_f6f_results/Matreyek_f6f_5_G1131C.fasta”

It will make a new fasta file suffixed with “_reordered” (such as “Matreyek_f6f_1_G1118A_reordered.fasta”), which you can now easily use for alignment in Benchling.

Note: Currently, the script only works for ampicillin resistant plasmids, since that’s somewhere between 95 to 99% of all of the plasmids that we use in the lab. That said, plasmidsaurus sequencing of the rare KanR plasmid won’t work with this method. Perhaps one day I’ll update the script for also working with KanR plasmids (ie. the first time I need to run plasmidsaurus data analysis on a KanR plasmid, haha).

Flow cytometry compensation

So I tend to use fluorescent protein combinations that are not spectrally overlapping (eg. BFP, GFP, mCherry, miRFP670), so that circumvents the need for any compensation (at least on the flow cytometer configurations that we normally use). That being said, I apparently started using mScarlet-I in some of our vectors, and there is some bleedover into the green channel if it’s bright enough.

These cells only express mScarlet-I, and yet green signal is seen when red MFI values > 10^4… booo……

Well, that’s annoying, but the concept of compensation seems pretty straightforward to me, so I figure we can also do it post hoc if necessary. The idea here is to take the value of red fluorescence, multiply that value to a fraction (< 1) constant value representing the amount of bleedover that is happening into the second channel, and then subtracting this product from the original amount of green fluorescence to make the compensated green measurement.

To actually work with the data, I exported the above cell measurements shown in the Flowjo plot above as a csv and imported it into R. Very easy to execute the above formula, but how does one figure out the relevant constant value that should be used for this particular type of bleedover? Well, I wrote a for-loop testing values from 0.0010 to 0.1, and saw whether the adjusted values now resulted in a straight horizontal line with ~ zero slope (since then, regardless of red fluorescence, green fluorescence would be unchanged).

Now as that value becomes too large, then more will be subtracted than should be, resulting in an inverse relationship between red and green fluorescence. To make my life easier to find the best value, I took the absolute value of the resulting slope in the points, which pointed me to a value of 0.0046 as the minima, for mScarlet-I red fluorescence bleeding over into my green channel on this particular flow cytometer with these particular settings.

Great, so what does the data actually look like once I compensate for this bleedover? Well, with this control data, this is the before and after (on a random subset of 1000 datapoints)

Hurrah. Crisis averted. Assuming we now have sample with both actual green and red fluorescence (previously confounded by the red to green bleedover from mScarlet), we can presumably now analyze that data in peace.

Just for fun, here’s a couple of additional samples and their before and after this compensation is performed.

First, here are cells that express both EGFP and mScarlet-I at high levels. You can see that the compensation does almost nothing. This makes sense, since the bleedover is contributing such a small total percentage to the total green signal (EGFP itself is contributing most of the signal), that removing that small portion is almost imperceptible.
Here’s a sample that’s a far better example. Here, there’s a bunch of mScarlet-I positive cells (as well as some intermediates), and a smattering of lightly EGFP positive cells throughout. But aside from the shape of the mScarlet-I positive, GFP negative population changing from a 45-degree line to a circular cloud, the overall effects aren’t huge. Still, even that is useful though, b/c if one didn’t look at this scatterplot (and know about the concept of bleedover and compensation), one might interpret that slight uptick in green fluorescence in that aforementioned population as a real biologically meaningful difference.

Common Plasmidsaurus Errors

OK, so we all know that Plasmidsaurus nanopore sequencing isn’t perfect. Every time I see the mistake at the 5′ end of the IRES sequence I know to ignore it, but there are bunch of other ones that I still repeatedly run into (but not quite as frequently) such that I don’t have it memorized and am not sure if I should be ignoring it right off the bat. Thus, I’m going to keep a list of repeated erroneous calls here on this page so I’m reminded to ignore them in the future.

Visual evidence of individual example listed below. But here’s a summary of Plasmidsaurus errors to ignore:

  1. IRES – deletions near the 5′ end
  2. mCherry – W63R or Q114R
  3. mScarlet – errors at R71 (sometimes R71G) and S113/L114 (including L114P).
  4. mKG – L96P
  5. Puromycin resistance gene PAC – R18G or L125P
  6. shBleR – Q56R
  7. Silent or frameshift mutations at the NPGP motif at the 3’end of the P2A sequence

In fact, be very suspicious of any unexpected L -> P mutant through Plasmidsaurus seq. And maybe Q -> R muts too.

Since almost all of my plasmids have this IRES sequence in it, I almost always run across this error (although it’s usually a 1nt miscall rather than 2nt like this example).
This Puromycin R18G error is annoying b/c it looks like it could be really problematic.
I don’t use mScarlet-I all that often, but when I do, Plasmidsaurus sometimes gives me this L114P erroneous call.
Here it gets screwed up in the same area but mysteriously called an A insertion, making it an S113fs.
It also has issues with mScarlet-I R71. Sometimes it calls it a silent mutation, but other times it calls it as R71G.
An insertion (which, if true, would make a frameshift) in the NPGP motif toward the 3’end of P2A.
Puro L125P
mCherry W63R
mCherry Q114R
mKG L96P
shBleR Q56R.

Edit 2/9/24: Here’s another one. A nt insertion in Asp residue at around position 4 or so of the histone 2A protein.