A personal clingen vignette

Back in 2017, when I was a postdoc and still first learning the human genetics field, I encountered a clinically relevant situation in my own family’s genetics. In short, my bloodline seemed to harbor a quite rare recessive allele in a gene encoding the “battery pack” for a set of critical hormone (and other small molecule) modifying / metabolizing enzymes. When two dysfunctional copies are harbored in the same individual, there are devastating consequences on the developing child.

Figuring this out ended up being an interesting academic exercise, since it was a case where I could synthesize a lot of the knowledge I had built up, from more recently acquired knowledge in human genetics, clinical genetics, genomic technologies, and genomic databases, to somewhat older knowledge I had on literature searches, molecular biology, cell biology, and protein structure-function. Here’s a report I had provided a family member upon embarking on my quest to understand the situation. Aside from my family, I had directly shared it with various friends / colleagues / trainees over the years, but a recent conversation with people in the lab made me realize that I may as well share it more broadly at this point (any lingering minor concerns about publicly sharing “private” health-related information – for both me and at least some of my family members – is offset by what I think is a nice case-study of the importance of science and genetics on understanding human disease.

Note: In the process of having Avery, we did get Anna tested, and she was not a carrier for any variants in POR thus making Avery’s risk of ABS essentially zero (rather than the number I had calculated at the end of the above document). That said, I remember that getting Anna tested was somewhat of a struggle, between it not being fully covered under insurance and there being some miscommunication between our hospital system and the external genetic testing lab, such that the test results were pretty delayed. Such is the US health care system…

Quantitating DNA via the plate reader

As another installment of the “demoing the plate reader” series, we’re also trying to see how well it quantitates DNA. Probably good that I’m actively involved in this process, as it’s making me learn some details about some of the existing and potential assays regularly used by the lab. Well, today, we’re going to look a bit at DNA binding dyes for quantitating DNA.

We have a qubit in the lab, which is like a super-simplified plate reader, in that it only reads a single sample at a time, it only does two fluorescent colors, and the interface is made very streamlined / foolproof, but in that way loses a bunch of customizability. But anyway, we started there, running a serial dilution of a DNA of known high concentration and running it on the qubit (200uL recommended volume). We then took that same sample and ran it on the plate reader. This is what the data looked like:

So pretty decent linearity of fluorescence between 1ug to 1ng. Both the qubit and plate reader gave very comparable results. Qubit readings missing on the upper range of things for the qubit since it said it was over the threshold upper limit of the assay (and thus would not return a value). This is reasonable, as you can see by the plate reader values that it is above the linear response of the assay.

So what if you instead run the same sample as 2uL on a microvolume plate instead of 200uL in a standard 96-well plate? Well, you get the above result in purple. The data is a bit more variable, which probably makes sense with the 100-fold difference in volume used. Also seems the sensitivity of the assay decreased some, in that the results became unreliable around 10ng instead of the 1ng for the full volume in plate format, although again, I think that makes sense with there being less sample to take up and give out light.

7/3/23 update:

Just tried AccuGreen from Biotium. Worked like a charm. They suggest mixing in ~200uL of DNA dye reagent (in this case, to 1uL of DNA already pipetted in), but I tried 100uL and 50uL as well, and if anything, 100uL arguably even looked the best.

Also, I just used the same plate as Olivia had run the samples on Friday. So in running these new samples, I ended up re-measuring those same samples 3 days later. And, well, it looked quite similar. So the samples for reading are quite stable, even after sitting on the bench for 3 days.

Oh, and lastly, this is what it looked like doing absorbance on 2uL samples on the Take3 plate. Looks perfectly fine, although as expected, sensitivity isn’t nearly as much as Accugreen. That said, you do get more linearity at really high concentrations (4000 ng/uL), so that’s kind of nice.

Kenny gets an R21 to study Sarbecovirus receptor interactions

The Matreyek lab is awarded a 2-year R21 grant to study how various Sarbecovirus (ie. SARS-like coronavirus) receptor binding domain sequences correspond to binding and infection with various ACE2 receptor sequences (eg. variants of human ACE2, or sequences of diverse ACE2 orthologs across mammals) to find the rules governing molecular compatibilities between these viruses and their potential hosts. More information here.

Quantitating cells via the plate reader

In moving down the floor and having some more space, we’ll be purchasing a plate reader. Based on previous experience / history, I’ve started by testing the BioTek Synergy H1. We’re going to put it through a number of the more traditional paces (such as DNA and protein quantitation), but we clearly have a bunch of cell-based experiments and potential assays that are worth testing on it. Since I’m curious about some of the possibilities / limitations here, I’ve taken a pretty active role in testing things out.

For these tests, I essentially did a serial dilution of various fluorescent protein expressing cells, to assess what the dynamic range of detection could be. Here, I did half log dilutions of the cells in a 96-well plate, starting with 75,000 cells (in 150uL) for the “highest” sample, and doing 6 serial dilution from there down to 75 cells per well, with a final row with media only to tell us background fluorescence. Everything was plated in triplicate, with analyses done on the average.

For the purposes of not having a ton of graphs, I’m only going to show the background subtracted graphs, where I’ve created a linear model based on the perceived area of dynamic range, and denote black dots showing datapoints that linear model predicts, to see how well it corresponds to the actual data (in color).

The above shows serial dilutions high UnaG expressing landing pad cells. Seems like a decent linear range of cell number -dependent fluorescence there, where we can see perhaps a little more than two orders of magnitude in predictable range. Of course, this is one of the least relevant (but easiest to measure) cases, so not super relevant to most of our experiments.

On the other hand, this is a situation that is far more relevant to most of our engineered cell lines. Here, it’s the same serial dilutions of cells, except we’re looking for histone 2A -fused mCherry. The signal here is going to be lower for three reasons: it’s behind an IRES instead of cap-dependent translation, red fluors are typically less bright than green fluors, and the histone fusion limits the amount of fluorescence to the nuclei, so the per-cell amount of fluorescence is decreased. Here, it was roughly a 20-fold range of the assay down from a confluent well. Maybe useful for some experiments quantitating effects of some pertubation on cell growth / survival (without exogenous indicators!) but obviously can’t expect to reliably quantitate more than that 20-fold effect.

Those are direct cell counting assays (or, well, relative counts based on total fluorescence), but what about enzymatic assays, such as common cell counting / titering assays based on dye conversion. I could certainly see these assays potentially having more sensitivity due to a “multiplier” effect through that enzymatic activity. Well, this is what it looks like for Resazurin / Alamar Blue / Cell Titer Blue.

The above is based on Resazurin fluorescence, where we have a pretty decent 2-log range, although even with these conditions, the confluent wells already saturated dye conversion and lost linearity.

Here, I’m testing a different cell titering dye, CCK-8 / WST-2, which only uses absorbance at A460 as its readout. Here, the linear range of the assay seemed to be a lot less; roughly 1 order of magnitude. I’m not showing resazurin conversion absorbance (A570, A600) here, but it looked pretty similar in overall range as CCK-8.

How about timing for the Resazurin and CCK8 assays? Well, here is what it looks like…. Note, I didn’t do the same linear modeling things, b/c I got lazy, but you can still tell what may be informative by eye:

So clearly, we lose accuracy at the top end but gain sensitivity at the bottom end by having the reaction run over night. This is looking at fluorescence. How about absorbance? That’s below:

Same shift toward increased sensitivity for looking at fewer cells, but same dynamic range window, so we lose accuracy in the more confluent wells as it shifts.

Here, the overnight incubation really didn’t do anything. Same linear range, really.

LP publications

For my own curiosity (as well as for some data in case I want to show to granting agencies that I’m not hoarding the research tools / materials that I make), I keep track of the papers that have used the landing pad cells / materials in their work. Here’s a chart of publications using some version of the LP cells over time.

The first few papers were mine, but from there, there have been roughly 10 or more papers per year. It’s seemingly been a little higher in the most recent couple of years, although the 2023 date has a dotted line projection for the end of year number (in the instance of my originally writing this paragraph, it was early June).

Manuscript acceptance timing

I’ve now been an author on enough papers to have a reasonable sampling of what the experiences can be like. In short, it minimally takes 2 months to go from initial submission to eventual manuscript acceptance. These experiences typically are those that require little to no experiments for revision. The process, especially when requiring hefty experiments or multiple rounds of revision, can easily stretch to half a year. In some cases it can take *much* longer (in one experience I’ve seen, a journal did the “rejected but amenable to resubmission if sufficient additional impact is added”, which resulted in an informal span of ~ 800 days!!! ). Anyway, at least for the manuscript submissions I was involved with where I had access to an author portal (or received emails when things happened), I noted when the reviews were returned and revisions submitted, to also keep track of how much of that time was technically under one’s control (manuscript with authors; red) or completely out of one’s control (manuscript with journal; blue). See below:

Also, part of the reason it makes sense to post manuscripts to bioRxiv; why have a completed manuscript that is essentially publication-worthy sit in the dark for half a year?

Note: Obviously this data is for manuscripts that eventually got accepted. Rather pleasantly, I’ve never been first or corresponding author for a paper that got rejected, so I don’t have nearly as good a sampling of that experience. But sitting on the sideline as a middle-author for a handful of such occasions, it would seem to take anywhere between a week (eg. immediate desk rejection) to a couple of months (eg. rejected upon peer review) per submission; when sequentially shopping across multiple journals to find a taker, this would seem to add up.

Lentivector supe collection

Most lentiviral production protocols (usually with VSV-G pseudotyped particles) tells the user to collect the supe at 48 or 72 hours. My protocols tend to say collect the supe twice a day (once when coming into the lab, and once when leaving) starting at 24 hours and ending a few days later (96 hours? more?).

This largely stems from a point in my PhD when I was generating a bunch of VSV-G pseudotyped lentiviral particles to study the “early stage” of the HIV life cycle (ie. the points preceding integration into the genome, such as the trafficking steps to get into the nucleus). After thinking about the protocol a bit, I realized that there’s really nothing stopping the produced VSV-G pseudotyped particles from attaching and re-entering the cells they emerged from, which is useless for viral production purposes. Even for the particles that are lucky enough not to re-enter the producer cells, they are going to be more stabler in a less energy environment (such as 4*C) than floating around in the supe at 37*C in the incubator.

But, well, data is always better to back such ideas. So back in April 2014 (I know this since I incorporated the date into the resulting data file name), I did an experiment where I produced VSV-G pseudotyped lentiviral particles as normal, and collected the supe at ~12 hour intervals, keeping them separate in the fridge. After they were all collected, I took ~ 10uL from each collected supe, put them on target cells, and measured luciferase activity a couple of days later (these particles had a lentiviral vector genome encoding firefly luciferase). Here’s the resulting data.

Some observations:

  • Particles definitely being produced by 24 hours, and seemingly reaching a peak production rate between 24 and 48 hours.
  • The producer cells kept producing particles at a reasonable constant rate. Sure, there was some loss between 48 and 72 hours, but still a ton being produced.
  • I stopped this experiment at 67 hours, but one can imagine extrapolating that curve out, and presumably there’s still ample production happening after 72 hours.

So yea, I suppose if the goal is to have the highest singular concentration, then taking a single collection at 48 or 72 hours will probably give you that. That said, if the goal is to have the highest total yield (which is usually the situation I’m in), then it makes much more sense to collect at various intervals, and then use the filtered, pooled supe in downstream experiments.

Also, I consider being able to dig up and discuss 10-year old data as a win!

Neutralizing Supe

Purchasing purified antibodies is expensive. Furthermore, purchased antibodies are a black box, from an amino-acid sequence perspective. For example, you may have a favorite anti-HA antibody from a company (my favorite from my PhD was an HRP direct conjugate of 3F10), but you likely have no clue what the sequence of that antibody is. Then again, if one had the sequence of the antibody, then they could order DNA encoding that antibody themselves, and then produce unlimited supplies of the antibody protein.

Well, I was curious enough to try this proof-of-principle. Thus, I ordered a DNA sequence encoding Bamlinivimab. It originally had an EUA to treat people infected with SARS-CoV-2, although the EUA eventually got pulled once variants resistant to it started circulating. Well, I engineered cells stably expressing it. Then to test it, I used it in a neutralization experiment, where I mixed SARS-CoV-2 spike pseudotyped lentiviral particles (encoding GFP) with high ACE2 expressing cells, and simultaneously added various amounts of the presumed Bamlinivimab-containing supernatant, or just supernatant from unmodified 293T cells as a control. Well, here are the results:

So definitely a dose-dependent decrease to pseudotyped virus infection, with the max amount used in this experiment (I believe 4 mLs supe out of 6 total mLs in the well, with the cells and virus each also taking a mL) giving a greater than 10-fold neutralizing effect. Cool.


So this doesn’t quite count as “synbiofun” since it didn’t work, so it’s not that fun. But figure I may as well post negative data here when we have it…

Based on this paper, I felt compelled to test some of the tricks they had published on that improved recombination efficiency. First up was DNA sequences that may help with nuclear targeting / import. Tried the NFkB DTS (DNA nuclear-targeting sequences) since that seemed to perform the best for them.

Didn’t exactly reproduce what they did, since I wanted to 1) Use my own construct that we normally use for recombination reactions, and 2) insert the sequence at a convenient location that I could put in with a single molecular cloning step and didn’t get in the way of any other elements we already had in the plasmid.

We cloned the sequences into the “G1180C_AttB_ACE2(del)-IRES-mScarletI-H2A-P2A-PuroR” backbone, where we could use the percentage of mScarlet fluorescent cells to tell us if recombination efficiency increased. Because of the repetitive nature of the NFkB DTS sequence, we ended up getting two different clones with the intended sequence: clone D had the indicated NFkB DTS sequence plus and additional repeat (for 5 repeats total), while clone E had the NFkB DTS sequence missing a repeat (for 4 repeats total).

Clone D
Clone E

Sarah recombined these into landing pad HEK 293T cells, and these were the results of red+ cells.
Negative control: 0.002%
G1180C (unmodified): 17.4%
Clone D (5 repeats): 4.78%
Clone E (3 repeats): 17.8%

So yea. Really didn’t seem to do anything. Not sure why Clone D is worse, although this is an n=1 experiment. If we really wanted to continue this, we would probably need to re-miniprep the plasmids to make sure it’s nothing about that specific prep. That said, nothing about the above results makes me optimistic that this will actually help our current system, so this avenue is likely going on ice.

Trimming and tabulating fastq reads

Anna has been testing her transposon sequencing pipeline, and needing some help processing some of her Illumina reads. In short, she needed to remove sequenced invariant transposon region (essentially a 5′ adapter sequence), trim the remaining (hopefully genomic) sequence to a reasonable 40nt, and then tabulate the reads since there were likely going to be duplicates in there that don’t need to be considered independently. Here is what I did.

# For removing the adapter and trimming the reads down, I used a program called cutadapt. Here's information for it, as well as how I installed and used it below.
# https://cutadapt.readthedocs.io/en/stable/installation.html
# https://bioconda.github.io/

## Run the commands below in Bash (they tell conda where else to look for the program)
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda config --set channel_priority strict

## Since my laptop uses an M1 processor
$ CONDA_SUBDIR=osx-64 conda create -n cutadaptenv cutadapt

## Activate the conda environment
$ conda activate cutadaptenv

## Now trying this for the actual transposon sequencing files
$ cutadapt -g AGAATGCATGCGTCAATTTTACGCAGACTATCTTTGTAGGGTTAA -l 40 -o sample1_trimmed.fastq sample1.assembled.fastq

This should have created a file called “sample1_trimmed.fastq”. OK, next is tabulating the reads that are there. I used a program called 2fast2q for this.

## I liked to do this in the same cutadaptenv environment, so in case it was deactivated, here I am activating it again.
$ conda activate cutadaptenv

## Installing with pip, which is easy.
$ pip install fast2q

## Now running it on the actual file. I think you have to already be in the directory with the file you want (since you don't specify the file in the command).
$ python -m fast2q -c --mo EC --m 2

## Note: the "python -m fast2q -c" is to run it on the command line, rather than the graphical interface. "--mo EC" is to run it in the Extract and Count mode. "--m 2" is to allow 2 nucleotides of mismatches.