Anh officially leaves the lab, as he graduates from CWRU (in three years!), and will be attending the Rice University Dept of Bioengineering in the fall to start his PhD. Good luck, Anh!
I’ve been keeping track of what the COVID situation has been like at Case since they first started posting the data every week, back in the fall of 2020 (https://case.edu/covid19/health-safety/testing/covid-19-testing-vaccination-and-case-data). Whenever the cases seem to be higher than usual, I’ve been messaging the below graph out to my group, so they can be informed and make the best risk assessments about their activities on campus.
Anyway, figured other people may be interested in this information too, and I’m getting kind of tired of sending the same exact message out like the last four weeks, so I figured I’d just post the plot here so people can see the current stats.
As of writing this (first week of May), cases have been the highest they’ve ever been, although at least almost everyone should be vaccinated and perhaps even boosted. Still, would certainly be nice to see that number come down some…
As you can tell from the above graph, the people in the lab (including me) are by far its most costly resource, accounting for the majority of all lab expenditures. Thus, while there are other important reasons, there’s always this very “bottom line” reason for me wanting to minimize how much personnel time and effort is wasted by confusion and mismanaging!
Here is some real-world data describing expected yields we may expect from some of these routine lab procedures or services.
Oh, and this is a good one:
How well my determination of flask “confluency” actually correlated with cell counts. I mean, sure, there must be some error being imparted by the actual measurement of the cells when counting, but I think we all know it’s mostly that my estimate really isn’t precisely informative.
Amplicon-EZ is a pretty convenient service from Genewiz. In short, they’ll perform Illumina sequencing on a 150-500 nt DNA fragment you send them (they’l perform 2 x 250 cycles of sequencing, so fragments smaller than 500nt will have paired read regions). For $50 per sample, they’ll return ~50,000 reads (although in our experience, they tend to return more than this). Turnaround times can be kind of slow (while one can minimize the delay if you time things perfectly, it’s taken between 14 and 19 days to get data back following submission). That said, we’re still only running full kits a couple of times a year, so obviously a lot faster turnaround than that. Thus, definitely good for getting an initial look into something you may want to sequence more deeply later. My general policy for that lab is that if you make any library, it’s worth submitting the library to Illumina sequencing via Amp-EZ pretty early on so you can be confident that the library is good and worthy of further experiments.
Primers are pretty simple to design. Essentially, you’ll want to make a pair of PCR primers with Amp-EZ adapters on the 5′ ends (and of course, DNA hybridizing sequences on the 3′ ends). As shown in the above link, the adapter sequences are:
For the forward sequencing read: 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’
For the reverse sequencing read: 5’-GACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’. <- Reverse strand on plasmid map
Here’s an example of a map and corresponding annotated primers used to sequence the ACE2 Kozak library plasmid, and a similar map for sequencing the library after its been integrated into landing pad cells.
Amplifying your fragment
For the actual protocols to do this, it’s probably worth asking Sarah or Nidhi how they do it. The basic steps are going to be PCR, gel extraction of the band, and Qubit quantitation of the extracted DNA. Some things to keep in mind are that they’ll want a fair amount of DNA (500 ng), so you’ll either want to make sure you do a lot of cycles and extract a pretty hefty band, or you’ll need to do a second amplification from the initially extracted DNA.
About the same time I got inspired to try making the vesicular protease assay, I figured I’d also build an assay to try to look at cell-surface localization of proteins.
Going back to that post about dL5 and malachite green (MG), I believe that MG doesn’t cross the the plasma membrane particularly well (Although now that I look at it, there are both ways to increase and decrease it’s cell permeability). Thus, I figured I’d test the dL5 fluorogen activating peptide in two forms; one where it’s sent out to the cell surface using a signal peptide (but anchored to the plasma membrane with a transmembrane domain), and another where its expressed as an intracellular, cytoplasmic protein.
Well, we made the constructs, and Sarah recombined the cells and tested them, and here are the results.
Well, ignore the blue distribution for now (since this is a construct testing a different hypothesis), and only look at the red and orange distributions for now. As you can tell, in the absence of MG, the signal is pretty low (I should probably through in some unrecombined landing pad cells on that plot to show what the background level is). On the other hand, MG addition causes cells encoding the extracellular dL5 to exhibit ~ 3e4 near infrared MFI, while the intracellular dL5 cells had about 1e4 MFI. So while that’s only a 3-fold difference, the standard deviations of those distributions were pretty tight, such that there was rather small overlap between those two distributions. So while this is a single, one-off experiment, it looks like this assay format may work.
Probably some additional knobs that can be turned to try to improve signal over background. First, is maybe this effect is somewhat MG concentration dependent, and reducing the amount of MG that is added may add some more dynamic range. Also, there are those less cell permeant MG derivations, which will likely improve the range (albeit, these are likely far harder to get than OG MG)
Obviously we make a lot of recombinant DNA constructs and create a bunch of different assays to try to understand biology. Well, at some point I became curious if I could come up with a protease cleavage assay for measuring how various peptide sequences could be substrates for vesicular proteases inside cells. I figured we’d start to test this by looking at furin cleavage, since 1) furin is quite ubiquitous, 2) furin cleavage is relevant for multiple disease-relevant membrane proteins, like many viral entry glycoproteins, and 3) including SARS-CoV-2 spike, where the furin cleavage site in that protein is thought to contribute to its dynamics of spread and pathogenicity during infection.
Well, the first construct that may have worked is a pretty simple one, where I have EGFP targeted to extracellular space using an N-terminal signal peptide, but retained on the cell surface by having it C-terminally fused to a transmembrane domain. Anh has actually been using this construct as part of his undergraduate research project. Well, to turn this construct into one that can study Furin cleavage, I modified it to encode a R-R-A-R peptide between EGFP and the transmembrane domain. Thus, if Furin cleaves this construct, the EGFP protein is now no longer tethered to the cell, and presumably escapes into the media once it reaches the cell surface.
So this is an N=1 experiment so far, so it’s not the most conclusive. That said, there does seem to be decent separation between the construct with and without the furin cleavage site, where cells with the construct with the EGFP that potentially gets released had ~ 10-fold less fluorescence than the construct that could get released. I suppose if this reproduces, it could be worth trying to turn this into a library-based experiment for studying protease cleavage within intracellular compartments.
… Though now that I’m thinking about it further, I’ll definitely need to ask how these cells were prepped. If they were prepped with trypsin (which is likely), this was likely more of a trypsin cutting assay rather than a furin one.
Well, so to avoid the whole trypsin complicated, recombined the above constructs in suspension HEK cells that are floating and don’t ever need to be detached off the plate for flow. Here’s what that plot looks like.
So ya, we’re still seeing the ~ 10-fold effect there, so it looks like that’s the real dynamic range of the assay for measuring furin protease cleavage.
It’s taken us a while, but at this point we’ve now submitted two multiplex NextSeq Illumina kits through CWRU genomics. We’ve performed the multiplexing by appending dual indices during targeted amplification of the barcodes. In order to mix everything together, we’ve taken a rather simple approach of gel extracting the amplified bands and quantitating the amount of extracted DNA using Qubit, and mixing every extracted amplicon to equimolar amounts. This mixture is then submitted to the CWRU genomics core for qPCR-based quantification, and loaded onto the kit for sequencing.
Since everything has been mixed to equimolar amounts so far, it’s quite simple to see how well our samples were quantitated and mixed together using the Qubit readings. This is done by taking the number of paired reads associated with each pair of indices, and seeing how their counts / frequencies compare to each other. The distributions tended to be a log-normal distribution, as seen in the below plots:
The second kit seemed to do a little better than the first one. Each kit ended up having a singular poorly sequenced outlier, with the sample in the first kit lower than the median by about ~300-fold, and the sample in the second kit lower by about 10x. The first kit also had about 4 samples that were below the median by about 10-fold.
Regardless, this log-normal distribution is showing what kind of mixing precision we can expect to consistent get with this scheme, even when its working well. Both distributions had a sd(log) of ~0.5. The coefficients of variation were ~ 0.04.
What are the red lines you ask? Well, those are the idealized / hypothetical number of reads we should have gotten if we got all of the reads from the kit (130 million for the mid kit used in kit 1, and 400 million for the high kit used in kit 2) divided by the total number of samples. Clearly, the real life read numbers aren’t hitting that idealized number, which is as expected.
To frame it another way though, sure having read a sample with too much read-depth in inefficient, but what is arguably more annoying is if end up under-reading a sample because we were off in out estimates. What seems like a reasonable approach is to choose a target read-count that will ensure that ~95% of our log-normal distribution hits that minimum read-number needed to get interpretable data. Looking at these n = 2 results, it seems reasonable that we would want to give each sample ~ 5 to 10-fold more reads than one would expect based on idealized numbers.
Nidhi recently needed to design new reverse primers for making dual-indexed amplicons for Illumina sequencing, but this involved generating novel indexes that could be used with existing primer sets. Luckily, I had already asked Anh to create a script for such a purpose, so once I remembered that he had already done that, all I had to do was find it and implement it. It’s pretty neat, since it goes to a Google Sheet in the cloud (so not a local file), and extracts the existing indices so we know to avoid things similar to them. Since it was a pretty easy script to understand, I ended up making some tweaks to it: 1) Adding a user input function, rather than hard-coding the number of new indices needed into the script itself, 2) Having the script consider the forward and reverse sequences of the existing indices to avoid both, and 3) spitting out an identity matrix of the pairwise comparisons of the existing 10 nucleotide indices, so I can visually verify that none are really close in sequence.
The script lives here in this GitHub repo, with the newest version I modified called “generateIndex_user_input.py”.
To be specific, rather than an identity matrix, it’s actually a distance matrix (how many positions within the 10 nucleotides are NOT identical), with the white diagonal in the above plot essentially a positive control since that’s showing that in the pairwise comparisons, the same sequence when compared to it self shows 0 nucleotides of difference. Notably, there are no other purely white blocks in that matrix, partially b/c 1) the chance that two randomly generated sequences are identical is ~ 1/(4^10), which is a very small number, and 2) Anh’s script is likely working, and keeping the number of identical matches between any newly generated indices and existing indices to a minimum. Actually, it looks like most differ by 7 or 8 nucleotides, and only a tiny fraction only differ by 3 nucleotides or so (and thus match at 7 positions). If those are the combinations with the highest similarity, it really isn’t bad at all.
So ya, I think we’re doing a decent job of managing our indices to make sure they are not overlapping!
So I had collected this data over 10 months ago, when I made an initial foray into trying the fluorogen activating peptides dL5 in the presence of Malachite Green, a dirt cheap small molecule (You can use it to try to kill fungus eating at your fish in your aquarium). A recent email reminded me about fluorogen activating peptides again, so I went back and looked at this data again, and it really isn’t bad.
In summary. 1) Works well, giving ~ 100-fold increase to fluorescence from background when highly expressed. 2) Probably want to do at least 500 ng/mL or perhaps even 1ug/mL to improve signal over background. 3) Still get specific signal after washout, but you may as well leave it on and get max signal.