Nidhi Shukla, PhD, joins the Matreyek lab as a postdoc, exactly one year after the lab started. Welcome, Nidhi!
TL;DR -> If you’re doing infection assays, it’s worth sticking to < 20% EGFP positive, and depending on how many cells you count, > 0.03% EGFP positive cells.
Can’t say I expected it, but the SARS CoV-2 pandemic has brought virology back into my sights. This includes some familiar tools, such as lentiviral pseudovirus infection assays which I had used a lot of in graduate school. It’s the same basic process as using lentiviral vectors for cell engineering, except you’re changing out the transgene of interest for a reporter protein, like EGFP. If the pseudovirus is able to (more-or-less) accurately model an aspect of the infection process, then you now have yourself a pretty nice molecular / cell biology surrogate to learn some biology with. Essentially, it tells you how efficiently the virus is getting into cells. Since all of the “guts” can look like HIV, you can us it to study parts of the HIV life cycle after it’s entered the cell but before its integrated into the host cell genome (this is what my PhD was in). It’s arguably more common use is instead as a vehicle for studying the protein interactions that various viruses make to enter the cell cytoplasms.
The loveliness of the assay (especially the fluorescent protein version) is that it is inherently a binary readout (infected or uninfected) at a per-cell level. But, since there are *MANY* cells in the well, it becomes a probability distribution, where the number of infected cells captures how efficient the conditions of entry / infection were.
But of course there is also some nuance to the stats here. For example, if you mixed a volume containing 100,000 cells with another volume containing 100,000 viral particles (and thus a multiplicity of infection (MOI) of 1), 100% of the cells don’t get infected. In actuality, you would only get about 63% of cells getting infected. That’s b/c each virus isn’t a heat-seeking missile capable of infecting the only remaining uninfected cells around; instead, it’s because each infection event is (*more or less) an independent event. Thus, cells can be multiply infected. Notably, based on an EGFP readout, you can’t really figure that out (ie. cells that are doubly infected are essentially indistinguishable from singly infected cells based on their fluorescence levels). But, since we know how the system should function, we know that it should largely follow a Poisson distribution.
Here’s a plot I made with this script to demonstrate what that relationship is between the MOI and the % of green cells you should expect to see:
Things look nice and linear at low MOIs, but as soon as you start getting close to an MOI of 1 then the number of multiply infected cells start taking on a bigger role, until eventually every cell becomes multiply infected as the last few “lucky” uninfected cells get so overrun that there is no escaping infection. In terms of trying to get nice quantitative data, that saturation of the assay is kind of problematic. Generally speaking I find somewhere about 20-30% of GFP positive cells where you’re really started losing that linearity (shown by the dotted line).
Why not just always do small numbers, like 1% or even 0.1%? Well, b/c you’re kind of caught between a rock and a hard place: I guess the rock is the unbreakable boundary of 100% infected cells, while if you go too low it gets hard to sample enough events to accurate quantitate something that’s really rare.
I do think there’s still a lot of meat in that assay though, as long as you count sufficient cells. I generally quantitate the number of infected + uninfected cells using flow cytometry. Being able to count 10,000+ events a second mean you can run through all of the cells of a 24-well in under a minute. That means you can pretty reasonably sample 100,000 cells per condition. Assuming that’s the case, how does our ability to accurately quantitate our sample drop as the number of infected cells fall?
IMO, it begins to become unusable when you get to roughly 0.03 percent GFP positive cells. That’s because past that, the amount of variability you get in trying to measure the mean is roughly about the size of the mean itself. Furthermore, your chances for encountering replicate experiments where you get *zero* GFP positive cells really starts to increase. And those zeros a pretty annoying, since they are quite uninformative.
So just understanding the basic stats, it looks like my informative range is going to be ~ 0.03 percent positive to ~ 30 percent positive. So roughly 3 orders of magnitude. Ironically, that just about exactly what I saw when I ran serial dilutions of a SARS CoV pseudovirus on ACE2 overexpressing cells.
So there you have it. The theoretical range of the assay, backed experimentally by what I saw in real life. The wonderfulness of science. You can find the script I used to these these estimations here.
We do a lot of molecular cloning in the lab. Sarah has been working on a great protocols.io page dedicated to writing up the entire process, which I’ll link to once it’s completely set. But once you’ve successfully extracted DNA from individual transformants, a key tool in the molecular cloning pipeline is to find the clonal DNA prep that has the intended recombinant DNA in it. We use Sanger sequencing to identify those clones. That is what this instructional post will be about.
First, gather all of your data. I have a plasmid map folder on the lab google drive, where all of the physical .gb files for each unique construct is stored. Create a new folder (named after your .gb file), and *COPY* all of the relevant Sanger sequencing traces (these are .ab1) files into that folder. That will just help organize everything down the road. *PREFIX* the Sanger sequencing files with the clone id (usually “A”, “B”, or “C”, etc). This will make things easier to interpet down the road.
You presumably already have this plasmid map on Benchling, since that’s likely where you designed the plasmid, so open that up. if you don’t already have it on there, then import it. Once open, click on the to the “alignment” button on the right hand side.
If this is your first time trying to align things to this plasmid, then your only option will be to “Create new alignment”; press this button.
Next you’ll get to another screen which will allow you to add in your ab1 files. Click the choose files button, go into the new folder you had made in the “Plasmid” directory of the lab google drive, and import the selected abi files.
If you’ve successfully added the abi files, then the screen should look like this:
The default settings are fine for most things, so you can go ahead and hit the “create alignment” button. It will take a few seconds, but at the end you should get a new screen that looks like this.
The above screen is showing you your template at the top, as well as the Sanger seq peaks for each of your Sanger runs. If you’re trying to screen miniprep clones to see which prep might have the right construct, then go to the part of the map / alignment that was at the junction. For example, in this above construct, I had shuttled in the iRFP670 in place of mCherry in the parental construct, so anything that now has iRFP670 in there in place of mCherry is an intended construct. Looks like I went 3/3 this time.
This is a good chance to look for any discrepancies, which are signified by red tick marks in the bottom visualization. I’ve now moved the zoomed portion to that area, and as you can see, the top two clones seem to have an extra G in the sequence triggering these red lines. This is where a bit of experience / intuition is important. Since this is toward the end of the sequencing reaction and the peaks are getting really broad (compare to the crisp peaks in the prior image), the peak-calling program was having a hard time with this stretch of multiple Gs, thus calling it 4 Gs instead of 3. I’m not concerned about this at all, since it’s more of an artifact of the sequencing rather than a legit mutation in the construct. If we were to sequence this area again with a primer that was closer, I’m almost sure all of the constructs will show they only have 3 Gs.
Since all three constructs seem to have the right insertion any without any seemingly legit errors (yet), I typically choose one close and move forward with fully confirming it. All things being equal, I make my life simpler by choosing the clone that is earliest on the alphabet (so A in this case). That said, if all three looked fine but Clone C had by far the highest quality sequencing (say, the Clone A trace only went 400nt while the Clone C trace went 800+, I’d instead go with Clone C).
Next is sequencing the rest of the open reading frame. if you already chose a sequencing primer, then you’ve probably already attached a bunch of primers onto this map. But let’s pretend you haven’t already done this. To do this, go to the “primer” button and hit “attach new primers”, as below.
You may already have a primer folder selected. In that case, you’re all set to go. Otherwise, you’ll have to select the most recent primer folder. Click the “Add locations” area, open the triangle for the folder that says “O_Primers” and select the most recent folder (since this is currently august, this would be “20200811”. If selected, it should show up in the window, like the below picture.
Hit “Find binding sites” and it will come up with a long list of primers *that we already have* that are located in your construct.
This part gets kind confusing. To actually use these primers, you’ll first have to hit the top left box, located on the header row of the big table of primers. You should then see a check mark next to every primer.
Once you do that, hit the “Attach selected primers” button that it’s in green at the top right of the screen.
Once you do that, all of the primers that were previously listed but colored white before should now be colored green. NOW you’re free to switch back to your alignment.
Once you switch back to your alignment, the top should have a big yellow box that says “Out of Sync” and have a blue button that says “Realign”. Hit the realign button, which will call up one of your earlier screens, and you just have to hit realign once more.
Now all of the attached primers should show up in the top row of the alignment window. Since I have so many overlapping primers, this eats up a bunch of the space on the screen (you can turn off the primers if needed by clicking on the arrow next to “Template” and clicking off the box next to Primers). Still, now you can move around the map and find primers that will be able to sequence different parts o the plasmid you have not sequenced yet. In this case, I’ll likely just move forwards with one clone (such as Clone A), and I’ll start sequencing other parts of the ORF, such as the remaining parts of mCherry which I can likely get with the primer KAM1042.
Congrats, you’re now a Sanger sequencing analysis master.
We do a lot of flow cytometry in the lab. Inevitably, what ends up being the most practical tool for analysis of low cytometry data is FlowJo. While I’ve been using FlowJo for a long time, I realize it isn’t super intuitive and new people to the lab may first struggle in using it. Thus, here’s a short set of instructions for using it to do a basic process, such as determining what percentage of live cells are also GFP positive.
Obviously, if you don’t have FlowJo yet, then download it from the website. Next, log into FlowJo Portal. I’m obviously not going to share my login and password here; ask someone in the lab or consult the lab google docs.
Once logged in, you’ll be starting with a blank analysis workspace, as below.
Before I forget, an annoying default setting of FlowJo is that it only lists two decimal points in most of its values. This can be prohibitively uninformative if you have very low percentages that you’re trying to accurate quantitate. Thus, click on the “Preferences” button:
Then click on the “Workspaces button”
And finally once in that final window, change the “Decimal Precision” value to something like 8.
With that out of the way, now you can perform your analysis. Before you start dragging in samples, I find it useful to make a group for the specific set of samples you may want to analyze. Thus, I hit the “Create Group” button and type in the name of the group I’ll be analyzing.
Now that the group is made, I select it, and then drag the new sample files into it, like below:
Now to actually start analyzing the flow data. Start by choosing a representative sample (eg. the first sample), and double clicking on it. By default, a scatterplot should show up. Set it so forward scatter (FSC-A) is on the X-axis, and side scatter (SSC-A) is on the Y-axis. Since we’re mostly using HEK cells, that means that main thing we will be doing in this screen is gating for the population of cells while excluding debris (small FSC-A but high SSC-A). Thus, make a gate like this:
Once you have made that gate, you’ll want to keep it constant between samples. Thus, right click on the “Live” population in the workspace and hit “Copy to Group”. Once you do that, the population should now be in bold, with the same text color as the group name.
Next is doublet gating. So the live cell population will already be enriched for singlets, but having a second “doublet gating” step will make it that much more pure. Here is the best description of doublet gating I’ve seen to date. To do this, make a scatterplot where FSC-A is on the X-axis, and FSC-H is on the Y-axis. Then only gate the cells directly on the diagonal, thus excluding those that have more FSC-A relative to FSC-H. Name these “Singlets”.
And like before, copy this to the group.
Next is actually setting up the analysis for the response variable we were looking to measure. In this case, it’s GFP positivity, captured by the BL1-A detector. While this can be done in histogram format, I generally also do this with a scatterplot, since it allows me to see even small numbers of events (which would be smashed against the bottom of the plot if it were a smoothed histogram). Of course, a scatterplot needs a second axis, so I just used mCherry fluorescence (or the lack of it, since these were just normal 293T cells), captured by the YL2-A detector.
And of course copy that to the group as well (you should know how to do this by now). Lastly, the easiest way to output this data is to hit the Table Editor button near the top of the screen to open up a new window. Once in this window, select the populations / statistics you want to include from the main workspace, and drag it into the table editor, so you have something that looks like this.
Some of those statistics aren’t what we’re looking for. For example, I find it much more informative to have the singlets show total count, rather than Freq of parent. To do this, double click on that row, and select the statistic you want to include.
And you should now have something that looks like this:
With the settings fixed, you can hit the “Create Table” button at the top of the main workspace. This will make a new window, holding the table you wanted. To actually use this data elsewhere (such as with R), export it into a csv format which can be easily imported by other programs.
Congratulations. You are now a FlowJo master.
The people at the CWRU flow cytometry core recent did a clean reinstall of one of their instruments, which meant that we had to re-set up our acquisition template. I still ended up eyeballing what would be the best laser / filter sets based on the pages over at FPbase.org, but I had a little bit of free time today, so I decided to work on a project I had been meaning to do for a while.
In short, between the downloadable fluorescent spectra at FPbase, as well as known instrument lasers and detector bandpass filters, I figured I could just write a script that essentially takes in whatever fluorescent protein with spectra that you have downloaded, and essentially makes a table showing you which laser + detector filter combinations give you the highest amount of fluorescence.
Here’s the R script on the lab GitHub page. I made it for the two flow cytometers and two sorters I use at the CWRU cytometry core, although it would presumably be pretty easy to change the script to make it applicable for whatever instruments are at your place of work. So here’s a screenshot of a compendium of the results for these instruments:
Nothing too surprising here, although it’s still nice / interesting to see the actual results. While it’s somewhat obvious since the standard Aria has no Green or Yellow-Green laser, we should not do any sorting with mCherry on it. Instead, we should use the Aria-SORP, which has the full complement of lasers we need.
Now that my lab is fully equipped, I’m taking on rotation students. Unfortunately, with the pandemic, it’s harder to have one-on-one meetings where I can sit down and walk the new students through every method. Furthermore, why repeat teaching the same thing to multiple students when I can just make an initial written record that everyone can reference and just ask me questions about? Thus, here’s my instructional tutorial on how I design primers in the lab.
First, it’s good to start out by making a new benchling file for whatever you’re trying to engineer. If you’re just making a missense mutation, then you can start out by copying the map for the plasmid you’re going to use as a template. Today, we’ll be mutating a plasmid called “G619C_AttB_hTrim-hCPSF6(301-358)-IRES-mCherry-P2A-PuroR” to encode the F321N mutation the CPSF6 region. This should abrogate the binding of this peptide to the HIV capsid protein. Eventually every plasmid in the lab gets a unique identifier based on the order it gets created (this is the GXXXX name). Since we haven’t actually started making this plasmid yet, I usually just stick an “X” in front of the name of the new file, to signify that it’s *planned* to be a new plasmid, with G619C being used as the template. Furthermore, I write in the mutation that I’m planning to make in it. Thus, this new plasmid map is now temporarily being called “XG619C_AttB_hTrim-hCPSF6(301-358)-F321N-IRES-mCherry-P2A-PuroR”
That’s what the overall plasmid looks like. We’ll be mutating a few nucleotides in the 4,000 nt area of the plasmid.
I’ve now zoomed into the part of the plasmid we actually want to mutate. The residue is Phe321 in the full length CPSF6 protein, but in the case of this Trim-fusion, it’s actually residue 344.
I next like to “write in” the mutation I want to make, as this 1) makes everything easier, and 2) is part of the goal of making a new map that now incorporates that mutation. Thus. I’ve now replaced the first two T’s of the Phe codon “TTT” with two A’s, making the “AAT” codon which encodes Asn (see the image above)
Next is planning the primers. So there are a few ways one could design primers to make the mutation. I like to create a pair of overlapping (~ 17 nt), inverse primers, where one of the primers encodes the new mutation in it. PCR amplification with these primers should result in a single “around-the-circle” amplicon, where there is ~ 17 nt of homology on the terminal ends. These ends can then be brought together and closed using Gibson assembly.
So first to design the forward primer. This is the primer that will go [5’end] –[17 nt homology] — [mutated codon] — [primer binding region] — [3’end]. So the first step is to figure out the primer binding region.
In a cloning scheme like this, I like to start selecting the nucleotides directly 3′ of the codon to be mutated, and select enough nucleotides such that the melting temperature is ~ 55*C. In actuality, the melting temperature will be slightly higher, since 1) we will end up having 17 nt of matching sequence 5′ of the mutated codon, and 2) the 3rd nt in the codon, T, will actually be matching as well.
Now that I’ve determined how long I need that 3′ binding region to be, I select the entire set of nucleotides I want in my full primer. In this case, this ended up being a primer 36 nt in length (see below).
Since this is the forward primer, I can just copy the “sense” version of this sequence of nucleotides.
OK, so next to design the reverse primer. This is simpler, since it’s literally just a series of nucleotides going in the antisense orientation directly 5′ of the codon (as it’s shown in the sense-stranded, plasmid map). I shoot for ~ 55*C to 60*C, usually just doing a little bit under 60*C.
Since this is the reverse primer, we want the REVERSE COMPLEMENT of what we see on the plasmid map.
Voila, we now have the two primers we need. We just now need to order these oligos (we order from ThermoFisher, since it’s the cheapest option at CWRU, and can then perform the standard MatreyekLab cloning workflow).
As of this posting, we’ve cloned 176 constructs in the lab. I’ve kept pretty meticulous notes about what standard protocol we’ve used each time, how many clones we’ve screened, and how many clones had DNA where the intended insertions / deletions / mutations were present. With this data, I wondered whether I could take a quick retrospective look on my observed success / failure rates to see if I could use to see if my basic workflow / pipeline was optimized to maximize benefit (ie. getting the recombinant DNA we want) while limiting cost (ie. Time, effort, $$$ for reagents and services). I particularly focused on 2-part Gibsons, since that’s the workhorse approach utilized for most molecular cloning in the lab.
First, here’s a density distribution reflecting reaction-based success rates (X number of correct clones in Y number of total screened clones, or X / Y = success rate).
I then randomly repeatedly sampled N-times from that distribution, ranging from N-values 1 through 5, effectively pretending that I was screening 1 clone, 2 clones … up to 5 clones for each PCR + Gibson reaction we were performing. Since 1 good clone is really all you need, for each sampling of N clones, I checked whether any of them were a success (giving that reaction a value of “1”) or a whether all of them failed (giving that reaction a value of “0”). I repeated this process 100 times, and counted the sum of “1” and “0” values, and divided by 100 to get an overall success rate. I repeated this process 50 overall times to get a sense of the variability of outcome with each condition. Here are the results:
We screen 3 clones per reaction in our standard protocol, and I think that’s a pretty good number. We capture at least 1 successful clone 3/4 of the time. Sure, maybe we increase how often we get the correct clone on the first pass if we instead screen 4 or 5 clones at at time, but the extra effort / time / cost doesn’t really seem worth it, especially since it’s totally possible to screen a larger number on a second pass for those though-but-worth-it clones. Some of those reactions are also going to be ones that are just bad, period, and need to be re-started from the beginning (perhaps even by designing new primers), which is a screening hill that certainly isn’t worth dying on.
9/10/20 edit: In my effort to make it easier for trainees to learn / recreate what I’m doing, I posted the data and analysis script to the lab GitHub.
The SARS CoV-2 pandemic -caused research ramp-down period was a weird time for me / the lab. I sent Sarah to work from home for 10 or so weeks, meaning I had to do the lab work myself if I wanted to make any progress on any of the existing grant-work, or for any of the SARS CoV-2 research I was trying to boot up. This has resulted in some VERY long weeks over the last few months, as I was really trying to do everything at that point. Cognizant of this, I even started timing myself doing some of the more routine / mundane tasks, to see if I could try to maximize my efficiency. Perhaps the most consistent / predictable of the tasks were minipreps. In particular, I was curious whether doing more minipreps simultaneously saved me time in the long run.
So short answer was yes. 24 is a very comfortable / logical number for me (I just fill up my mini-centrifuge, and the result is divisible by three so easy for processing as complete 8-strip PCR tubes for Sanger later on), and I consistently processed those in about an hour. Dong fewer would be somewhat less efficiency, though sometimes you have to do that if you’re in a rush to get some particular clone of recombinant DNA plasmid. Then again, doing more than 24 — while somewhat exhausting — does save me some time overall. Thus, I found out that was a worthwhile strategy to plan for during that period.
That said, I’m very glad to have Sarah back in the lab helping me with some of the wet-lab work again. Not only does it save me time, but also saves me focus; I’ve gotten pretty good at multi-tasking, but I still do hit a limit in terms of the number of DIFFERENT things I can do / think about at the same time.
I do a lot of molecular cloning, which means a lot of transformations of chemically competent e.coli. Using 50 uL of purchased competent bacteria would cost about $10 per transformation, which would be an AWFUL waste of money, especially with this being a highly recurring expense in the lab. I had never made my own competent cells before, so I had to figure this out shortly after starting my lab. It took a couple of days of dedicated effort, but it ended up being quite simple (I’ll link to my protocol a bit later on). Though my frozen stocks ended up working fine, I became quite used to creating fresh cells every time I need to do a transformation. The critical step here is taking a saturated overnight starter culture, and diluting it so you can harvest a larger volume of log-phase bacteria some short time later. A range of ODs [optical density here defined as absorbance at 600 nm] work, though I like to use bacteria at an OD around 0.2. I had gotten pretty good at being able to eyeball when a culture was ready for harvesting (for LB in a 250 mL flask, I found this was right when I started seeing turbidity), but I figured there was a better way to know when it’s worth sampling and harvesting.
I started keeping good notes about 1) the starting density of my prep culture (OD of the overnight culture divided by the dilution factor), 2) the amount of time I left the prep culture growing, and 3) the final OD the prep culture. I converted everything into cell density which is a bit more intuitive than OD (I found 1 OD[A600] of my bacteria roughly corresponded to 5e5 bacteria per mL), and worked in those units from there on out. Knowing bacteria exhibit exponential growth, I log base-10 transformed the counts. Much like the increasing number of COVID-19 deaths experienced by the US from early March through early April, exponential growth becomes linear in log-transformed space. I figured I could thus estimate the growth of my prep culture of competent cells by making a multi-variate linear model, where the final density of the bacteria was dependent on the starting bacterial density and how long I left it growing. I figured the lag-phase from taking the saturated culture and sticking it into cold-LB would end up being a constant in the model. Here’s my dataset, and here’s my R Markdown analysis script. My linear model seemed to perform pretty well, as you can see in the below plot. As of writing this, the Pearson’s r was 0.98.
The aforementioned analysis script has a final chunk that allows you to input the starting OD of your starter culture, and assuming a 1000-fold dilution, tells you how long you likely need to wait to hit the right OD of your prep culture. Then again. I don’t think anyone really wants to enter this info into a computer every time they want to set up a culture, so I made a handy little “look-up plot”, shown below, where a lab member could just look at their starter culture OD on the x-axis, choose the dilution they want to do (staying 2x within 1000-fold since I don’t know if smaller dilutions can affect bacterial competency), and figure out when they need to be back to harvest (or at least stick the culture on ice). I’ve now printed this plot out and left it by my bacterial shaker-incubator.
I’m still much more of a wet-lab scientist than a computational one. That said, god damn do I still think the moderate amount of computational work I can do is still empowering.