TL;DR -> If you’re doing infection assays, it’s worth sticking to < 20% EGFP positive, and depending on how many cells you count, > 0.03% EGFP positive cells.

Can’t say I expected it, but the SARS CoV-2 pandemic has brought virology back into my sights. This includes some familiar tools, such as lentiviral pseudovirus infection assays which I had used a lot of in graduate school. It’s the same basic process as using lentiviral vectors for cell engineering, except you’re changing out the transgene of interest for a reporter protein, like EGFP. If the pseudovirus is able to (more-or-less) accurately model an aspect of the infection process, then you now have yourself a pretty nice molecular / cell biology surrogate to learn some biology with. Essentially, it tells you how efficiently the virus is getting into cells. Since all of the “guts” can look like HIV, you can us it to study parts of the HIV life cycle after it’s entered the cell but before its integrated into the host cell genome (this is what my PhD was in). It’s arguably more common use is instead as a vehicle for studying the protein interactions that various viruses make to enter the cell cytoplasms.

The loveliness of the assay (especially the fluorescent protein version) is that it is inherently a binary readout (infected or uninfected) at a per-cell level. But, since there are *MANY* cells in the well, it becomes a probability distribution, where the number of infected cells captures how efficient the conditions of entry / infection were.

But of course there is also some nuance to the stats here. For example, if you mixed a volume containing 100,000 cells with another volume containing 100,000 viral particles (and thus a multiplicity of infection (MOI) of 1), 100% of the cells don’t get infected. In actuality, you would only get about 63% of cells getting infected. That’s b/c each virus isn’t a heat-seeking missile capable of infecting the only remaining uninfected cells around; instead, it’s because each infection event is (*more or less) an independent event. Thus, cells can be multiply infected. Notably, based on an EGFP readout, you can’t really figure that out (ie. cells that are doubly infected are essentially indistinguishable from singly infected cells based on their fluorescence levels). But, since we know how the system should function, we know that it should largely follow a Poisson distribution.

Here’s a plot I made with this script to demonstrate what that relationship is between the MOI and the % of green cells you should expect to see:

Things look nice and linear at low MOIs, but as soon as you start getting close to an MOI of 1 then the number of multiply infected cells start taking on a bigger role, until eventually every cell becomes multiply infected as the last few “lucky” uninfected cells get so overrun that there is no escaping infection. In terms of trying to get nice quantitative data, that saturation of the assay is kind of problematic. Generally speaking I find somewhere about 20-30% of GFP positive cells where you’re really started losing that linearity (shown by the dotted line).

Why not just always do small numbers, like 1% or even 0.1%? Well, b/c you’re kind of caught between a rock and a hard place: I guess the rock is the unbreakable boundary of 100% infected cells, while if you go too low it gets hard to sample enough events to accurate quantitate something that’s really rare.

I do think there’s still a lot of meat in that assay though, as long as you count sufficient cells. I generally quantitate the number of infected + uninfected cells using flow cytometry. Being able to count 10,000+ events a second mean you can run through all of the cells of a 24-well in under a minute. That means you can pretty reasonably sample 100,000 cells per condition. Assuming that’s the case, how does our ability to accurately quantitate our sample drop as the number of infected cells fall?

IMO, it begins to become unusable when you get to roughly 0.03 percent GFP positive cells. That’s because past that, the amount of variability you get in trying to measure the mean is roughly about the size of the mean itself. Furthermore, your chances for encountering replicate experiments where you get *zero* GFP positive cells really starts to increase. And those zeros a pretty annoying, since they are quite uninformative.

So just understanding the basic stats, it looks like my informative range is going to be ~ 0.03 percent positive to ~ 30 percent positive. So roughly 3 orders of magnitude. Ironically, that just about exactly what I saw when I ran serial dilutions of a SARS CoV pseudovirus on ACE2 overexpressing cells.

So there you have it. The theoretical range of the assay, backed experimentally by what I saw in real life. The wonderfulness of science. You can find the script I used to these these estimations here.