## Simulating sampling during routine lab procedures

TL;DR: Statistics is everywhere, and simulating bottlenecks that happen during routine lab procedures such as dilutions of cells can potentially help you increase reproducibility, and at the least, help you better conceptualize what is happening with each step of an experiment.

I’m still working on getting a cell counter for the lab. In the meantime, we’ve been using an old school hemacytometer to count cells before an experiment. Sarah had used a hemacytometer more recently than me, and knew to dilute the cells from a T75 flask 10-fold to get them into a countable range for the hemacytometer. She said she had performed the dilution by putting 10 ul cells in 90 ul media (and then putting 10 ul of the dilution into the hemacytometer). But as she said this, she asked whether it was OK to perform the dilution as described; a grad student in her previous lab had taught her to do it that way, but a postdoc there said it was a bad idea. My immediate response was that if the cells are sufficiently mixed, then it should be fine. And while that was my gut reaction, I realized that it was something I could simulate and answer myself using available data. Would the accuracy of the count be increased if we diluted 100 ul of cells into 900ul of media, or 1ml of cells into 9ml of media?

Here are the methods (skip if you don’t want to dive in and want to save yourself a paragraph of reading): To me, it would seem the answer to whether the dilution matters depends on how the cells are dispersed in the media / how variable the count is when the same volume is sampled numerous times. Sarah’s standard practice is to count four squares of the hemacytometer, so I had four replicate counts for each volume pipetted. She had repeated this process three times by the time I had performed the analysis, giving me a reasonable dataset I could roll with. I got the mean and standard deviations for each of the three instances, all corresponding to a volume of 0.1 ul. They were all quite similar, so I created a hypothetical normal distribution from the average mean and standard deviation. Next was seeing how different ways of performing the same dilution impacted the accuracy of individual readings. I recreated the 10 ul cells by sampling from this distribution 100 times, 100 ul cells by sampling 1,000 times, and 1 ml by sampling 10,000 times, and taking the mean. I repeated this process 5,000 times for each condition, and looked at how wide each distribution was.

I then turned the counts into concentration (cells / ml):

Instead of stopping there, I thought about the number of cells I was actually trying to plate, which was 250,000. The number the distributions were converging to was ~ 27.3 (black line), so I used that as the “truth”, and saw how many “true” cells would be plated if I had determined the volume needed to be plated based on each of the repeat concentrations calculated by each of the conditions of dilutions. The resulting plot looked like this:

So as you can tell based on the plot, there are slight differences in cells plated depending on imprecision propagated by the manner in which the same 10-fold dilution as performed: while all distributions are centered around 250k, the 10 ul dilution distribution was quite wide, while the 1 ml in 9 ml dilution resulted in cell counts very close to 250k each time. To phrase it another way, ~15% of the time, a 10 ul in 90 ul dilution would cause the “wrong” number of cells to be plated (less than 24k, or more than 26k). In contrast, due to the increased precision, a 100 ul in 900 ul dilution would never result in the “wrong” number of cells being plated. So speaking solely about the dilution, the way the dilution was being performed could have some light impacts on the accuracy of how many cells would be actually plated.

I was going to call this exercise complete, but I ran this analysis by Anna, and she mentioned that I wasn’t REALLY recreating the entire process; sure I had recreated the dilution step, but we would have also counted cells from the dilution in the hemacytometer to actually get the cell counts in real life. Thus, I modified the code such that each dilution step was followed by a random sampling of four counts (using the coefficient of variation determined from the initial hemacytometer readings), and taking the mean of those counts; this represented how we would have ACTUALLY followed up each dilution in real life. The results were VERY different:

In effect, the imprecision imparted by the hemacytometer counts seemed to almost completely drown out the imprecision caused by the suboptimal dilution step. This was pretty mind-blowing for me; especially considering that I would have totally missed this effect had I not run this post by Anna. Now fully modeling the counting process, a 10 ul in 90 ul dilution would cause the “wrong” number of cells (less than 24k, or more than 26k) to be plated ~ 42.5% of the time, and a 100 ul in 900 ul dilution would still cause a “wrong” cell number to be plated ~ 42.2 % of the time; almost identical! Thus, while a 100 ul in 900 ul dilution does impart some slightly increased accuracy, it’s quite minor / negligible over a 10 ul in 90 ul dilution. So while in a sense this wasn’t the initial question asked, it’s still effectively the real answer.

At the end of the day, I think the more impactful aspect of this exercise is the idea that even routine aspects of wet-lab work are deeply rooted in stats (in this case, propagation of errors caused by poor sampling), and that the power of modern computational simulations can be used to optimize these procedures. There’s something truly empowering to having a new tool / capability that gives you new perspectives on procedures you’ve done a bunch of times, and allows you to fully rationalize it rather than relying on advice given to you by others.

Here’s the code if you want to try running it yourself.
Acknowledgements: Thanks to Sarah for bringing this question to my attention. Also, BIG THANKS to Anna for pointing out where I was being myopic in my analysis, which got me to a qualitatively different (and more real-life relevant) answer. It really is worth having smart people look over your work before you finalize it!