Using prior data to optimize the future

As of this posting, we’ve cloned 176 constructs in the lab. I’ve kept pretty meticulous notes about what standard protocol we’ve used each time, how many clones we’ve screened, and how many clones had DNA where the intended insertions / deletions / mutations were present. With this data, I wondered whether I could take a quick retrospective look on my observed success / failure rates to see if I could use to see if my basic workflow / pipeline was optimized to maximize benefit (ie. getting the recombinant DNA we want) while limiting cost (ie. Time, effort, $$$ for reagents and services). I particularly focused on 2-part Gibsons, since that’s the workhorse approach utilized for most molecular cloning in the lab.

First, here’s a density distribution reflecting reaction-based success rates (X number of correct clones in Y number of total screened clones, or X / Y = success rate).

I then randomly repeatedly sampled N-times from that distribution, ranging from N-values 1 through 5, effectively pretending that I was screening 1 clone, 2 clones … up to 5 clones for each PCR + Gibson reaction we were performing. Since 1 good clone is really all you need, for each sampling of N clones, I checked whether any of them were a success (giving that reaction a value of “1”) or a whether all of them failed (giving that reaction a value of “0”). I repeated this process 100 times, and counted the sum of “1” and “0” values, and divided by 100 to get an overall success rate. I repeated this process 50 overall times to get a sense of the variability of outcome with each condition. Here are the results:

We screen 3 clones per reaction in our standard protocol, and I think that’s a pretty good number. We capture at least 1 successful clone 3/4 of the time. Sure, maybe we increase how often we get the correct clone on the first pass if we instead screen 4 or 5 clones at at time, but the extra effort / time / cost doesn’t really seem worth it, especially since it’s totally possible to screen a larger number on a second pass for those though-but-worth-it clones. Some of those reactions are also going to be ones that are just bad, period, and need to be re-started from the beginning (perhaps even by designing new primers), which is a screening hill that certainly isn’t worth dying on.

9/10/20 edit: In my effort to make it easier for trainees to learn / recreate what I’m doing, I posted the data and analysis script to the lab GitHub.