Instructional Archives - Matreyek Lab

BCA protein quantitation

Posted on July 1, 2025July 1, 2025 by kmatreyek

I guess the verdict is still out exactly how well the Qubit Protein Assay works. Now that I’m looking into the details, looks like the Qubit Protein Assay (Q33211) that we have is incompatible with RIPA buffer, which is likely the buffer people had tried using it with in the past. But, the BCA assay is perfectly compatible with RIPA buffer, so we still do it reasonably often. But, to make sure everyone in the lab is on the same page, we may as well look at some real data and see how I view it. This will be based on some data that Olivia S had generated a month or so ago.

Link to the code below:
https://colab.research.google.com/drive/16jIz0Qq_jvb7R1s-fIJW804ffahZDHPG?usp=sharing

Here’s what the raw values of the standard curve looked like:

The value of the background is shown as the datapoint on the y-axis. Clearly, the lowest standard dilution loses linearity from the rest of the values, likely b/c a larger fraction of it is due to background signal. Accordingly, if we subtract out the background value from it, we might be able to salvage use of that dilution. That is indeed the case.

Using those values, I’ve now fit a linear model. This will allow us to predict the concentration of a protein sample based on its absorbance, relative to the standard curve. The points and line in red denote the values calculated based on this linear model. The real standard curve values do straddle it pretty well, which is good.

Cool, so what do the experimental samples of unknown concentration look like based on this model? Since Olivia did half-log dilutions of these, we can actually see how the predicted values change based on the dilution tested and its resulting absorbance value.

So for all of the samples, the 33-fold dilution sample was erroneous. Otherwise, 10 uL of undiluted sample in 190 uL BCA working solution, or 10 uL of 3- or 10-fold dilutions of that lysate in 190 uL BCA working solution, gave values that were relatively similar. Taking the geometric mean of those, we calculated values of 1184, 1894, and 2339 ng / uL for these lysates. Seems reasonable enough.

There we go. A primer on the data analysis portion of the BCA assay, or anything else that requires using dilutions of a standard of known concentration to determine the likely concentrations of unknown samples.

Example Barcoded Variant Library Counts

Posted on May 27, 2025May 27, 2025 by kmatreyek

As part of the AVE-ETS, we had been discussing barcoded variant libraries. I don’t quite remember the context, but I think I suggested we look at some real sequencing data of a barcoded variant library, and I offered to dig up the PTEN VAMP-Seq library data. Of course, I had other things to do in the month following this statement so I didn’t look for those files until the long weekend immediately prior to the next meeting. My original plan was to just find this blog post [https://www.matreyeklab.com/simulating-sampling-during-recombination/1175/] and say we use that, but then I realized that those data tables were for variant frequencies, and not barcode counts. All of my old data from my postdoc was on flash drives in the office at work, but I didn’t feel like making the commute in just to for that. I decided to try to find and re-process the raw data uploaded to GEO / SRA.

First, a number of steps that aren’t explicitly coded in this markdown file.

I downloaded the files from the relevant GEO sites. The first dataset from the NatGenet paper can be found here [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108727]. The second dataset from the Genome Med paper where we published on a “fill-in” library we created to reintroduce some missing variants can be found here [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE159469].
I counted the barcodes using the original method we had used, which was just having Enrich2 do it, with a minimum quality score filter of 30.
I imported the data into R to do some analyses here

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

theme_set(theme_bw())
theme_update(panel.grid.minor = element_blank())

first_1 <- read.delim(file = "Data/SRR6437841.tsv.gz", sep = "\t")
first_2 <- read.delim(file = "Data/SRR6437842.tsv.gz", sep = "\t")

first <- merge(first_1, first_2, by = "X")
first$count = rowMeans(first[,c("count.x","count.y")])

ggplot() + scale_x_log10() + #scale_y_log10() +
  geom_histogram(data = first %>% filter(count > 3), aes(x = count)) + geom_vline(xintercept = 100)

## As a rough but effective approach, I like to look at the histogram of read counts to identify where the relative minima between the population containing counts of 1 (largely erroneous barcodes from sequencing error) and the next non-zero population (assuming the sample was sequenced to enough depth). For this sample, since it was sequenced so deeply, this is around a count of 100.

first_filtered <- first %>% filter(count > 100)

#first_key <- read.delim(file = "Data/GSE108727_PTEN_barcodeInsertAssignments.tsv", sep = "\t", header = F)
first_key <- read.delim(file = "Data/first_key.tsv", sep = "\t", header = F)
colnames(first_key) <- c("X","variant")

first_df <- merge(first_filtered, first_key, by = "X", all.x = T)

First_lib_histogram <- ggplot() + scale_x_log10() + #scale_y_log10() +
  geom_histogram(data = first_df, aes(x = count), fill = "grey90", bins = 50) +
  geom_histogram(data = first_df %>% filter(!is.na(variant)), aes(x = count), bins = 50) + 
  geom_vline(xintercept = 100) + 
  labs(x = "Num of reads", y = "Count", title = "Lib1: Grey, all barcodes; Black, subassembled variants") +
  theme(panel.grid.minor = element_blank()) + 
  NULL; First_lib_histogram

ggsave(file = "Output/First_lib_histogram.pdf", First_lib_histogram, height = 4, width = 5)

Some notes for the above plot. The grey bars of the histogram denote counts for all barcodes that were observed. The black bars are barcodes that were linked to a particular PTEN coding variant via PacBio subassembly.

Now, let’s look at the next library.

second_1 <- read.delim(file = "Data/SRR12818211.tsv.gz", sep = "\t")
second_2 <- read.delim(file = "Data/SRR12818212.tsv.gz", sep = "\t")

second <- merge(second_1, second_2, by = "X")
second$count = rowMeans(second[,c("count.x","count.y")])

ggplot() + scale_x_log10() + #scale_y_log10() +
  geom_histogram(data = second %>% filter(count > 1), aes(x = count)) + geom_vline(xintercept = 10)

## As a rough but effective approach, I like to look at the histogram of read counts to identify where the relative minima between the population containing counts of 1 (largely erroneous barcodes from sequencing error) and the next non-zero population (assuming the sample was sequenced to enough depth). For this sample, it's around 10.

second_filtered <- second %>% filter(count > 10)

second_key <- read.delim(file = "Data/Second_key.tsv", sep = "\t", header = T)
colnames(second_key)[2] <- "variant"

second_df <- merge(second_filtered, second_key, by = "X", all.x = T)

Second_lib_histogram  <- ggplot() + scale_x_log10() + #scale_y_log10() +
  geom_histogram(data = second_df, aes(x = count), fill = "grey90") +
  geom_histogram(data = second_df %>% filter(!is.na(variant)), aes(x = count)) + 
  geom_vline(xintercept = 100) +
  labs(x = "Num of reads", y = "Count", title = "Lib2: Grey, all barcodes; Black, subassembled variants") +
  theme(panel.grid.minor = element_blank()) + 
  NULL; Second_lib_histogram

ggsave(file = "Output/Second_lib_histogram.pdf", Second_lib_histogram, height = 4, width = 5)

## I have no idea why it looks bimodal , btw. Probably a prob with library mixing.

write.table(file = "Output/First_df.tsv", first_df, quote = F, row.names = F)
write.table(file = "Output/Second_df.tsv", second_df, quote = F, row.names = F)

For anyone that wants to test this, the github repo for the above scripts and data can be found here: https://github.com/MatreyekLab/Barcodes

Barcoding the epilepsy vector

Posted on May 13, 2025May 13, 2025 by kmatreyek

We have a redesigned attB vector purposefully made to carry and barcode a bunch of epilepsy-associated membrane protein genes (and to a lesser extent, cytoplasmic and secreted protein genes). We’ll eventually need to make a number of barcoded libraries from it, so we’ve been figuring out the kinks of barcoding at a rare-cutter site. I’m realizing that if I want things to scale (whether it’s across labs, or even in the lab), it probably makes sense to make things as easy for the next person to pick up as possible. So, I’m showing my work in terms of how barcoding can be analyzed with this vector.

Alright, so to QC the barcoding vector, we’re trying two things. An initial plasmidsaurus run (quick turnaround, hundreds to low thousands of reads, sufficient to estimate unbarcoded contamination), and then a submission to Genewiz/Azenta/(Formerly Brooks Life Sciences) 2x250nt miSeq Amp-EZ service (2 week turnaround, hundreds of thousands of reads, likely enough to fully analyze small barcoded libraries). I’ll talk about the plasmidsaurus reads on another day.

Today is figuring out how many barcodes seem to exist per prep with the Amp-EZ data.

First was pairing the reads.

% pear -f BC-L062-G2_R1_001.fastq.gz -r BC-L062-G2_R2_001.fastq.gz -o L062_G2

Next was treating the sequence next to the barcode as adapters to identify the barcodes themselves.

% cutadapt -a ATAAGATCTGGTCCTCTGATCCGA...CTATCGGTAACGCATTCGCC -o G2_linked.fastq L062_G2.assembled.fastq

% sh Tally_sequences.sh G2_linked.fastq G2_linked_tally.csv

At this point, I have a csv file, which I imported into R since i'm more nimble there. (Obviously one can do something similar in Python). if the adapters were indeed there, then the resulting read is returned as a 20nt sequence of the barcode. If the adapters weren't there (eg. if the plasmid was unbarcoded, or if it had so many errors in the read that it wasn't identified as the adapter), then it was returned as the full sequence. Thus, I can then subset for reads that were 20nt (barcoded) vs those that were not (unbarcoded), resulting in this following histogram based on that designation.

For AEZ035_G1, 83% of the reads were barcoded, whereas 17% of reads were not. This included 11.5% of the reads being the unbarcoded template. While we can get a bit fancier with things like error-correcting the barcodes (eg. to accommodate for things like sequencing error, which is presumed to result in low counts stemming from true barcodes a small hamming-distance away with much higher counts), for today’s purpose, I’m just going to use a minimum threshold value, such as 7, to distinguish likely true barcodes from the likely erroneous non-barcodes in the “barcoded” subset. This yields a plot like so:

Great, so based on this relatively simple analysis scheme, it seems to indicate that there are ~ 5,400 unique barcodes in this sample. We also repeated this process in another independently derived sample. What does that look like?

For this independent barcoding attempt, we got 92% of all reads being barcoded reads, and the remaining 8% something else (with 3.5% being clearly unbarcoded plasmid)

Based on the same analysis scheme, there are ~ 5,100 unique barcodes in this sample. So despite the slight different in the number of unbarcoded samples using Nidhi’s Gibson barcoding protocol, the total number of unique barcodes seemed to be similar.

Finally, since we have two independent barcoding attempts with a highly diverse oligo (N20 – so a potential diversity of 1.1 trillion barcodes), we would expect to get very little overlap in barcodes between these two dataset. Does that actually play out? Well, here are the actual results: G1 barcoded library only: 5439, G2 barcoded library only: 5095, and the number of barcodes found in both: 34. So, pretty non-overlapping… so that’s good! The vast majority of these overlapping barcodes had reasonably high counts in the G2 library, but counts barely above the threshold filter (9, 10, 11, 12) in the G1 library. Thus, these are likely due to sequencing errors, that can be corrected either by being more stringent, or by taking a more involved barcode error-correcting scheme (perhaps to come in a future blog post).

Plasmidsaurus decision tree

Posted on March 25, 2025March 28, 2025 by kmatreyek

Plasmidsaurus whole-plasmid nanopore sequencing is a fantastic service. As of today (3/25/25), we’ve sent 1357 samples into them(!!). And they’ve essentially been worth every penny. But there are a bunch of different reasons to use them, and I wanted to make sure everyone in the lab was on the same page in terms of reasons that are well justified vs those that are more arguable, so I made this following decision tree (also, this codifies lab policies of sequencing plasmids from unverified sources before working with them). For anyone in the lab; let me know if you think we should make any specific changes to it (I tried to remember what we discussed in lab meeting, but I may have forgotten something).

Pymol figures

Posted on January 8, 2025January 18, 2025 by kmatreyek

I end up having to Google search the commands the relevant commands every time I need to make publication-quality figures in Pymol, so I’m just going to note them here to save myself some time.

set bg_rgb, white

This sets the background to white. Usually safe bet for any image meant for a normal presentation (ie. slides with white backgrounds), or a publication (where it’s text and images against a white page).
set ray_trace_mode, 1

This allows it to have the “black outline” representations, which I think look a little nicer than the default.
ray [Some number, usually between 900 and 1500]

This makes a static fully-rendered image that is much higher quality than the fast render in the interactive interface.

See above for an example of what a resulting image looks like.

When iCasp9 doesn’t kill

Posted on November 11, 2024November 12, 2024 by kmatreyek

iCasp9 as a negative selection cassette is amazing. Targeted protein dimerization with AP1903 / Rimiducid is super clean and potent, and the speed of its effect is a cell culturist’s dream (cells floating off the plate in 2 hrs!). It really works.

But when there are enough datapoints, sometimes it doesn’t. I have three recorded instances of email discussions with people that have mentioned it not working in their cells. First was Jeff in Nov 2020 with MEFs. Then Ben in June 2021 with K562s. And Vahid in July 2021 with different MEFs. Very well possible there’s one or two more in there I missed with my search terms.

Reading those emails, it’s clear that I had already put some thought into this (even if I can’t remember doing so), so I may as well copy-paste what some them were:

1) Could iCasp9 not work in murine cells, due to the potential species-based sequence differences in downstream targets? Answer seems to be no, as a quick google search yields a paper that says “Moreover, recent studies demonstrated that iPSCs of various origin including murine, non-human primate and human cells, effectively undergo apoptosis upon the induction of iCasp9 by [Rimiducid]. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7177583/

Separately, after the K562s (human-derived cells) came into the picture:

This is actually the second time this subject has come up for me this week; earlier, I had a collaborator working in MEF cells note that they were seeing slightly increased but still quite incomplete cell death. That really made me start thinking about the mechanism of iCasp9-based killing, which is chemical dimerization and activation of caspase 9, which then presumably cleaves caspases 3 and 7 to then start cleaving the actual targets actually causing apoptosis. So this is really starting to make me think / realize that perhaps those downstream switches aren’t always available to be turned on, depending on the cellular context. In their case, I wondered whether the human caspase 9 may not recognize the binding / substrate motif in murine caspase 3 or 7. In yours, perhaps K562’s are deficient in one (or both?) of those downstream caspases?

Now for the most recent time, which happened in the lab rather than by email: It was recently brought up that there is a particular landing pad line (HEK293T G417A1) which we sometimes use, that apparently has poor negative selection. John and another student each separately noticed it. Just so I could see it in a controlled, side-by-side experiment, I asked John if he’d be willing to do that experiment, and the effect was convincing.

So after enough attempts and inadvertently collecting datapoints, we see the cases where things did not go the way we expected. Perhaps all of these cases share a common underlying mechanism, or perhaps they all have unique ones; we probably won’t ever know. But there are also some potentially interesting perspective shifts (eg. a tool existing only for a singular practical purpose morphing into a potential biological readout), along with the practical implications (ie. if you are having issues with negative selection, you are not alone).

This is the post I will refer people to when they ask about this phenomenon (or what cell types they may wish to avoid if they want to use this feature).

Spec comparisons

Posted on March 27, 2024March 27, 2024 by kmatreyek

Well, I was going to talk about some of these experiments during lab meeting, but why make a Powerpoint or Google CoLab link people won’t follow when I can write it as a blog post.

Regardless, we’ve recently been looking at how our various possible methods of spectrophotometry compare.

“Amazon Spec” purchased for $235 back in November 2020.
ThermoFisher Nanodrop in a departmental common room (I don’t actually know what kind as I’ve never used it)
BioTek Syngergy plate reader, either… A) with 200uL of bacteria pipetted into a flat-bottom 96-well plate, or B) using their “BioCell”, which is a $290 cuvette that fits onto one of their adapter plates. I mistakenly label this one as “BioCube” in the plots, but they probably should have just named it that in the first place so I don’t feel too bad.

To test the methods, Olivia sampled bacterial optical densities while a batch of e.coli were growing out to make competent cells. Thus, the different densities in the subsequent data will correspond to different timepoints of the same culture growing out. Each time point was measured with all four methods.

Well, all of the methods correlated pretty well, so no method was intrinsically problematic. I’m not sure if the settings for any automated calculation of absorbance values, but the BioCell numbers were just off by an order of magnitude (The BioCell data also had a clear outlier). The Amazon spec and Nanodrop generally gave similar values, although the nanodrop gave slightly higher numbers, comparatively.

The plate reader option was also perfectly fine, although it required more back-end math to convert the absorbance values to actual optical density. This is also not the raw data, as the media only absorbance has to be collected and subtracted to yield the below graph.

Rather than try to figure out the path length and try to calculate the formula, I just used the above dataset to create a calibration for “nanodrop-esque optical density”. (Note: There was a second independently collected set of data I added for this analysis). Here, the goal was to actually use the raw values from the plate reader output, so people could do the conversion calculation on the fly.

Say you have a particular nanodrop-esque A600 of 0.5 in mind. The formula to convert to plate reader units is 0.524 * [nanodrop value] + 0.123, or in this case, 0.385. Checks out with the linear model line shown above.

Or, if you already have raw platereader values and want to convert to nanodrop-esque values, the formula here is 1.79 * [biotekp value] – 0.2 to get the converted value. Here, let’s pretend we have an absorbance value of 0.3, which calculates to a nanodrop-esque value of 0.338. So perhaps that’s a decent raw plate reader value to get with nearly grown bacterial cultures during the chemically competent cell generation process.

Lastly, it’s worth noting how surprisingly large dynamic range there seems to be for spec readings of bacterial cultures. It’s likely largely because we’re used to handling either mid-to-late log phase growth or saturated / stationary cultures, but we’re used to dealing with values in the 0.2 to 1.2 range, although the log-scale plots above suggest that we can be detecting cultures reasonably well within the 0.01 to 0.1 range as well.

HEK cell small molecule toxicities

Posted on December 19, 2023December 19, 2023 by kmatreyek

I’ve now done a *bunch* of kill curves with HEK cells in various forms (WT HEK cells, single or double landing pad cells). Here’s a compendium of observed toxicity of serial dilutions of various small molecules in HEK cells not engineered to be resistant in any way. (This is mostly for my own reference, when I’m in the TC room and need to check on some optimal concentrations).

Quantitating DNA via the plate reader

Posted on July 1, 2023July 3, 2023 by kmatreyek

As another installment of the “demoing the plate reader” series, we’re also trying to see how well it quantitates DNA. Probably good that I’m actively involved in this process, as it’s making me learn some details about some of the existing and potential assays regularly used by the lab. Well, today, we’re going to look a bit at DNA binding dyes for quantitating DNA.

We have a qubit in the lab, which is like a super-simplified plate reader, in that it only reads a single sample at a time, it only does two fluorescent colors, and the interface is made very streamlined / foolproof, but in that way loses a bunch of customizability. But anyway, we started there, running a serial dilution of a DNA of known high concentration and running it on the qubit (200uL recommended volume). We then took that same sample and ran it on the plate reader. This is what the data looked like:

So pretty decent linearity of fluorescence between 1ug to 1ng. Both the qubit and plate reader gave very comparable results. Qubit readings missing on the upper range of things for the qubit since it said it was over the threshold upper limit of the assay (and thus would not return a value). This is reasonable, as you can see by the plate reader values that it is above the linear response of the assay.

So what if you instead run the same sample as 2uL on a microvolume plate instead of 200uL in a standard 96-well plate? Well, you get the above result in purple. The data is a bit more variable, which probably makes sense with the 100-fold difference in volume used. Also seems the sensitivity of the assay decreased some, in that the results became unreliable around 10ng instead of the 1ng for the full volume in plate format, although again, I think that makes sense with there being less sample to take up and give out light.

7/3/23 update:

Just tried AccuGreen from Biotium. Worked like a charm. They suggest mixing in ~200uL of DNA dye reagent (in this case, to 1uL of DNA already pipetted in), but I tried 100uL and 50uL as well, and if anything, 100uL arguably even looked the best.

Also, I just used the same plate as Olivia had run the samples on Friday. So in running these new samples, I ended up re-measuring those same samples 3 days later. And, well, it looked quite similar. So the samples for reading are quite stable, even after sitting on the bench for 3 days.

Oh, and lastly, this is what it looked like doing absorbance on 2uL samples on the Take3 plate. Looks perfectly fine, although as expected, sensitivity isn’t nearly as much as Accugreen. That said, you do get more linearity at really high concentrations (4000 ng/uL), so that’s kind of nice.

Quantitating cells via the plate reader

Posted on June 29, 2023July 1, 2023 by kmatreyek

In moving down the floor and having some more space, we’ll be purchasing a plate reader. Based on previous experience / history, I’ve started by testing the BioTek Synergy H1. We’re going to put it through a number of the more traditional paces (such as DNA and protein quantitation), but we clearly have a bunch of cell-based experiments and potential assays that are worth testing on it. Since I’m curious about some of the possibilities / limitations here, I’ve taken a pretty active role in testing things out.

For these tests, I essentially did a serial dilution of various fluorescent protein expressing cells, to assess what the dynamic range of detection could be. Here, I did half log dilutions of the cells in a 96-well plate, starting with 75,000 cells (in 150uL) for the “highest” sample, and doing 6 serial dilution from there down to 75 cells per well, with a final row with media only to tell us background fluorescence. Everything was plated in triplicate, with analyses done on the average.

For the purposes of not having a ton of graphs, I’m only going to show the background subtracted graphs, where I’ve created a linear model based on the perceived area of dynamic range, and denote black dots showing datapoints that linear model predicts, to see how well it corresponds to the actual data (in color).

The above shows serial dilutions high UnaG expressing landing pad cells. Seems like a decent linear range of cell number -dependent fluorescence there, where we can see perhaps a little more than two orders of magnitude in predictable range. Of course, this is one of the least relevant (but easiest to measure) cases, so not super relevant to most of our experiments.

On the other hand, this is a situation that is far more relevant to most of our engineered cell lines. Here, it’s the same serial dilutions of cells, except we’re looking for histone 2A -fused mCherry. The signal here is going to be lower for three reasons: it’s behind an IRES instead of cap-dependent translation, red fluors are typically less bright than green fluors, and the histone fusion limits the amount of fluorescence to the nuclei, so the per-cell amount of fluorescence is decreased. Here, it was roughly a 20-fold range of the assay down from a confluent well. Maybe useful for some experiments quantitating effects of some pertubation on cell growth / survival (without exogenous indicators!) but obviously can’t expect to reliably quantitate more than that 20-fold effect.

Those are direct cell counting assays (or, well, relative counts based on total fluorescence), but what about enzymatic assays, such as common cell counting / titering assays based on dye conversion. I could certainly see these assays potentially having more sensitivity due to a “multiplier” effect through that enzymatic activity. Well, this is what it looks like for Resazurin / Alamar Blue / Cell Titer Blue.

The above is based on Resazurin fluorescence, where we have a pretty decent 2-log range, although even with these conditions, the confluent wells already saturated dye conversion and lost linearity.

Here, I’m testing a different cell titering dye, CCK-8 / WST-2, which only uses absorbance at A460 as its readout. Here, the linear range of the assay seemed to be a lot less; roughly 1 order of magnitude. I’m not showing resazurin conversion absorbance (A570, A600) here, but it looked pretty similar in overall range as CCK-8.

How about timing for the Resazurin and CCK8 assays? Well, here is what it looks like…. Note, I didn’t do the same linear modeling things, b/c I got lazy, but you can still tell what may be informative by eye:

So clearly, we lose accuracy at the top end but gain sensitivity at the bottom end by having the reaction run over night. This is looking at fluorescence. How about absorbance? That’s below:

Same shift toward increased sensitivity for looking at fewer cells, but same dynamic range window, so we lose accuracy in the more confluent wells as it shifts.

Here, the overnight incubation really didn’t do anything. Same linear range, really.