Designing Amplicon-EZ primers

Amplicon-EZ is a pretty convenient service from Genewiz. In short, they’ll perform Illumina sequencing on a 150-500 nt DNA fragment you send them (they’l perform 2 x 250 cycles of sequencing, so fragments smaller than 500nt will have paired read regions). For $50 per sample, they’ll return ~50,000 reads (although in our experience, they tend to return more than this). Turnaround times can be kind of slow (while one can minimize the delay if you time things perfectly, it’s taken between 14 and 19 days to get data back following submission). That said, we’re still only running full kits a couple of times a year, so obviously a lot faster turnaround than that. Thus, definitely good for getting an initial look into something you may want to sequence more deeply later. My general policy for that lab is that if you make any library, it’s worth submitting the library to Illumina sequencing via Amp-EZ pretty early on so you can be confident that the library is good and worthy of further experiments.

Designing primers

Primers are pretty simple to design. Essentially, you’ll want to make a pair of PCR primers with Amp-EZ adapters on the 5′ ends (and of course, DNA hybridizing sequences on the 3′ ends). As shown in the above link, the adapter sequences are:

For the forward sequencing read: 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’

For the reverse sequencing read: 5’-GACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’

Here’s an example of a map and corresponding annotated primers used to sequence the ACE2 Kozak library plasmid, and a similar map for sequencing the library after its been integrated into landing pad cells.

Amplifying your fragment

For the actual protocols to do this, it’s probably worth asking Sarah or Nidhi how they do it. The basic steps are going to be PCR, gel extraction of the band, and Qubit quantitation of the extracted DNA. Some things to keep in mind are that they’ll want a fair amount of DNA (500 ng), so you’ll either want to make sure you do a lot of cycles and extract a pretty hefty band, or you’ll need to do a second amplification from the initially extracted DNA.

Cell surface localization assay

About the same time I got inspired to try making the vesicular protease assay, I figured I’d also build an assay to try to look at cell-surface localization of proteins.

Going back to that post about dL5 and malachite green (MG), I believe that MG doesn’t cross the the plasma membrane particularly well (Although now that I look at it, there are both ways to increase and decrease it’s cell permeability). Thus, I figured I’d test the dL5 fluorogen activating peptide in two forms; one where it’s sent out to the cell surface using a signal peptide (but anchored to the plasma membrane with a transmembrane domain), and another where its expressed as an intracellular, cytoplasmic protein.

Well, we made the constructs, and Sarah recombined the cells and tested them, and here are the results.

Well, ignore the blue distribution for now (since this is a construct testing a different hypothesis), and only look at the red and orange distributions for now. As you can tell, in the absence of MG, the signal is pretty low (I should probably through in some unrecombined landing pad cells on that plot to show what the background level is). On the other hand, MG addition causes cells encoding the extracellular dL5 to exhibit ~ 3e4 near infrared MFI, while the intracellular dL5 cells had about 1e4 MFI. So while that’s only a 3-fold difference, the standard deviations of those distributions were pretty tight, such that there was rather small overlap between those two distributions. So while this is a single, one-off experiment, it looks like this assay format may work.

Probably some additional knobs that can be turned to try to improve signal over background. First, is maybe this effect is somewhat MG concentration dependent, and reducing the amount of MG that is added may add some more dynamic range. Also, there are those less cell permeant MG derivations, which will likely improve the range (albeit, these are likely far harder to get than OG MG)

Vesicular protease cleavage assay

Obviously we make a lot of recombinant DNA constructs and create a bunch of different assays to try to understand biology. Well, at some point I became curious if I could come up with a protease cleavage assay for measuring how various peptide sequences could be substrates for vesicular proteases inside cells. I figured we’d start to test this by looking at furin cleavage, since 1) furin is quite ubiquitous, 2) furin cleavage is relevant for multiple disease-relevant membrane proteins, like many viral entry glycoproteins, and 3) including SARS-CoV-2 spike, where the furin cleavage site in that protein is thought to contribute to its dynamics of spread and pathogenicity during infection.

Well, the first construct that may have worked is a pretty simple one, where I have EGFP targeted to extracellular space using an N-terminal signal peptide, but retained on the cell surface by having it C-terminally fused to a transmembrane domain. Anh has actually been using this construct as part of his undergraduate research project. Well, to turn this construct into one that can study Furin cleavage, I modified it to encode a R-R-A-R peptide between EGFP and the transmembrane domain. Thus, if Furin cleaves this construct, the EGFP protein is now no longer tethered to the cell, and presumably escapes into the media once it reaches the cell surface.

So this is an N=1 experiment so far, so it’s not the most conclusive. That said, there does seem to be decent separation between the construct with and without the furin cleavage site, where cells with the construct with the EGFP that potentially gets released had ~ 10-fold less fluorescence than the construct that could get released. I suppose if this reproduces, it could be worth trying to turn this into a library-based experiment for studying protease cleavage within intracellular compartments.

… Though now that I’m thinking about it further, I’ll definitely need to ask how these cells were prepped. If they were prepped with trypsin (which is likely), this was likely more of a trypsin cutting assay rather than a furin one.

Illumina Library Mixing

It’s taken us a while, but at this point we’ve now submitted two multiplex NextSeq Illumina kits through CWRU genomics. We’ve performed the multiplexing by appending dual indices during targeted amplification of the barcodes. In order to mix everything together, we’ve taken a rather simple approach of gel extracting the amplified bands and quantitating the amount of extracted DNA using Qubit, and mixing every extracted amplicon to equimolar amounts. This mixture is then submitted to the CWRU genomics core for qPCR-based quantification, and loaded onto the kit for sequencing.

Since everything has been mixed to equimolar amounts so far, it’s quite simple to see how well our samples were quantitated and mixed together using the Qubit readings. This is done by taking the number of paired reads associated with each pair of indices, and seeing how their counts / frequencies compare to each other. The distributions tended to be a log-normal distribution, as seen in the below plots:

The second kit seemed to do a little better than the first one. Each kit ended up having a singular poorly sequenced outlier, with the sample in the first kit lower than the median by about ~300-fold, and the sample in the second kit lower by about 10x. The first kit also had about 4 samples that were below the median by about 10-fold.

Regardless, this log-normal distribution is showing what kind of mixing precision we can expect to consistent get with this scheme, even when its working well. Both distributions had a sd(log) of ~0.5. The coefficients of variation were ~ 0.04.

What are the red lines you ask? Well, those are the idealized / hypothetical number of reads we should have gotten if we got all of the reads from the kit (130 million for the mid kit used in kit 1, and 400 million for the high kit used in kit 2) divided by the total number of samples. Clearly, the real life read numbers aren’t hitting that idealized number, which is as expected.

To frame it another way though, sure having read a sample with too much read-depth in inefficient, but what is arguably more annoying is if end up under-reading a sample because we were off in out estimates. What seems like a reasonable approach is to choose a target read-count that will ensure that ~95% of our log-normal distribution hits that minimum read-number needed to get interpretable data. Looking at these n = 2 results, it seems reasonable that we would want to give each sample ~ 5 to 10-fold more reads than one would expect based on idealized numbers.



Generating new indices for multiplex Illumina sequencing

Nidhi recently needed to design new reverse primers for making dual-indexed amplicons for Illumina sequencing, but this involved generating novel indexes that could be used with existing primer sets. Luckily, I had already asked Anh to create a script for such a purpose, so once I remembered that he had already done that, all I had to do was find it and implement it. It’s pretty neat, since it goes to a Google Sheet in the cloud (so not a local file), and extracts the existing indices so we know to avoid things similar to them. Since it was a pretty easy script to understand, I ended up making some tweaks to it: 1) Adding a user input function, rather than hard-coding the number of new indices needed into the script itself, 2) Having the script consider the forward and reverse sequences of the existing indices to avoid both, and 3) spitting out an identity matrix of the pairwise comparisons of the existing 10 nucleotide indices, so I can visually verify that none are really close in sequence.

The script lives here in this GitHub repo, with the newest version I modified called “generateIndex_user_input.py”.

To be specific, rather than an identity matrix, it’s actually a distance matrix (how many positions within the 10 nucleotides are NOT identical), with the white diagonal in the above plot essentially a positive control since that’s showing that in the pairwise comparisons, the same sequence when compared to it self shows 0 nucleotides of difference. Notably, there are no other purely white blocks in that matrix, partially b/c 1) the chance that two randomly generated sequences are identical is ~ 1/(4^10), which is a very small number, and 2) Anh’s script is likely working, and keeping the number of identical matches between any newly generated indices and existing indices to a minimum. Actually, it looks like most differ by 7 or 8 nucleotides, and only a tiny fraction only differ by 3 nucleotides or so (and thus match at 7 positions). If those are the combinations with the highest similarity, it really isn’t bad at all.

So ya, I think we’re doing a decent job of managing our indices to make sure they are not overlapping!

Fluorogen activating peptides

So I had collected this data over 10 months ago, when I made an initial foray into trying the fluorogen activating peptides dL5 in the presence of Malachite Green, a dirt cheap small molecule (You can use it to try to kill fungus eating at your fish in your aquarium). A recent email reminded me about fluorogen activating peptides again, so I went back and looked at this data again, and it really isn’t bad.

In summary. 1) Works well, giving ~ 100-fold increase to fluorescence from background when highly expressed. 2) Probably want to do at least 500 ng/mL or perhaps even 1ug/mL to improve signal over background. 3) Still get specific signal after washout, but you may as well leave it on and get max signal.

Posters

So a couple of the trainees in the lab are preparing posters for the Dept retreat. In helping them (and other future trainees) get started, I dug out some of my own digital versions of posters out of the digital storage closet, and put them in a “Poster_examples” directory on the lab google drive. But along with those examples, here are a series of tips that I’ve learned over the years, which I’m remembering and committing to writing in this process.

  1. Be cognizant of the size of your poster. I think the most common dimensions are 36″ high by 48″ wide. For some conferences, they will allow you to make larger posters (probably depends on what type of poster boards they have). But also, different poster printers may have different dimensions. I’m sure most do 36″ or 48″ as one of their standard sizes, though you also want to figure out sizes your poster printer will do as one of its standard dimensions.
  2. Know how long it takes to print the poster. There are logistics to everything. Where are you going to get the poster printed? How far ahead of the guaranteed printing deadline do they need the submission? Again, it’s worth knowing this up front so you’re not scrambling at the end. Also, getting started a little bit early can definitely save you some misery later on.
  3. Plan for column-based viewing rather than rows. IMO, it makes much more sense to tell the story in a series of columns, where the viewer starts on the left of the poster, reads down the column, takes a step to their right, reads down the column, and so forth 3 or so times until they’ve seen the whole poster. Probably generally easier to read this way rather than horizontally through rows, and probably the ONLY way to see / go through a poster when it’s really crowded.
  4. Plan out how you’re going to talk about your poster as you’re making it. What normally happens in a poster session, is that you stand around all awkward and lonely in front of your poster until someone comes by that seems interested, and then you’ll introduce yourself and offer to explain it (and you can ask them who they are and what their background is so you can figure out what parts they may be particularly interested in or may struggle in understanding). Then you’ll explain this story, starting at the top left of your poster and winding down to the bottom right. Now, what would really help in telling this story is if the things on your poster actually illustrates what you’re trying to say. So, think about what exactly is the story you’re saying as you’re making the poster, so it’s all there when you need it.
  5. Always start by presenting the problem. It’s easy to get sucked into just saying what you’ve did. But nobody is going to understand why you did something until you tell them WHY you’re doing it. Which means there must be a problem presented, and your data is part of the solution in answering it. But if you want people to be on board, you definitely want to spell out the problem, and why it’s so important, up front.
  6. Do a light (white) background. The backing paper is going to be white. So why force them to cover the entire surface of the paper with ink?
  7. Consistency in fonts / font sizes. This is one of the hard ones. If you’re essentially making a scrapbook of different figures from different sources, it may be hard to get all of the fonts from the figures. In those cases, you may need to crop out the existing labels / text and add in your own of known / controllable size. Also, this is a good reason to always use the same fonts (eg. Arial) in all of your figures.
  8. Text size hierarchy. I hadn’t really thought about this before, but I think this makes sense: For the main text of the poster, it probably makes sense to have about 2 different text sizes: one for the main narrative text you may want the reader to read, and a smaller size for text that holds some detailed information (like a figure legend). The larger one is always meant to be read, while the smaller one is only to be read when necessary. The larger one should be able to be read a good couple feet away, so it’s definitely gotta be >= size 20 or something. Headers / section titles may also make sense too, and you can denote those using bold or by using a slightly larger text size. Of course, this is all not counting the poster title, which should be large and visible from across the room. Of course, you don’t want to overcomplicate things by having like 5 different font sizes.
  9. Not stretching / squishing figures. Nobody likes seeing images in weird proportions; sure, maybe it works with mirrors in a circus funhouse, but it definitely doesn’t work for conveying scientific information in a professional setting. So ya, make sure that if you’re adjusting a figure’s size, that you have the “height and width proportionality” box checked as you do it.
  10. Avoid having too much text. Nobody is at the poster to read the next great American novel. You’ll literally want the bare bones amount of text that you need to give just about all of the big structural information you need to. Everything else, like all of the details, you can convey in person.
  11. Don’t make it too crowded. Again, you want it to be inviting and easy to follow, so you want to give figures ample space to breathe. I’d also say don’t make it too sparse, but while it’s technically possible, I can’t imagine any serious poster has ever had that be an issue (eg. too much stuff to want to get in there).
  12. Consider expanding your illustrative palette. If you’re on your first few posters, it totally makes sense to use Powerpoint (or equivalent) to create your poster. That said, eventually, you may want to include other options. I’ve slowly shifted to using Inkscape, but partially since I’m much better / faster in Inkscape now than I am in Powerpoint. Also, BioRender is a great option for making some really aesthetically pleasing images / graphics relatively quickly with an easy-to-use interface. At worst, I suggest you generate some plots there, and just take screenshots that you can insert into your posters made in Powerpoint.

Also, I realize some of my own posters probably break some of these rules (eg. the “Avoid having too much text” rule). Nobody’s perfect!

Addendum:

Nisha just did the legwork on figuring out how we print posters here, and this is what she found out:
– FedEx office in Thwing (9am-5pm Monday thru Friday).
– General size that they work with is 36″x48″ , so that might be what you want your poster size to be.
– You can have it on a usb drive, or email it to them directly at [email protected]
– If submitted early enough in the day, they may be able to do same day turnaround, but better to submit 2 to 3 days before you need it just in case.
– Will update this once we know how payment works exactly, but I suggest showing up knowing one of the lab speedtypes to charge directly to one of them (R35 is a good candidate, but can always use the startup speedtype if needed).

Inverse PCR SSM Primer Python Script

Time to make our first concerted effort in making a small site-saturation mutagenesis library. Last time I did this in any deliberate way was with the PTEN library. We used the original Jain et al protocol for doing this with inverse PCR followed by DNA phosphorylation and ligation. One of the first Python scripts I ever really wrote (back in 2016) was to generate the list of primers we needed to order to make this library. You can find the script here.

Running it should be pretty easy: If on a mac, you literally have to open “Terminal” or an equivalent Bash terminal, go to the directory with the script and write “Python 160421_PTEN_SSM_30mer.py”.

Now, if you’ve never run a script for bioinformatic -type things before, then you may not already have biopython installed. In that case, trying to run it will likely throw an error. If that’s the case, I suggest you follow the directions here to install biopython (potentially in a virtual environment). Obviously, you aren’t going to want to make SSM primers for making a PTEN library like in the script, and will instead want to make primers for making a SSM of your gene of interest in a different plasmid (ie. with different sequences flanking your gene of interest). You can literally make those changes to the script by opening the “.py” file with a basic text editor (like TextEdit), or you can use a slightly more souped up text editor like “Sublime Text”. Or you want a slightly more complicated but perhaps better long-term experience, you can install Spyder or PyCharm and make the changes in an Integrated Development Environment (IDE). Either way, you’ll want to replace the pretty self-explanatory PTEN-library specific sequences that are currently written into the script.

Some additional considerations:
1. As I noted, this is for a ligation-based inverse PCR protocol. For all future libraries in the lab, I suggest we take a Gibson / In Vitro Assembly -compatible format, where instead of a blunt ended ligation product, the primers will create a DNA product with ~18 nucleotides of homologous sequence on the ends. Instead of writing this script myself, I’m leaving it up to future students to make that script by modifying my above script. Once we create this, I’ll perhaps post it here.
2. Once we come up with the list of primers, we can order them as machine-mixed primers from IDT (and not ThermoFisher, like our standard primer orders), since I’ve recently discovered that degenerate oligos from ThermoFisher have a pretty problematic T-bias.

NSF GRFP at CWRU

One of the hardest things to navigate at an institution are the convoluted layers of policies and administration. It’s still hard for me now as faculty, and must be exponentially harder as a trainee.

For any CWRU students in School of Medicine labs / departments planning to submit an NSF GRFP application, feel free to contact me to hear what I’ve learned about how things work here; it may save you a lot of misery.