Illumina sequencing of barcoded amplicons from the landing pad

Illumina sequencing of barcoded amplicons is going to be large factor in the work we’ll be doing. Going to a completely new place, I now need to explain to people how it works. To keep this post simpler, I’m skipping all of the landing pad details (hopefully you already understand it). Let’s just start with what was genomically integrated after the recombination reaction. In the case of the PTEN library, it looks like this:

As you can see, the barcode is the blue “NNNNNNNNNNNNNNNNNN” region at the bottom of the above image. We can’t directly sequence it with primers directly flanking it, since those sequences will also be present in the unexpressed plasmid sequences likely contaminating our genomic DNA. Thus, we first have to create an amplicon containing our barcode of interest, but spanning the recombination junction (“Recomb jxn” above).

For the PTEN library, the barcode is located in back of the EGFP-PTEN ORF, so we have to amplify across the whole thing. We did this by using a forward primer located in the landing pad prior to recombination (such as KAM499 found in the Tet-inducible promoter and shown in red as the “Forward “Primer), as well as a reverse primer located behind the barcode associated with the PTEN coding region (KAM501 in this case).

The above is a simplistic representation. The forward primer / KAM499 is indeed just the sequence needed for hybridization, since we can’t going to do anything else with this end of the amplicon. On the other hand, we’ll add some more sequence to the reverse primer to help us with the next steps. In this case, this is a nucleotide sequence that wasn’t present before, so that we can amplify this specific amplicon in the next step. The actual amplicon will thus look something like below:

OK, so the above amplicon is huge; too big to form efficient cluster generation. Thus, we’ll now use that amplicon to make a much smaller, Illumina compatible second amplicon. We’ll also include an index of degenerate nucleotides “nnnnnnnn” that can be used to distinguish different amplicons from each other when we mix multiples samples together before doing the actual sequencing step. This second amplicon looks something like this:

This time, the forward primer has both the hybridizing portion (in the blue-purple) as well as one of the cluster generators, shown as that light blue. At the complete other end, you can see the “nnnnnnnn” index sequence in indigo, followed by the other cluster generator sequence in orange. Please note: Here, the index is a KNOWN sequence, like GTAGCTAC, or GATCGAGC. It’s just that each sample will have a different KNOWN sequence,. so it was simpler just to denote it with “n”s for the purpose of this explanation.

There’s more in the above map though. We’ll likely want to do paired sequencing of the barcode, so we’ll give the Illumina sequencer two read primers: read 1 (in red) which reads through the barcode in the forward direction, as well as read 2 (in green), which goes through the barcode in the reverse direction. We will also need to sequence the index, though we’ve tended to only do that with one primer. This primer is Index 1, and is colored in yellow.

For the actual primer sequences and everything, look at Supplementary Table 7 of my 2018 Nature Genetics paper.

Firmware flaw in recent Stirling SU780XLE -80C freezers

Wow, I never thought I’d learn so much about a freezer company, but here we are. I took a deep dive on this issue with the Stirling SU780XLE ULT freezers. It’s still second-hand (through company reps and people on social media) and I don’t know if I believe everything about the explanations I’ve received (for example, I count roughly 8 instances of freezer firmware getting stuck through various contacts, and I vaguely remember a company rep saying this has happened <= 10 times), but this is my understanding of the situation:

The issue is indeed a firmware problem, and it affects all units produced between ~ Aug 2019 and ~ Sep / Oct 2020. Aug 2019 is when they switched one of their key electronic components to a Beagle Bone (apparently a circuit-board akin to a Raspberry Pi). Part of its job is to relay messages from one part of the circuitry to another. The firmware they wrote for it had a flaw, where — in certain circumstances that the company still does not understand — one part of the relay no longer works, and the other part of the relay just keeps piling up commands that go unexecuted. So that’s the initial issue. There is also supposed to be “watchdog” code that recognizes these types of instances, but this was not working either. Thus, the freezer becomes stuck in the last state it was in before the relay broke. If it was in a “run the engine to cool down the freezer mode”, then it would have been stuck in a state that kept things cold. If it was in a “stay on but don’t do anything b/c it’s cool enough” mode, then it would have been stuck in a state where it didn’t cool the freezer at all. This is the state my freezer was stuck in**.

[** I’m actually not 100% convinced on this. My freezer stopped logging temperatures / door openings, etc at the end of August. If I look at number of freezer hours, it says ~8,000 hrs (consistent with Oct’19 through Aug’20) rather than the ~10,000 hrs for Oct’19 to Nov’20). It is definitely within the realm of possibility that my Stirling has been a zombie for the last 70+ days, and either slowly reached 5*C over time or had a second event over the last weekend that triggered the thaw in its susceptible state.]

It sounded like they had seen numerous freezers get stuck in the former format, which was the less devastating mode since it didn’t result in freezer thawing and product loss. They had seen one freezer get stuck in the catastrophic format before me, back in Aug 20th. They brought it back to their workspace, and couldn’t recreate the failure. They could artificially break the relay to reproduce the condition, allowing them to create additional firmware that actually triggers the “watchdog” (and other failsafes) to reset the system when it has sensed that things have gone wrong, event though they still don’t know what the original cause of the issue is. The reason the freezers produced after Sep / Oct 2020 are unaffected, is that these have already been programmed with the new firmware. The firmware I had when it had encountered the problem was 1.2.2, while it became 1.2.7 after it got updated.

Freezers made / distributed(?) within the last month were pre-programmed with the updated firmware, and are supposedly not susceptible to the GUI freezes. Apparently they’re having trouble updating the firmware in the units b/c the update requires a special 4-pin programming unit that is in short supply due to the pandemic.

I won’t get into the details of my experience with Stirling (it apparently even includes a local rep who contracted COVID). They completely dropped the ball in responding, and they know that (and I’m sure they regret it). What will remain a major stain on this situation is that THEY HAVE KNOWN ABOUT THIS FLAW FOR MONTHS AND DID NOT WARN ANY OF THEIR CUSTOMERS. I received an email ~ 8 days ago saying they were going to schedule firmware updates to “improve engine performance at warmer set points, enhance inverter performance and augment existing functionality to autonomously monitor and maintain freezer operation”. Other customers with susceptible units did not even receive this vague and rather misleading email. My guess is that they chose to try to maintain an untarnished public perception of their company over the well-being of the samples stored by their customers. My suspicion is that their decisions may have been exacerbated by the current demand for -80*C freezers for the SARS-CoV-2 mRNA vaccine cold chain distribution (Stirling has a major deal with UPS, for example), though there is no way I will ever confirm that.

After my catastrophic experience, they bungled their response, and only jumped to action after I tweeted about my experience. I really wanted to like this company, as they are local and not one of the science supply mega-companies (eg. ThermoFisher). My fledgling lab is still out almost $3k in commercial reagents, and many of my non-commercial reagents and samples were compromised. They did make a special effort to update my firmware today and answer my questions, but I still can’t help to feel like a victim of poor manufacturing and service. All of the effort I’ve put in the last few days was to get to some answers and help others avoid the same situation I was put in.

I’ll post any updates to this page if I learn any more, but I’m now satisfied with my understanding of what happened. Now back to some actual science.

Stirling -80C Freezer Failure

I’m getting really tired wasting time and brain-power on this, but unlike buying regular consumer goods (like the items on Amazon with hundreds to thousands of reviews) buying and dealing with research equipment is subject to really small sample sizes, so the more information that’s out there the better. Thus, I’ll keep this page as a running log of my experience with Stirling’s XLE Ultra Low Temperature (aka. -80*C) freezer.

TL;DR -> My 1 year-old freezer failed in the most catastrophic way: the firmware froze and displayed -80*C while the contents slowly thawed as it had reached 5*C by the time I noticed it wasn’t working. No alarms, as the firmware had crashed and was frozen (again, displaying -80*C the whole time). While I’ve had no issue with their mechanics, I suspect their firmware is potentially critically flawed.

Part 1) Discovering that the freezer had failed: I purchased a Stirling Ultracold SU780XLE, a little over a year now (purchased ~ October 2019), shortly after I started up my lab at CWRU. I’ve been in labs that had poor experiences with the ThermoFisher TSU series freezers, and the reviews for the Stirling seemed pretty good on twitter. Furthermore, CWRU has a rebate program with Stirling due to their energy efficiency, and probably also because they are local (they are based in Ohio).

I went into the lab last Sunday evening (Nov 8) to do some work. I went to retrieve something from my the Stirling -80*C, and saw that the usual ice on the front of the inner doors were gone. I opened up the inner doors and looked at the shelves, and there was water pooled on every shelf. I looked at some of the most recent preserved cryovials of cells we had temporarily stored on one of the shelves, and they were all liquid. Things had clearly thawed inside the freezer. I closed the outer door and looked at the screen at the top, and it was displaying -80*C. The screen is actually a touchscreen, so I tried to flip through its settings, but it was completely unresponsive to my touch. It became pretty clear to me in that moment that the freezer firmware had crashed with the screen displaying -80*C. Ooof.

The picture I took of the frozen screen, timestamped Sun, Nov 8, 7:25pm.

I pulled the freezer out from the wall, found the on/off switch, and switched it to OFF. The first time, I actually flipped the switch too soon to ON, as the screen never reset. I’m guessing there must be some short term battery / capacitor that allows the freezer to keep running with momentary interruptions in power. So I then set it to OFF, waited for the screen to go blank, and then set it back to on. After booting up, the screen displayed 5*C. So there we go. It was indeed stuck on that screen, and rebooting the firmware showed it to show the real temperature again. Which is a VERY BAD real temperature.

The picture I took of the screen after resetting the freezer, timestamped Sun, Nov 8, 7:28pm.

I immediately emailed Stirling (email timestamped Sun, Nov 8, 7:37 PM). I received a response from a customer service representative Mon, Nov 9, 8:01 AM saying “I’m sorry to hear that you are having issues.” and that they were referring me to the service dept. Got an email from the Stirling service department Mon Nov 9, 8:39 AM asking for more information and a picture of the device’s service screen. I replied to this email with all requested information Mon, Nov 9, 10:43 AM. I got an email telling me I was “Incident-7576” on Mon, Nov 9, 11:00 AM. Complete radio silence from them as of writing this section of this post, which is ~ 72 hours later (Thurs, Nov 12, ~ 11:00 AM), even after I sent them a pretty strongly worded email yesterday at 6:00 AM. I’ll follow up on my continued experience interacting with the company in section 3 of this post.

Otherwise, the mechanics for the freezer seemed to be fine. It look me about an hour to mop up all of the water, and look through my boxes to see what had thawed (which was everything except the 15ml conicals, which seemed to have enough mass to them to have not fully thawed). I was still very aggravated and in a bit of shock to have had to deal with this, but still went about my work. Two hours later, the freezer was back down to -30*C. The next morning, it was back at -80*C. So the reset was clearly sufficient to make the freezer operational* again. ( *since it presumably still encodes the same firmware glitch which caused the problem in the first place).

Part 2) Taking stock of my lost items and forming my interpretation of what happened: Over the next couple of days, I had a chance to take stock of everything I had lost during the thaw. Being a new lab (and thus with a ~ 1 year old freezer) we didn’t have a ton of items in there, but they were not inconsequential. The commercial reagents were largely competent bacterial cells, which amounted to ~ $2,110 of lost material. There were also ~ $720 worth of chemicals, which upon freeze thaw cycles, are of somewhat questionable potency, and will likely need to be purchased again before use in publication. There were also dozens of cryovials of cell lines made in house. There were also a few cryovials of cells, dozens of tubes of patient serum, and viral stocks for SARS-CoV-2 research either given by other labs or provided by BEI resources, which would need to be replaced as we have no backups. While there is no monetary value associated with these reagents, the amount of work-time used in creating them and now replacing them is a major loss.

As a scientist, I think it’s natural for me to try to synthesize all the information I have to piece together what happened. There was no power loss (it was a sunny weekend without any storms, and no other equipment in the lab had any aberrant behavior). Nobody had gone into it for any extended amount of time, especially since it was over the weekend. The last time I had gone into it was Friday afternoon, when it seemed fine. That said, it is very well possible it had already crashed at that time. I don’t think I can visually tell the difference between a freezer at -80*C, -40*C, or maybe even -10*C. Frozen looks frozen. In lieu of any alarms or temperature readings provided by the freezer itself, the only visual clue was going to be water from the thawed ice in the freezer, which by that point was going to be too late.

To see if I could figure out when the freezer may have crashed / failed, I tried going back into the freezer log. This is all the information I could glean from the freezer:

So, uh, that history feature wasn’t all that informative, but still a couple of points I could glean from looking at it.
1) It goes from -80*C in the data points directly preceding the event, to being > 0*C to when I restarted it. So it completely stopped logging during the event. This is entirely consistent with the software having crashed, and the reason it was still showing -80*C on the screen while it had thawed.
2) Uhhhh. I can’t actually figure out what day and time it failed b/c it had apparently logged its most recent operation as August 26th. Clearly it wasn’t August 26th when it had failed, since August 26th was 72 days before Fri, Nov 6, which was the last time I had looked in the freezer before the event, when it was clearly still completely frozen. Weirdly, I didn’t have to tell it what day it was after I reset it, so it must have had an internal clock that knew it was Nov 8th upon the reset. So here’s another indication of there being something glitchy with their firmware.

Ironically, I had a separate low-temperature thermometer plugged into it TraceableLIVE® ULT Thermometer, Item#: LABC3-6510, which really isn’t a bad thermometer, but it eats up batteries and I ran out of disposable AAA batteries (I don’t think it takes a wall plug, which it should also do so it only needs to use the batteries during power-outage situations!), so I was waiting for some rechargeable AAAs to come in from Amazon. TBH, they had already come in a week or two ago, but the freezer was operating perfectly fine until this so it wasn’t high up on my to-do list to charge and replace the batteries and get the secondary thermometer up and running again. In hindsight, a very naive and critical mistake!

Part 3) Stirling’s response to this:

Thurs, Nov 12, 11:00 AM: So far, it’s been pretty nonexistent. I wrote them an email yesterday (Nov 11) saying 1) Everything I’ve seen is telling me this is a catastrophic failure of the freezer itself, so are you going to take responsibility for it? 2) I’m still quite worried about the freezer’s operation, since the glitch that caused this has not been addressed. I’m yet to get any non-automated response from them past the most recent email on Nov 9, 11 AM.

Thurs, Nov 12, ~ 5:00 PM: Tweeting about my experience seemed to have escalated things, as I got two phone calls. The first was from the technician handling my case (“Incident-7576”), who asked if anyone had been in touch with me about scheduling the fix on the previous Monday and Tuesday. I said no, this is the first response I had gotten. I also pointed out hat I had emailed him yesterday with some questions. Apparently he had not seem the email. So, a rather poorly managed customer and technical service response.

As soon as I got off the phone, the VP of Global Services called me (this is where I think the tweets likely made a difference). Provided apologies (as expected), but I also got to ask for answers to my specific questions. Here are things I learned:
1) “We’re not responsible for sample loss”. So they won’t cover anything that you lose if the freeze fails and thaws, even if it was in the most spectacularly bad way completely due to flaws in freezer design or production that torpedoes its operation.
2) The mechanics are covered for 7 yrs, but the material and labor warranty is only for 2 yrs. This includes things like “door handles and electronics”, with electronics clearly being the most relevant item here. They offered to extend this warranty to 3 yrs. I don’t think I’m unreasonable to feel like that is a pretty weak gesture based on the freezer failing the way it did.
3) I’ve had people tell me I should ask for a refund to get it replaced. Well, they don’t do that.
4) Apparently there are three parts to their firmware. One of them is called the “Beagle Bone”, which they said is responsible for making the real-time connection between the freezer settings and the parts. Quick google search suggests it’s something like this.

The saga continues. Let’s see what the technicians tomorrow say.

Fri, Nov 13th: Causing a stir on twitter apparently kicked things into action. I also put my detective hat on and I think I figured out what was going on. Too much to bury way down here, so I made a new post.

Chemiluminescent images with standard cameras

I work with proteins, so I’ve done Western blots throughout my career. Originally that meant using film and developers, and later with imagers. Imagers are way better than having to deal with film, so as soon as I knew I was going to start up a lab, I started looking at various imagers and quoting them out. Even the most basic imagers with chemiluminescent capabilities quoted in the $24-27k range. But then it dawned on me….. are these imagers nothing but kind of old cameras with a light-proof chassis and dedicated acquisition and analysis software? During my stint in Seattle, I dabbled with taking some long-exposure photography of stars in my parents back yard. Perhaps I could do something similar for taking images of blots?

I had bought an Olympus E-PM2 16.1MP mirrorless camera for $320 back in 2014. While I used it a decent amount at first, I eventually stopped using it as often as I started using my smartphone for quicker snaps, while using Anna’s Nikon DSLR with an old telephoto lens for more long-distance pictures. So, with the E-PM2 now not doing much at home, I figured I’d bring it in and try it with this. I cut out a hole in the top of a cardboard box I could stick the camera into. I dug up the intervalometer I had used for those long-exposure photos of the sky. Nidhi had been doing some western blots recently, and had kept her initial attempts in the fridge, which was good since I could just grab one of those membranes instead of running and transferring a gel just for this. I kept it in some anti beta-actin HRP antibody I recently blot, washed it, and exposed.

OLYMPUS DIGITAL CAMERA

Above is something like a 5 minute exposure. So my cardboard box wasn’t perfectly around the sides, so there’s a decent amount of light bleeding in. I had the blot lifted up within the blot on a metal pedestal (some heat-blocks that weren’t being used), so the blot itself is actually pretty free from being affected by the bleed-over light. Notably, the beta-actin bands are blue! Which makes sense, as if you’ve ever mixed bleach with luminol, you see a flash of blue light. Furthermore, if you google “hrp luminol nm”, you see that the reaction should emit 425nm light (which is in the Indiga / violet range). Notably; this would be a difference between my regular use Olympus camera, which is a color camera, with cameras you’d normally encounter on equipment like fluorescent cameras, which are normally black-and-white.

I had actually been playing around a bit with image analysis in python over the last week or so (to potentially boot up a automated image analysis pipeline). That work reminded by that color images are a mixture of red-green-blue. Thus, I figured I could isolate the actual signal I cared about (the chemiluminescent bands) from the rest of the image by keeping signal in the blue channel but not the others. So I wrote a short python script using the scikit-image, matplotlit, and numpy libraries and ran code to isolate only the blue image and convert it to greyscale, and to invert it so the bands would appear dark against a white background.

To be honest, the above picture isn’t the first ~ 5-minute exposure I mentioned and showed earlier. Knowing this seemed to be working, I started playing around with another aspect that I thought should be possible, which was combining the values from multiple exposures to make an ensemble composition. The reason for this being that a single large exposure might saturate the detector, making you lose quantitation at the darkest parts of the band. I figured why couldn’t one just take a bunch of shorter exposures and add them up in-silico? So I took five one-minute exposures. The above image is the inverted first image (with an exposure of one minute).

And the above image here is what it looks like if I make an ensemble plot from 5 separate 1-minute exposures. With it now effectively being a “longer exposure” (due to the combining of data in silico), the signal over the background has been improved, with no risk of over-saturating any of the detectors.

So while I’m sure there are many suboptimal parts of what I did (for example, the color camera may have less sensitivity for looking at chemiluminescent signals), it still seemed to have worked pretty well. And it was essentially free since I had already had all of the equipment sitting around unused (and would have cost < $400 if I had to buy them just for this). And also gave me a chance to look under the hood of this a bit, practice some python-based image analysis, and prove to myself that I was right.

Bleed-over on the Nikon scope

Anna was using the Nikon microscope a couple of days ago, and noticed some level of bleed-over of mCherry fluorescence into the CY5 filter-set channel. I previously played around with making a bespoke script for figuring out the optimal fluorescent protein : laser : detector combinations on the two flow cytometers we routinely use at the core here, and figured I could modify the script to understand how our existing Nikon filter sets work with the various fluorescent proteins we use here. It took me a couple hours, but I made a script posted on my GitHub. Here’s a figure that script produces:

Based on the filter sets we currently have, looks like it may be near impossible to avoid bleedover of a near infra-red FP fluorescence into the red channel. Also, this seems to reproduce / explain the effect Anna was seeing with mCherry fluorescence bleedover into the NIR channel. Seems this problem doesn’t happen with mScarlet.I, so maybe I’ll slowly shift toward using that FP more.

Pymol Basic Tutorial

Alright CWRU students; today we will talk about installing PyMol and using it for some basic analysis of protein structure.

1) INSTALLATION

Firstly, download PyMol. CWRU has a subscription, so go to this link and read the license agreement. If you agree, press “I agree”. Then, enter your CWRU SSO info to be logged into the institutional software page. Hit any of the logos in the “PyMOL AxPyMOL v2.4” section (they all redirect to the same place).

As of 10/20/20, they do some weird web-store thing (never seen this before). You’ll have to add the version for whatever platform / OS you’re on, and hit add to cart, and then on the next pop up, hit “Check Out”. It’s confusing, but it’s all free, so this seems to just be a formality. You’ll get an email confirmation, but this email confirmation really doesn’t do much.

You will get to a screen that looks like this though:

First, while you’re still on the above screen, make sure you hit the “Important Notice” link. This will tell you the information you’ll need later for registering. The important notice screen will look like this (just without the blacked out parts).

Now that you know that info, go back to the previous screen (with the big “Download” button), add the version for the platform / OS you’re on, and hit download. You’ll then get to another screen where you’ll have to hit download again.

Now you’re on the registration screen. Type in the info from earlier, and type in your case email address. You’ll get an email to your school email address bringing you to the download link.

I whited out some of the token link, so you’ll have a link that looks much longer than above.

Follow that link and you’ll get to another page that looks like this:

Once you type in your case email address again, you should FINALLY get to the download page, which will look like this:

There’s the lot of links there, but the most important one is the very first link, under “Download PyMOL”. Hitting that should bring you to this page:

You’ll need to do two things here. First, before you forget, download the license file. This is the button on the bottom right. Following the instructions will allow you to download a “pymol-license.lic” file. I like to put it into the same Application folder, so it’s easy to find. As shown in the picture provided by the website, you’ll need this file when you first open up PyMol on your computer.

Next, download the program itself, based on your platform (again, I’m on a Mac). Once downloaded, follow the steps for installing Pymol (for macs, it’s opening up the disk image, and then dragging the Pymol application into your computer Application folder). Double click to open the program, and it will ask about activation. Click the “Browse for License File” button and select that license file you recently downloaded. Voila!, you have successfully navigated the gauntlet of website complexity to achieve your goal. A world of exploring protein structures awaits.

2) USAGE

Now that it’s downloaded and properly linked to the license, you should get a nice blank screen like this.

I could explain what everything is, but it’s easier to just to get started. If you’re on an internet connection, the easiest way to load protein structures is to use the “fetch” command. Essentially 99% of the protein structures you’ll ever want to see will be deposited at the RCSB Protein Data Bank. Go to that website, type in your favorite protein, and see what hits you have. For the purposes of today, I’ll just use a structure I’ve stared at many times over the last 5 years: that of the tumor suppressor protein PTEN. I have the code memorized, and it’s “1d5r”. So, type in “Fetch 1d5r” into either the bottom “PyMOL>_” prompt area, or in the top “PyMOL>” prompt area. Both spaces work / are largely redundant, although the top area actually records what your previous commands were, so I have a slight preference for this area. You should then now see the below protein pop up into your PyMol window.

From here, it’s easiest if you have a two button mouse connected. If you do, you can do a left-click hold and drag to flip the protein around. You can also do a right-click hold and drag to zoom in and out. And finally, you can also hold down either option or command (if you’re on a mac) and left click to actually move the entire molecule around on the screen. if you have a scroll wheel, you may notice that turning it makes part of the protein appear or disappear. This is called “clipping” and will be useful in certain cases where you need just the right picture of part of the protein, but we won’t be getting into this for today.

On the other hand, you may be using a laptop without a two-button mouse. While slightly less ideal, you can still do everything pretty easily as long as you go to the top menu-bar, click and over over “mouse” and click down on “1 Button Viewing Mode”. Now, if you go back to the Pymol window, you can still turn the object clicking and dragging on your trackpad, but you zoom by holding down control while clicking and dragging, and moving the entire molecule around (relative to your frame of view) by first holding down option before clicking and dragging.

The default is a “cartoon” view, which shows the backbone of the peptide, as well as secondary structure with alpha helices shown as helices, and beta strands shown as those flat arrows. I find cartoon views to be the most generally useful, so we’ll keep it like that for now. The small red plusses are water molecules the authors have in their model. I find these to be kind of annoying, so I like to hide them in the beginning, only brining them back in much later when we’re considering how various side-chains may be interacting with water in the medium. To do this, I just type in “hide all” which makes everything disappear, and then “show cartoon”. Now all of the extra molecules aside from the peptide backbone should be gone. In the case of PTEN, there’s a small tartrate molecule that was bound to the active site. It’s worth making this visible. The easiest way to select it is to click the “S” button in the bottom right menubar…

This will call up the sequence at the top of the protein viewer window, below the command-line area. The tartrate molecule is after the entire PTEN sequence, but before the waters. Scroll, there, and click on “TLA” to select it.

The sequence pops up, and default to the very beginning (starting with chain A)
After scrolling to the right, you can see the TLA substrate mimic in the sequence list.

This will make a new selection called “sele”. Now you can tell the program to make a visual representation of the selected “TLA” molecule. One option is to type in “show sticks, sele”. This will be pretty subtle. On the other hand, if you want to be able to see the atoms of the TLA molecule from pretty far, you can instead type “show spheres, sele”. The molecule will then look like this:

Nice. From here, feel free to explore. If you want to select particular residues, it’s helpful to use the “resi” denotation. For example, if we wanted to select residues 124, 129, 130, and 38, you can type in “select important_residues, resi 124+129+130+38′. There will now be a new selection called “important_residues”. To show where these are, you can choose to show these as spheres also (“show spheres, important residues”), and color them a different color to make them easier to see, such as by typing “color yellow, important_residues”. You’ll have something that looks like this.

3) OUTPUTS

So you’re doing doing all of your exploratory analyses, and feel like making an image to put into a presentation. First thing to consider is the background. It defaults to black, but white is usually better for most presentations / paper figures. You can change this either by selecting Display/Background/White from the top menu, or by simply typing in “set bg_rgb, white”.

While you can get a decent picture even with this, you can get much nicer images if you tell the program to render the image first. You can do this by clicking the “Draw/Ray” button at the top right corner of the screen, and fill out the values you want, or you can tell it to do its ray-tracing / rendering from the command line by typing something like ‘ray 900″. After a little bit of waiting, you’ll get a rendered image.

Now you can go to “File/Export image as/PNG…” and export your image file to one of your computer directories. You’ve made a simple, high quality figure. Hurrah! You may want to save your Pymol analysis file by going to “File/Save Session As…”. Thus, if you want to get back to this step without starting from scratch, all you need to do is open this session file again and you’re back to the same spot.

Hopefully that was a useful introduction to some basic operations in Pymol. There’s a ton more you can do with the program, but I’ll leave it here for now.

4) ADDENDUMS

So the above the the most basic things to get you started, but as I interact with trainees here (and see what we need to do for their work), I’ll amend this page with other more specific operations.

A) Show the protein surface.
Cartoon diagrams are great to understand how the protein is structured, but doesn’t give you a sense of what the surface of the protein looks like. That’s where showing the protein surface comes in handy. You can do this quite simply by typing in “show surface, all” (replacing “all” with whatever object name there is). This will create something like this:

The reason for the blue, red, and yellow is that the default coloring scheme in pymol is color carbon atoms green, nitrogen atoms blue, oxygen atoms red, and sulfer atoms yellow. If that’s too distracting, you can just color everything green by typing in “color green, all”.

But what if you want to see both the protein surface as well as the underlaying secondary structure? One approach is to make the surface representation semi-transparent. You can do this by typing something like “set transparency, 0.75”.

B) Selecting atoms near another selection of atoms
This can be a pretty useful feature. For example, you may be curious which residues of a protein is near a substrate mimic molecule. Or maybe the structure shows a protein-protein interaction, and you’re trying to figure out which residues in protein A are in close contact with protein B. Below is a third example, which is figuring out which atoms of a protein are pretty close to water molecules on the surface of the protein.

As described on this page, you can use the “around” command for this. Here’s a series of commands to show the atoms of the protein as spheres, color them all green, and then select only the atoms near water and color them blue:
“show spheres, 1d5r and not resn HOH”
“color green, 1d5r and not resn HOH”
“select near_water, (resn HOH) around 3.5”
“color blue, near_water”

Ordering primers

This is going to be a very lab-specific instructional post, since it’s literally only going to talk about how we normally order primers here in the lab.

So as I mentioned in this previous post, at least for us here at CWRU, primers are cheapest when ordered from ThermoFisher. There’s an unfortunate delay on how long you have to wait until you can receive the primers, but for the most part it isn’t a big problem. Regardless, I’ve essentially streamlined how we order and organize primers as much as possible, so here are the steps.

Any permanent member of the lab should have their own tab (titled with their initials) in the “MatreyekLab_Primer_Inventory” sheet. I also have my own tab, which is called “KAM”. Every primer you order should have its own unique identifier (like KAM2199). Since you’re the only one adding to your own tab, there is no reason you should ever have duplicate identifiers. You want to do this step first, so you know what identifier to give the primer you intend on ordering now (The most recent primer I ordered was KAM3495, so the primer I’ll order today is KAM3496).

I usually keep a separate “scratch” google sheet where I write down things I’m in the process of doing. Here, I like to make a set of fields like this (I usually just copy-paste from the last time I ordered primers, actually). But, if you don’t already have one of these sheets already, I just made a “Scratch_area” tab in the google sheet.

Here, I like to list a few things. There’s an area I intend to later copy-paste into the primer inventory, which lists the date, the length of the oligo (using the =LEN() function of the actual primer sequence), the primer identifier, the actual primer sequence, and then a short description for what the primer is for.

Next, I like to have another section which repeats a subset of the information, but in a slightly different order. The reason for this is that the ThermoFisher website requires the primer ordering sample sheets to have the data entered in a specific order (sequence, identifier, Name of person ordering). Since this is the same information as in some of the previous columns, I just have those cells point to each of the relevant cells so that I don’t need to fill in that same info again.

If you’re ordering primers for sequencing, then you’re all set. But if you’re ordering primers for some cloning, I like to also write a short section that writes down which primers are supposed to be paired with which other primers to make a particular plasmid, also listing the template plasmid that will be used for the reaction. It’s nice to write this info now, so you don’t have to try to remember everything / redo the effort you just did to remember how you need to pair things to actually perform the PCR when you receive the primer. I usually copy-paste these cells into the “MatreyekLab_Hifi_Reactions” google sheet, into an area below where I have written “** denotes primers we may not have yet“.

So next is actually ordering the primers. There is a folder in the lab google drive where all of the primer order sheets are kept (MatreyekLab_GoogleDrive/Primer_order_templates). I usually just find a recent file, duplicate it, and rename it for today’s date and whoever is ordering (eg. KAM).

Once I copy-paste the new info in, it should look like this in the end.

K, great. Now to actually order it. Log into our thermofisher account by going to www.thermofisher.com, hitting “Sign In” at the top right. Lab account email is “[email protected]” and the password is the lab password (which you should already have memorized!). Then I hover over “Popular” at the top left, and click on “Oligos, Primers, & Probes”.

Then hit “Order now” under “Custom made to order”

Then I always do the “Bulk Upload” button:

Next hit “choose file”:

Then click on the samplesheet you want to upload. Your primer should now be entered.

Cool, so now is actually checking out. Hover over the cart and then click ‘View cart & check out”.

Next is a series of somewhat annoying confirmation screens. Scroll down and hit “Begin Checkout”

Default settings are fine, so scroll down and hit “Continue to Payment”

Default settings (including the speed type) are still fine, so scroll down and hit “Continue to Review Order”

You’re almost done! Now scroll down, click on the “You agree to the thermofisher.com terms and conditions…” and then click “Submit Order”.

You should now see a screen that says “Thank you for your order!” at the top. The “[email protected]” account will also have received a confirmation email (but you don’t need to check this or anything). You are FINALLY done. Log out of the account and go on with your day.

Setting up a shared Github repo for an RStudio project

As I work with more trainees as they join the lab, it could be helpful to have a single RStudio GitHub repo that we share so we can work on the same project / analysis script. Here are a series of steps we can do to set this up.

Someone first has to initiate the repo. Log into your account on GitHub (make an account if you don’t already have one). Make a new Repo. If you’ve been added to the lab, then make a new repo within the lab. Add some kind of description, click “Add a README file”, and then hit “Create Repository”.

Here’s the blank repo that should now exist:

Now go into RStudio (which you should have already installed), and go to File -> New Project.

Click on Version Control:

And then click “Git”:

Next you’ll need to enter in some info. Add in the web address fo the repo you just made and want to link with your RStudio, make sure you’re happy with the project name, and then enter in where on your computer you want the project to be located.

Cool, now you have a very blank looking project.

Now let’s start adding things to the project. For example, let’s make an R Markdown file where you will start making the analysis script. To do this, go to File -> New File -> R Markdown…

A new screen will pop up. Add whatever details you want, and hit “OK”.

The file isn’t saved, so save it in the same directory as your Rproj and Readme file, giving it whatever name you want to give it (probably the same name as the repo).

If you use the default settings, you’ll get a generically prepopulated RMarkdown file. Get rid of everything but the first chunk, since you’ll just reuse the first chunk. Add in some generally useful initial code as I have in the picture below:

So you’ve already made some changes to the repository than when you first created it, since you 1) Made an R Proj file and 2) Made a R Markdown file. We can now send these changes back to the repo. Go to the Git tab on the top right panel…

Click on the two things you want to update (in this case, the Rproj and Rmd files), and hit Commit.

The next screen should be the commit screen. If you click on the items, it should tell you what’s different from the previous version. New things should be green (since these files are new, the whole thing is green). Write a message to generally describe what’s different with this new set of committed files, and then hit “Commit”.

You’ll get an output screen, like this:

This essentially means you approve of the changes you made, and you’re now ready to send it back to the repo (cloud). The commit screen is now different, and this time hit “Push”.

You should now get a different commit output screen, which says something like this, which means you sent it back to the master branch on the repo:

You can check to see if the repo was indeed changed by going back into your web-browser where you had logged into GitHub and had made the new repo. If you look, you’ll see that the new files are there.

Great. So the idea now is that you can now make changes to the contents of this repo, and send it back into the Cloud so everyone can now see your updated versions. For best performance, you’ll want to hit “pull” before you start working on something (to make sure you have the most up-to-date version of the repo), and once you’ve done your work, hit “push”, so other people can see and work with what you’ve done.

If you’re not the person who first initialized this repo, then you’ll have to follow a similar but slightly different series of steps (You won’t need to make the repo yourself). This resource was super helpful in helping me figure out how to do this, and will likely be quite helpful for you too!

Pseudovirus Infection Assays

TL;DR -> If you’re doing infection assays, it’s worth sticking to < 20% EGFP positive, and depending on how many cells you count, > 0.03% EGFP positive cells.

Can’t say I expected it, but the SARS CoV-2 pandemic has brought virology back into my sights. This includes some familiar tools, such as lentiviral pseudovirus infection assays which I had used a lot of in graduate school. It’s the same basic process as using lentiviral vectors for cell engineering, except you’re changing out the transgene of interest for a reporter protein, like EGFP. If the pseudovirus is able to (more-or-less) accurately model an aspect of the infection process, then you now have yourself a pretty nice molecular / cell biology surrogate to learn some biology with. Essentially, it tells you how efficiently the virus is getting into cells. Since all of the “guts” can look like HIV, you can us it to study parts of the HIV life cycle after it’s entered the cell but before its integrated into the host cell genome (this is what my PhD was in). It’s arguably more common use is instead as a vehicle for studying the protein interactions that various viruses make to enter the cell cytoplasms.

The loveliness of the assay (especially the fluorescent protein version) is that it is inherently a binary readout (infected or uninfected) at a per-cell level. But, since there are *MANY* cells in the well, it becomes a probability distribution, where the number of infected cells captures how efficient the conditions of entry / infection were.

But of course there is also some nuance to the stats here. For example, if you mixed a volume containing 100,000 cells with another volume containing 100,000 viral particles (and thus a multiplicity of infection (MOI) of 1), 100% of the cells don’t get infected. In actuality, you would only get about 63% of cells getting infected. That’s b/c each virus isn’t a heat-seeking missile capable of infecting the only remaining uninfected cells around; instead, it’s because each infection event is (*more or less) an independent event. Thus, cells can be multiply infected. Notably, based on an EGFP readout, you can’t really figure that out (ie. cells that are doubly infected are essentially indistinguishable from singly infected cells based on their fluorescence levels). But, since we know how the system should function, we know that it should largely follow a Poisson distribution.

Here’s a plot I made with this script to demonstrate what that relationship is between the MOI and the % of green cells you should expect to see:

Things look nice and linear at low MOIs, but as soon as you start getting close to an MOI of 1 then the number of multiply infected cells start taking on a bigger role, until eventually every cell becomes multiply infected as the last few “lucky” uninfected cells get so overrun that there is no escaping infection. In terms of trying to get nice quantitative data, that saturation of the assay is kind of problematic. Generally speaking I find somewhere about 20-30% of GFP positive cells where you’re really started losing that linearity (shown by the dotted line).

Why not just always do small numbers, like 1% or even 0.1%? Well, b/c you’re kind of caught between a rock and a hard place: I guess the rock is the unbreakable boundary of 100% infected cells, while if you go too low it gets hard to sample enough events to accurate quantitate something that’s really rare.

I do think there’s still a lot of meat in that assay though, as long as you count sufficient cells. I generally quantitate the number of infected + uninfected cells using flow cytometry. Being able to count 10,000+ events a second mean you can run through all of the cells of a 24-well in under a minute. That means you can pretty reasonably sample 100,000 cells per condition. Assuming that’s the case, how does our ability to accurately quantitate our sample drop as the number of infected cells fall?

IMO, it begins to become unusable when you get to roughly 0.03 percent GFP positive cells. That’s because past that, the amount of variability you get in trying to measure the mean is roughly about the size of the mean itself. Furthermore, your chances for encountering replicate experiments where you get *zero* GFP positive cells really starts to increase. And those zeros a pretty annoying, since they are quite uninformative.

So just understanding the basic stats, it looks like my informative range is going to be ~ 0.03 percent positive to ~ 30 percent positive. So roughly 3 orders of magnitude. Ironically, that just about exactly what I saw when I ran serial dilutions of a SARS CoV pseudovirus on ACE2 overexpressing cells.

So there you have it. The theoretical range of the assay, backed experimentally by what I saw in real life. The wonderfulness of science. You can find the script I used to these these estimations here.