Chemical transformation efficiencies

We do a lot of molecular cloning in the lab. Standard practice in the workflow is to make your own home-made chemically competent NEB 10-beta bacteria to be used fresh the day of the transformation. This has worked surprisingly well, and we have made ~ 350 different plasmid constructs in the first ~ 600 days). Each time you do the transformation, it’s important to include a positive control (we use 40 ng of attB-mCherry plasmid) to make sure the transformation step was performing properly (to help interpret / troubleshoot what may have gone wrong if you had few / zero colonies in your actual molecular cloning transformation plates). I’ve done this enough times now to know what is “normal”. Thus, especially for new members in the lab, please reference this plot to see how your own transformation results compare to how it has worked for me, where I typically get slightly more than 10,000 transformants (Note: you may get numbers better than me, which is a good thing!).

Edit 2/24/2022: It recently dawned on me that we haven’t necessarily been using a standard number of bacteria per transformation (ie. we typically prep the same amount of bacteria, and that bacteria normally just gets split into however many transformations are being done at the time. Thus, if there are 10 plasmids being transformed, the number of bacteria per transformation is ~ one fifth of the number of bacteria for when two plasmids are being transformed.

Well, since I had a bunch of data recorded for our positive control transformations (40ng of attB-mCherry plasmid), I decided to see what number of bacteria we had used and if there were any patterns in the transformation efficiencies:

So it looks like, depending on the instance, we used anywhere between 300 million to 10 billion bacteria per transformation. Notably, there was no real pattern in the transformation efficiency, were on average, it worked just as well with the half a billion as it was for 5 billion. Thus, it’s likely that the limiting factor at this point is the DNA and the bacteria are in vast excess.

For those curious, I had previously mentioned that 1mL of our e.coli at an OD of 1 corresponded to 5e8 bacterial cells / colony forming units. Thus, for a 180 mL culture at an OD of 0.2, I would do [5e8 cells/mL at OD1] * [0.2 of OD1] * [180 mL] * [1 billion cells / 1e9 cells] = ~ 18 billion cells. So if we were doing 9 transformations with this culture, it would be about 2 billion cells for each transformation.

HEK 293T Landing Pad Dose Response Curves

Positive selection for recombined selection is a pretty useful technique when working with the HEK 293T landing pads, but people might not know what concentrations are best. Well, here is that info, all in one place. In each case, the antibiotic resistance gene had been placed after an IRES element in an attB recombination plasmid, and was linked to mCherry using a 2A stop-start sequence. Thus, the Y axis is showing how well the mCherry+ cells were enriched at various concentrations of antibiotic.

Consistent with the above plots, I generally suggest people use 1 ug/mL Puro, 10 ug/mL Blast, 100 ug/mL Hygro, and 100 ug/mL Zeocin for selections.

Nucleotide biases in primers

We generally talk about degenerate nucleotide positions in synthesized oligo (like primers) as if they are this automatic and even thing, but like everything else, things can be a lot more complicated. We recently ordered some primers with six degenerate positions (so NNNNNN) from ThermoFisher, and they were predominantly filled with T’s, while the same oligo ordered from IDT was a lot more balanced in terms of nucleotide composition. While this was done at really small (Sanger) scale and not quantitative, this reminded me of nucleotide biases I observed in libraries created in the Fowler lab using Inverse PCR (where the degenerate sequence is not hybridizing with a partner strand during initial hybridization, so no degenerate sequence should have a competitive edge over another, at least in terms of hybridization). These were all made with IDT primers. I dug up that slide, and have posted it below:

So in this analysis, you can see that there is a ubiquitous albeit somewhat variable G bias across the libraries. This often resulted in the presence of more glycines than expected. Overall, I don’t think this is ALL that bad; certainly still WAY better than the rampant (like, 86% T) bias with the ThermoFisher primer (When I emailed them about this, they told me “Unfortunately, we do know exactly what the issue is. This is an inherent instrument issue with how automated degeneracies are produced on the existing synthesizer. The only way to fix it from a customer perspective is to order using special instructions to “hand mix” the degeneracy in an optimal ratio to produce as close to a 25% balance as possible.”). So while I’ll still buy regular cloning primers from Thermo (b/c they are the cheapest for me), all degenerate primers will have to go through IDT (or maybe eventually EuroFins, as the Roth lab has noted good luck with them).

5/15/21 update: OK, got some Illumina sequencing back to actually quantitate this, and the ThermoFisher degenerate primer definitely suffered from too many T’s, with ~ 49% of all nucleotides being T’s. Actually, the IDT hand mixed did worse than IDT machine mixed, which had the closest to the target values. Interesting.

Quantitating Western blots

Almost everybody hates doing Western blots since they’re so labor intensive (and somewhat finicky), but they are undeniably useful and will forever have a place in molecular biology / biomedical research. We’re currently putting together a manuscript where we express and test variants of ACE2, which requires some Western blotting quantitation. Since I’m about to do some of the quantitation now, I figured I’d just record the steps I do it so trainees in the lab have a basic set of instructions they can follow in the future.

This already assumes you have a “.tif” file of western blot. While I someday hope to do fluorescent westerns, this will likely be a chemiluminescent image. Hopefully this isn’t a scan of a chemiluminescent image captured on film, b/c film sux. So you presumably took it on a Chemidoc or an equivalent piece of equipment. Who knows; maybe I finally found the time to finish / standardize my chemiluminescent capture using a standard mirrorless camera procedure (unlikely). I digress; all that matters right now is you already have such an image file ready to quantitate.

My favorite method for Western blot quantitation is to use “Image Lab” from BioRad. You know, I like BioRad. And I definitely like the fact that they provide this software for free. Anyway, download and install it as we’ll be using it.

Once you have it installed, start it up and open your image file of interest. The screen will probably look something like this:

First off, stop lying to yourself. By default, the imagine is going to be auto-scaled so that the darkest grey values in your image get turned to black, while the lightest grey values get turned white. But in actuality, the image you see is not going to be the “raw” range of your image, so you may as well turn off the auto-scaling so you see your image for what it really is. Thus, press that “Image Transform” button…

… and get a screen that looks like below:

See where those low and high values are? Awful. Set the low value to 0 and the high value to the max (65535).

And now the image looks like this, which is great, since this is what the actual greyscale values in your image actually look like.

OK, now we can actually start quantitating the bands in the plot. First off, don’t expect your western blot to look 100% clean. Lysates are messy, and you can’t always get pure, discrete bands. Sure, some of the lower-sized bands may be degradation products that happened when you accidentally left the lysates out of ice (you should avoid that, of course). Then again, you may have done everything perfectly bench-wise, and the lower-sized bands may be because you’re overexpressing a transgene, and proteins naturally get degraded, and overexpressed proteins may tax the normal machinery and get degraded more obviously. I say the best thing is to acknowledge that happens, show all your results / work, and make the more educated interpretations that you can. Regardless, in that above plot, we’re going to try to quantitate the density of the highest molecular weight band, since that should be the full-length protein. To do that, first select the “Lane and bands” button on the left.

I then press the “Manual” lane annotation button.

In this case, I know I have 11 lanes, so I enter that and get something that looks like this:

Clearly that’s wrong, so grab the handles on the edges and move the grid such that it actually falls more-or-less on the actual lanes.

Sure the overall grid is pretty good, but maybe it’s not perfect for every single individual lane. The program also lets you adjust those. To do that, click off the “Resize frame” button on the left so it’s no longer blue…

And then adjust the individual lanes so they fit the entire band as your human eyes see them, resulting in an adjust grid that looks like this:

Nice. Now go to the left and select the tab at the top that says “Bands”, and then click on the button that says “Add”.

Once you do that, start clicking on the bands you want to quantitate across all of the lanes. You may have to grab the dotted magenta lines in each lane to adjust them so that the actual band is within them (and presumably somewhere near the solid magenta line which should be somewhere in between them). This is what it looks like after I do that:

It’s good to check how well the bands are being seen by the program. Go to the top and press the “Lane profile” button. It should give you a density plot. This is also the window where you can do background subtraction. Find a number that seems sensible (in this case, a disk size of 20 mm seems reasonable), and make sure you hit the “apply to all lanes” button so it propagates this across lanes. While I’m only showing the picture for lane 5, it’s probably worth scanning across the lanes to make sure the settings are sensible.

Now with those settings fine, close out, and then click on “Analysis table” at the top. Once that is open, go to the bottom and click on “lane statistics”. These should be the numbers you’re looking for.

Now export the statistics (either pressing the “copy analysis table to clipboard” button and pasting in an spreadsheet you want to use, or the “export analysis table to spreadsheet”). The number you’ll be looking to analyze will be those in the “Adj. Total Band Vol” column.

Note: Now that I’m doing this, the “standard curve” button is now ringing a bell. I’m fairly certain that in my PhD work, when I ran a ton of Western blots or just straight up protein gels stained with coomassie, that I would run dilutions of lysates / proteins to make a standard curve of known proteins amounts that I could calibrate the densitometry against. We obviously didn’t do that here, since we didn’t have the space, so these numbers aren’t going to be quite as accurate if we had done so. Still, getting some actual numbers we can compare across replicates is still a major step up than not quantitating and having everything be even more subjective.

Terrific for lentivector growth?

At some point, I was chatting with Melissa Chiasson about plasmid DNA yields, and she mentioned that her current boss had suggested using terrific broth instead of Luria broth for growing transformed bacteria. I think both of us were skeptical at first, but she later shared data with me showing that DNA from e.coli grown in TB had actually given her better yield. I thus decided it was worth trying myself to see if I could reproduce it in my lab.

There are two general types of plasmids we tend to propagate a lot in my lab. attB recombination vectors, for expressing transgenes within the landing pad, and also lentiviral vectors of the “pLenti” variety, which play a number of different roles including new landing pad cell line generation and pseudovirus reporter assays.

I first did side-by-side preps of the same attB plasmids grown in TB or LB, and TB-grown cultures yielded attB plasmid DNA concentrations that were slightly, albeit consistently worse. But I eventually I tested some lentiviral vector plasmids and finally saw the increase in yield from TB that I had been hoping for. Relaying this to Melissa, she noted she had been doing transformations with (presumably unrelated sets of) lentiviral vectors herself, so these observations had been consistent after all.

Thus, if you get any attB or pLenti plasmids from me, you should probably grow them in LB (attB plasmids) and TB (pLenti plasmids), respectively, to maximize the amount of DNA yields you get back for your efforts.

Using SpliceAI for synthetic constructs

I try to be as deliberate as I can be about designing my synthetic protein-coding constructs. While I’ve largely viewed splicing as an unnecessary complication and have thus left it out of my constructs (though, who knows; maybe transgenes would express better with splicing, such as supposedly happens with the chicken ß-actin intron present in the pCAGGS promoter/5’UTR combo), there’s still a very real possibility that some of my constructs encode cryptic splice sites that could be affecting expression. In a recent conversation with Melissa Chiasson (perhaps my favorite person to talk syn-bio shop with), she noted that there is actually a way to use SpliceAI to predict splicing signals in synthetic constructs, with a nice correspondence here showing what the outputs mean. Below is my attempt to get this to work in my hands:

First is installing Splice AI, following the instructions here.

I started by making a new virtual environment in anaconda:
$ conda create --name spliceai

I then activated the environment with:
$ conda activate spliceai

Cool, next is installing spliceai. Installing tensorflow the way I have written below got me around some errors that came up.
$ conda install -c bioconda spliceai
$ conda install tensorflow -c anaconda

OK, while trying to run the spliceAI custom sequence script, I got an error along the line of “Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized…”. I followed the instructions here, and typed this out into my virtualenv:
$ conda install nomkl

Alright, so that was fixed, but there was another error about no config or something (“UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually….”). So I got around that by writing a new flag into the load_module() function based on this response here.

OK, so after that (not so uncommon) struggling with dependencies and flags, I’ve gotten things to work. Here’s the result when I feed it a construct from the days when I was messing around with consensus splicing signals early during my postdoc. In this case, it’s a transcript that encodes the human beta actin cDNA, with its third intron added back in. It’s also fused to mCherry, found on the C-terminal end.

And well, that checks out! The known intron is clearly observed in the plot. The rest of Actin looks pretty clean, while there seems to be some low-level splicing signals within mCherry. That said, the fact that they’re in the wrong order probably means it isn’t really splicing, and I’m guessing the signals are weak and far away enough that there isn’t much cross-splicing with the actin intron.

Oh, and now for good measure, here’s the intron found in transcripts made from the pCAGGS vectors, with this transcript belonging to this plasmid from Addgene encoding codon optimized T7 polymerase.

Nice. Now to start incorporating it into analyzing some of the constructs I commonly use in my research…

Shuttling one ORF into another

I’ve gone over how to make mutations of a plasmid before (since that’s the simplest molecular cloning process I could think of), but there will be many times we will need to make a new construct (via 2-part Gibson) by shuttling a DNA sequence from one plasmid to another. Here’s a tutorial describing how I do that.

First, open the maps for the two relevant plasmids on Benchling. Today it will be “G871B_pcDNA3-ACE2-SunTag-His6” serving as the backbone, and “A49172_pOPINE_GFP_nanobody” serving as the insert. The goal here will be to replace the ACE2 ectodomain in G871B with the GFP nanobody encoded by the latter construct.

Next, duplicate the map for the backbone vector. Thus, find the plasmid in the plasmid under the “Projects” tab in Benchling, right click on it to open up more options, and select “Copy to…”.

Then, make a copy into whatever directory you want to keep things organized. Today, I’ll be copying it into a directory called “Cell_surface_labeling”. A very annoying thing here is that Benchling doesn’t automatically open the copy, so if you start making edits to the plasmid map you already have open on the screen, you’ll be making edits to the *original*. Thus, make sure you go to the directory you had just located, and open duplicated file (it should start with “Copy of …”). Once open, rename the file. At this point, the plasmid won’t have a unique identifier yet, so I typically just temporarily prefix the previous identifier with “X” until I’m ready to give it it’s actual identifier. To rename the file, go to the encircled “i” icon on the far right of the screen, second from the bottom icon (the graduation cap). After you enter the new name (“XG871B_pcDNA3-Nanobody[GFP-pOPINE]-SunTag-His6” in this case), make sure you hit “Update Information” or else it will not save. Hurrah, you’ve successfully made your starting campus for in-silico designing your future construct.

Cool, now to the plasmid map with your insert, select the part you want, and do “command C” to copy it.

Now select the part of the DNA you want to replace. For this construct, we’re replacing the ACE2 ectodomain, but we want this nanobody to be secreted, so we need to keep the signal peptide. I didn’t already have the signal peptide annotated, so I’m going to have to do it now. When in doubt, one of the easiest things to do is to consult Uniprot, which usually has things like the signal peptide already annotated. Apparently it’s the first 17 amino acids, which *should* correspond to “MSSSSWLLLSLVAVTAA”. For whatever reason, the first 17 amino acids of the existing ACE2 sequence is “MSSSSWLLLSLVAVTTA”, so there’s a A>T mutation. It’s probably not a big deal, so I’ll leave it as it is. That said, to make my construct, I now want to select the sequence I want to replace, like so:

As long as you still have the nanobody sequence copied, you can now “Command V” paste it in place of this selection. The end result should look like so:

Great, so now I have the nanobody behind the ACE2 signal peptide, but before the GCN4 peptide repeats that are part of the Sun-tag. Everything looks like it’s still in frame, so no obvious screw-ups. Now time to plan the primers. I generally start by designing the primers to amplify the insert, shooting for enough nts to give me a Tm of slightly under 60*C. I also generally put in the ~ 17nt of homology appended onto these primers, thus. The fwd primer would look like this.

Same thing for the reverse primer, and it would look like this (make sure to take the reverse complement).

To recap, the fwd primer seq should be “ttgctgtTactactgctgttcaactggtggaaagcggc” and the reverse should be “ccgttggatccggtaccagagctcaccgtcacctgagt. Cool. Next, we need to come up with the primers for amplifying the vector. It could be worth checking to see if we have any existing primers that sit in the *perfect* spot, but for most construct, that’s likely not the case. Thus, design forward and reverse primers that overlap the homology segments, and have Tm’s slightly under 60*C. The forward primer will look like this:

And the reverse primer like this:

Hmmm. I just realized the Kozak sequence isn’t a consensus one. I should probably fix that at some point. But, that’s beyond the scope of this post. So again, to recap, the fwd and rev primers for the vector are “ggtaccggatccaacggtcc” and “agcagtagtAacagcaacaaggctg”. But now you’re ready to order the oligos, so go ahead and do this. As a pro-tip, I like to organize my primer order work-areas like this:

Why? Well, the left part can be copy-pasted into the “MatreyekLab_Primer_Inventory” google sheet, while the middle section can be copy-pasted into the template for ordering primers from ThermoFisher. The right part can be copy-pasted into the “MatreyekLab_Hifi_Reactions” google sheet in anticipation of setting up the physical reaction tubes once the primers come in.

Illumina sequencing of barcoded amplicons from the landing pad

Illumina sequencing of barcoded amplicons is going to be large factor in the work we’ll be doing. Going to a completely new place, I now need to explain to people how it works. To keep this post simpler, I’m skipping all of the landing pad details (hopefully you already understand it). Let’s just start with what was genomically integrated after the recombination reaction. In the case of the PTEN library, it looks like this:

As you can see, the barcode is the blue “NNNNNNNNNNNNNNNNNN” region at the bottom of the above image. We can’t directly sequence it with primers directly flanking it, since those sequences will also be present in the unexpressed plasmid sequences likely contaminating our genomic DNA. Thus, we first have to create an amplicon containing our barcode of interest, but spanning the recombination junction (“Recomb jxn” above).

For the PTEN library, the barcode is located in back of the EGFP-PTEN ORF, so we have to amplify across the whole thing. We did this by using a forward primer located in the landing pad prior to recombination (such as KAM499 found in the Tet-inducible promoter and shown in red as the “Forward “Primer), as well as a reverse primer located behind the barcode associated with the PTEN coding region (KAM501 in this case).

The above is a simplistic representation. The forward primer / KAM499 is indeed just the sequence needed for hybridization, since we can’t going to do anything else with this end of the amplicon. On the other hand, we’ll add some more sequence to the reverse primer to help us with the next steps. In this case, this is a nucleotide sequence that wasn’t present before, so that we can amplify this specific amplicon in the next step. The actual amplicon will thus look something like below:

OK, so the above amplicon is huge; too big to form efficient cluster generation. Thus, we’ll now use that amplicon to make a much smaller, Illumina compatible second amplicon. We’ll also include an index of degenerate nucleotides “nnnnnnnn” that can be used to distinguish different amplicons from each other when we mix multiples samples together before doing the actual sequencing step. This second amplicon looks something like this:

This time, the forward primer has both the hybridizing portion (in the blue-purple) as well as one of the cluster generators, shown as that light blue. At the complete other end, you can see the “nnnnnnnn” index sequence in indigo, followed by the other cluster generator sequence in orange. Please note: Here, the index is a KNOWN sequence, like GTAGCTAC, or GATCGAGC. It’s just that each sample will have a different KNOWN sequence,. so it was simpler just to denote it with “n”s for the purpose of this explanation.

There’s more in the above map though. We’ll likely want to do paired sequencing of the barcode, so we’ll give the Illumina sequencer two read primers: read 1 (in red) which reads through the barcode in the forward direction, as well as read 2 (in green), which goes through the barcode in the reverse direction. We will also need to sequence the index, though we’ve tended to only do that with one primer. This primer is Index 1, and is colored in yellow.

For the actual primer sequences and everything, look at Supplementary Table 7 of my 2018 Nature Genetics paper.

Bleed-over on the Nikon scope

Anna was using the Nikon microscope a couple of days ago, and noticed some level of bleed-over of mCherry fluorescence into the CY5 filter-set channel. I previously played around with making a bespoke script for figuring out the optimal fluorescent protein : laser : detector combinations on the two flow cytometers we routinely use at the core here, and figured I could modify the script to understand how our existing Nikon filter sets work with the various fluorescent proteins we use here. It took me a couple hours, but I made a script posted on my GitHub. Here’s a figure that script produces:

Based on the filter sets we currently have, looks like it may be near impossible to avoid bleedover of a near infra-red FP fluorescence into the red channel. Also, this seems to reproduce / explain the effect Anna was seeing with mCherry fluorescence bleedover into the NIR channel. Seems this problem doesn’t happen with mScarlet.I, so maybe I’ll slowly shift toward using that FP more.

Pymol Basic Tutorial

Alright CWRU students; today we will talk about installing PyMol and using it for some basic analysis of protein structure.

1) INSTALLATION

Firstly, download PyMol. CWRU has a subscription, so go to this link and read the license agreement. If you agree, press “I agree”. Then, enter your CWRU SSO info to be logged into the institutional software page. Hit any of the logos in the “PyMOL AxPyMOL v2.4” section (they all redirect to the same place).

As of 10/20/20, they do some weird web-store thing (never seen this before). You’ll have to add the version for whatever platform / OS you’re on, and hit add to cart, and then on the next pop up, hit “Check Out”. It’s confusing, but it’s all free, so this seems to just be a formality. You’ll get an email confirmation, but this email confirmation really doesn’t do much.

You will get to a screen that looks like this though:

First, while you’re still on the above screen, make sure you hit the “Important Notice” link. This will tell you the information you’ll need later for registering. The important notice screen will look like this (just without the blacked out parts).

Now that you know that info, go back to the previous screen (with the big “Download” button), add the version for the platform / OS you’re on, and hit download. You’ll then get to another screen where you’ll have to hit download again.

Now you’re on the registration screen. Type in the info from earlier, and type in your case email address. You’ll get an email to your school email address bringing you to the download link.

I whited out some of the token link, so you’ll have a link that looks much longer than above.

Follow that link and you’ll get to another page that looks like this:

Once you type in your case email address again, you should FINALLY get to the download page, which will look like this:

There’s the lot of links there, but the most important one is the very first link, under “Download PyMOL”. Hitting that should bring you to this page:

You’ll need to do two things here. First, before you forget, download the license file. This is the button on the bottom right. Following the instructions will allow you to download a “pymol-license.lic” file. I like to put it into the same Application folder, so it’s easy to find. As shown in the picture provided by the website, you’ll need this file when you first open up PyMol on your computer.

Next, download the program itself, based on your platform (again, I’m on a Mac). Once downloaded, follow the steps for installing Pymol (for macs, it’s opening up the disk image, and then dragging the Pymol application into your computer Application folder). Double click to open the program, and it will ask about activation. Click the “Browse for License File” button and select that license file you recently downloaded. Voila!, you have successfully navigated the gauntlet of website complexity to achieve your goal. A world of exploring protein structures awaits.

2) USAGE

Now that it’s downloaded and properly linked to the license, you should get a nice blank screen like this.

I could explain what everything is, but it’s easier to just to get started. If you’re on an internet connection, the easiest way to load protein structures is to use the “fetch” command. Essentially 99% of the protein structures you’ll ever want to see will be deposited at the RCSB Protein Data Bank. Go to that website, type in your favorite protein, and see what hits you have. For the purposes of today, I’ll just use a structure I’ve stared at many times over the last 5 years: that of the tumor suppressor protein PTEN. I have the code memorized, and it’s “1d5r”. So, type in “Fetch 1d5r” into either the bottom “PyMOL>_” prompt area, or in the top “PyMOL>” prompt area. Both spaces work / are largely redundant, although the top area actually records what your previous commands were, so I have a slight preference for this area. You should then now see the below protein pop up into your PyMol window.

From here, it’s easiest if you have a two button mouse connected. If you do, you can do a left-click hold and drag to flip the protein around. You can also do a right-click hold and drag to zoom in and out. And finally, you can also hold down either option or command (if you’re on a mac) and left click to actually move the entire molecule around on the screen. if you have a scroll wheel, you may notice that turning it makes part of the protein appear or disappear. This is called “clipping” and will be useful in certain cases where you need just the right picture of part of the protein, but we won’t be getting into this for today.

On the other hand, you may be using a laptop without a two-button mouse. While slightly less ideal, you can still do everything pretty easily as long as you go to the top menu-bar, click and over over “mouse” and click down on “1 Button Viewing Mode”. Now, if you go back to the Pymol window, you can still turn the object clicking and dragging on your trackpad, but you zoom by holding down control while clicking and dragging, and moving the entire molecule around (relative to your frame of view) by first holding down option before clicking and dragging.

The default is a “cartoon” view, which shows the backbone of the peptide, as well as secondary structure with alpha helices shown as helices, and beta strands shown as those flat arrows. I find cartoon views to be the most generally useful, so we’ll keep it like that for now. The small red plusses are water molecules the authors have in their model. I find these to be kind of annoying, so I like to hide them in the beginning, only brining them back in much later when we’re considering how various side-chains may be interacting with water in the medium. To do this, I just type in “hide all” which makes everything disappear, and then “show cartoon”. Now all of the extra molecules aside from the peptide backbone should be gone. In the case of PTEN, there’s a small tartrate molecule that was bound to the active site. It’s worth making this visible. The easiest way to select it is to click the “S” button in the bottom right menubar…

This will call up the sequence at the top of the protein viewer window, below the command-line area. The tartrate molecule is after the entire PTEN sequence, but before the waters. Scroll, there, and click on “TLA” to select it.

The sequence pops up, and default to the very beginning (starting with chain A)
After scrolling to the right, you can see the TLA substrate mimic in the sequence list.

This will make a new selection called “sele”. Now you can tell the program to make a visual representation of the selected “TLA” molecule. One option is to type in “show sticks, sele”. This will be pretty subtle. On the other hand, if you want to be able to see the atoms of the TLA molecule from pretty far, you can instead type “show spheres, sele”. The molecule will then look like this:

Nice. From here, feel free to explore. If you want to select particular residues, it’s helpful to use the “resi” denotation. For example, if we wanted to select residues 124, 129, 130, and 38, you can type in “select important_residues, resi 124+129+130+38′. There will now be a new selection called “important_residues”. To show where these are, you can choose to show these as spheres also (“show spheres, important residues”), and color them a different color to make them easier to see, such as by typing “color yellow, important_residues”. You’ll have something that looks like this.

3) OUTPUTS

So you’re doing doing all of your exploratory analyses, and feel like making an image to put into a presentation. First thing to consider is the background. It defaults to black, but white is usually better for most presentations / paper figures. You can change this either by selecting Display/Background/White from the top menu, or by simply typing in “set bg_rgb, white”.

While you can get a decent picture even with this, you can get much nicer images if you tell the program to render the image first. You can do this by clicking the “Draw/Ray” button at the top right corner of the screen, and fill out the values you want, or you can tell it to do its ray-tracing / rendering from the command line by typing something like ‘ray 900″. After a little bit of waiting, you’ll get a rendered image.

Now you can go to “File/Export image as/PNG…” and export your image file to one of your computer directories. You’ve made a simple, high quality figure. Hurrah! You may want to save your Pymol analysis file by going to “File/Save Session As…”. Thus, if you want to get back to this step without starting from scratch, all you need to do is open this session file again and you’re back to the same spot.

Hopefully that was a useful introduction to some basic operations in Pymol. There’s a ton more you can do with the program, but I’ll leave it here for now.

4) ADDENDUMS

So the above the the most basic things to get you started, but as I interact with trainees here (and see what we need to do for their work), I’ll amend this page with other more specific operations.

A) Show the protein surface.
Cartoon diagrams are great to understand how the protein is structured, but doesn’t give you a sense of what the surface of the protein looks like. That’s where showing the protein surface comes in handy. You can do this quite simply by typing in “show surface, all” (replacing “all” with whatever object name there is). This will create something like this:

The reason for the blue, red, and yellow is that the default coloring scheme in pymol is color carbon atoms green, nitrogen atoms blue, oxygen atoms red, and sulfer atoms yellow. If that’s too distracting, you can just color everything green by typing in “color green, all”.

But what if you want to see both the protein surface as well as the underlaying secondary structure? One approach is to make the surface representation semi-transparent. You can do this by typing something like “set transparency, 0.75”.

B) Selecting atoms near another selection of atoms
This can be a pretty useful feature. For example, you may be curious which residues of a protein is near a substrate mimic molecule. Or maybe the structure shows a protein-protein interaction, and you’re trying to figure out which residues in protein A are in close contact with protein B. Below is a third example, which is figuring out which atoms of a protein are pretty close to water molecules on the surface of the protein.

As described on this page, you can use the “around” command for this. Here’s a series of commands to show the atoms of the protein as spheres, color them all green, and then select only the atoms near water and color them blue:
“show spheres, 1d5r and not resn HOH”
“color green, 1d5r and not resn HOH”
“select near_water, (resn HOH) around 3.5”
“color blue, near_water”