Command line BLAST

One of the pseudo-projects in the lab requires looking for a particular peptide motif in genomic data. While small scale searches can be done using the web interface, the idea is to do this in a pretty comprehensive / high throughput manner, so shifting to the command line makes sense for this work. I last did this back in 2018 for some preliminary studies, so I’m going to have to re-install the software on my new computer and re-run some of those analyses. I figure I’ll write down my notes as I re-do this, so that I (and others) can use this post as a reference.

Installing BLAST+

The instructions on how to download the program can be found here. I’m on a mac, so I downloaded “ncbi-blast-2.13.0+.dmg” and double clicked and ran the package installer.

Assuming it’s been correctly installed, writing the command …

blastp -task blastp-short -query <(echo -e ">Name\nAAWLIEKGVASAEE") -db nr -remote -outfmt 1

… into the terminal should actually reveal some BLAST-specific output, rather than throw an error.

Running protein motif-specific blast searches

Type in the following into your terminal:

psiblast -phi_pattern PHI-Blast_2A_pattern.txt -db nr -remote -query <(echo -e ">Name\nGATNFSLLKQAGDVEENPGP") -max_hsps 1 -max_target_seqs 10000 -out phi_blast_output.csv -outfmt 10

Note: The above command will require having a text file specifying the pattern constraint (“PHI-Blast_2A_pattern.txt” above), which can be found here. This should yield a 25 KB file csv output, like so.

Extracting just the accession numbers

I don’t remember if there are other BLAST+ outputs that give you the full hit sequence. If so, the method I ended up taking back in 2018 would seem to be unnecessarily roundabout. But, until I figure that out, I’ll follow the old method. As you can see in the aforementioned output format, it doesn’t output the hit protein sequence, and instead just gives the accession number. Thus, the next step is using the accession number to actually figure out the protein sequence. To do this, we’ll use Entrez Direct. To install Entrez Direct, follow the instructions here. Briefly, type in the following into the terminal:

sh -c "$(curl -fsSL ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh

In order to complete the configuration process, execute the following:

echo "source ~/.bash_profile" >> $HOME/.bashrc
echo "export PATH=\${PATH}:/Users/kmatreyek/edirect" >> $HOME/.bash_profile

OK, now that it’s installed, here’s how I’ve used it:

First, the output file above has more info than the accession number. To have it pare down to only the accession number, I used this script, which can be run by entering the following into the terminal, assuming you have the previous output csv file somewhere in the directory with the script (can even be in other folders within that directory):

python3 3_Blast_to_accession.py

This will create a file called “3A_prot_accession_list_complete.txt” (example output file here) which will be the unique-ified list of accession numbers to give to Entrez Direct. (Uniquifying is important if you have multiple .csv outputs you wanted to compile into a single master list).

This can be fed into Entrez Direct using this shell script, which you can run by typing in:

sh 4_Accession_to_fasta.sh

You should now have an output file called “4A_prot_fasta.txt” with the resulting protein sequences in fasta format, like so.

Now you can search for your desired sequence (in its full protein context) within the resulting file.

To be continued…

Are there other steps in this process related to this project? Sure. Like what do you do with all of these full sequences containing the hits? Well, that’s beyond the scope of this post.

ODs on the spec and nanodrop

So there are two ways to measure bacterial culture ODs in the lab. The first is to use the nearby ~ $10,000 Thermofisher Nanodrop One (no cuvette option). The second option is to use a relatively cheaply made cuvette-based spectrophotometer I bought off of Amazon for ~ $100. To make it clear, this comparison is not a statement about the value of a Nanodrop (though I will say that having an instrument like a Nanodrop is essentially a must in a mol biol lab). This is more about if the Nanodrop is already being used by someone and waiting would get in the way of some bacterial speccing timepoints, can I purchase a $100 piece of equipment to relieve such a conflict? Especially for bacterial cultures, where volume isn’t really an issue and the measurement is simply the reading at 600 nm, not even requiring some algebra to make a conversion to more practical units (like ng/uL for DNA).

So to do this comparison, over a number of independent instances, I took the same bacterial culture and put 1mL into a cuvette and ran it on the old spec, and took 2 uL and put it on the Nanodrop pedestal and measured there. I made a table of the results, and graphed it in the plot below.

So the readings on the two instruments certainly correlate (that’s good), although it’s not an exact 1:1 relationship. In fact, the nanodrop gave numbers roughly 1.5 times higher than the spec. But if the two instruments give two different readings, then the question becomes “which is right?”

And to that, I essentially say there is no right answer. Each is a proxy for bacterial cell density (ie. Billions of bacteria / mL), but there’s no “absolute” information encoded in the OD number that tells us that specifically for our bacteria, and we’d still have to come up with a conversion factor either way (ie. my doing limiting dilutions of specc’d cultures and counting colonies), and once we have that, both will be right with that context. Sure, it would be nice if we had a method that was the most in-line with whatever ODs that were being described by various papers in the literature, but who knows what they used (recent papers may be using ODs from the nanodrop [with some perhaps using the cuvette option but many others not], while the older publications certainly didn’t have and instead likely used some old-school form of spec). But even that’s going to be heterogeneous, and will only give limited information anyway.

Well, good record-keeping to the rescue. We’ve transformed the positive control plasmid enough times to sample a range of various ODs just by chance, to see if certain bacterial ODs correlate with transformation efficiency. And boy, there’s been a whole lot of nothing there so far (which is actually quite notable; see below).

(FYI: I don’t remember which instrument I used to measure the OD A600 readings. Probably mostly the old spec, tho).

So yea, I’ve generally used cultures with ODs at the time of collection between 0.1 and 0.45, and they’ve collectively given me transformation rates of ~ 20,000 using our standard “positive control” plasmid. So there seems to be a pretty wide window of workable ODs. But generally speaking, I see no issue with having a culture of 0.1 to 0.4 OD as measured with either machine for use with chemical transformation.

Setting up a hybrid lab meeting

Both due to child-care and pandemic reasons, our originally 100% in-person lab meeting for a time was 100% remote and for the last few months have been 100% hybrid. For overall accessibility reasons, I’ll likely have hybrid remain the default option, and only not bother to set up the Zoom when it’s clear absolutely everybody is going to be in attendance in-person. Over time, I think I’ve better learned how I should be setting up the hybrid lab meeting, and I figure I’d write down the steps here so I can remember (and anybody else can do so if they’re setting things up).

Standard flexible format (ie. Nobody needs presenter mode)
Here, the laptop plugged into the projector is providing the sights and sounds of the conference room, but is simply serving as a “viewer” of the slides in Zoom. Here, I’ll assume this is being done with the common lab laptop (Kenny’s old laptop from 2019), although anyone’s laptop should work.

  1. Plug in the 360 degree camera into the common lab laptop. If you also want to use a different external microphone (ie. if you don’t want to use the microphone associated with the 360 degree camera), then plug that in now too.
  2. Log into the lab meeting on Zoom. Confirm that the right camera and microphone are selected. Make sure the sound is up to the maximum, and that this computer remains unmuted.
  3. Using the USB-C adapter, hook up the laptop to either the projector or the wheeled TV. The adapter will allow for connecting to the projector with the existing VGA cord, or the TV with an HDMI connection.
  4. Make sure the Zoom screen is showing on the projector / TV screen.
  5. To actually use this setup, the idea will be that 1) anybody with their own computer can log into the lab meeting Zoom and share their desktop or window with the presentation (powerpoint file or google slides, for example), or 2) if it’s someone without their own computer, they can use Kenny’s or Anna’s computer to screen share (assuming the presentation file is somewhere easily accessible, like the “Lab_meetings” directory of the lab Google Drive).
    Note: While these computers are being used to share the slides, all sound (input and output) should be happening from the common lab computer connected to the projection device.

One speaker / Longer-form talk where someone needs presenter mode
The main difference here is that the laptop presenting the slides has to be plugged into the projector / monitor, and is thus not simply a “viewer” in the Zoom call. Here, I’ll assume this is being done with the common lab laptop, a

  1. Plug in the 360 degree camera into the common lab laptop. If you also want to use a different external microphone (ie. if you don’t want to use the microphone associated with the 360 degree camera), then plug that in now too.
  2. Log into the lab meeting on Zoom. Confirm that the right camera and microphone are selected. Make sure the sound is up to the maximum, and that this computer remains unmuted.
  3. Open the presentation file. Windows may get much harder to navigate once connected to the second screen, so you may as well get everything set up beforehand.
  4. Using the USB-C adapter, hook up the laptop to either the projector or the wheeled TV. The adapter will allow for connecting to the projector with the existing VGA cord, or the TV with an HDMI connection.
  5. Now, go to Zoom and hit “Share screen”. Probably makes sense to choose the screen with the presentation on it, though it doesn’t really matter at this point since you can adjust it later.
  6. Once the screen is sharing, go to the slide / presentation software you’re using. Assuming it’s Powerpoint, then hit “presenter view”.
  7. If the wrong screen is showing on the projector / TV monitor, then hit swap displays in the Zoom panel until it does.
  8. Now, if the wrong screen is being shared on Zoom (ie. people in the Zoom call are saying they’re seeing your presenter view), then hit the “Share screen” button again and choose the correct screen to cast.

That should do it!

Designing Amplicon-EZ primers

Amplicon-EZ is a pretty convenient service from Genewiz. In short, they’ll perform Illumina sequencing on a 150-500 nt DNA fragment you send them (they’l perform 2 x 250 cycles of sequencing, so fragments smaller than 500nt will have paired read regions). For $50 per sample, they’ll return ~50,000 reads (although in our experience, they tend to return more than this). Turnaround times can be kind of slow (while one can minimize the delay if you time things perfectly, it’s taken between 14 and 19 days to get data back following submission). That said, we’re still only running full kits a couple of times a year, so obviously a lot faster turnaround than that. Thus, definitely good for getting an initial look into something you may want to sequence more deeply later. My general policy for that lab is that if you make any library, it’s worth submitting the library to Illumina sequencing via Amp-EZ pretty early on so you can be confident that the library is good and worthy of further experiments.

Designing primers

Primers are pretty simple to design. Essentially, you’ll want to make a pair of PCR primers with Amp-EZ adapters on the 5′ ends (and of course, DNA hybridizing sequences on the 3′ ends). As shown in the above link, the adapter sequences are:

For the forward sequencing read: 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’

For the reverse sequencing read: 5’-GACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’. <- Reverse strand on plasmid map

Here’s an example of a map and corresponding annotated primers used to sequence the ACE2 Kozak library plasmid, and a similar map for sequencing the library after its been integrated into landing pad cells.

Amplifying your fragment

For the actual protocols to do this, it’s probably worth asking Sarah or Nidhi how they do it. The basic steps are going to be PCR, gel extraction of the band, and Qubit quantitation of the extracted DNA. Some things to keep in mind are that they’ll want a fair amount of DNA (500 ng), so you’ll either want to make sure you do a lot of cycles and extract a pretty hefty band, or you’ll need to do a second amplification from the initially extracted DNA.

Illumina Library Mixing

It’s taken us a while, but at this point we’ve now submitted two multiplex NextSeq Illumina kits through CWRU genomics. We’ve performed the multiplexing by appending dual indices during targeted amplification of the barcodes. In order to mix everything together, we’ve taken a rather simple approach of gel extracting the amplified bands and quantitating the amount of extracted DNA using Qubit, and mixing every extracted amplicon to equimolar amounts. This mixture is then submitted to the CWRU genomics core for qPCR-based quantification, and loaded onto the kit for sequencing.

Since everything has been mixed to equimolar amounts so far, it’s quite simple to see how well our samples were quantitated and mixed together using the Qubit readings. This is done by taking the number of paired reads associated with each pair of indices, and seeing how their counts / frequencies compare to each other. The distributions tended to be a log-normal distribution, as seen in the below plots:

The second kit seemed to do a little better than the first one. Each kit ended up having a singular poorly sequenced outlier, with the sample in the first kit lower than the median by about ~300-fold, and the sample in the second kit lower by about 10x. The first kit also had about 4 samples that were below the median by about 10-fold.

Regardless, this log-normal distribution is showing what kind of mixing precision we can expect to consistent get with this scheme, even when its working well. Both distributions had a sd(log) of ~0.5. The coefficients of variation were ~ 0.04.

What are the red lines you ask? Well, those are the idealized / hypothetical number of reads we should have gotten if we got all of the reads from the kit (130 million for the mid kit used in kit 1, and 400 million for the high kit used in kit 2) divided by the total number of samples. Clearly, the real life read numbers aren’t hitting that idealized number, which is as expected.

To frame it another way though, sure having read a sample with too much read-depth in inefficient, but what is arguably more annoying is if end up under-reading a sample because we were off in out estimates. What seems like a reasonable approach is to choose a target read-count that will ensure that ~95% of our log-normal distribution hits that minimum read-number needed to get interpretable data. Looking at these n = 2 results, it seems reasonable that we would want to give each sample ~ 5 to 10-fold more reads than one would expect based on idealized numbers.



Generating new indices for multiplex Illumina sequencing

Nidhi recently needed to design new reverse primers for making dual-indexed amplicons for Illumina sequencing, but this involved generating novel indexes that could be used with existing primer sets. Luckily, I had already asked Anh to create a script for such a purpose, so once I remembered that he had already done that, all I had to do was find it and implement it. It’s pretty neat, since it goes to a Google Sheet in the cloud (so not a local file), and extracts the existing indices so we know to avoid things similar to them. Since it was a pretty easy script to understand, I ended up making some tweaks to it: 1) Adding a user input function, rather than hard-coding the number of new indices needed into the script itself, 2) Having the script consider the forward and reverse sequences of the existing indices to avoid both, and 3) spitting out an identity matrix of the pairwise comparisons of the existing 10 nucleotide indices, so I can visually verify that none are really close in sequence.

The script lives here in this GitHub repo, with the newest version I modified called “generateIndex_user_input.py”.

To be specific, rather than an identity matrix, it’s actually a distance matrix (how many positions within the 10 nucleotides are NOT identical), with the white diagonal in the above plot essentially a positive control since that’s showing that in the pairwise comparisons, the same sequence when compared to it self shows 0 nucleotides of difference. Notably, there are no other purely white blocks in that matrix, partially b/c 1) the chance that two randomly generated sequences are identical is ~ 1/(4^10), which is a very small number, and 2) Anh’s script is likely working, and keeping the number of identical matches between any newly generated indices and existing indices to a minimum. Actually, it looks like most differ by 7 or 8 nucleotides, and only a tiny fraction only differ by 3 nucleotides or so (and thus match at 7 positions). If those are the combinations with the highest similarity, it really isn’t bad at all.

So ya, I think we’re doing a decent job of managing our indices to make sure they are not overlapping!

Posters

So a couple of the trainees in the lab are preparing posters for the Dept retreat. In helping them (and other future trainees) get started, I dug out some of my own digital versions of posters out of the digital storage closet, and put them in a “Poster_examples” directory on the lab google drive. But along with those examples, here are a series of tips that I’ve learned over the years, which I’m remembering and committing to writing in this process.

  1. Be cognizant of the size of your poster. I think the most common dimensions are 36″ high by 48″ wide. For some conferences, they will allow you to make larger posters (probably depends on what type of poster boards they have). But also, different poster printers may have different dimensions. I’m sure most do 36″ or 48″ as one of their standard sizes, though you also want to figure out sizes your poster printer will do as one of its standard dimensions.
  2. Know how long it takes to print the poster. There are logistics to everything. Where are you going to get the poster printed? How far ahead of the guaranteed printing deadline do they need the submission? Again, it’s worth knowing this up front so you’re not scrambling at the end. Also, getting started a little bit early can definitely save you some misery later on.
  3. Plan for column-based viewing rather than rows. IMO, it makes much more sense to tell the story in a series of columns, where the viewer starts on the left of the poster, reads down the column, takes a step to their right, reads down the column, and so forth 3 or so times until they’ve seen the whole poster. Probably generally easier to read this way rather than horizontally through rows, and probably the ONLY way to see / go through a poster when it’s really crowded.
  4. Plan out how you’re going to talk about your poster as you’re making it. What normally happens in a poster session, is that you stand around all awkward and lonely in front of your poster until someone comes by that seems interested, and then you’ll introduce yourself and offer to explain it (and you can ask them who they are and what their background is so you can figure out what parts they may be particularly interested in or may struggle in understanding). Then you’ll explain this story, starting at the top left of your poster and winding down to the bottom right. Now, what would really help in telling this story is if the things on your poster actually illustrates what you’re trying to say. So, think about what exactly is the story you’re saying as you’re making the poster, so it’s all there when you need it.
  5. Always start by presenting the problem. It’s easy to get sucked into just saying what you’ve did. But nobody is going to understand why you did something until you tell them WHY you’re doing it. Which means there must be a problem presented, and your data is part of the solution in answering it. But if you want people to be on board, you definitely want to spell out the problem, and why it’s so important, up front.
  6. Do a light (white) background. The backing paper is going to be white. So why force them to cover the entire surface of the paper with ink?
  7. Consistency in fonts / font sizes. This is one of the hard ones. If you’re essentially making a scrapbook of different figures from different sources, it may be hard to get all of the fonts from the figures. In those cases, you may need to crop out the existing labels / text and add in your own of known / controllable size. Also, this is a good reason to always use the same fonts (eg. Arial) in all of your figures.
  8. Text size hierarchy. I hadn’t really thought about this before, but I think this makes sense: For the main text of the poster, it probably makes sense to have about 2 different text sizes: one for the main narrative text you may want the reader to read, and a smaller size for text that holds some detailed information (like a figure legend). The larger one is always meant to be read, while the smaller one is only to be read when necessary. The larger one should be able to be read a good couple feet away, so it’s definitely gotta be >= size 20 or something. Headers / section titles may also make sense too, and you can denote those using bold or by using a slightly larger text size. Of course, this is all not counting the poster title, which should be large and visible from across the room. Of course, you don’t want to overcomplicate things by having like 5 different font sizes.
  9. Not stretching / squishing figures. Nobody likes seeing images in weird proportions; sure, maybe it works with mirrors in a circus funhouse, but it definitely doesn’t work for conveying scientific information in a professional setting. So ya, make sure that if you’re adjusting a figure’s size, that you have the “height and width proportionality” box checked as you do it.
  10. Avoid having too much text. Nobody is at the poster to read the next great American novel. You’ll literally want the bare bones amount of text that you need to give just about all of the big structural information you need to. Everything else, like all of the details, you can convey in person.
  11. Don’t make it too crowded. Again, you want it to be inviting and easy to follow, so you want to give figures ample space to breathe. I’d also say don’t make it too sparse, but while it’s technically possible, I can’t imagine any serious poster has ever had that be an issue (eg. too much stuff to want to get in there).
  12. Consider expanding your illustrative palette. If you’re on your first few posters, it totally makes sense to use Powerpoint (or equivalent) to create your poster. That said, eventually, you may want to include other options. I’ve slowly shifted to using Inkscape, but partially since I’m much better / faster in Inkscape now than I am in Powerpoint. Also, BioRender is a great option for making some really aesthetically pleasing images / graphics relatively quickly with an easy-to-use interface. At worst, I suggest you generate some plots there, and just take screenshots that you can insert into your posters made in Powerpoint.

Also, I realize some of my own posters probably break some of these rules (eg. the “Avoid having too much text” rule). Nobody’s perfect!

Addendum:

Nisha just did the legwork on figuring out how we print posters here, and this is what she found out:
– FedEx office in Thwing (9am-5pm Monday thru Friday).
– General size that they work with is 36″x48″ , so that might be what you want your poster size to be.
– You can have it on a usb drive, or email it to them directly at [email protected]
– If submitted early enough in the day, they may be able to do same day turnaround, but better to submit 2 to 3 days before you need it just in case.
– Will update this once we know how payment works exactly, but I suggest showing up knowing one of the lab speedtypes to charge directly to one of them (R35 is a good candidate, but can always use the startup speedtype if needed).

Inverse PCR SSM Primer Python Script

Time to make our first concerted effort in making a small site-saturation mutagenesis library. Last time I did this in any deliberate way was with the PTEN library. We used the original Jain et al protocol for doing this with inverse PCR followed by DNA phosphorylation and ligation. One of the first Python scripts I ever really wrote (back in 2016) was to generate the list of primers we needed to order to make this library. You can find the script here.

Running it should be pretty easy: If on a mac, you literally have to open “Terminal” or an equivalent Bash terminal, go to the directory with the script and write “Python 160421_PTEN_SSM_30mer.py”.

Now, if you’ve never run a script for bioinformatic -type things before, then you may not already have biopython installed. In that case, trying to run it will likely throw an error. If that’s the case, I suggest you follow the directions here to install biopython (potentially in a virtual environment). Obviously, you aren’t going to want to make SSM primers for making a PTEN library like in the script, and will instead want to make primers for making a SSM of your gene of interest in a different plasmid (ie. with different sequences flanking your gene of interest). You can literally make those changes to the script by opening the “.py” file with a basic text editor (like TextEdit), or you can use a slightly more souped up text editor like “Sublime Text”. Or you want a slightly more complicated but perhaps better long-term experience, you can install Spyder or PyCharm and make the changes in an Integrated Development Environment (IDE). Either way, you’ll want to replace the pretty self-explanatory PTEN-library specific sequences that are currently written into the script.

Some additional considerations:
1. As I noted, this is for a ligation-based inverse PCR protocol. For all future libraries in the lab, I suggest we take a Gibson / In Vitro Assembly -compatible format, where instead of a blunt ended ligation product, the primers will create a DNA product with ~18 nucleotides of homologous sequence on the ends. Instead of writing this script myself, I’m leaving it up to future students to make that script by modifying my above script. Once we create this, I’ll perhaps post it here.
2. Once we come up with the list of primers, we can order them as machine-mixed primers from IDT (and not ThermoFisher, like our standard primer orders), since I’ve recently discovered that degenerate oligos from ThermoFisher have a pretty problematic T-bias.

NSF GRFP at CWRU

One of the hardest things to navigate at an institution are the convoluted layers of policies and administration. It’s still hard for me now as faculty, and must be exponentially harder as a trainee.

For any CWRU students in School of Medicine labs / departments planning to submit an NSF GRFP application, please know that you can apply for the fellowship directly, and don’t need to have to go through the grants administration associated with your department (reporting to the central grants administration team at the SOM level). You likely won’t be in the internal grants management system yet, adding additional (inexplicable) delays and hurdles, that may put your application past the internal deadline for grant submissions. Once past the internal deadline (and thus being locked out of being able to submit the application), you may have to write a petition to yet another group / committee, who may not see the forest through the trees and deny your request to no longer be locked out, effectively ruining all chances of submitting the application, even if you’ve already completely written it many days prior and the actual deadline for NSF is still a couple of days away. Only to have your PI eventually contact the School of Graduate Studies, which sets the entire SOM grants infrastructure straight by pointing out to them that the student applies for the fellowship directly, and all of those previous hurdles were inflicted on you solely by the school for no reason / by mistake.

If anyone seems unsure about this, I can always provide the receipts. If any of that is unclear, feel free to reach to schedule a time to chat. After I regale you with the full tale, I can also give my thoughts on the NSF GRFP, and the fact that even if chances of getting it are low, it could certainly still be worth trying to apply for the experience, and answer any other questions you may have about it.

Using Inkscape

Vector graphics editor software are SUPER useful when trying to make figures. A proprietary program some people may be used to is Adobe Illustrator. I’ve never actually used it, mostly b/c I started using playing around with vector graphics editor software when I was a grad student, and the idea of paying for an Adobe product with my own money was unpalatable. Inkscape is a free & open source software that works on Mac or Windows. It can be a little buggy, but I’ve learned to absolutely love it over the last 10 years or so. To help people in my lab start using it, I’m writing up this primer describing a series of basic processes I perform when using it.

Step 1: Downloading Inkscape. Obviously, this is downloading the program. I’ll assume this is pretty self-explanatory unless someone tells me otherwise.

Step 2: Starting up the program. You should get a blank screen like this, which is the basic workspace. The white rectangle with the drop-shadow is the blank “page”.

Step 3: Adjusting the properties of your page. While you don’t have to start here, I like to make some basic adjustments to the page out of the get-go. To do this, go to File -> Document Properties… or just hit “Command + Shift + D” if you’re on a Mac, like me. A new pane should open up to the right of the program, like below (if it doesn’t, try adjusting the size of your window and that should force a refresh).

The main things I like to change are as follows:
1) The dimensions. Most journal have maximum figure widths between 174 and 180 mm (though handful allow 190 or even 200 mm). It’s probably easier to plan for the smallest size of 174 mm, and just space things apart a bit / give the panels more breathing room if you end up in a situation where larger dimensions are allowed. Thus, adjust the width to 174 mm, and hit enter, and the width of that page should now shrink.
2) So the page settings defaults to *NO* background color (this can be revealed by clicking on that “Checkerboard background” button). So if you put in some shapes / plots, they will be floating in emptiness, rather than a white background. While this isn’t problematic per-se, it usually makes sense to make figures that actually have a white background. Thus, I next click on that box next to the text that says “Background color:” and in the bottom field that has a 0 next to “A:”, I change that to 100 (I’m assuming this is opacity, and we are turning the opacity from 0 to 100). This will make the background white.

Step 3: Saving the file. Now that you’ve made some modifications to your file, you may as well save it (since you’re going to want to start saving the file soon anyway). Go to File -> Save As… and save the file in whatever directory you want to keep it. The file type is will save is a “.svg” file, apparently for an “Inkscape svg” file, which is seemingly a slight derivation of a pretty genetic / standard file type for vector graphics (svg). You can just keep these default settings.

Step 4: Import some existing vector graphics data. So while you can start from scratch making shapes in the program itself, I’m most often using Inkscape to make adjustments to vector graphics plots generated in another program, like R. In this case, I’ll be importing a phylogenetic tree based on the spike protein receptor binding domain sequences from various coronaviruses. Thus, go to File -> Import… or hit “Command + I”, or press that arrow-point-to-a-page button at the top of the screen, to prompt the import screen,.

In the ensuing screen, point to the file you’re trying to import. If it’s a vector graphics -compatible file, then this will most likely be a “.pdf” type file. After you select a file, you’ll get a screen like this:

You can probably just click “OK”, since I’ve found these default settings to generally work fine. All of the shapes will all get imported fine, though I’ve found that the text gets imported kind of weird. Like, the letters are fine if you don’t make any adjustments, but if you try to rewrite any of the letters, you’ll notice that the spacing gets all screwed up. I usually just get around that by replacing the imported text with replacement text that I’ve created using Inkscape, though that gets a bit tedious. I suppose maybe one day I’ll learn if there is a better way to import text from a PDF, but for now, this will do.

Step 5: Adjust the positions and sizes of things. Depending on the dimensions of your image as it was outputted by the preceding program (like R), your image may or may not fit well within your blank page. In my case here, the image was indeed roughly the right size, though I wanted to zoom in a bit to see it in a bit more detail. To do this, I pressed the “+” button my keyboard a couple of times to get to the desired zoom, though you can adjust the Zoom in more detail on the bottom right of the screen (see below):

Another thing you may want to do is adjust the size and position of the image / plot you just imported. If you want to keep everything in the right dimensions, the first thing you’ll want to do is click on the “lock” button making sure the height and width stay in proportion.

Then you can type in whatever number you want into either the W: or H: fields, hit enter, and see if the size was adjusted to what you’d like. Another thing that’s useful is to adjust the position of the image. For example, perhaps this is the first panel of your figure, and you want it in the top left of the page. If that’s the case, enter 0 for both the X: and Y: fields, and hit enter (In many cases, you’ll want at least a couple pixels worth of blank space around the borders, in which case typing in something like 2 is better, but I’m using 0 today to prove a later point). Now the plot should look like so:

Step 6: Ungrouping and adjusting vector objects. As you can see, the border is now missing. Why is that? Well, it’s because in this case, the phylogenetic tree pdf I imported also had a white background, so that white background is now obstructing the borders of the page under it. While this isn’t necessarily problematic, this extra white background is superfluous and may complicate things downstream, so we will now get rid of it. To do so, click on the object, and go to Object -> Ungroup, or just hit the “Command + U” keys like I do. That one object should now have been turned into a bunch of objects, each with its own highlighting, showing that the ungroup command worked.

Unselect everything, and then select only the white background. Once you seem to have selected it, hit delete. If you missed and you deleted the wrong thing, you can undo. If you’re having a hard time selecting the white background because there is something in the foreground that inevitably gets selected first, you can take that foreground element and then hit the “send to back” button at the top of the screen to get that foreground element out of the way.

Once you do that, you should be able to more easily select the background element and hit delete. (In this case, there were apparently two background elements I had to individually select and delete). The end result should now look like this, where you can now see the Inkscape “page” borders again.

Step 7: Adjusting individual elements. In this case, there’s a lot of wasted space because the lines for the outgroup (MERS) is so long. This is also kind of pointless wasted space, since I’m not going to try to make any point about how similar the MERS RBD sequencer is to the rest of them… I’m just using MERS as a “outlier” sample to orient everything else with. To fix this waste of space, I can adjust the lines leading to MERS (the power of vector graphics editors!). To do this, click on the line in question. I did that, and noticed all of the lines in the image were still getting selected. Thus, I hit “ungroup” a few more times until EACH of the individual lines were selectable. I then selected the two lines I wanted to reduce in length: the one leading to the MERS outgroup, and the one leading to the tree. Once you do that, hit the “Edit paths by node” button that’s toward the top left of the screen (see the orange highlight below):

Once you do that, the start and end points of each of those lines are now selectable. Since I want to adjust both lines on the left size (and make it smaller by moving them to the right), click on one of those points, and then Shift-Click on the other point. If they’re both selected, they should look different from the points on the right side of the line.

Once they’re selected, you can move them to the right. I like to do fine manipulating at this scale using the arrow keys on the keyboard. So, I *could* hit the right arrow to move those nodes to the right and thus shortening the lines. Then again, I’d probably have to hit that button ~ 100 times to move it to the right the amount I want. That’s b/c by default, the arrows are “fine” adjustments. To turn the arrows into “coarse” adjustments, hold down the shift key as you press the arrow buttons. I only had to hit “Shift + [Right Arrow]” four times to get those lines to the length I wanted them.

That worked well, but now that connecting line is orphaned and confusing. We can now move that to the right by selecting that, and since we’re not selecting individual nodes but just moving the entire line itself, we can hit “Shift + [Right Arrow]” four times *without* first hitting that “Edit paths by nodes” button. If you already have the “Edit paths by nodes” button enabled (and thus shaded gray), you may need to turn it into the other “Select and transform objects” mode by clicking on the button above the “Edit paths by nodes” button. I’m also now moving everything to the left by selecting everything and entering X: 1 and Y: 1 as I mentioned above. It should now look like so:

Progress! Though adjusting the tree like we did can be disingenuous, since both the adjust lines we shorted (for the outgroup) is the same color & texture as the lines we didn’t adjust (for the rest of the tree). To distinguish between the two, we can click on the lines leading to the outgroup and change its style / texture to a dotted or interval one. That way, we can write in the figure legend later on that the dotted lines leading to the outgroup were adjusted to conserve figure space. To do this, click on the lines in question (holding down shift while doing so, so that all four individual lines can be selected), and then hit the color / button associated with the “Stroke:” label at the very bottom-left of your screen (see orange highlighted circle in the image below). This should open up a new page to the right. If you still have the “Document Properties” pane open from before, you may need to close it out (by hitting the “X” button at the top right of that pane), or just scroll down until you get to the “Fill and Stroke” pane. Apparently one can also get to this pane by hitting “Command + Shift + F”.

While you still have those lines selected, you can click on the tab that says “Stroke style” and then go to the button that says Dashes, and click on a style that has dots or dashes. You may also need to adjust the width of the line, so that it’s a bit more pronounced (I set it to 1 pixel).

The end result should look like this. A much better use of space!

Since this is potentially panel A, let’s make a panel label. To do this, click on the button labeled with “A” on the left side (the “Create and edit text objects” button), and then *drag* a text box roughly in the right area of where you’ll want it. Note: While you can *click* and make a test object, dragging is better because then you can modify the size of that text object later (less relevant for things like figure labels, but *definitely* important for things like figure legends). In this case, I typed in the letter “A”, turned it into bold. Once I did that, I clicked on the now bolded A, and then adjusted the X: and Y: to 1 so it’s now in the top left corner, like so:

Step 8: Aligning some object together. OK, well this figure doesn’t actually need this label, but let’s pretend we want to make a short one-line label / title at the bottom, and we want it right at the center of the tree. Make a text box like before, and type in your label. If you select the text box and are in text editing mode (you may need to click on the “A” button on the left again), you have the option to change fonts, bold / italics, size, line spacing, and line justification / centering in a series of boxes at the top. The one I adjust ALL the time is the line spacing. It defaults to 1.25, but it’s kind of waste of space like that, so I set it to 1.

Once you’ve done all that, you should have a screen that looks like below:

Now for centering the label right in the middle of the tree. To do this, select all of the tree elements (I would click and drag the selection around the tree). If it gets too many things, like the “A” panel label at the top left, do a “Shift + [Click]” to unselect just that. Then you can group all of the tree stuff together for easy manipulation by going to Object -> Group or hitting “Command + G” like me. To now align the figure label in the center of the tree, open the alignment pane. You can get to it by going to Object -> Align and Distribute… , or by hitting “Command + Shift + A” like I normally do.

This pane will allow you to align things based on the right edge, left edge, the center. Or if you have multiple objects that you want to space equally, you can go to the Distribute button options in the middle. Looks like it’s defaulted to “Relative to: Last selected”. That means that if you’re aligning two objects, the object you select first will be moved to the objected you selected second. You can of course change that order if you change the “Relative to:” to “First selected”. But in this case, I’m leaving it as the default, so I first selected the legend, and then selected the tree, and then hit the “Center by Vertical Axis” button that I highlighted in orange. The resulting image looks like this, where now the elements are aligned!

Step 9: Exporting your image. OK, so I’ve done everything I’ve wanted to do for this tutorial, so that last thing will be exporting the image. You have two options here. You can make a PDF that can be imported into whatever program later, by going to File -> Save As… and then instead of saving it as a .svg file, saving it as a .pdf file. (Note: this will change the workspace you’re working in to be a “.pdf”. If you want to have the workspace be an “.svg”, you’ll need to resave into that format. Or if you don’t want to do this roundabout process, you can also do File -> Print, and then select “Print to File”; all this is up to you). But, let’s pretend you want to actually submit this as a figure for a manuscript submission. Usually the desired filetype is as an image file rather than a vector graphic. This means something like an uncompressed .tif or .png file (instead of a compressed .jpg file). To do this, click on File -> Export PNG Image… , or click on the “Arrow coming out of page” button next to the import button at the top of the screen. This will open up another pane to the right:

For novices, I suggest you always click on “Page” for the Export area; this means it will export that full canvas you have been looking at (with the full width of 174 mm or whatever you had set before), rather than just the selected object, etc. You can adjust the dots per inch or DPI, but the default 300 is usually pretty good. And of course, you can and should change where the file is being saved. Once you do that, hit the “Export” button on the bottom. A little pop-up screen will show the export process (should be relatively quick for files like this). And Voila! You’ve made yourself a nice little customized image!

Update 1: Now with Western Blots.
I told myself that I would document the next time I was going to insert some image files into Inkscape for the purposes of this post, and here we are. So let’s now set the scene:

OK, so we have a multi-panel figure in the works here. The only panel relevant to today’s post is panel B, which is going to be a couple of Western Blotting images. In short, we have a negative control lysate, as well as 5 experimental samples where a different protein has been HA-tagged in each lysate. Looks like the first 3 proteins express quite highly, while the last two are expressed at pretty low levels. Thus, I’m going to try putting a lighter exposure image on the left side, and probably a cropped right portion of a longer exposure image showing the last few samples to the right.

Well, first thing to do is press that import button (circled in orange) at the top of the page. Doing that will open up a window like below, where you’ll want to select the image you want to import into your canvas.

Once you choose a file, you should get another screen like the one below. Default settings are fine in my experience, so you can just hit “OK”.

Do this for all of the images you may want. The image files, if they are of pretty normal quality, will be imported as HUGE dimensions on the canvas. Reduce the dimensions of the file to something more reasonable, like 150 pixels. You’ll see yourself in a situation like this:

Now to start aligning the images, the main trick you’ll to using today is “setting a clip”. In other words, instead of cropping the edges off of an image like one would normally do in Powerpoint, you instead make an area that you save from being cropped. Thus, to do this, I start by making a partially transparent box above the relevant bands, like these actin bands here.

I have multiple exposures I’m eventually going to want to line up on top of each other, so now that I’ve the box to be the size I want, I’m first going to make a duplicate of the box by hitting command D. I then move that duplicated box elsewhere on the canvas, so we can get back to it later.

Then, select both the box and the image (note: the box needs to be “above” the blot image), and then right click and hit “Set Clip”.

Doing so reduces the bottom image to the size of the top object. So now it’s the Blot image that has effectively been cropped.

Well, so I didn’t do a great job centering this crop, so I went back and fixed it but didn’t bother to retake the images. But if you’re curious, you can actually do this by right clicking on the object and hitting “Release Clip” to turn them into two separate objects again. Then just resize the box, select everything again, and then right click and hit “Set Clip” again.

Alright; now we can get one of the other exposures cropped and lined up. So now, I’ve moved in one of the other exposures in proximity to the red box. Since I’m going to have to make a larger field of view for this cropping, how straight the image is lined up is going to matter a lot more. For that reason, I drew in a black vertical line to visually inspect how straight it all seems. For the most part looks pretty good.

Allright, so I don’t need that line anymore so I went ahead and deleted it. But as I had mentioned; I want to get a larger window of the blot as compared to actin (a single horizontal band). Thus, I want to keep the width of the box the same but change the height of the box. To do this, click on the image, and you’ll see it have these arrows coming out of it.

Well, just click on the top and bottom arrows and drag to the height that you want it. It should now look something like this.

I can then crop and get the most relevant part of the image. Now that I have the cropped image, I can now align the two images together.

Now move everything in place. If you have another image, like a darker exposure, then you can just go back and repeat some of the previous steps for that one. You can even give them an outline, by duplicating the image, releasing the clip, deleting the repeated image, but turning hte red box into a completely clear box with a solid stroke; essentially, turning it into a see-through frame. Anyway, at this point, you should have a plot like above.

Next, making a bunch of labels to denote what each sample is. You can do that by dragging a text box, and then typing in your label. Once you’ve written the text you want for the first columns, duplicate that text box and replace the existing text with the text for the next sample, and keep doing this until everything is correctly labeled. When trying to make row labels, you may encounter needing to get some special greek characters like I did. In that case, while in the text box, go to Text -> Unicode Characters… and select it (see below).

You’ll get a box that looks like below, where you’ll want to click and select “Greek and Coptic”.

End result after you finish fixing the text should look like so:

Last bit is adding in the size labels, at least for the blot in the middle, since it spans a pretty large size. Alright, well, how would I go about doing this? Well, it’s pretty easy for the imager that we use, since it seems to spit out a combined image of the marker and the exposure, into a file that looks like this when imported into Inkscape (and resized to be the same dimensions of the lower exposure blot from earlier, and aligned on the vertical axis); here they are side-by-side.

If you want to prove to yourself that they are perfectly aligned, you can click both the actual lower exposure image and the marker-exposure overlay image, align them on both the vertical and horizontal axis, and then toggle the order back and forth (make the top image the bottom, and them make the bottom image top, etc). I did this, and they indeed seem to be perfectly aligned / superimposable (as one would expect). So knowing this, I’ll move forward by leaving tick marks where each of those marker labels are. Looks like the ladder we used is this one from GenScript.

Cool, now I can delete the overlaid picture, reset the clip on the original low exposure image, and then move the marker labels to a convenient spot to the left. Voila! (see below).




Additional resources:
I haven’t vetted this tutorial page, but it looks like it could potentially be useful.