Skip to the content.

evSeq

No sequence-function pair left behind.

Every Variant Sequencing (evSeq) is a library preparation and analysis protocol that slots neatly into existing workflows to enable extremely low-cost, massively parallel sequencing of protein variants. Designed for heterologously expressed protein variants arrayed in 96-well plates (or similar), this workflow enables sequencing all variants from targetted mutagenesis libraries produced during a protein engineering or biochemical mutagenesis experiment at a cost of cents per variant, even for labs that do not have expertise in or access to next-generation sequencing (NGS) technology.

Read the Paper!

This repository accompanies the work “evSeq: Cost-Effective Amplicon Sequencing of Every Variant in Protein Mutant Libraries”.

Read the Docs!

Find detailed documentation at the individual pages linked below or start at the overview.

Biology
Computation
Troubleshooting

General Overview

The evSeq workflow

Workflow A) evSeq amplifies out a region of interest that contains variability, attaches well-specific barcodes and adapters, and is ready for NGS.

B) All that’s required to perform the evSeq laboratory procedure is:

That’s it.

Due to the two-primer, culture-based PCR methodology employed by evSeq, only a new pair of inner primers needs to be ordered when targeting new regions/sequences and no DNA isolation needs to be performed.

C) Once the sequences are returned by the NGS provider, the computational workup can be performed on a standard laptop by users with little-to-no computational experience.

The amplicons prepared with evSeq can yield nearly 1000 high-quality protein variant sequences for just the cost of the multiplexed NGS run (typically ~$100 from commercial sequencing providers, likely lower for in-house providers).

Construct and visualize sequence-function pairs

SeqFunc Sequencing eight site-saturation libraries (768 wells) in a single evSeq run and combining this with activity data to create low-cost sequence-function data. A) Enzyme and active-site structure highlighting mutated residues. B) Heatmap of the number of identified variants/mutations (“counts”) for each position mutated (“library”) from processed evSeq data. C) Heatmap of the average activity (“normalized rate”) for each variant/mutation in each library. D) Counts for a single library, also showing the number of unidentified wells. E) Activity for a single library, showing biological replicates. (Inset displays the mutated residue in this library.)

Documentation

Biology

Theoretical overview

Library preparation

Computation

Computational basics

Installation

Running evSeq

Understanding the Outputs

Additional Examples

Below are a collection of Jupyter Notebooks (rendered as documents) with examples on how to get the most out of evSeq. If you want to run them on your own, they can be found in the examples directory of the evSeq repository.

Using evSeq data

Creating barcode/index pairs

Running evSeq in a Jupyter Notebook

Troubleshooting