Running evSeq
in a Jupyter Notebook¶
Successfully imported evSeq
Jupyter Notebooks are excellent tools for reproducible code. If you are a frequent Jupyter Lab user, every step of evSeq
—from running, to analysis of generated HTML plots, to post-processing—can be done directly in the Jupyter Lab interface.
Jupyter Notebooks allow code to be run effectively from the command line interface by using a !
in front of the line. Furthermore, variables can be defined declaratively in python and then passed into the command within {}
brackets.
The run will take anywhere from a couple of minutes (if you have many processors availble) to upwards of 20–30 minutes, if you have few or use the flag --jobs 1
(or some other low number) to reduce the multiprocessing jobs.
Not all runs will take this long. This run processes 5 plates of high-quality sequencing for multi-site libraries, making it extensive to process in return for excellent results.
# Define the refseq and folder
refseq = 'refseqs/DefaultRefSeq.csv'
folder = '../data/multisite_runs/'
# Pass to evSeq and run
!evSeq {refseq} {folder}
Loading forward reads... Parsing forward reads...: 100%|██████| 403077/403077 [00:21<00:00, 19189.31it/s] Loading reverse reads... Pairing reverse reads...: 100%|███████| 403077/403077 [00:48<00:00, 8300.91it/s] Running read qc... Assigning sequences to wells... Processing wells...: 100%|████████████████████| 480/480 [10:09<00:00, 1.27s/it] Saving outputs to disk... High mutational frequency in DI01-C02. You may want to check alignments for accuracy.
You should see a warning of High mutational frequency in DI01-C02. You may want to check alignments for accuracy.
. This is expected.
Viewing the results¶
You can view the results directly in Jupyter Lab. To view plots, navigate to the output directory in the file explorer on the left:
evSeqOutput/date-time/
and then click on a plot (e.g., Qualities/QualityPlot.html
). Then, in the top left corner of the new tab that opens click Trust HTML
to allow Jupyter Lab to render the plot.
Running with other arguments and flags¶
As stated above, the !
runs code directly as if it were run from the command line, making evSeq
run here as if it were run from the command line and allowing variables to be passed in within {}
. Arguments and flags can be passed in as shown in the examples below:
# # Analysis-only run of evSeq, using a flag
# !evSeq {refseq} {folder} --analysis_only
# # Save the output in the 'data' folder
# !evSeq {refseq} {folder} --output ../data/
A standard call for running evSeq
¶
You could even create formatted lists/dictionaries declaratively in python that are passed to the standard call !evSeq {refseq} {folder} {flags} {args}
.
The cell code below could reasonably used for any evSeq run, updating flags and args where appropriate and uncommenting the last line. If any flags and/or args are used, commenting out or removing any lines between the brackets will run the program with only those remaining (including if none are remaining).
# Define the refseq and folder
refseq = 'refseqs/DefaultRefSeq.csv'
folder = '../data/multisite_runs/'
# Set up flags ('--' not needed)
flags = [
'keep_parsed_fastqs',
'return_alignments',
]
# Set up args and their values
args = {
'output': '../data/',
'read_length': 150,
}
# Format
flags = ' '.join([f'--{flag}' for flag in flags])
args = ' '.join([f'--{arg} {val}' for arg, val in args.items()])
# Check on them
print(f"""
Running evSeq with the following parameters:
--------------------------------------------
refseq file: {refseq}
fastq location: {folder}
Flags: {flags}
Args + values: {args}
""")
# Run in evSeq
# !evSeq {refseq} {folder} {flags} {args}
Running evSeq with the following parameters: -------------------------------------------- refseq file: refseqs/DefaultRefSeq.csv fastq location: ../data/multisite_runs/ Flags: --keep_parsed_fastqs --return_alignments Args + values: --output ../data/ --read_length 150
Comparing to expected evSeqOutput
¶
import glob
from evSeq.util import compare_to_expected
The function compare_to_expected
takes in paths to a folder inside evSeqOutput
and compares each OutputCounts
file, expecting them to be identical (allowing rooms for small numerical errors). This passes silently or raises an AssertionError
after printing out the mismatched file(s) and associated problem(s).
# Path to expected results
expected_path = '../data/multisite_runs/evSeqOutput/expected/'
# Get the most recent run from this folder
# Adjust this path if necessary
recent_path = glob.glob('evSeqOutput/*')[-1]
# Compare
print(f'Comparing expected results to results in "{recent_path}".')
compare_to_expected(recent_path, expected_path)
Comparing expected results to results in "evSeqOutput/20210825-114038".
Back to the main page.