Welcome to pyrewton’s documentation!

Logos of EastBIO, BBSRC, University of St Andrews and James Huttton Institute

Version v0.1.1 2020/06/04
DOI: 10.5281/zendo.3876218

Overview

Pyrewton is a Python3 package for the identification of Carbohydrate Active enZymes (CAZymes) from candidate species, providing the user a complete CAZyome (all CAZymes encoded within a genome) for each candidate species. Pyrewton invokes and statistically evaluates three CAZyme prediction tools [dbCAN](https://github.com/linnabrown/run_dbcan), [CUPP](https://www.bioengineering.dtu.dk/english/researchny/research-sections/section-for-protein-chemistry-and-enzyme-technology/enzyme-technology/cupp), and [eCAMI](https://github.com/zhanglabNKU/eCAMI).

Pyrewton is designed to be run at the command line and is free to use under the MIT license, with proper recognition.

Pyrewton supports: - Downloading of all genomic assemblies (as GenBank files .gbff) from the [NCBI Assembly database](https://www.ncbi.nlm.nih.gov/assembly) associated with each candidate species passed to the programme - Retrieval of all annotated protein sequences from GenBank (.gbff) files - Retrieve proteins entries from [UniProtKB](https://www.uniprot.org/), using a JSON file to configure the queries. Writing out the protein data to a summary dataframe and protein sequence to a FASTA file.

Features currently in development: - Use the 3rd-party tools dbCAN, CUPP and eCAMI to predict which proteins within a FASTA file (generated by searching genomic assemblies and querying UniProt) are CAZymes - Evaluate the accuracy of the CAZyme prediction tools to distinguish between CAZyme and non-CAZyme protein sequences - Evaluate the accuracy of the CAZyme prediction tools to the correct CAZy family - Produce a report of the CAZyme prediction tool evaluation

More detailed documentation for each module is linked to in the contents table below, including links to documentation to help with trouble shooting.

Requirements

Python version 3.7+ Miniconda3 managed microenvironment, incorporated code checkers are included in list form in ‘requirements.txt’. Miniconda3 environment file is also available in the GitHub repository: ‘environment.yml’.

Installation

1. Navigate the directory you wish to store pyrewton in, then clone this repository. git clone https://github.com/HobnobMancer/pyrewton.git

1. Create a virtual environment with dependencies, then activate the environment. conda create -n <venv_name> python=3.8 diamond hmmer prodigal -c conda-forge -c bioconda conda activate <venv_name>

2. Install all requirements from requirements.txt file. The requirements.txt file is stored in the root of this repository. pip3 install -r <path to requirements.txt file>

3. Install pyrewton. pip3 install -e <path to directory containing setup.py file> Do not forget to use the -e option when install using pip3, otherwise each time pyrewton is invoked a ModuleNotFound error will be raised. Pass the path to the directory containign the setup.py file not the path to the setup.py file; if you are currently in the root directory of the repoistory where the file is located, simply use ‘.’ to indicate the current working directory.

  1. Install third party CAZyme predicition tools.

To install dbCAN follow the instructions within their [GitHub repository](https://github.com/linnabrown/run_dbcan), BUT ignore steps 1 and 2 of their installtion guide, becuase the necessary virtual environment was already created in the second step of this installation and it meets all requirements of dbCAN. Install dbCAN within ‘pyrewton/cazymes/prediction/tools/dbcan’ directory within the repository, otherwise pyrewton will not be able to find the tool.

To install eCAMI follow the instructions within their [GitHub respository](https://github.com/yinlabniu/eCAMI). eCAMI must be installed within the directory pyrewton/cazymes/prediction/tools/ecami. Following the method from the eCAMI repository will write eCAMI to ‘pyrewton/cazymes/prediction/tools/ecami/eCAMI’, to avoid this perform the installation within ‘pyrewton/cazymes/prediction/tools’ and rename ‘eCAMI’ to ‘ecami’, thus install eCAMI in ‘pyrewton/cazymes/prediction/tools/ecami’.

To install CUPP download the CUPP files from the [DTU Bioengineering server](https://www.bioengineering.dtu.dk/english/ResearchNy/Research-Sections/Section-for-Protein-Chemistry-and-Enzyme-Technology/Enzyme-Technology/CUPP), and store the files in ‘pyrewton/cazymes/prediction/tools/cupp’. It is not necessary to download all the files becuase the .tar and .tar.gz directories each contain all the files, therefore, download either the .tar _or_ .tar.gz directories and unpackage them or download all the files located within ‘CUPP_v1.0.14’.

Notebooks

Jupyter notebook environments were crated, documenting how pyrewton was used during the EastBIO 2019-2023 PhD Project, the GitHub pages for which are available here. These can be used as examples for how to use pyrewton in research.

Help, Contribute and Support

Many of the common errors expected to arise during the operation of the scripts provided in this repository are covered in this documentation, including the probable causes of these issues.

Please raise any issues with any of the programmers at the GitHub repository issues pages, by following the link.

Note

pyrewton is still in development, and further functionalities are being added. Please see the GitHub repository for the latest developments.