# vcf-annotator **Repository Path**: wangzc1990/vcf-annotator ## Basic Information - **Project Name**: vcf-annotator - **Description**: Add biological annotations to variants in a given VCF file. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-01 - **Last Updated**: 2022-07-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/vcf-annotator/README.html) [![Anaconda-Server Badge](https://anaconda.org/bioconda/vcf-annotator/badges/downloads.svg)](https://anaconda.org/bioconda/vcf-annotator) *vcf-annotator* uses the reference GenBank file to add more details to the variant calls in a VCF. # vcf-annotator Using a reference GenBank file, *vcf-annotator* adds biological annotations to variants in a VCF file. A full list of annotations is descibed below, but these include amino acid changes, gene information, synonymous vs nonsynonymous, locus tag information, among many more. ### Added Annotations For each mutation, if applicable, the following annotations are added to the INFO column of the VCF. | Annotation | Description | |------------|-------------| | RefCodon | Reference codon | | AltCodon | Alternate codon | | RefAminoAcid | Reference amino acid | | AltAminoAcid | Alternate amino acid | | CodonPosition | Codon position in the gene | | SNPCodonPosition | SNP position in the codon | | AminoAcidChange | Amino acid change | | IsSynonymous | 0:nonsynonymous, 1:synonymous, 9:N/A or Unknown | | IsTransition | 0:transversion, 1:transition, 9:N/A or Unknown | | IsGenic | 0:intergenic, 1:genic | | IsPseudo | 0:not pseudo, 1:pseudo gene | | LocusTag | Locus tag associated with gene | | Gene | Name of gene | | Note | Note associated with gene | | Inference | Inference of feature. | | Product | Description of gene | | ProteinID | Protein ID of gene | | Comments | Example: Negative strand: T->C | | VariantType | Indel, SNP, Ambiguous_SNP | | FeatureType | The feature type of variant. | # Installation ### Requirements * Python >= 3.4 * [BioPython](http://biopython.org/) >= 1.65 * [PyVCF](https://github.com/jamescasbon/PyVCF) == 0.6.8 ### Bioconda *vcf-annotator* is available from BioConda ``` conda install -c bioconda vcf-annotator ``` ### From Source ``` git@github.com:rpetit3/vcf-annottor.git cd vcf-annottor pip3 install -r requirements.txt python3 vcf-annottor.py YOUR_VCF.vcf REFERENCE.gb ``` Nothing much else to it, just a simple to read in a VCF and GenBank file and output an annotated VCF. Feel free to drop it in your $PATH somewhere! # Usage *vcf-annotator* requires an uncompressed VCF file and the corresponding reference GenBank file. It then outputs the annotated variants, by default to STDOUT, but this can be changed on runtime. ### Usage Output ``` python3 vcf-annotator.py usage: vcf-annotator.py [-h] [--output STRING] [--version] VCF_FILE GENBANK_FILE Annotate variants from a VCF file using the reference genome's GenBank file. positional arguments: VCF_FILE VCF file of variants GENBANK_FILE GenBank file of the reference genome. optional arguments: -h, --help show this help message and exit --output STRING File to write VCF output to (Default STDOUT). --version show program's version number and exit ``` ##### --version Output ``` python3 vcf-annotator.py --version vcf-annotator.py 0.5 ``` ### Example Usage A VCF and GenBank file are included in the *example-data* directory. You can use these two files to verify the script is working properly. ``` python3 vcf-annotator.py example-data/example.vcf example-data/example.gb ``` ### Disclaimer This script has been developed only for microbial variant analysis. I've only tested on VCF files output from GATK, but I would assume if the VCF format is followed other VCF files should work as well. Currently for a ~3mb genome with ~20k mutations it takes about 10s to annotate the VCF file. Based on this information, I'm not sure how well it would work on larger genomes (if it would even work at all!).