Machine learning solutions to fight the Antibiotic Resistance

Antibiotic Resistance

Antibiotic resistance is turning into a global healthcare problem. The exponential growth of metagenomics data has contributed to creation of more accurate and fastest algorithms.

What is DeepARG?

DeepARG is a machine learning solution that uses deep learning to characterize and annotate antibiotic resistance genes in metagenomes. It is composed of two models for two types of input: short sequence reads and gene-like sequences.

Short NGS reads

DeepARG is able to annotate short sequence reads from Next Generation Sequencing (NGS) technologies such as Ilummina. This model has been trained with simulated antibiotic resistance reads to perform better with metagenomic samples.

Gene-Like Sequences

DeepARG is able to predict antibiotic resistance in long gene-like sequences. This model is suitable for annotating full sequence genes and to discover novel antibiotic resistance genes from assembled samples.


Antibiotic resistance Database

Click here to Download the database

The deepARG-DB (database) contains data for approximately 15,000 sequences between manually-curated (CARD database) and high quality predicted genes (Automatic inspection, annotation and validation Figure 1). The database, unlike other resources, it provides a simple structure consisting of antibiotic classes or types (families of antibiotics e.g., beta lactamase, aminoglycosides) and antibiotic groups or subtype.

Train Your deep Learning Model

We have included the neccesary steps to re-train the deepARGs models or to create your own deeep learning model using the architecture of deepARG.
Please look at the Bitbucket README file for details in the DeepARG repository.

Figure 1: Automatic annotation of highly homologous ARGs

Deep Learning Models

The deepARG models have been designed for computational analysis of next generation sequencing data such as Metagenomes. The main contribution of the deepARG models are their low false negative rate during predictions. Also, the gene-like sequences model is designed to find novel ARGs based on the sequence homology.

DeepARG pipeline

The pipeline can be used as an stand alone program. It was developed in python 2.7 and requires (optional) DIAMOND for making the alignments. The source code can be downloaded from this Git Repository hosted in BitBucket.


DeepARG requires the next python modules (all can be installed via pip):

  • Nolearn lasagne deep learning library.
  • Sklearn machine learning routines.
  • Theano for fast computation. For GPU usage (see theano documentation)


Open a terminal and clone the source code:

git clone


  • Go to the directory where the program was saved and open the file

    Replace path = '/home/gustavo1/tmp/deeparg-ss/'; with the current directory (deepARG path).

    For instance, deepARG was cloned at /home/user/deeparg-ss/
    The file should looks like
    path = '/home/user/deeparg-ss/';

  • Go to ./bin under deeparg-ss and run chmod +x diamond (only for LINUX)


python -h

    General options:
        --type          (nucl/prot) Molecule type of input data
        --iden          minimum percentaje of identity to consider
        --reads         short sequences version
        --genes         long sequences version

    Annotate sequences when the input is a BLAST tsv delimited file:
        deepARG --predict --input  --output 
            --input         blast tab delimited file.
            --output        output of annotated reads.
    Annotate sequences when the input is a FASTA file:
        deepARG --align  --input  --output 
            --input         fasta file containing reads.
            --output        blast tab delimited alignment file.
            --iden          Identity cutoff

Go to the deeparg-ss directory and run any of the following commands:

Input is a FASTA file: 
    1) Annotate gene-like sequences when the input is a nucleotide FASTA file:
        python --align --type nucl --genes --input /path/file.fasta --out /path/to/out/file.out

    2) Annotate gene-like sequences when the input is an amino acid FASTA file:
        python --align --type prot --genes --input /path/file.fasta --out /path/to/out/file.out

    3) Annotate short sequence reads when the input is a nucleotide FASTA file:
        python --align --type nucl --reads --input /path/file.fasta --out /path/to/out/file.out

    3) Annotate short sequence reads when the input is a protein FASTA file (unusual case):
        python --align --type prot --reads --input /path/file.fasta --out /path/to/out/file.out

Input is a tabular BLAST-like file:
    4) Annotate gene-like sequences when the input is a nucleotide BLAST alignment file:
        python --predict --type nucl --genes --input /path/file.fasta --out /path/to/out/file.out

    5) Annotate gene-like sequences when the input is an amino acid BLAST alignment file:
        python --predict --type prot --genes --input /path/file.fasta --out /path/to/out/file.out

    6) Annotate short sequence reads when the input is a nucleotide BLAST alignment file:
        python --predict --type nucl --reads --input /path/file.fasta --out /path/to/out/file.out

    7) Annotate short sequence reads when the input is a protein BLAST alignment file (unusual case):
        python --predict --type prot --reads --input /path/file.fasta --out /path/to/out/file.out

Direct Annotation Service

New! (9/6/2017) We are excited to release a new version of our deepARG online analysis.

The DeepARG webservice (beta) is a tool with fully automated data analysis pipeline for Antibiotic Resistance annotation of raw metagenomics samples using the deepARG algorithm and our developed database (deepARG-DB). You just need to upload your raw sequence reads (*.fastq.gz) and our service will take care of everything else.

Our pipeline first removes low quality reads using TRIMMOMATIC, then, reads are merged into one big file (VSEARCH) and are submited for classification to the deepARG algorithm. Results are normalized to the 16s rRNA abundance in the sample.

DeepARG webservice has a very simple web user interface for the annotation of ARGs from metagenomes. It runs under the ARC (advanced research computing) center at Virginia Tech which guarantees a stable computing environment.

Meet our projects on metagenomics analysis

MetaStorm is a WebService developed for Functional and Taxonomic analysis of Metagenomes from Next Generation sequencing reads.
MetaStorm incorporates several functional databases such as ACLAME, COG, UNIPROT, CARD, ARDB, BACMET among others.

Visit Website
ARGPore (Antibiotic Resistance annotation from MinION nanopore sequencing reads) is a web platform developed for the analysis of antibiotic resistance from environmental metagenomics samples obtained with the MinION nanopore sequencer.

ARGPore will be released soon.

Visit Website


NSF Partnership in International Research and Education (PIRE).

HEARD: NSF Halting Environmental Antimicrobial Resistance Dissemination.
Effective Mitigation Strategies for Antimicrobial Resistance program.
The Virginia Tech Institute for Critical Technology and Applied Science Center for the Science and Engineering of the Exposome (SEE).

The Virginia Tech Sustainable Nanotechnology Interdisciplinary Graduate Education Program (IGEP).

Our team Who is behind this project

Contact us Keep in touch

Your message was successfully sent!

  • Address
  • Virginia Tech, VA 24061
  • Phone number
  • (+1) 202 717 5300
  • Email