AMP PREDICTION

Prediction algorithms for antimicrobial peptides are incorporated in the database. These are based on Support Vector Machines (SVM), Random Forests (RF) Artificial Neural Network (ANN) and Discriminant Analysis (DA). User can select the algorithm required for prediction.
Peptide sequence/s in FASTA format can be pasted or uploaded for prediction. The results for RF, ANN, SVM and DA are explained below:
AMP: The sequence is predicted to be antimicrobial.
NAMP: The sequence is predicted to be not antimicrobial.
RF, SVM and ANN give a probability score (0 to 1) for the prediction. Higher the probability, greater is the possibility of the peptide being antimicrobial.
The prediction algorithm provides three options to the users:

  • Users can scan the entire protein for predicting its antimicrobial activity.
  • Users can scan sequences for antimicrobial regions within proteins.
  • Users can rationally design antimicrobial peptides by generating all possible single residue mutations and select the sequences having the highest AMP probability.


SEARCH

Simple search in CAMPR3 allows users to search based on keywords like "brevinin" or string searches like "human defensin". Users can restrict the search to a particular field descriptor. Searches using Boolean operators are possible using the ‘Advanced search’ option. All searches are case insensitive. A complete list of the field descriptors and their description is given below:


DESCRIPTORS

DESCRIPTION& USE IN CAMPR3

SEQUENCE

Protein sequences represented as single letter amino acids.
E.g. GLWS

SEQUENCE LENGTH

The length of antimicrobial peptides represented in a numerical manner. 
E.g. 29

SOURCE ORGANISM

Scientific name of the source organism of the antimicrobial peptide. 
E.g. Phyllomedusaoreades

ACTIVITY

E.g. antibacterial, antifungal, antiviral, antimicrobial, anticancerous

TARGET ORGANISM

E.g. E.coli

PUBMED ID

E.g. 12379643

GI

GenInfo Identifier of NCBI. E.g. 41016983

PROTEIN NAME

E.g. Dermaseptin-01

UNIPROT ID

E.g. P83637

PDB ID

E.g. 2JQ0

AMP FAMILY

E.g. Dermaseptin

MIC

E.g. MIC=30

SECONDARY STRUCTURE


Secondary structure

Criteria

Helical

Helical residues more than 80%

Strand

Beta residues more than 80%

Coil

Turn + bend residues more than 80%

Majorly Helical

( Helical residues > 60% and beta residues < 5% ) or ( helical residues > 50% and beta residues < 10% )

Majorly Strand

Beta residues > 30% and helical residue < 5%

Majorly Coil

Turn + bend residues > 50% and helical residues < 50% and beta residues < 30%

Mixed

Helical residues < 50% and beta residues < 30% and turn+bend residues < 50%

Signatures:

Users can browse through the different AMP families. The page contains a table providing information about the AMP family and signatures captured using patterns or HMMs.

H: symbol H represents HMMs.

P: symbol P represents Patterns.

Description of Family:  This information has been obtained from Pfam, InterPro and/or published literature.

Signature IDs: First four letters represent the CAMP database, followed by a three letter abbreviation of the family name, followed by H or P either for HMM or Pattern, respectively. If the pattern/HMM is created for a family using sequences with specific length, then this integer is suffixed at the end.
For example:
CAMPCecH is a HMM Id for cecropins, CAMPCecP35 is a pattern ID derived from cecropins which are 35 residues long.

Tools:

BLAST
BLAST in CAMPR3 provides option for selection of databases of interest such as the entire database, sequence, structure, patent, experimentally validated, predicted and predicted based on signature datasets.
References

  • Altschul, S. F. et al. (1997), Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25:3389-3402.

VAST
VAST is an algorithm used for the identification of similar protein 3-dimensional structures based on geometric criteria and also for the identification of distant homologs. The similar 3D structures identified by VAST are referred to as “structure neighbours”. Users can input PDB or MMDB ID of their interest.
References

  • Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996 Jun; 6(3): 377-85.

Clustal Omega
Clustal Omega tool can be used for multiple sequence alignment. It uses seeded guide trees and HMM profile-profile techniques to generate progressive alignment of three or more biological sequences. Users can paste their sequence/s or browse a text file with sequence/s in the fasta format.
References

  • Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.
  • Goujon M., McWilliam H., Li W., Valentin F., Squizzato S., Paern J., Lopez R. (2010) A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W695-9. doi: 10.1093/nar/gkq313. Epub 2010 May 3.
  • McWilliam H., Li W., Uludag M., Squizzato S., Park Y.M., Buso N., Cowley A.P., Lopez R.(2013) Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W597-600. doi: 10.1093/nar/gkt376. Epub 2013 May 13.

PRATT
Pratt tool is used to search patterns conserved in a set of protein sequences. Users can either input their sequences in the FASTA format or Swiss-Prot format. Multiple sequence alignment of the sequences in the FASTA format can also be used as an input. Users can provide how many sequences should match a pattern to be reported.
References:

  • Jonassen I., Collins J.F., Higgins D.G.(1995) Finding flexible patterns in unaligned protein sequences. Protein Sci. 1995 Aug; 4(8):1587-95.
  • Jonassen I. (1997) Efficient discovery of conserved patterns using a pattern graph. Comput Appl Biosci. 1997 Oct;13(5):509-22.

ScanProsite
ScanProsite tool can be used to scan protein sequences against the PROSITE collection of motifs or scan user-defined motifs against protein sequence/s.
References:

  • de Castro E., Sigrist C.J., Gattiker A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N. (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W362-5.

PHI-BLAST
Pattern Hit Initiated BLAST uses regular expression pattern for searching against protein sequence database. It can find sequences that contain the pattern and are homologous to the query protein sequence. Users have to provide a query protein sequence as well as the pattern associated with the sequence.
References:

  • Zhang Z., Schäffer A.A., Miller W., Madden T.L., Lipman D.J., Koonin E.V., Altschul S.F. (1988) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998 Sep 1;26(17):3986-90.

HMMER
jackhmmer: The tool allows users to iteratively scan a sequence, HMM or multiple sequence alignment against a protein sequence database.
References:

  • Finn R.D., Clements J., Eddy S.R. HMMER web server: interactive sequence similarity searching. (2011) Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.
  • Eddy S.R.(1998) Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63. Review.

Sequence formats:
Swiss-Prot:  The first line starts with 'ID' and then the name of the sequence, followed by an arbitrary number of lines, and then a line starting with 'SQ' followed by the sequence (on one or several lines), followed by a line starting with '//' which indicates the termination.
For example: 
ID   DB119_HUMAN             Reviewed;          84 AA.
AC   Q8N690; Q5GRG1; Q5JWP1; Q5TH42; Q8N689;
DT   06-DEC-2002, integrated into UniProtKB/Swiss-Prot.
DT   02-FEB-2004, sequence version 2.
DT   04-FEB-2015, entry version 95.
DE   RecName: Full=Beta-defensin 119;
DE   AltName: Full=Beta-defensin 120;
DE   AltName: Full=Beta-defensin 19;
..
..
..
SQ   SEQUENCE   84 AA;  9822 MW;  0C2828612A674AB1 CRC64;
MKLLYLFLAI LLAIEEPVIS GKRHILRCMG NSGICRASCK KNEQPYLYCR NCQSCCLQSY
MRISISGKEE NTDWSYEKQW PRLP
//

FASTA: FASTA format begins with a greater-than ('>') symbol followed by a single-line description. The sequence data starts from the next line. The description line is demarked from the sequence data by a greater-than ('>') symbol in the first line.
For example:
>sp|P80391|AMP1_MELGA Antimicrobial peptide THP1 OS=Meleagris gallopavo PE=1 SV=2
MRIVYLLFPFILLLAQGAAGSSLALGKREKCLRRNGFCAFLKCPTLSVISGTCSRFQVCCKTLLG

Stockholm Format: The Stockholm format starts with a line that contains the format and the version identifier, currently “# STOCKHOLM 1.0”. The sequence alignment is shown as the sequence name followed by the aligned sequence. Each sequence on a separate line followed by “//” to mark the end of the alignment. The Stockholm format also contains the mark-up lines which contains features like accession number, description, organism etc.
For example:
# STOCKHOLM 1.0
Sequence_1   --PGLGFY--
Sequence_2   ---RKKWFW-
Sequence_3   ----FRWWHR
Sequence_4   ----RRWWRF
//