Help

Prediction algorithms for antimicrobial peptides are incorporated in the database. These are based on Support Vector Machines (SVM), Random Forests (RF) Artificial Neural Network (ANN) and Discriminant Analysis (DA). User can select the algorithm required for prediction.
Peptide sequence/s in FASTA format can be pasted or uploaded for prediction. The results for RF, ANN, SVM and DA are explained below:
AMP: The sequence is predicted to be antimicrobial.
NAMP: The sequence is predicted to be not antimicrobial.
RF, SVM and ANN give a probability score (0 to 1) for the prediction. Higher the probability, greater is the possibility of the peptide being antimicrobial.
The prediction algorithm provides three options to the users:

SEARCH

Simple search in CAMPR3 allows users to search based on keywords like "brevinin" or string searches like "human defensin". Users can restrict the search to a particular field descriptor. Searches using Boolean operators are possible using the ‘Advanced search’ option. All searches are case insensitive. A complete list of the field descriptors and their description is given below:

DESCRIPTORS	DESCRIPTION& USE IN CAMPR3
SEQUENCE	Protein sequences represented as single letter amino acids. E.g. GLWS
SEQUENCE LENGTH	The length of antimicrobial peptides represented in a numerical manner. E.g. 29
SOURCE ORGANISM	Scientific name of the source organism of the antimicrobial peptide. E.g. Phyllomedusaoreades
ACTIVITY	E.g. antibacterial, antifungal, antiviral, antimicrobial, anticancerous
TARGET ORGANISM	E.g. E.coli
PUBMED ID	E.g. 12379643
GI	GenInfo Identifier of NCBI. E.g. 41016983
PROTEIN NAME	E.g. Dermaseptin-01
UNIPROT ID	E.g. P83637
PDB ID	E.g. 2JQ0
AMP FAMILY	E.g. Dermaseptin
MIC	E.g. MIC=30

SECONDARY STRUCTURE

Secondary structure	Criteria
Helical	Helical residues more than 80%
Strand	Beta residues more than 80%
Coil	Turn + bend residues more than 80%
Majorly Helical	( Helical residues > 60% and beta residues < 5% ) or ( helical residues > 50% and beta residues < 10% )
Majorly Strand	Beta residues > 30% and helical residue < 5%
Majorly Coil	Turn + bend residues > 50% and helical residues < 50% and beta residues < 30%
Mixed	Helical residues < 50% and beta residues < 30% and turn+bend residues < 50%

Users can browse through the different AMP families. The page contains a table providing information about the AMP family and signatures captured using patterns or HMMs.

Description of Family: This information has been obtained from Pfam, InterPro and/or published literature.

Signature IDs: First four letters represent the CAMP database, followed by a three letter abbreviation of the family name, followed by H or P either for HMM or Pattern, respectively. If the pattern/HMM is created for a family using sequences with specific length, then this integer is suffixed at the end.
For example:
CAMPCecH is a HMM Id for cecropins, CAMPCecP35 is a pattern ID derived from cecropins which are 35 residues long.

BLAST
BLAST in CAMPR3 provides option for selection of databases of interest such as the entire database, sequence, structure, patent, experimentally validated, predicted and predicted based on signature datasets.
References

VAST
VAST is an algorithm used for the identification of similar protein 3-dimensional structures based on geometric criteria and also for the identification of distant homologs. The similar 3D structures identified by VAST are referred to as “structure neighbours”. Users can input PDB or MMDB ID of their interest.
References

Clustal Omega
Clustal Omega tool can be used for multiple sequence alignment. It uses seeded guide trees and HMM profile-profile techniques to generate progressive alignment of three or more biological sequences. Users can paste their sequence/s or browse a text file with sequence/s in the fasta format.
References

Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.
Goujon M., McWilliam H., Li W., Valentin F., Squizzato S., Paern J., Lopez R. (2010) A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W695-9. doi: 10.1093/nar/gkq313. Epub 2010 May 3.
McWilliam H., Li W., Uludag M., Squizzato S., Park Y.M., Buso N., Cowley A.P., Lopez R.(2013) Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W597-600. doi: 10.1093/nar/gkt376. Epub 2013 May 13.

PRATT
Pratt tool is used to search patterns conserved in a set of protein sequences. Users can either input their sequences in the FASTA format or Swiss-Prot format. Multiple sequence alignment of the sequences in the FASTA format can also be used as an input. Users can provide how many sequences should match a pattern to be reported.
References:

ScanProsite
ScanProsite tool can be used to scan protein sequences against the PROSITE collection of motifs or scan user-defined motifs against protein sequence/s.
References:

PHI-BLAST
Pattern Hit Initiated BLAST uses regular expression pattern for searching against protein sequence database. It can find sequences that contain the pattern and are homologous to the query protein sequence. Users have to provide a query protein sequence as well as the pattern associated with the sequence.
References:

HMMER
jackhmmer: The tool allows users to iteratively scan a sequence, HMM or multiple sequence alignment against a protein sequence database.
References:

Sequence formats:
Swiss-Prot: The first line starts with 'ID' and then the name of the sequence, followed by an arbitrary number of lines, and then a line starting with 'SQ' followed by the sequence (on one or several lines), followed by a line starting with '//' which indicates the termination.
For example:
ID   DB119_HUMAN             Reviewed;          84 AA.
AC   Q8N690; Q5GRG1; Q5JWP1; Q5TH42; Q8N689;
DT   06-DEC-2002, integrated into UniProtKB/Swiss-Prot.
DT   02-FEB-2004, sequence version 2.
DT   04-FEB-2015, entry version 95.
DE   RecName: Full=Beta-defensin 119;
DE   AltName: Full=Beta-defensin 120;
DE   AltName: Full=Beta-defensin 19;
..
..
..
SQ SEQUENCE   84 AA; 9822 MW; 0C2828612A674AB1 CRC64;
MKLLYLFLAI LLAIEEPVIS GKRHILRCMG NSGICRASCK KNEQPYLYCR NCQSCCLQSY
MRISISGKEE NTDWSYEKQW PRLP
//

FASTA: FASTA format begins with a greater-than ('>') symbol followed by a single-line description. The sequence data starts from the next line. The description line is demarked from the sequence data by a greater-than ('>') symbol in the first line.
For example:
>sp|P80391|AMP1_MELGA Antimicrobial peptide THP1 OS=Meleagris gallopavo PE=1 SV=2
MRIVYLLFPFILLLAQGAAGSSLALGKREKCLRRNGFCAFLKCPTLSVISGTCSRFQVCCKTLLG

Stockholm Format: The Stockholm format starts with a line that contains the format and the version identifier, currently “# STOCKHOLM 1.0”. The sequence alignment is shown as the sequence name followed by the aligned sequence. Each sequence on a separate line followed by “//” to mark the end of the alignment. The Stockholm format also contains the mark-up lines which contains features like accession number, description, organism etc.
For example:
# STOCKHOLM 1.0
Sequence_1   --PGLGFY--
Sequence_2   ---RKKWFW-
Sequence_3   ----FRWWHR
Sequence_4   ----RRWWRF
//