Comprehensive database of Chorismate synthase enzyme from shikimate pathway in pathogenic bacteria

Background Infectious diseases are major public health problem. It is increasingly affecting more than 50 million people worldwide. Targeting shikimate pathway could be efficiently used for the development of broad spectrum antimicrobial compound against variety of infectious diseases. Chorismate synthase is an enzyme in shikimate pathway that catalyzes Phosphoenol pyruvate to chorismate in most of the prokaryotic bacteria. This step is crucial for its growth, since Chorismate acts as a precursor molecule for the synthesis of aromatic amino acids. Hence, we present a comprehensive database of Chorismate Synthase Database (CSDB) which is a manually curated database. It provides information on the sequence, structure and biological activity of chorismate synthase from shikimate pathway of pathogenic bacteria. Design of suitable inhibitors for this enzyme, hence could be a probable solution to destroy its proteomic machinery and thereby inhibit the bacterial growth. Description The aim of this study was to characterise chorismate synthase enzyme belonging to pathogenic bacteria to analysis the functional and structural characterization of chorismate synthase is very important for both structure-based and ligand based drug design. Conclusions The broad range of data easy to use user interface makes csdb.in a useful database for researchers in designing drugs.


Background
Biomolecules databases, in general contain gene function, structure and localization of cell and chromosome. This also includes clinical effects of mutations, sequence and structural properties of proteins, domains, motifs and their functional roles in a protein and pathway information [1]. Targeting the seven enzymes of shikimate pathway could be an effective target for the development of antimicrobial and herbicidal compounds as it is a crucial pathway for synthesis of aromatic amino acid in bacteria and plants but not in mammals [2][3][4][5][6][7]. Chorismate synthase (CS) catalyzes conversion of 5-enolpyruvylshikimate 3-phosphate (EPSP) to chorismate, is the final step of shikimate pathway [3,8]. It is also an essential precursor for the synthesis of p-aminobenzoic acid and folate [9].
Chorismate synthase also plays a remarkable role in the biosynthesis of nucleotides. The reaction of chorismate synthase is unique in nature, involves a 1, 4 elimination of phosphate and loss of proton of the C-6 hydrogen. The formation of two out of three necessary double bonds to build an aromatic amino acid is aided by CS and activity of this enzyme requires reduced FMN molecule which is not consumed during the reaction. In the elimination reaction the most accepted mechanism suggests a direct role of reduced FMN that transfers the electron to phosphate and the substrate donates an electron for the regeneration of FMN. Furthermore, the monofunctional form of chorismate synthase is found in plants and bacteria whereas in bifunctional it occurs in fungi [3].  Characterization of this pathway in bacteria was achieved largely by studying mutants lacking the individual enzyme activities. Shikimate pathway is essential in bacteria since enzymatic mutations in this pathway completely inhibit the growth in culture unless aromatic supplements are provided [10]. Studies of Barea and Giles showed that shikimate pathway in fungi play an essential role in synthesis of aromatic amino acid [11]. The genomic studies confirmed that this pathway could be efficiently used for the development of broad spectrum antimicrobial compound against variety of infectious diseases [10]. Earlier reports have shown that inhibition of one of the enzyme of shikimate pathway could efficiently treat the opportunistic pathogens such as Pneumocystis carinii, Mycobacterium tuberculosis, Cryptosporidium parvum and Toxoplasma gondii, which may simultaneously infect AIDS and other immune compromised patients [12].
A promising drug target for bacterial pathogenic diseases could be developed by blocking any enzymes of this pathway. Designing inhibitors for this reaction would greatly facilitate researchers to block multiple pathways essential for the survival of micro organism. The Chorismate Synthase Database provides data incorporating all the parameters required for the inhibition of chorismate synthase in 42 pathogenic bacterial species which is a potential drug target for blocking the shikimate pathway. A list of 48 inhibitors reported in Table 1 Table 1.

Construction and content
Data sources and curation The starting point for data curation in Chorismate Synthase Database is a manual curation of all publicly available sequence, structure and functional information for pathogens from UniProtKB [13,14]. Other database identifiers (e.g. NCBI taxonomy codes, Gene Ontology classifications, InterPro and Pfam accessions, super family, SCOP, prosite, KEGG, Pubchem Substance, etc.,) were also imported apart from the literature references, annotations of sequence and structure features. CSDB taxonomy is derived from the NCBI taxonomy database.
The data in CSDB is organized into 7 fields (Figure 1) such as protein resources, gene annotations, features, gene and nucleotide sequence, pathways, molecular target, taxonomical ID and literature references. The classification of pathogenic bacteria used in CSDB is similar to that of the already available pathogenic bacteria listed in "Classification of Pathogenic Bacteria" available at the weblink (http://www.buzzle.com/articles/pathogenicbacteria-list.html). Links are provided to access further information on the Pathogenic Bacteria, if present in external databases like Swiss-Prot, NCBI Taxonomy Browser, EMBL-EBI, Sanger institute, chemical database, PDB and Pubmed reference etc., An extensive literature survey was carried out using PUBMED and MEDLINE to extract information about human diseases caused by various bacterial pathogens. Critical features related to chorismate synthase for each bacterial strain such as gene sequence, gene id, protein sequence in fasta format, domain and motif information were retrieved from domain and motif databases. The structure related information were retrieved from PDB, CATH, and SCOP, kinetic data from literature, pathway information from KEGG, and its Gene Ontology information were retrieved from GO database. A database was constructed using these information by integrating them appropriately in a flat file format.
The features of this database can be categorized in to three broad areas: 1. Query interface: The query interface is a collection of all the pathogenic bacteria with their strain information available in literature and relates to the disease it causes to humans. 2. Feature enrichment: Feature enrichment category is sequence annotation from well curated databases, multiple sequence alignment in chorismate synthase of all strains and 3D structure determination using Modeller v.9.10 and its validation using GNR plot. 3. External references/links: This category includes pathogenic organism database, Genome databases, Database of protein-protein interactions, Systems Biology pathways, Drug bank and Structure prediction servers.
The molecular modeling in this work was performed by the MODELLER version 9.10. The MODELLER program was completely automated to calculate comparative models for a large number of protein sequences, by using many different template structures and sequencestructure alignments [15][16][17]. Sequence-structure matches are established by aligning SALIGN [18,19]. Sequence profile of the target sequence against each of the template sequences extracted from PDB [14] (Figure 2).

Database architecture
CSDB is built on Apache HTTP Server 2.2.11 with MySQL Server 5.1.36 as the back-end and PHP 5.3.0, HTML and JavaScript, CSS as the front-end. Apache, MySQL and PHP technology were preferred as they are open-source software's and platform independent. Besides these advantages, MySQL is the most popular open source SQL (Structured Query Language) database over the internet. MySQL (Figure 3) is a relational database management system that works much faster which also supports multiuser and multi-threading. It can work both on Windows and Linux. It comes with Triggers, Cursors and stored procedures to improve the productivity of developers.

Data access
Data stored in CSDB can be accessed in the following ways: (i) Search options in CSDB: CSDB can be queried to obtain pathogen information. In order to facilitate this, simple search options or manual browse option have been provided in the 'Search' section.
Select pathogenic bacteria: the user can select pathogenic bacteria to obtain related information on bacteria. (Figure 4) illustrates the result of organism-based search).

External links
External database links are provided in the web portal by using hyperlinks to other useful bioinformatics resources such as genome database, protein-protein interactions databases, system biology pathways, pathogenic organism databases, microarray databases, structure prediction server and GENE CARDS.

Feedback
Users can submit their suggestions/comments/queries using this feature.

Help
A detailed description on the use of the various features incorporated in CSDB is provided in this section for the benefit of users.

Future work
The resource will be updated constantly with further enhanced features. We also intend to add some bioinformatics tools on structural and sequence analysis in future versions. We would also like to extend this database for other pathogens.

Conclusions
The CSDB provides manually curated information on analysis of chorismate synthase in 42 pathogenic bacterial species. This database provides information useful for designing a drug in both ligand as well as structure based methods. For structure based drug design, information on the protein's motif and Interpro's/PFAM domain categorization are been added and 48 inhibitors with IC50/Ki values are made available for designing inhibitors using Ligand based drug design strategies [ Table 1]. In addition to this, this database also contains information about the protein's superfamily, SCOP IDs, GO IDs, active site residues pathway information using KEGG, taxonomy, and structural models using modeler 9.10. This facilitates their usage in drug design for researchers. This database is freely available at the website http://www.csdb.in.

Availability and requirements
CSDB is freely available at http://www.csdb.in.

Download
CSDB database contents can be downloaded easily from the 'Download Database' section. Users can obtain the entire collection of ID's at SQL format with a single mouse click.