Prospects of Indole derivatives as methyl transfer inhibitors: antimicrobial resistance managers

Background It is prudent that novel classes of antibiotics be urgently developed to manage the WHO prioritized multi-drug resistant (MDR) pathogens posing an unprecedented medical crisis. Simultaneously, multiple essential proteins have to be targeted to prevent easy resistance development. Methods An integration of structure-based virtual screening and ligand-based virtual screening was employed to explore the antimicrobial properties of indole derivatives from a compound database. Results Whole-genome sequences of the target pathogens were aligned exploiting DNA alignment potential of MAUVE to identify putative common lead target proteins. S-adenosyl methionine (SAM) biosynthesizing MetK was taken as the lead target and various literature searches revealed that SAM is a critical metabolite. Furthermore, SAM utilizing CobA involved in the B12 biosynthesis pathway, Dam in the regulation of replication and protein expression, and TrmD in methylation of tRNA were also taken as drug targets. The ligand library of 715 indole derivatives chosen based on kinase inhibition potential of indoles was created from which 102 were pursued based on ADME/T scores. Among these, 5 potential inhibitors of MetK in N. gonorrhoeae were further expanded to molecular docking studies in MetK proteins of all nine pathogens among which 3 derivatives exhibited inhibition potential. These 3 upon docking in other SAM utilizing enzymes, CobA, Dam, and TrmD gave 2 potential compounds with multiple targets. Further, docking with human MetK homolog also showed probable inhibitory effects however SAM requirements can be replenished from external sources since SAM transporters are present in humans. Conclusions We believe these molecules 3-[(4-hydroxyphenyl)methyl]-6-(1H-indol-3-ylmethyl)piperazine-2,5-dione (ZINC04899565) and 1-[(3S)-3-[5-(1H-indol-3-ylmethyl)-1,3,4-oxadiazol-2-yl]pyrrolidin-1-yl]ethanone (ZINC49171024) could be a starting point to help develop broad-spectrum antibiotics against infections caused by N. gonorrhoeae, A. baumannii, C. coli, K. pneumoniae, E. faecium, H. pylori, P. aeruginosa, S. aureus and S. typhi.


Background
The finding and development of antibiotics was a milestone in medical sciences that prevented fatality from simple infections. Unfortunately, the emergence of antibiotic-resistant strains among these pathogens appears to be inevitable as selective pressure for survival [1]. The most alarming is the prevalence of resistance even in the last resort antibiotic colistin [2] that has added a serious challenge to the current antibiotic crisis.
The cost of developing a prescription drug estimated by The Tufts Center for the Study of Drug Development as published in the Journal of Health Economics in March 2016, is a massive 2.558 billion dollars [3]. Huge research costs and numerous failures in various stages of drug development have lowered the interests of commercial pharmaceutical companies in drug discovery research. The rapid increase in drug resistance among pathogens and the excessive time and cost parameters required to develop a drug demand a robust and faster method of drug discovery. This is where computational strategies come into play, efficiently assisting drug discovery and development with the available in vitro techniques [4].
Computer-aided drug design (CADD) approaches were applied in this study to find the probable drug targets and discover potential lead candidates against these. SAM is a critical metabolite involved in several biochemical reactions in bacteria. MetK, a SAM producer and various SAM utilizers including DNA adenosine methylase (Dam), Uroporphyrinogen-III methyltransferase (CobA), and tRNA (guanine-N (1)-)-methyltransferase (TrmD) were taken as drug targets in this study. Dam is responsible for DNA replication and mRNA transcription which methylates adenine in DNA of bacteria in contrary to human cytosine. TrmD is responsible for proper reading of codons that prevents + 1 frameshift reading error thus involved in proper peptide elongation. CobA is responsible for corrin ring contraction in vitamin B12 synthesis, an important cofactor for membrane biosynthesis. Thus, all the genes/proteins involved from DNA replication to peptide elongation, and even membrane biosynthesis were targeted to discover new lead candidates, simultaneously preventing easy resistance buildup in these targets.

Selection of MDR strains and obtaining their genomic sequences
Nine prioritized pathogens by WHO [5] as 'critical' and 'high' against whom new antimicrobials are sought were taken as the reference organisms. The whole-genome sequences of these organisms published in NCBI were taken for whole-genome alignment. The genomic sequences of the 9 selected pathogens were downloaded from NCBI FTP site in the annotated gbk format.
ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Bacteria/ Multiple sequence alignment (MSA) MSA was performed using the progressive Mauve algorithm in MAUVE, a multiple sequence alignment software. The genomic regions common to all the aligned sequences were searched for, in MAUVE via visual observation of locally collinear blocks (LCBs) denoted by certain color codes. LCBs represent homologous regions of sequence shared by two or more of the genomes under study without rearrangement [6].

Gene essentiality analysis
The common genes obtained from MAUVE alignment were looked for their essentiality in DEG and OGEE, databases of essential genes.

Obtaining the dockable crystal structures of the target proteins
The X-ray diffraction structures of S-adenosyl methionine synthase, MetK from N. gonorrhoeae (PDB id: 5T8S) [13]; cobA from P. aeruginosa (PDB id: 2YBQ) [14] and that of TrmD from P. aeruginosa (PDB id: 5WYQ) [15] were obtained from Protein Data Bank. For those whose crystal structures were not available in RCSB PDB, homology modeling tools including Phyre2, RaptorX, ps2v2, Swiss-model, and CPHmodel were used to predict their tertiary structures and the best structures were selected based upon the completeness and Zscores of the predicted structures using Prosa-server.

Preparation of ligand database
In the present work, both ligand-based (LBVS) and structure-based virtual screening (SBVS) was performed. LBVS was done because similar compounds exhibit similar Physico-chemical and biological properties so a broad chemical database with structural diversity would offer an ideal solution for effective lead discovery. In this study, a ligand database containing 715 indole derivatives including marine indoles [16] was prepared from ZINC database [17].

Protein and ligand preparation
SBVS was performed based on the common gene in all nine pathogens, MetK, and the metabolite that it produces, SAM which is further utilized in methylation reactions. Prior to molecular docking, the proteins and ligands were prepared for efficient and more accurate docking results. Protein preparation was done by deleting water, adding hydrogen atoms, merging non-polar bonds, and computing Gasteiger charges in AutoDock-Tools (http://mgltools.scripps.edu/). Similarly, ligand preparation was done in Openbabel GUI [18] available in PyRx interface by adding hydrogens, energy minimization and converted to pdbqt file format, a useable file format for docking afterward.

Setting reference values for docking
The native ligands were removed from each of the target proteins in Discovery Studio Visualizer 2017 and docked back in their binding sites, a process called re-docking. The binding energy thus calculated was taken as a reference value for identifying potential leads based on their binding energy in the respective binding pockets of the target proteins.

Binding sites prediction
The ligand-binding sites in the target proteins were identified from the RCSB protein data bank. For the homology modeled 3D-structures, 3DLigandSite, a web server that superimposes the ligands bound to the structures similar to the query and thus predicts the binding site [19], was used to predict the ligandbinding sites. All the amino acid residues around the binding site (Supplementary Table 2) are marked to create a site for molecular docking.

Molecular docking, rescoring and clustering analysis of docked poses
Docking was carried out using AutoDockVina in a virtual screening software, PyRx against the target proteins with the selected ligand database. The conformation with the lowest docked energy was chosen after the docking interactions since, the higher the negative binding energy value, the stronger is the binding of the ligand in the target [20].
The rescoring of docked poses was done by using the python implementation of NNScore 1.0 [21] in combination with a consensus of the top 24 scoring networks.
AuPosSOM (Automatic analysis of Poses using SOM) [22] was used for the clustering analysis and to differentiate active compounds from inactive ones. AuPosSOM default parameters were used. The tree was visualized using PhyloWidget [23].

Protein-ligand interaction visualization
The 2D and 3D protein-ligand interaction for the lead compound was observed and analyzed using Discovery Studio Visualizer 2017 and ligplot+.

ADME/Tox screening
The toxic profiles and drug-likeness based on the binding energies were predicted using the OSIRIS program [24]. OSIRIS calculates various drug relevant properties like molecular weight, cLogP, cLogS, Druglikeness, and toxicities like mutagenicity, tumorigenicity, reproductive effects and irritant effects in the lead molecules based on functional groups present in their structures [25].

Drug target identification
The MAUVE result showed MetK as one of the probable therapeutic targets and was taken as a reference on our search for other therapeutic targets ( Fig. 1). Manual curing of the alignment revealed twenty-four common genes found in most of the strains with diverse roles. Most of the genes (Supplementary Table 1) were ribosomal proteins (14 proteins) and some were involved in ATP synthesis (3 proteins), DNA directed RNA polymerase (2 proteins), chaperones (2 proteins), elongation factor (1 protein), protein translocator (1 protein), involved in thiol assimilation (1 protein) which were not pursued further due to lack of required computational resources to work on these.

Amino acid alignment
The active binding sites in MetK were found same (conserved) in all the pathogens under study upon amino acid

Gene essentiality
Search in DEG and OGEE for gene essentiality of the genes under study in the target organisms revealed metK as essential in H. pylori, P. aeruginosa, S. aureus and S. typhi. Similarly, dam was found to be essential in S. enterica subsp. enterica serovar Typhimurium; cobA was reported as non-essential in P. aeruginosa; and trmD was essential in P. aeruginosa, S. aureus, S. typhi whereas non-essential in H. pylori and Acinetobacter sp. However, these databases do not make use of the interrelation between the genes to record gene essentiality. So the genes mentioned as non-essential here could still be essential when the activity of one is inhibited. Since our works were concerned with multiple targets simultaneously and all these genes under study are SAM utilizers, they were taken for further study despite being nonessential in some instances.

Ligand database preparation
The close structural proximity of the indole ring to the adenosyl moiety of SAM (Fig. 3) pushes indole derivatives to be probable candidates against SAM binding pocket of MetK. Thus, indole derivatives were presumed as the potential ligand sources for virtual screening. A total of 715 indole derivatives were taken from ZINC database among which only 102 showed the drug-like properties based on ADMET parameters (Fig. 4) and used for molecular docking studies.

Molecular docking results
One hundred two ligands that passed ADME/T tests were subjected to molecular docking against the MetK protein of N. gonorrhoeae (PDB id: 5T8S). Fifty three exhibited higher binding energy in the SAM binding pocket of MetK than its native ligand SAM (Supplementary Table 3 The common aminoacid residues involved for these three top ligands were Lys274, Gly120, Ile103, Ile307, Phe235, Lys168, Ser191, Asp121, and Asp243 which could have contributed in stronger binding affinities in the binding pocket of MetK (Fig. 5).
NNScore, a neural network based scoring function was then used to re-rank the small-molecule ligands which resulted in 12 positive hits (good binders) as potential inhibitors of MetK in N. gonorrhoeae (Supplementary Table 3). ZINC04899565 was still among the top binders.
We further used a contact activity relationship (CAR) analysis to overcome the limitations of the scoring functions used for docking. Aupossom analyses all the docking poses in multiple conformations given by the docking algorithm and discriminates active and inactive compounds using only mean protein contacts' footprints calculation. The 12 ligands along with SAM were clustered into 10 different groups with varied scores. The score is determined by the combination between contact specificity and contact intensity of the ligands with various atoms of the protein molecule. The plot (Fig. 6) shows the ligands in leaves 0, 3, 4 and 5 as the most active ones. ZINC01494627 from cluster 0, ZINC14824027 and ZINC49171024 from cluster 3, ZINC04899565 from cluster 4 and ZINC15219763 from cluster 5 can thus be concluded as the potential MetK inhibitors of N. gonorrhoeae.
CAR results showed that these 12 potential leads were distributed in 10 different clusters (0 to 9) with different protein-contact footprints represented in the clustering tree (Fig. 7). The different clusters indicate the differences in the interacting residues with the protein. Cluster 0 contains 1 compound along with SAM (native ligand) which interacts predominantly with A41, E56, Z99, K274, D166 and I237 residues. Cluster 5 contains 1 compound that interacts predominantly with P16, D166, I237 and G242 residues and so on. The similar binding residues of the ligands  with that of SAM could have contributed to their inhibition potential of MetK. CAR analysis can thus be used to cluster the compounds as per the binding residues of the protein.
Those 5 screened ligands on further docking against the SAM binding pocket of MetK of all other pathogens under study resulted in 3 potential leads (Table 1). These 3 lead molecules were further docked against all other protein targets CobA, Dam and TrmD (Tables 2, 3, 4) to assess their inhibition potential in multiple targets resulting 2 potential lead candidates (Table 5).

Cross-reactivity with human homologs
Questions on cross-reactivity with human homologs of S-adenosylmethionine synthase (Uniprot id: Q00266) could be raised as it has more than 50% structural similarity with that of bacteria (Table 6). Unfortunately, both leads were potential inhibitors of its human homolog as well (Supplementary Table 4).
Nevertheless, MetK inhibitors of humans could still be used as antibacterial therapeutics because of the presence of SAM transporters in humans [26] and the SAM requirements can be replenished from external sources. The lack of crystal structures of SAM transporters in humans constrained the molecular docking studies of possible inhibition of the transport system.
Human homologs for other target proteins were not considered because of their dissimilarity with humans.

Drug target identification
Lead molecules with multiple target proteins in a single pathogen and with a common target in multiple pathogens are highly sought [27] to develop a broad-spectrum drug. This strategy has been designed for preventing easy resistance development against these new drugs. Developing resistance in multiple targets at once could be evolutionarily challenging for any pathogens and probably impossible for the bacteria to survive against such developed drugs. Hence, genome-level sequence alignment of major pathogens could give common new lead target proteins for the screening of lead inhibitor molecules of these proteins, MetK being the common target in this study (Fig. 1). From this, the respective metabolic pathway or other target proteins could be identified based on protein-protein interactions.
MetK codes the formation of SAM from ATP and methionine as substrates. SAM is utilized by three major metabolic pathways: transmethylation, transsulfuration, and polyamine synthesis making SAM an important molecule in normal cell functioning and survival [28]. In addition, SAM is a primary methyl donor in multiple reactions including corrin ring methylation [29], RNA methylation [30], and DNA methylation [31] thus, these steps were also taken as lead targets.
Quorum sensing is one of the major causes of resistance in pathogens that utilize autoinducers which inturn utilize SAM as a substrate [32]. Also, it controls biofilm formation and virulence in bacteria [33]. The literature search further verified the essentiality of MetK in many different pathogens [34,35]. Also, the lack of reports about SAM transporters in any of the mentioned target pathogens makes this a better target. So, including MetK, other SAM utilizing proteins, namely CobA, Dam, and TrmD were taken as potential drug targets against which virtual screening of compounds was done.

Indoles as potential drugs
The adenosyl moiety of SAM and ATP binding domain present in kinases [36] probably suggests kinase inhibitors as potential inhibitors of these proteins that biosynthesize or utilize SAM. Protein kinase inhibitors represent an important class of targeted therapeutic agents, particularly as anticancer drugs [37]. Several indoles have been reported to possess kinase inhibition potentials [38][39][40][41]. Also, Indole is reported to be   [42] and the source of indole could be from tryptophan metabolism by gut microflora. This indicates that indole could be easily transported in humans through gut suggesting indoles are metabolized in humans thus indicating these could not pose toxicity [43]. In addition, indole has been suggested to be pharmaceutical scaffolds for drug development [44]. Also, its metabolized derivative Indirubin has been reported to be kinase inhibitor [45].

ADMET screening
The primary reason for lead molecules not being able to pass the clinical trials is their inability to reach the target and perform its predicted function, and also the toxicity issues [46]. Thus, ADMET and pharmacokinetic properties evaluation in the early stages of drug discovery seem to be a wiser choice. The toxic profiles and druglikeness were predicted by OSIRIS using various parameters. Various physicochemical and drug relevant properties such as Molecular weight, cLogP, cLogS, number of hydrogen bond donors and acceptors, topological polar surface area, number of rotatable bonds and druglikeness were analyzed for each of the lead molecules. The parameters (Fig. 4) were set based on Lipinski's rule of five which predicts whether a chemical compound has chemical and physical properties that would make it likely to be an orally active drug in humans. The rule states that most orally active drugs will have molecular weight ≤ 500, logP ≤5, hydrogen bond donors ≤5, and hydrogen bond acceptors ≤10. Also, the aqueous solubility of a compound measured as logS significantly affects its absorption and distribution characteristics. Low solubility usually goes along with a bad absorption so poorly soluble compounds should be avoided. The number of rotatable bonds determines the flexibility of compounds and can predict oral bioavailability. It has been reported that 10 or fewer rotatable bonds in a molecule indicated good oral bioavailability [47]. Polar surface area (PSA) of a molecule can predict membrane permeability including crossing of the blood-brain barrier. Most known drugs have PSA values less than 120 Å [48].

Lead candidates
ADMET evaluation of these two leads (Fig. 8) was done to access their possibility to be drug candidates. Both met the parameters for drug-likeness including Lipinski's rule of five (Table 5). Apart from indole, ZINC04899565 has a benzene and a 2,5-diketopiperazine ring. Out of all the naturally occurring peptide antibiotics, the 2,5-diketopiperazine rings are among the most numerous. Cycloserine diketopiperazine active against Mycobacterium tuberculosis, bicyclomycin active against gram-negative bacteria, avrainvillamide which contains a 3-alkylidene-3H-indole-1-oxide function active even against multi drugresistant bacteria are some instances [49]. Moreover, 2, 5-diketopiperazines have also been reported to inhibit quorum sensing in certain gram-negative bacteria thus   blocking cell-to-cell communication and restraining the virulence as well [50]. Structurally, the 2,5-diketopiperazines are the smallest possible cyclic peptides, which are peptidoimmetic in nature resembling a constrained protein beta turn. They have two cis-amide bonds thus possessing 2H-bond acceptors and donors each. Although they contain conformationally constrained heterocyclic scaffolds, they are flexible since the six-membered ring can exist either in a flat or a slight puckered boat conformation. Moreover, these are stable to proteolysis [49]. All these features support them bind to a wide range of enzymes and receptors, and their good bioavailability and resistance to enzymatic degradation make them excellent drug candidates. Thus, ZINC04899565 has the potential to be a broad-spectrum antimicrobial based on this study and could be pursued further. ZINC49171024 is an indole with a pyrrolidine and a benzimidazole ring. Pyrrolidine moiety containing compounds have been reported as antimicrobials and fungals [51]. The strong lipophilic properties of benzimidazoles contribute in producing antimicrobial effects [52]. All these features indicate the high probability of these two compounds to be developed as broad-spectrum antimicrobials.

Conclusion
CADD approaches were used in this study to discover potential methyltransferase inhibitory activities of indole derivatives. Multiple protein targets were subjected to molecular docking that resulted ZINC04899565 and ZINC49171024 as probable therapeutic drug candidates with multi-target potential and probable antimicrobial resistance managers since these multiple targets would be troublesome for the pathogens to easily evolve resistance as these are really critical in its survival and mutating these targets could be more lethal than survival.