Computational analysis of protein-protein interfaces involving an alpha helix: insights for terphenyl–like molecules binding

Background Protein-Protein Interactions (PPIs) are key for many cellular processes. The characterization of PPI interfaces and the prediction of putative ligand binding sites and hot spot residues are essential to design efficient small-molecule modulators of PPI. Terphenyl and its derivatives are small organic molecules known to mimic one face of protein-binding alpha-helical peptides. In this work we focus on several PPIs mediated by alpha-helical peptides. Method We performed computational sequence- and structure-based analyses in order to evaluate several key physicochemical and surface properties of proteins known to interact with alpha-helical peptides and/or terphenyl and its derivatives. Results Sequence-based analysis revealed low sequence identity between some of the analyzed proteins binding alpha-helical peptides. Structure-based analysis was performed to calculate the volume, the fractal dimension roughness and the hydrophobicity of the binding regions. Besides the overall hydrophobic character of the binding pockets, some specificities were detected. We showed that the hydrophobicity is not uniformly distributed in different alpha-helix binding pockets that can help to identify key hydrophobic hot spots. Conclusions The presence of hydrophobic cavities at the protein surface with a more complex shape than the entire protein surface seems to be an important property related to the ability of proteins to bind alpha-helical peptides and low molecular weight mimetics. Characterization of similarities and specificities of PPI binding sites can be helpful for further development of small molecules targeting alpha-helix binding proteins.


Background
Protein-Protein Interactions (PPIs) are key to many cellular processes. Abnormal PPIs contribute to many disease states and as such, PPIs represent today a new class of drug targets essentially unexploited for drug discovery. Indeed, the size of the human interactome has been estimated to be between 300,000 [1] and 650,000 interactions [2]. In the last decade many studies have been performed in order to target PPIs [3]. Several small-molecule inhibitors of PPIs have been demonstrated therapeutic potential [4][5][6][7][8]. However, efficient targeting of PPIs is still being considered as an important challenge [3,9,10]. In contrast to enzyme-substrate interactions, protein-protein recognition often occurs through flat surfaces or wide shallow grooves. Recent structural analyses of PPI interfaces and small molecules disrupting PPIs suggested that such ligands might mimic the structural characteristics of the protein partner [6,11]. To facilitate the discovery of new PPI small-molecule inhibitors, the characterization of PPI interfaces [12,13] and the prediction of putative ligand binding sites are essential. Physicochemical properties of both ligand and protein are key to mediate the binding [14], such as cavity sizes, shape complementarity, electrostatic potential and hydrophobicity [12,15].
The role of alpha-helical peptides in mediating many PPIs is well demonstrated and development of small organic molecules mimicking such peptides becomes important [16]. Recent studies have been carried out on the whole Protein Data Bank (PDB) in order to establish a druggability profile of alpha-helix mediated PPIs and to predict which of them could bind a small molecule [17]. More specifically, terphenyl and its derivates are small organic molecules [18][19][20][21][22][23][24][25][26] mimicking one face of an alpha-helical peptide, i.e. the side chains of three key residues occupying positions i, i+3 and i+7 [25,26] or i, i+4 and i+7 [20] of the bound helix. It has been suggested that terphenyl compounds can serve as pharmacological probes because they are membrane permeable [22]. Terphenyl 1 and 2, which mimic the calmodulin binding face of smooth muscle myosin light chain kinase (smMLCK), have been shown to inhibit the interactions of calmodulin (CaM) with the enzyme 3'-5'-cyclic nucleotide phosphodiesterase (PDE) and with the helical peptide C20W of the plasma membrane calcium pumps [18]. Following the similarity between the calmodulin and human centrin 2 (HsCen2) alpha-helix binding sites, we recently suggested that terphenyl 2 might also inhibit the interaction between HsCen2 and a 17 residues peptide of Xeroderma Pigmentosum Group C (XPC) protein [27]. Terphenyl derivates mimicking the alpha-helical structure of p53 N-terminal peptide inhibit the p53-MDM2 [22] and the p53-HDM2 interactions [21]. These molecules also mimic the alpha-helical region of Bak BH3 domain, which binds BCL-X 2, thus disrupting the BCL-X 2 /Bak interaction [19,20,24].
In this work we performed a computational analysis in order to evaluate several key physicochemical and surface properties of proteins known to interact with alphahelical peptides or to bind terphenyl and its derivatives. We calculated the binding pocket volumes and the fractal dimensions of the surface cavities for the entire protein and for the binding pockets. We identified several similarities and specificities characterizing such protein binding sites that can be helpful for future development of more efficient small-molecule inhibitors targeting alpha-helix binding proteins.

Methods
In this study we compared the sequence and surface properties of the investigated proteins. In order to analyze the sequence similarities we performed sequence alignment using the CLUSTALW software [28]. Interacting residues at the protein-protein interface in terms of contact distances were found using the ContPro online freely available tool [29]. We identified the protein residues interacting with the three key residues of the alpha-helical peptide (occupying positions i, i+3 and i+7 or i, i+4 and i+7) those relative positions are mimicked by terphenyl and its derivatives. The distance threshold was set to 5 Å for the side chain atoms. In order to evaluate the protein surface properties, the bound peptide was removed for each complex. The surface characteristics of the entire protein and those of the peptide-binding cavity were analyzed. Using the approach of the fractal geometry we quantitatively described the surface roughness for the entire protein and for the binding cavity, expressed by global surface fractal dimension (D S ) and local surface fractal dimension (D L ), respectively. In order to calculate the surface fractal dimension we used the method proposed by Lewis and Rees [30] based on the scaling law between the surface area (SA) and the radius of the rolling probe molecule (R) on the surface, i.e. SA is proportional to the radius to the power 2-Ds: The surface fractal dimension was determined from the slope of the double logarithmical plot of SA versus R. The surface area of the protein was computed using the on-line available software GETAREA [31]. Probe radii of 1, 1.2, 1.4, 1.6, 1.8 and 2 Å were used. For the Figure 1 Sequence alignment of alpha-helix binding proteins. The amino acid residues interacting with alpha-helical peptides are presented in red. The binding area was defined here as all residues of the protein interacting with the helical peptide. proteins cavities, the same algorithm was employed using the CASTp software [32]. Hydrophobicity and local hydrophobic density for binding pockets were determined using Fpocket [33]. Pocket volumes were computed using CASTp [32]. Molecular docking of terphenyl 2 was performed into the alpha-helical binding sites of calmodulin (code entry 2O5G) and troponin C (code entry 1A2X) using AutoDock 4.2 [34]. The input files preparation and docking analysis were carried out using AutoDockTools. Grid maps were centered in the alpha-helix binding site for both structures. Grids sizes were 126 Å x 126 Å x 126 Å with a grid spacing of 0.33 Å for calmodulin and 126 Å x 126 Å x 126 Å with a grid spacing of 0.28 Å for troponin C. Ligand conformational searching was performed using Lamarckian genetic algorithm and all ligand torsion angles were flexible. The following docking parameters were used: 250 Lamarckian genetic algorithm runs, a population size of 250, a maximum of 2 500 000 energy evaluations and a maximum of 27000 generations.

Results and discussions
Sequence-based analysis We analyze several proteins interacting with alphahelical peptides, some of them being known to bind also terphenyl and/or its derivatives. To characterize and compare their surface properties we examine the sequences and the three dimensional (3D) structures of the complexes formed by the protein and the bound peptide. The 3D structures are retrieved from the PDB [37], the entry codes being presented in Table 1. Most of the structures are crystallographic. Two NMR structures are also used: the C-terminal domain of human centrin 2 in complex with the repeat sequence of human Sfi 1 and the human BCL-XL in complex with the BAK peptide.
Multiple sequences alignment (Figure 1) shows low sequence identity for the most of the analyzed proteins (shown in Table 2) both for the entire sequences and for the binding areas. The binding areas included all residues of the protein interacting with the alpha-helical peptide. Chicken, human, E. coli and rat calmodulin have very similar sequences (rat, chicken and human calmodulin are 100% identical; E coli has 98% identity with the others). For BCL-XL and human ubiquitin carboxyl-terminal hydrolase MDM2 only those fragments of sequences that are present in the 3D structures are considered. There is a high similarity only between the calmodulin, centrin 2 and troponin C sequences.  hydrophobic residues of the bound peptide anchor into the protein binding pocket. Following the sequence similarities we superimposed the alpha-helix binding regions structures of calmodulin, human centrin 2, scherffelia dubia centrin and rabbit troponin C (Figure 3a). Strong structural homology for binding regions is seen following the sequence similarity of these proteins. Figure 3b and 3c illustrate the binding pockets of BCL-XL and human E3 ubiquitin-protein ligase MDM2, respectively.

Structure-based analysis
The interacting residues of the proteins and bound peptides, identified with ContPro [29], are shown in Figures 1 and 4 and Table 1. The results reveal that usually hydrophobic residues such as TRP, LEU, ILE, PHE, VAL, MET are involved in the interactions. The presence of hydrophobic residues suggests a favorable interaction with terphenyl-like molecules anchoring in the hydrophobic cavities. Most of the residues involved in the interactions between the proteins and alpha-helices are hydrophobic for both partners, as also observed in other studies [38]. We notice several key residues involved in the interaction of the same protein with different peptide partners. For example, in the case of calmodulin, PHE92, MET124, PHE141, MET144 and MET145 are involved in most of the peptides' interactions. These residues can thus be considered as key for the interaction with terphenyl and its derivatives, or other alpha-helix mimetics. We noticed the presence of MET residues in most of the alpha-helix binding pockets analyzed here. In a recent study, MET residues have not been identified to be a part of hot spot amino acids, in particular in alpha-helix mediated protein interfaces [39]. However, our analysis clearly indicates their presence in positions that are key for the interaction with the alpha-helical partner. Furthermore, Ma and Nussinov [40] have also concluded that the amino acids TRP, MET, and PHE are important for proteinprotein interactions. They showed that TRP/MET/ PHE residues play roles in the dimerization of the transcriptase (p51/p66) and in cell-fusion processes, including the gp120-CD4 interaction and the gp41 six-helix bundle formation. They suggested that polarizability of MET allows it to assume roles of both hydrophobic and hydrophilic residues [40]. Further, its larger flexibility compared to other hydrophobic residues may facilitate the plasticity of hydrophobic binding pockets allowing to accommodate different ligands [27].
We used Fpocket [33] and CASTp [32] to calculate geometrical and physicochemical characteristics of the binding pockets taking into account the protein residues interacting with the alpha-helical peptides. The overall hydrophobic character of the binding pockets is again clearly identified. Yet, some specificity is also observed, several pockets show high hydrophobicity score but low local hydrophobic density, or vice versa, demonstrating that the hydrophobic patches are not always regularly distributed in the binding pockets. For example, 1YCR and 3KF9 have similar hydrophobicity scores but high and low calculated hydrophobic density, respectively. The differences of the hydrophobicity distribution are illustrated in Figure 5.
The volumes of the detected pockets in the peptidebinding regions computed with CASTp are given in Table 3. The average volume of the sub-cavities present at the PPI interfaces found by Fuller et al [41] was~60 Å 3 . Sonavane & Chakrabarti [42] found PPI pocket volumes to be up to~330 Å 3 . We found similar volumes to those reported in Bourgeas et al. [43]. Taking into account the various algorithms and different concepts for binding pocket definition, such differences for the computed volumes can be expected. Several small cavities are present in the binding region (seen in Figure 2 and Figure 5), as it has been previously observed for other targeted PPI interfaces [39]. For the proteins studied here, the presence of several small hydrophobic cavities in the alpha-helix binding region seems to be a typical surface feature guiding the anchoring of hydrophobic residues from the peptide side. Such characteristics can also facilitate targeting PPI mediated by alpha-helices by small molecules containing hydrophobic anchors (as terphenyl or other mimetics).
Further, we decided to explore the roughness of the alpha-helix binding sites. The methodology implemented to calculate the fractal surface dimensions, used for the roughness evaluation, is illustrated in Figure 6 for the global surface roughness of chicken calmodulin. The fractal global surface dimension and the fractal local surface dimension for the binding site of chicken calmodulin are calculated to be D S =2.238; ± 0.006 and D L = 2.616 ± 0.072, respectively. The global and local fractal dimensions for the other proteins are given in Table 4. Our results and other previously published data [44][45][46][47] suggest that the global fractal dimension of protein surface is about 2. The local surface fractal dimensions for the binding cavities are computed to be larger than the global surface fractal dimensions for all studied proteins. This reflects the higher roughness of the binding site and its more complex shape and that can be considered as important for ligand binding. The most important differences between D S and D L are obtained for human calmodulin (2VAY), centrin (3KF9, 2K2I), BCL-XL (1BXL), MDM2 (1YCR) and troponin C (1A2X). It has been experimentally demonstrated that human calmodulin [18], BCL-XL [19,20] and MDM2 [21,22] interact with terphenyl or its derivatives. Recently, we suggested  Figure 6 Double logarithmical plot of the surface area versus probe radii for chicken calmodulin (PDB code 2O5G). a possible binding of terphenyl 2, which mimics the relative positions of the side chains of residues TRP848, LEU851, LEU855 of the XPC peptide, into human centrin 2 following our energetic and conformational flexibility analysis performed for the alpha-helical peptide-binding pocket of centrin 2 [27]. The D L value for the peptidebinding site of troponin C shows rougher surface than the entire protein, similarly to the above listed terphenylbinding proteins. Taking into consideration the sequence and structural homology of troponin C and calmodulin and other physicochemical similarities of the binding sites as discussed above, we decided to probe putative terphenyl binding into troponin C. We performed docking of terphenyl 2 into the peptide-binding sites of calmodulin and troponin C using AutoDock. The best scored docking poses are shown in Figure 7. The terphenyl orientations in the best scored poses correspond to the position of the bound alpha-helical peptides shown in Figure 2. The predicted interaction energies of -7.98 and -8.18 kcal/mol for terphenyl binding in calmodulin and troponin C, respectively, suggest favorable interactions with the two proteins.
In the light of the results obtained here, it is now interesting to discuss the physicochemical properties of known PPI modulators, such as terphenyl. In a previous work [10] we gathered a set of 66 PPI inhibitors among which some terphenyl derivatives and other inhibitors of alpha-helix mediated PPI were present. In that work we demonstrated the more hydrophobic character of these compounds but also their bigger size. Interestingly, we also showed the importance of a critical number of aromatic bonds and some specific molecular shapes (T-shaped, star-shaped, or L-shaped compounds), among which some correspond to terphenyl derivatives. The present work therefore confirms that such genuine properties on the ligand side seem to be cavity-driven, and that these small molecules must possess certain properties in order to efficiently modulate an alpha-helix mediated PPI and to mimic the native partner and its properties.

Conclusions
Modulating protein-protein interactions using small molecules based on surface recognition has been a field of increasing interest during the last decade. PPI interfaces are very complex and need to be analyzed in order to be efficiently targeted for drug discovery purposes. Designed compounds must bind with high affinity and selectivity to the target protein. The low sequence identity found between some of the analyzed proteins suggests that there are no sequence requirements for the ability of proteins to bind alpha-helical peptides and consequently smallmolecule mimetics.
From the structural point of view, all investigated proteins show larger surface fractal dimensions for the peptide-binding pockets than the entire protein surface reflecting the higher complexity of the shape of the binding sites. Also, the presence of several hydrophobic patches at the protein surface seems to be an important property related to the ability of the protein to bind alpha-helical peptides and mimetics. Furthermore, we showed that hydrophobicity is not uniformly distributed across different alpha-helix binding pockets and that its distribution can be used to identify hydrophobic hot spots.
Many similarities between the binding sites studied here are observed and terphenyl or its derivatives binding to various alpha-helix binding proteins can be suggested. However, targeting various PPI complexes by similar small molecules can rise selectivity problems in the context of drug discovery or chemical biology projects. Thus, the specificities found here for different binding sites, e.g. key residues, roughness and local hydrophobic density, can be further exploited to optimize terphenyl-like ligands in order to improve their selectivity.