General Description:
1. Creation of an ‘indel-based’ bioinformatics discovery platform; more >>
2. Generation of algorithms for the analysis of ‘host-mimicry’ among pathogen proteins; more >>
3. Development of ‘in silico’ models capable of ranking chemical substances for their
potential antimicrobial properties; more >>
4. Successful in silico discovery of novel non-steroidal ligands for human sex hormone binding
globulin (SHBG); more >>
5. Can ‘Bacterial-Metabolite-Likeness’ Model Improve Odds of ‘In silico’ Antibiotic Discovery? more >>
The dilemma of antibiotic resistance is that new antibiotics are traditionally selected for their activity against proteins that are specific to the pathogen. As a result, they exert similar selective pressures on the pathogen as the existing drugs, which leads to resistance to the new medications. This conventional approach is aimed at minimizing the risk of developing a lead compound that will display toxic effects. The theory being that to target homologous proteins between the host and pathogen should be avoided as there would most likely be serious toxic effects. However, it is highly conserved proteins that perform the everyday, essential housekeeping functions that keep cells in all species alive and functioning. These proteins often act as critical busy ‘hubs’ of protein-protein interactions where numerous intracellular processes are coordinated. As a result, these essential proteins are often more resistant to change through mutation, since any compromise of function would be lethal to the pathogen. Thus, it is possible that compounds that selectively target these conserved ‘hub’ proteins may be potent antibiotics from which the pathogen can not escape via mutation. Recent studies from my laboratory have resulted in an important discovery towards a strategy for targeting ‘hub’ proteins. Namely, we have found that highly homologous and essential pathogen proteins may contain sizable insertions/deletions (indels) when compared to their human homologue(s) that provides a strategy for selective targeting [7, 8]. As a motivating example we can consider the case of Elongation Factor -1α (EF-1α) from Leishmania donovani. It has been long established that the bulk of the genetic difference between Leishmania and human elongation factor-1a (EF-1α) is accounted for by a 12 amino acid insert present in the human protein and absent from the Leishmania sequence [8, 9] (Figure 2). We have now found that a 12 amino acid deletion (compared to the human homologue) in the Leishmania donovani EF-1α may account for unexpected virulence properties of this pathogen protein [7]. Leishmania are obligate intracellular protozoa that infect and replicate exclusively within macrophages of their mammalian hosts. It has been established that Leishmania EF-1α binds to and activates a host tyrosine phosphatase protein SHP-1, which leads to macrophage deactivation [9]. This observation was somewhat surprising considering that EF-1α serves as an essential “housekeeping” enzyme for the translation of proteins from mRNA. The potential virulence properties of Leishmania EF-1α were even more surprising considering the high (82%) sequence identity between this protein and its human counterpart. Our findings suggest that nothing more than the 12 amino acid deletion might explain the acquisition of special virulence properties by an otherwise highly conserved housekeeping protein in the sequence. Importantly, the presence of such a sizable indel suggests the possibility that there is sufficient spatial difference in the three-dimensional structures of human and Leishmania EF-1α such that it could be exploited for selective targeting of the pathogen protein. We subsequently performed ‘in silico’ protein modeling and found that the 12 amino acid insertion in human EF-1α corresponded to a surface exposed hairpin fold, which is absent from the Leishmania protein (Figure 2). Thus, there is a significant ‘patch’ of differentially solvent-exposed amino acids between otherwise highly conserved Leishmania and human EF-1α proteins. When antibodies were raised against the exposed patch on the Leishmania protein (the corresponding area in the human homologues being shielded by the hairpin) they demonstrated selective binding to pathogen EF-1α with absolutely no cross reactivity to human EF-1α or any other human protein (Figure 3). Moreover, when included in a leishmania in vitro protein translation system, the antibodies disrupted protein synthesis by ~50%, further validating the indel-differentiation region of EF-1α as an antimicrobial target [paper in preparation]. In addition, these antibodies may be useful as a simple diagnostic where microscopy is the current gold standard [10]. Further bioinformatics analysis led to the finding that the deletion in Leishmania EF-1 αwas not an isolated example. We found that EF-1α from seven other major protozoan pathogens including Giardia lamblia, Trypanosoma brucei brucei, Entamoeba histolytica, Cryptosporidium parvum, Plasmodium knowles, Plasmoiduim falciparumi and Leishmania braziliensis all share this same indel [9] (Figure 2). In a recent study conducted on a limited set of SWISS PROT proteins from 136 bacterial and protozoan genomes we have also established that insertions and deletions occur in approximately 5-10% of proteins with close human homologues [11]. Furthermore, proteins from the protozoan pathogens such as Trypanosoma cruzi, Plasmodium falciparum and Leishmania donovani exhibit elevated indel-content of up to 25% (Figures 4 and 5) [11]. These data suggest that the occurrence of sequence indels may be involved in the evolution of pathogenic mechanisms in various protozoa spp. Of note, our discoveries around Leishmania EF-1α have established a proof-of-principle for a novel, drug-target discovery platform that can be applied to a broad range of pathogens including antibiotic resistant organisms. For example, we have used this approach to identify a 24 amino acid insertion in the sequence of human heat-shock HSP70 protein which is the only close (~68%) homologue for DnaK proteins from Staphylococcus aureus and Streptococcus pneumoniae. The superimposed structures of DnaK protein from S. areus and human HSP70 are presented in Figure 6.
Figure 2. Superimposed models EF-1α proteins from human (red) and Leishmania donovani (green).
The hairpin fold corresponding to the indel is marked in red. The indel containing alignment of human and protozoa
EF1a proteins is displayed on the lower panel. Additional panels feature images of Leishmania.
Research Plan
Specific Aim 1: to identify essential pathogen proteins which traditionally have not been considered as suitable drug targets due to their overall high similarity to human proteins, but which nonetheless possess potentially ‘targetable’ specific sites on their surfaces.
Hypothesis 1.1: Essential pathogen proteins may contain sequence insertions or deletions (compared to their host homologues) conferring structural differences sufficient to serve as sites for selective antimicrobial targeting.
Overview and rationale
The primary objective of this part of the proposal is to identify essential proteins in 13 selected human pathogens (Staphylococcus aureus strains Mu50, MW2, N315m, MSSA 476, Streptococcus pneumoniae strains A6 and TIGR4, Enterococcus faecalis strain V583, Mycobacterium tuberculosis strains CDC1551, H37Rv, Pseudomonas aeruginosa, Acinetobacter spp., Escherichia coli O157 and Shigella flexneri A2) that are critical for their fitness and survival and which also show sizable indels compared to their human homologues.
Figure 3. Selective immunoprecipitation of leishmania EF-1α under native conditions. Stationary phase leishmania
promastigotes and exponentially growing cells of the human promonocytic cell line THP-1 were washed with Hank’s balanced
salt solution and then lysed separately in buffer containing a cocktail of protease and phosphatase inhibitors. Lysates
were incubated with either rabbit anti-peptide EF-1α antibodies (lane 3 and 4) or normal rabbit serum for
immunoprecipitation (lane 1 and 2). Immune complexes were separated by SDS-PAGE followed by transfer to nitrocellulose
and probed with mouse, monoclonal anti-EF-1α. Lanes 1 and 3 human cell lysates., lanes 2 and 4 leishmania lysates. The
data shown are from two independent experiments that yielded similar results.
The proposed research will facilitate the identification of a dataset of novel drug targets associated with a lower risk of developing drug resistance than historical drug targets in a diverse group of major human pathogens. Quantitative data on indel occurrence amongst otherwise highly similar proteins will bring new insight into mechanisms of divergent evolution. Three-dimensional structures of the candidate target proteins will be modeled ‘in silico’ in order to evaluate differences in spacial structures of human and pathogen homologues inferred by the indel. Significant sequence identity between indel-associated human and pathogen proteins ensures their overall structural similarity. Based on our previous results (as described in the previous sections), we anticipate that the insertion into either sequence will correspond to a compact, distinct protein fold (such as a hairpin or loop). As illustrated previously for EF-1α and DnaK, the exposed areas on the protein surface that appear when the structure itself is not present can be utilized as sites for selective targeting. More specific characteristics of protein indels including their presence/absence in various pathogens, folding preferences, and length and tendency for solvent exposure will be investigated in detail. There is currently a paucity of information about indel structures [26] and size distribution [27, 28]. Furthermore, these previous studies have utilized limited datasets and limited contexts. This research will represent the first comprehensive study of indels that takes into account all available genomic information.
Figure 4. Frequencies of sequence insertions compared to human homologues.
Preliminary work
Perl scripts are used to extract alignments of pathogen and human proteins. We have already developed most of those required for the implementation of the proposed data pipeline in the course of our previous investigation of indels in leishmania EF1α, and DnaK proteins from S. aureus and S. pneumoneae. The Molecular Operation Environment (MOE) is a software package we adopted for scaffold homology modeling of proteins [29]. We have tested the utility of this package for indel-associated protein modeling purposes through the creation of homology models of EF-1α proteins from L. donovani and H. sapiens. These models were created using a template of an experimentally resolved structure of EF-1α from yeast and demonstrated that the 12 amino acid insertion in human EF-1α forms an isolated hairpin fragment. It was predicted that this fold comprised of two anti-parallel beta strands exposed from the protein surface. It should be mentioned that several months after the models of human and leishmania EF-1α were completed, the experimentally determined structure of EF-1a from Sulfolobus solfataricus became available (PDB code 1JNY) [30]. This protein has ~50% identity to leishmania EF-1α and has a similar deletion of 12 amino acids compared to the human EF-1α. The experimental structure of this protein confirmed that the hairpin was indeed missing from the 3D structure and reproduced the general scaffolds of the modeled EF-1α. These data support the accuracy of the homology modeling procedure we have adopted.
Figure 5. Frequencies of sequence deletions compared to human homologues.
Significance and future directions
The proposed research represents the first attempt to examine the previously unrecognized possibility of targeting pathogen proteins with close human homologues. It provides an opportunity to explore and exploit a novel area of target identification for therapeutic drug design. We anticipate the execution of these studies will result in the identification of novel drug targets in many important human pathogens. Eventually, the computational framework can be applied to all infective agents.
Figure 6. Superimposed models of DnaK protein from S. areus (in red) and human HSP70 (in blue). The insertion in the sequence of human protein is shown in yellow.
The right panel features the photo of S. Aureus.
References
7. Nandan D, Cherkasov A, Yi T, Reiner NE. Molecular Cloning and Characterization of Elongation Factor –1a ?from Leishmania donovani Reveals Structural and Functional Differences from host EF –1 a. Biochem. Biophys. Res. Commun. 302: 4, 646-652 (2003)
8. Cherkasov A, Nandan D, Reiner NE. Selective targeting of indel - inferred differences in 3D structures of highly homologous proteins. Proteins: Struct. Funct. Bioinf. 58: 950-954 (2005).
9. Nandan D, Taolin Y, Lopez M, Lai C, Reiner NE. Leishmania EF1a Activates the Src Homology 2 Domain Containing Tyrosine Phosphatase SHP-1 Leading to Macrophage Deactivation. J. Biol. Chem. 227: 50190-50197 (2002).
10. Reports of the World Trade Organization, 2003.
11. Cherkasov A, Nandan D, Lee S-J., Reiner NE. Large-scale Survey for Potentially Targetable Indels in Bacterial and Protozoan Proteins. Proteins: Struct. Funct. Bioinform., (2005), in press.
26. Sibanda BL, Thornton JM. Accommodating sequence changes in beta-hairpins in proteins. J. Mol. Biol., 229: 428-447 (1993).
27. Gu X, Li WH. The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol., 40: 464-475 (1995).
28. Fechteler T, Dengler U, Schomburg D. Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. J. Molec. Biol., 253: 114-131 (1995).
29. Molecular Operation Environment. V. 2004.10, Chemical Computation Group, Montreal, 2004.
30. Vitagliano L, Masullo M, Sica F, Zagari A, Bochhini V. The crystal structure of Sulfolobus solfataricus elongation factor 1alpha in complex with GDP reveals novel features in nucleotide binding and exchange. The EMBO J. 20: 5305-5311 (2001).
Top
2. Generation of algorithms for the analysis of ‘host-mimicry’ among pathogen proteins
Pathogen proteins often manipulate host cellular functions by mimicking host activities. In some cases, mimicry is achieved through virulence factors that are direct homologues of host proteins that have been incorporated into the genome of the pathogen through horizontal gene transfer (HGT) [12-14]. In others, convergent evolution has produced new effectors that, although having no obvious amino acid sequence similarity to host factors, mimic them at the molecular level [15]. Several examples of host mimicry have been described in the literature [15-20] with the invasine protein from Yersinia pestis being one of the best studied cases [15]. It has been demonstrated that invasine possesses an overall shape resembling human fibronectin even though there is no sequence similarity between the two proteins (Figure 7). The spatial resemblance of the shape and surface position of critical chemical groups allows yersinial invasine to mimic human fibronectin and thus initiate bacterial internalization through binding to the integrin receptor (which normally interacts with fibronectin - its natural ligand to serve as a ‘cellular gate’). Several other striking examples of pathogen mimicking activities have been reported such as SptP from Salmonella and its ExoS homologue from Psudomonas, which mimic GAP enzymes and induce pathogenic hydrolysis in the host [16-20]. My laboratory has recently developed a new bioinformatics approach for the identification of potential mimicking pathogen proteins (supported by funding from Genome Canada and Vancouver Hospital Health Sciences Centre) [21]. This led to the implementation and testing of a new structural genomics tool for screening pathogen genomes for proteins with low sequence- but high structural-similarity to human host analogues. The method facilitated accurate identification of non-obvious structural similarities between proteins having no or limited sequence identity. A corresponding scoring scheme for a bacterial protein to be a potential mimicker was also developed. When applied to the genome of Chlamydia trachomatis this approach yielded a list of candidate virulence factors possibly mimicking some human functions [21]. We found that out of 33 top scoring Chlamydia proteins with known functions, 11 had been previously identified as pathogenic virulence factors. Thus, our approach allowed very substantial enrichment of genome scanning for potential virulence factors. We anticipate that the development and broad application of this new tool to the complete genomes of other infectious organisms will reveal a pool of novel, candidate virulence factors including those involved in drug resistance.
Figure 7. Structures of human fibronectin (top) and Yersinia Invasine (bottom). The bacterial protein mimics
the human counterpart and competes with it in binding to the beta-1Iintergin cell receptors in host (human) cells. Such
competition promotes hostile bacterial internalization (Science. 286: 291-295 1999). NOTE: The sequence similarity
between the two proteins is 1.5%.
Research PlanSpecific Aim 2: to identify bacterial proteins (virulence factors) that are used by human pathogens to mimic and manipulate host functions.
Hypothesis 2.1: A combination of threading and sequence alignment techniques can be used to identify potential virulence factors having low sequence identity but high structural similarity with host proteins.
Overview and rationale
The primary objective of this part of the proposal is to identify microbial proteins that do not have close analogues in the host genome, but which possess structural similarities to specific host proteins. We anticipate that these pathogen proteins are likely to play a role in pathogenicity through mimicry of host functions. In theory, in order to identify pathogen virulence factors that share three-dimensional features with human analogues, one would need to model all proteins from pathogen and human genomes (and then compare them all one-by-one). This task is not yet achievable using the bioinformatics tools that are currently available. Currently two main approaches to protein structure prediction are being used. In the first approach, the information is represented in linear form, as a ‘profile’, which is based on empirically derived scores for the expected occurrence of residues in a particular structure [35-39]. This type of approach is relatively rapid, but an unknown protein can only be characterized if it has reasonable sequence similarity with protein(s) with known structure. The second approach involves sequence-structure threading using pair potentials that score the likelihood of two residues being at a certain distance [40-50]. This approach is based upon the assumption that nature has made certain economic decisions wherein countless different proteins fold into a limited number of shapes (estimated at approximately 4000 [51]) and that nearly all natural protein structures can be described based on these shapes. Threading attempts to assign folds for a given sequence by fitting it onto each member of a library of known folds using pseudo-energy as a measure of fit in a “tried and tested” manner. Remarkably, threading approaches have been shown to make accurate predictions even in a “twilight zone” of <25% sequence identity, where sequence-based approaches normally fail [49, 50]. However, neither profile–based nor threading–based approaches are capable of identifying structurally similar proteins from two different sources (genomes). To overcome this problem, we have developed an indirect approach to identify potential protein structural similarities based upon numerical outputs from protein sequence-structure threading.
Preliminary work
We recently proposed that proteins with potentially similar structures should have very similar ‘threading profiles’; i.e., their sequences behave similarly when they are threaded through a standard library of folds [52]. Thus, two sequences should produce high- and low-threading scores on the same templates and as such, the ordered arrays of threading scores of two sequences should form a high quality linear correlation. Figure 10 illustrates the similarity of threading profiles of two structurally similar proteins 3CAA_A – antichymotrypsin (chain A) and 9PAI_A – plasminogen activator inhibitor-1 (Chain B) that have only 20% sequence identity. The structural alignment of these proteins resulted in a coordinate RMSD (root-mean-square-deviation) value of 2.1Å indicating that the structures can be superimposed very precisely. Figure 10 clearly shows the threading profiles of 3CAA and 9PAI are analogous and when the two arrays of 1983 Z parameters are correlated for 3CAA and 9PAI the value of the correlation coefficient is 0.734.
(a) Space filling view of the ligand sitting within the protein cavity.
(b) Interactions between the ligand and protein. EF-1a residues within about 4.5Å of the ligand are shown, mostly as line. They hydrogen bonds are shown as black dotted lines.
(c) The human protein has been superimposed on top of the leishmania protein in a darker color.
d) The darker colored hairpin loop of the human protein can be seen to clash directly with the compound.
A rigorous evaluation of the algorithm and estimation of ‘structure similarity’ threshold for the r2 parameter was carried out using a set of 24,351 protein chains with experimentally determined 3D structures [52]. We derived Z scores representing pseudo-energy of threading for each of the sequences threaded onto the available 1893 model folds using the THREADER2 software [46]. We have estimated the correlation between Z scores and compared these with the coordinate RMSD of the corresponding structure-structure alignment for every aligned pair of proteins from the dataset using the CE (combinatorial extension) algorithm [53]. Figure 11 shows the relationship between the estimated correlation coefficients r2 (which reflects the similarity of protein threading profiles) and the corresponding RMSD produced by cross-superposition of all 24,351 protein structures studied. The graph corresponds to 846,534 individual comparisons within the studied proteins. As can readily be seen from the chart, the meaningful correlations between threading scores (i.e. with correlation coefficients r2 > 0.7) correspond to high quality structural alignments with RMSD values below 2Å (typical threshold for distinguishing pairs of structurally similar proteins). It is clearly seen, that the r2~0.68 cut-off produces very few false positive observations in the chart’s in top right corner (i.e., there are almost no proteins pairs, which in fact, correspond to high r2 and high RMSD values). According to the estimated numbers of true- and false predictions, the r2 = 0.68 threshold achieves 99% specificity and 50% sensitivity in identifying structural alignments with RMSD < 2Å [52]. Thus, it appears that if the threading scores for 2 sequences correlate well, the corresponding three-dimensional protein structures of experimentally resolved protein structures resemble each other very closely. We have also established, that at the highest levels of protein structural similarity (when correlation between two threading profiles is above r2>0.95), the developed scoring scheme can reach up to 99% specificity and 97% sensitivity levels [52].
Figure 10. Structural alignment and threading profiles for 2 structurally similar proteins:3CAA - ANTICHYMOTRYPSIN; CHAIN: A (Green)
9PAI:A - PLASMINOGEN ACTIVATOR INHIBITOR-1; CHAIN: A (Red)
RMSD = 2.1Å, Sequence identity = 24.8%
This approach was further adopted to derive the scoring scheme for identification of tentative mimicking proteins from pathogen genomes. Thus, we considered a pathogen protein as a potential mimic when it had no sequence similarity to a human proteins and yet scored above r2>0.95 (i.e., indicating a high degree of probability that the corresponding 3D structures are very similar). When this scheme was applied to a genome of Chlamydia trachomatis it yielded a list of candidate virulence factors. We have carefully examined the top 40 candidate proteins (ranked by factor r2) and found that that out of 33 annotated proteins from the list, 11 have been previously identified as pathogenic virulence factors [52]. These results demonstrate that the scoring scheme can be useful for screening pathogen genomes for potential virulent factors. As further evidence of its applicability, we have subsequently found that this approach readily identifies invasine as a potential human mimicker as well the virulence factors SptP and ExoS (unpublished results, work in progress).
Figure 11. RMSD values (ax Y) of pair wise alignments of protein chains with known structures versus the corresponding parameters r2 (ax X) established by correlating the corresponding threading scores.
Significance and future directions
Threading of whole genomes of major human pathogens itself will provide valuable information about protein folding within nosocomial and other life threatening organisms. The application of our novel structural genomics tools to the threading results will bring interesting insights into mechanisms of pathogen evolution. The tentative virulence factors -‘mimickers’- identified from the genomes studies will be further evaluated by collaborating investigators. Together, not only will the indel-bearing essential pathogen protein targets and bacterial virulence factors mimicking host proteins represent good candidates as novel drug targets, but they will also provide a basis for research into infectious agents, host cell biology and the evolution of pathogenesis. It is anticipated that this research will build upon the existing expertise of my research group in ‘in silico’ development of new drug candidates against drug targets [54-56] and serve as a foundation for future antibiotic drug design efforts in my laboratory.
References
12. Doolittle WF. You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet. 14: 307-11 (1998).
14. Stephens RS, Kalman S, Lammel C, Fan J, Marathe R, Aravind L, Mitchell W, Olinger L, Tatusov RL, Zhao Q, Koonin EV, Davis RW. 1998. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282:754-759 (1998).
15. Stebbins CE, Galan JE. Structural mimicry in bacterial virulence. Nature, 412: 701-705 (2001).
16. Frankel G, Lider O, Hershkoviz R, Mould AP, Kachalsky SG, Candy CA, Cahalon L, Humphries M, Dougan. The cell-binding domain of intimin from enteropathogenic Escherichia coli binds to beta1 integrins. J. Biol. Chem. 271: 20359-62034 (1996).
17. Hamburger ZA, Brown MS, Isberg RR, Bjoarkman PJ. Crystal structure of invasin: a bacterial integrin-binding protein. Science. 286: 291-295 (1999).
18. Van Nhieu GT, Isberg RR. The Yersinia pseudotuberculosis invasin protein and human fibronectin bind to mutually exclusive sites on the alpha 5 beta 1 integrin receptor J. Biol. Chem. 266: 24367-24375 (1991)
19. Chen Y, Smith MR, Thirumalai K, Zychlinsky A. A bacterial invasin induces macrophage apoptosis by binding directly to ICE The EMBO Journal. 15: 3853 -3860 (1996).
20. Van Nhieu GT, Sansonetti PJ. Mechanism of Shigella entry into epithelial cells. Curr Opin Microbiol. 2: 51-55 (1999).
21. Cherkasov A, Jones SJM. An approach to large scale identification of non-obvious structural similarities between proteins BMC Bioinformatics, 5: 61 (2004)
35. Russell RB, Saqi MAS, Bates PA, Sayle RA, Sternberg MJE. Recognition of analogous and homologous protein folds – assessment of prediction success and associated alignment accuracy using empirical substitution matrices. Protein Engineering, 11: 1-9 (1998).
36. Bowie JU, Luthy R, Eisenberg G. A method to identify protein sequences that fold into a known three-dimensional structure. Science 1991, 253: 164-170 (1991).
37. Bates PA, Jackson RM, Sternberg MJE. Genomes, Molecular Biology and Drug Discovery. Academic Press, London, 73 (1996).
39. Rice DW, Eisenberg G. 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J Molec Biol. 267: 1026 1038 (1997).
40. Rost B, Schneider R, Sander C. Protein fold recognition by prediction – based threading. J. Molec. Biol. 270: 471-480 (1997).
41. Defay TR, Cohen FE. Multiple sequence information for threading algorithms. J. Mol. Biol. 262: 314-323 (1996).
42. Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF. IMPALA: matching a proteins sequence against a collection of PSI-BLAST – constructed position-specific score matrices. Bioinformatics, 15: 1000-1011 (1999).
43. Godzik A, Skolnick J. Sequence – structure matching in globular proteins: application to supersecondary and tertiary structure determination. Proc. Natl. Acad. Sci., 89: 12098 -12102 (1992).
44. Bryant SH, Altschul SF. Statistics of sequence – structure threading. Curr Opin Struct Biol, 5: 236-244 (1995).
45. Murzin AG, Bateman A. Distant homology recognition using structural classification of proteins. Proteins (Suppl.), 1: 105-112 (1997).
46. Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature, 358: 86-89 (1992).
47. Jones DT, Miller RT, Thornton JM. Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins, 23: 387-397 (1995).
48. Taylor WR. Multiple sequence threading: an analysis of alignment quality and stability. J Mol Biol, 269: 902-943 (1997).
49. Orengo CA, Jones DT, Thornton JM: Protein superfamilies and domain superfolds. Nature (London), 372:63-63 (1994).
51. Machalek AZ. Structural genomics: a slice of the proteomics pie. ASM News, 67: 441-446 (2001).
53. Shindyalov IN, Bourne PE. A database and tools for 3-D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithm. Nucleic Acids Research 29:228-229 (2001).
55. Li Y, Jones SJM, Cherkasov A. Selective Targeting of Indel-Inferred Differences in Spatial Structures of Homologous Proteins. Journal of Bioinformatics and Computational Biology, 2005, in press
56. Cherkasov A. Shi Z, Li Y, Jones SJM, Fallahi M, Hammond GL. ‘Inductive’ Charges on Atoms in Proteins: Comparative Docking with the Extended Steroid Benchmark Set and Discovery of a Novel SHBG Ligand. Journal of Chemical Information and Modelling, 45, 2005, in press.
3. Development of ‘in silico’ models capable of ranking chemical substances for their potential antimicrobial properties Research Plan
Specific Aim 1: to identify essential pathogen proteins which traditionally have not been considered as suitable drug targets due to their overall high similarity to human proteins, but which nonetheless possess potentially ‘targetable’ specific sites on their surfaces.
Hypothesis 1.1: Essential pathogen proteins may contain sequence insertions or deletions (compared to their host homologues) conferring structural differences sufficient to serve as sites for selective antimicrobial targeting.
Overview and rationale
The primary objective of this part of the proposal is to identify essential proteins in 13 selected human pathogens (Staphylococcus aureus strains Mu50, MW2, N315m, MSSA 476, Streptococcus pneumoniae strains A6 and TIGR4, Enterococcus faecalis strain V583, Mycobacterium tuberculosis strains CDC1551, H37Rv, Pseudomonas aeruginosa, Acinetobacter spp., Escherichia coli O157 and Shigella flexneri A2) that are critical for their fitness and survival and which also show sizable indels compared to their human homologues.
The proposed research will facilitate the identification of a dataset of novel drug targets associated with a lower risk of developing drug resistance than historical drug targets in a diverse group of major human pathogens. Quantitative data on indel occurrence amongst otherwise highly similar proteins will bring new insight into mechanisms of divergent evolution. Three-dimensional structures of the candidate target proteins will be modeled ‘in silico’ in order to evaluate differences in spacial structures of human and pathogen homologues inferred by the indel. Significant sequence identity between indel-associated human and pathogen proteins ensures their overall structural similarity. Based on our previous results (as described in the previous sections), we anticipate that the insertion into either sequence will correspond to a compact, distinct protein fold (such as a hairpin or loop). As illustrated previously for EF-1α and DnaK, the exposed areas on the protein surface that appear when the structure itself is not present can be utilized as sites for selective targeting. More specific characteristics of protein indels including their presence/absence in various pathogens, folding preferences, and length and tendency for solvent exposure will be investigated in detail.
There is currently a paucity of information about indel structures [26] and size distribution [27, 28]. Furthermore, these previous studies have utilized limited datasets and limited contexts. This research will represent the first comprehensive study of indels that takes into account all available genomic information.
Preliminary work
Perl scripts are used to extract alignments of pathogen and human proteins. We have already developed most of those required for the implementation of the proposed data pipeline in the course of our previous investigation of indels in leishmania EF1α, and DnaK proteins from S. aureus and S. pneumoneae. The Molecular Operation Environment (MOE) is a software package we adopted for scaffold homology modeling of proteins [29]. We have tested the utility of this package for indel-associated protein modeling purposes through the creation of homology models of EF-1α proteins from L. donovani and H. sapiens.
These models were created using a template of an experimentally resolved structure of EF-1α from yeast and demonstrated that the 12 amino acid insertion in human EF-1α forms an isolated hairpin fragment. It was predicted that this fold comprised of two anti-parallel beta strands exposed from the protein surface. It should be mentioned that several months after the models of human and leishmania EF-1α were completed, the experimentally determined structure of EF-1α from Sulfolobus solfataricus became available (PDB code 1JNY) [30]. This protein has ~50% identity to leishmania EF-1α and has a similar deletion of 12 amino acids compared to the human EF-1α. The experimental structure of this protein confirmed that the hairpin was indeed missing from the 3D structure and reproduced the general scaffolds of the modeled EF-1α. These data support the accuracy of the homology modeling procedure we have adopted.
Significance and future directions
The proposed research represents the first attempt to examine the previously unrecognized possibility of targeting pathogen proteins with close human homologues. It provides an opportunity to explore and exploit a novel area of target identification for therapeutic drug design. We anticipate the execution of these studies will result in the identification of novel drug targets in many important human pathogens. Eventually, the computational framework can be applied to all infective agents.
References
26. Sibanda BL, Thornton JM. Accommodating sequence changes in beta-hairpins in proteins. J. Mol. Biol., 229: 428-447 (1993).
27. Gu X, Li WH. The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol., 40: 464-475 (1995).
28. Fechteler T, Dengler U, Schomburg D. Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. J. Molec. Biol., 253: 114-131 (1995).
29. Molecular Operation Environment. V. 2004.10, Chemical Computation Group, Montreal, 2004.
30. Vitagliano L, Masullo M, Sica F, Zagari A, Bochhini V. The crystal structure of Sulfolobus solfataricus elongation factor 1alpha in complex with GDP reveals novel features in nucleotide binding and exchange. The EMBO J. 20: 5305-5311 (2001).
4. Successful in silico discovery of novel non-steroidal ligands for human sex hormone binding globulin (SHBG) Overview and rationale
Sex hormone-binding globulin (SHBG) is a glycoprotein in blood plasma that is produced primarily by the liver [1]. Expression of the SHBG gene in the testis of several mammals also gives rise to a protein, commonly known as the testicular androgen binding protein (ABP), which is thought to play a key role in sperm maturation [2]. Plasma SHBG and testicular ABP bind biologically active androgens and estrogens and play a critical role in regulating the access of these sex steroids to their target cells [1-3]. In addition to binding steroids with high affinity, SHBG has been reported to interact directly with plasma membranes of cells in some tissues in a ligand-dependent manner, and to thereby stimulate intracellular signalling pathways that alter cell growth and/or function [4]. Numerous human diseases such as endometrial cancer [5], ovarian dysfunction [6], male and female infertility [7], osteoporosis [8, 9], diabetes [10] and cardiovascular diseases [11] are associated with abnormal levels of SHBG in plasma. Many of these disease processes, or the health problems associated with them, can be attributed to abnormalities in the plasma distribution and bio-availability of the endogenous sex steroid ligands of SHBG. In cases where the disease can be attributed to the limited activities of sex steroid, such as osteoporosis, the identification of high affinity non-steroidal SHBG ligands with no intrinsic biological properties of their own might represent a means of enhancing the bioavailability and activities of endogenous sex steroids. In the current study we therefore employed conventional ‘in silico’ drug design technologies, such as docking- and pharmacophore-based virtual screening, as well as several recently developed ‘in house’ molecular modeling solutions, to establish a virtual screening method for non-steroidal compounds that could effectively displace endogenous sex steroids from the human SHBG steroid-binding site. Virtual screening methods are key approaches in modern computer-aided drug design. They involve analyzing electronic collections of chemical structures by means of various computational tools, with the goal of identifying manageable subsets of compounds that have higher chances of being active against the desired biological target(s). Virtual screening is thus considered an effective complementation for experimental high-throughput assays and is being used increasingly in modern drug discovery practices [12-14]. There are two types of virtual screening; namely, structure-based docking which requires detailed information about the three-dimensional structure of the target’s binding site, and ligand-based techniques (such as pharmacophore modeling) that rely on pre-existing knowledge of compounds with biological activities of interest [12-14]. For ‘in silico’ discovery of non-steroidal SHBG ligands we employed both structure-based and ligand-based techniques, as well as our recently developed QSAR (quantitative structure-activity relationships) solutions [15-17]. We focused our efforts on screening databases of natural compounds since SHBG is known to interact with some plant-derived agents [18], and because such collections are typically rich in biologically active substances [19]. The following three-stage lead discovery procedure was implemented to identify potential ligands of the human SHBG steroid-binding site from collections of natural molecules. Stage one, involved the use of existing data on known ligands of SHBG [20] to develop several pharmacophore models which were then used to screen an electronic collection of natural compounds. As a result, we identified a smaller set of natural derivatives that met the stringent structural requirements imposed by these models. The second stage involved using the same set of known SHBG ligands to train a QSAR model that utilizes a machine-learning algorithm to distinguish known SHBG ligands from other chemicals. The resulting structure-activity model was then applied to rank the pharmacophore-identified compounds for their potential binding affinity toward SHBG. During the third stage of the ‘in silico’ study, we conducted rigorous virtual docking of selected molecular structures into the SHBG steroid-binding pocket. Finally, those compounds that ranked highly by the QSAR model, and/or were favoured by the virtual docking, were subjected to experimental testing of their potencies as SHBG ligands in vitro.
Figure 1. The developed pharmacophores A, B and C superimposed with the structure of DHT.a) Pharmacophore A
b) Pharmacophore B
c) Pharmacophore C
Results
Database of Natural Substances. Using various public and commercial databases we assembled a collection of natural compounds (molecules found in nature or their close synthetic analogues/derivatives) totalling more than 48,000 substances. We selected only those compounds with well-defined chemical structures that could be purchased in sufficient quantities for subsequent analysis in vitro. Those chemicals that satisfied typical ‘drug-likeness’ criteria [21] were placed in a separate group of ‘drug-like’ natural substances. This group included 23,836 molecules with molecular weights within the 200-500 D range, and which possess 1 - 5 hydrogen bond donors (OH, NH, SH groups) and 1 - 10 hydrogen bonds acceptors (O, N, S atoms). They also had less than 8 rotating bonds, 1 - 3 aromatic rings or/and 1 - 5 cyclic systems, a total polar surface area of less than 140 Å2, and a hydrophobicity of below logP = 8.0 (the corresponding drug-likeness filters have been implemented within the MOE (Molecular Operation Environment) package [22]). For each of these 23,836 ‘drug-like’ natural substances we generated up to 100 distinct conformations using the Catalyst 4.8 software [23]. The resulting dataset, reflecting the conformational space of the ‘drug-like’ natural derivatives, was then subjected to pharmacophore-based virtual screening. Pharmacophore-based Virtual Screening for Novel SHBG Ligands. A pharmacophore represents a set of space-positioned molecular features determining the ability of a ligand to bind to its biological target. A pharmacophore model can thus be developed by superimposing the molecular structures of specific ligands against their known affinities for the same target, and this allows the common structural features responsible for binding interactions to be defined. In the case of SHBG, we developed three major pharmacophore models illustrated graphically in Figure 1. Pharmacophore A was developed from the structures of 5a-dihydrotestosterone (DHT) and its close derivatives (Figure 1a). This pharmacophore model included nine major features: three hydrophobic/aromatic sites, one hydrogen bond donor/acceptor point with three donor/acceptor projection points and one hydrogen bond acceptor with two acceptor projection points. Pharmacophore B was generated by aligning the crystallographic configurations of DHT and estradiol inside the steroid-binding pocket of SHBG, and it utilized eight basic features: three hydrophobic/aromatic centers, one H acceptor and one H-bond donor, two donor projection points, and one volume exclusion feature restraining the molecular boundaries (Figure 1b). The nine-featured Pharmacophore C relied on the flexible alignment of a large number of known ligands of human SHBG [20], which resulted in the identification of the following common features: three hydrophobic/aromatic centers, one hydrogen donor site with two donor projection points and one H-bond acceptor with two projection points (Figure 1c). The adequacy of the resulting pharmacophore models was evaluated by applying them to a test collection of compounds with high, moderate, or zero binding affinity to SHBG. The performance in relation to each pharmacophore model was assessed through the hit- and recovery-rate parameters (for details see Materials and Methods). The results indicated that Pharmacophore A was the most successful in identifying steroidal substances (high affinity ligands), while it tended to miss moderately active non-steroidal compounds (see Table 1). Pharmacophore B identified most steroids in the test set, in addition to some non-steroidal SHBG binders, while Pharmacophore C generated the largest number of hits including a substantial fraction of SHBG binders in the test set, as well as numerous non-active substances (see Table 1). These pharmacophore models were then used as data-mining instruments of varying stringency. By applying them in different combinations to conformer sets generated for the 23,836 ‘drug-like’ natural substances, we identified 201 initial hits including 96 known steroid-like and 105 non-steroidal compounds (corresponding to the entries 283-388 in Appendix 1). QSAR hit ranking. To prioritize the identified 105 non-steroidal chemicals for experimental testing we processed them with a recently developed QSAR approach utilizing ‘inductive’ molecular descriptors [15-17, 24-27]. These parameters represent a novel distinct group of QSAR descriptors that cover a broad range of bound atoms and molecules whose properties vary in relation to their size; polarizability; electronegativity; compactness; mutual inductive and steric influence, and distribution of electronic density [24-27]. The ‘inductive’ QSAR descriptors have been used in our previous studies for building ‘drug-likeness’ and ‘antibiotic-likeness’ models [16] and for creating QSAR solutions for the anti-microbial activity of cationic peptides [17]. Detailed information on the ‘inductive’ descriptors can be found elsewhere [15-17, 24-27]. In the current study, we used ‘inductive’ descriptors in combination with the method of Artificial Neural Networks (ANN) to rank the 105 non-steroidal compounds selected according to their potential binding-affinity to human SHBG. To build a predictive QSAR model we assembled a set of more than 70 compounds known to interact with SHBG (entries 1-78 from Appendix 1), and complied a set of about two hundred chemicals [28] with unknown affinities to SHBG as ‘negative’ controls for the model (entries 79-282 in Appendix 1). For all 282 molecules, we calculated 28 independent ‘inductive’ QSAR descriptors that are characterized in greater detail in Table 2. The rational for building a QSAR model for SHBG ligands was such that these 28 molecular parameters could be used as independent variables for describing SHBG binding-affinity. Thus, within the training set, the SHBG binders (entries 1-78) were assigned a dependent parameter with a value 1.0, while the compounds from the ‘negative control’ set (entries 79-282) were all assigned zero. To relate the ‘inductive’ descriptors to the binary (1|0) SHBG binding criteria for the 282 molecules studied, we employed a standard back-propagation ANN configuration consisting of 24 input, 8 hidden and 1 output nodes (see Figure 2). For effective training and subsequent validation of the ANN, we used the training set of 188 molecules randomly selected as representing 2/3 of the 282 molecules under investigation. In each training run, the remaining 94 compounds under investigation were used as the testing set to assess the predictive ability of the model. In each of 20 independent training and validating testing runs of the ANN, the corresponding false/true positive- and negative-predictions were estimated using a 0.50 cut-off for the output. In other words, ANN outputs greater than 0.50 were considered as positive predictions. The results demonstrated that the ANN achieved up to 99% accuracy in distinguishing SHBG binders within the training sets and 92% accuracy within the testing sets. The results for various cut-off parameters averaged over 20 independent validation runs are shown in Table 3. These data clearly illustrate that the ‘inductive’ QSAR descriptors allowed us to distinguish with confidence the compounds in the testing and training sets that bind to SHBG from those that do not. The ANN-based binary QSAR model we created was applied to the 105 molecules identified previously by the pharmacophore model-based search for potential SHBG ligands (entries 283-388 in Appendix 1). The molecular structures of all 105 molecules were processed to calculate 28 ‘inductive’ descriptors that were then passed through the pre-trained ANN. The resulting network outputs have been assembled in Appendix 1 and it can be seen that some candidate molecules are ranked very highly by the model. We anticipated that the corresponding top-ranked substances would represent very good candidate ligands of human SHBG, and we selected 22 compounds (molecules from Table 4 with network outputs above 0.1) for purchasing and in vitro testing. At the same time, we applied a rigorous ligand-protein docking procedure to all 105 pharmacophore-identified compounds to support the lead selection by the QSAR model and to expand the set of selected compounds.

Figure 3. Crystal structure of human SHBG containing DHT or estradiol in the steroid-binding pocket.
a) Human SHBG steroid-binding pocket occupied by DHT.
b) Human SHBG steroid-binding pocket occupied by estradiol.
Structure-based Virtual Screening. Several human SHBG crystal structures have been solved and deployed to the Protein Data Bank [29]. These include co-crystal structures of the protein with different steroid ligands, as well as steroid-binding data for numerous human SHBG mutants that can be exploited in the structure-based design of high affinity SHBG ligands [30-33]. These published crystal structures also demonstrate that SHBG binds androgen and estrogen molecules in different (opposite) orientations (see Figure 3 for more details), and show that there are two major hydrogen bonding ‘anchors’ within the SHBG steroid-binding site: one involving Ser42 residue which binds to the C3-carbonyl group in C19 steroids (androgens) and to the 17ß-OH group of C18 steroids (estrogens), and another formed by the side chains of Asp85 and Asn82, which binds functional groups at the C3 and C17 positions of C18 and C19 steroids, respectively. These two ‘anchors’ therefore interact selectively with functional groups of androgen and estrogen molecules with respect to their unique orientations within the SHBG steroid-binding site [30]. In addition to differences in the orientation of androgens and estrogens within the SHBG steroid-binding site, there are several other challenges associated with the virtual docking of small compounds into this hydrophobic pocket. One is that the conformation of the SHBG steroid-binding site may change upon ligand binding in an as yet undefined manner, and another relates to the fact that the presence of a Zn ion can disorder a polypeptide loop covering the human SHBG steroid-binding site and alter its ligand binding specificity [34]. Thus, the previous studies of the crystal structure of SHBG complexes demonstrated that Zn+2 is positioned at the ligand-entry point of the steroid-binding pocket [33]. It has also been experimentally confirmed that the presence of zinc ion does not have any impact on binding of C19 steroids. On another hand, it reduces the SHBG affinity of estrogen and its derivatives that bind to the active site in a different orientation. In fact, the site-directed mutagenesis studies demonstrated that Zn+2 causes reorientation of Asn65 side chain in the SHBG binding pocket that otherwise can form strong H – bond with C3 oxygen of estradiol in the absence of the metal. Taking these factors into consideration and to capture all possible implications of Zn+2 presence in the active site, we developed a consensus docking strategy utilizing four protein structures some of which include the zinc atom while others do not – namely the PDB entries 1D2S, 1F5F, 1KDM and 1LHU that correspond to the human SHBG co-crystallized with DHT (1D2S) or with estradiol (1LHU). All protein structures were pre-processed for docking by removing water molecules and reconstructing hydrogen atoms (see ‘Materials and Methods’ for more details). We used the Glide 2.7 program [35] to fit 32 SHBG binders with experimentally determined association constants (correspond to entries 1-8, 10-12, 14-16, 18, 19-21, 28, 32, 33, 38, 41, 44, 46, 51, 53 from Appendix 1) into the four different crystal structures of SHBG. Based on the docking results for all four structures, we derived an averaged consensus score, which was plotted in Figure 4 against known ligand-SHBG association constants (Ka) taken from the literature [20]. As can be seen in Figure 4, the r value for the correlation is above 0.81, and this indicates that the docking protocol reproduced the experimental protein binding affinities with fairly good accuracy. Figure 5 illustrates the docked DHT structure, and shows that it is very closely aligned with the known orientation of DHT within the crystal structure (1D2S) of the SHBG steroid-binding site. It was also established that, on average, a compound demonstrates good binding affinity to SHBG when the corresponding docking score is below the –7.5 threshold. Figure 6 illustrates the distribution of the consensus docking scores among known SHBG binders and compounds with no known SHBG binding affinity (molecules 82-151 from Appendix 1 which have been docked as negative controls). These results demonstrate that the consensus docking score we have developed is a useful guide for selecting potential SHBG ligands for experimental testing. We therefore applied this procedure to 105 pharmacophore-selected non-steroidal structures and docked them into the four crystal structures of SHBG. The established consensus docking scores assembled in Table 4 indicate that this set of 105 compounds might contain a large number of potential SHBG binders, with more than half of the candidates scoring below the –7.5 threshold, and several molecules yielding a docking score below –9.0 (see Figure 6). The docking scores also demonstrated that most of the molecules selected using the QSAR ranking could be docked into SHBG steroid-binding site very precisely, and these were selected as prime candidates for testing of their SHBG binding properties in vitro. On the other hand, some compounds (9808, 823, 1372, 5605, 1924, 298, 422), which were not identified by the QSAR model, demonstrated very good docking potential (consensus scores below -7.5), and these were also selected for in vitro testing. In total, 29 non-steroidal compounds were purchased and subjected to in vitro testing (all presented in Table 4).
Figure 7. The displacement curves for test compounds used in the in vitro competition assay to determine the relative binding affinities of human SHBG ligands. The amount of [3H] DHT bound to SHBG in the presence of increasing concentrations of competitor ligands (B) is expressed as a percentage of the amount of [3H] DHT bound to SHBG in the absence of competitor ligand (Bo).
In Vitro Testing of Potential SHBG Ligands. All 29 compounds selected by the QSAR ranking and virtual docking experiments were screened for their ability to interact with the SHBG steroid-binding site in vitro. The screening assay involved a modification of an established competitive steroid ligand-binding assay that employs tritium-labelled DHT ([3H] DHT) as the radio-labelled ligand (see Materials and Methods for details). The initial screen of compounds was conducted at a single high concentration (approximately 200 µM), and the results (presented in Table 4) demonstrate that 8 non-steroidal compounds (696, 2593, 2228, 2623, 6767, 8590, 5636 and 5597) displaced 35-95 % of the [3H] DHT from the SHBG steroid-binding site. These compounds, and a compound (5098) with no activity in the screening assay (i.e., a negative control), were then selected for a more detailed analysis of their ability to compete [3H] DHT from the human SHBG steroid-binding site relative to known concentrations of the physiologically most important androgen (testosterone) and estrogen (17ß-estradiol). The resulting competitive displacement curves generated using these test compounds (see Figure 7) illustrate that their potencies as SHBG ligands are very much in line with their rank potencies obtained in the preliminary screening assay (Table 4), with the most effective competitors being 5597, 2623, and 2593. For these compounds an IC50 could be calculated from the plot shown in Figure 7. These IC50 values (5597 = 13.6 µM; 2623 = 22.4 µM; 2593 = 124.5 µM) could then be compared with those of testosterone (15.6 nM) and estradiol (61.6 nM). These data provide a measure of the relative binding affinities (RBAs) of the top three test compounds as compared to testosterone or estradiol, and their RBAs are shown in Table 5 along with their structural alignments against those of testosterone or estradiol produced by the MOE Flexible alignment routine [22]). Although small differences in the association rate constants (Ka) for testosterone and estradiol have been reported, they have been determined previously [20] under the same conditions (i.e. at 4ºC) as those employed in this study, and are as follows: testosterone (Ka = 1.1 x 109 M-1); estradiol (Ka = 6 x 108 M-1). Thus, it is also possible to estimate the Ka values for the most active test compounds based on their RBA values (see Table 5). The parallelism of the competitive displacement curves for 2623 and the natural steroids (Figure 7) is also indicative that this compound is completely soluble at high concentrations and behaves in essentially the same way as a steroid ligand with respect to its kinetics of binding. By contrast, the crossing of the displacement curve for 5597 with that of 2623 (see Figure 7) indicates that 5597 may not be completely soluble at high concentrations, and this could tend to underestimate of the potency of 5597 in the assay.
Discussion
The competitive displacement curves in Figure 7 illustrate the range of potencies of various compounds as assessed in the competitive steroid-binding assay. The affinities of all of these compounds are more than two orders of magnitude lower than estradiol. However, the potencies of 2623 and 5597 exceed those of other synthetic and natural non-steroidal compounds with endocrine disrupting properties, and which typically bind to SHBG steroid-binding site with association constants in the range of 0.02 x 105 M-1 to 8 x 105 M-1 [18, 36, 37]. To date, the highest association constant for a non-steroidal ligand of SHBG (Ka = 3.2 x 106 M-1) has been reported for (-)-3, 4-divanyllyltetrahydrofuran which belongs to the class of natural lignans [18], and the calculated Ka (1.20 x 106 M-1) for compound 5597 is very close to that value. Moreover, the structure of 5597 is much easier to produce synthetically, as compared to a lignan, and provides more opportunities for further structural modifications that could increase binding affinity to SHBG. Figure 8a shows a superimposition of the molecular structures of 5597 and (-)-3, 4-divanyllyltetrahydrofuran. This illustrates that the overall shapes of the two molecules are not only similar, but it also shows that the critical chemicals groups required for SHBG binding can be positioned very closely in space. It can also be expected that the non-steroidal compounds we have identified reside within the SHBG steroid-binding pocket in essentially the same way as the physiologically important sex steroid ligands. Figure 8b shows the structure of 5597 docked into the active site of SHBG and superimposed against bound DHT. In this orientation, 5597 reproduces all critical binding features of a sex steroid. Interestingly, two of the non-steroidal ligands of SHBG we have identified (2623 and 2593) are structurally very similar but clearly differ in their binding affinities. The main structural difference between them is the presence of a methoxy-group adjacent to a hydroxyl group on a phenolic ring structure, which seems to be associated with an increased in binding affinity. This is interesting because there is a very similar difference in the RBAs of estradiol and 2-methoxy-estradiol for human SHBG [20] and this suggests that the phenolic ring structures of compounds 2623 and 2593 may reside within the SHBG steroid-binding site in the same orientation as ring A of an estrogen molecule. The structure of one of the most active compounds (5597) we have identified as a ligand of human SHBG is also quite similar to a dicyclohexane derivative (Figure 8c features a superimposition of their structures) that binds with relatively high affinity to the rat testicular ABP, and causes azospermia when administered to sexually mature rats [38, 39]. This dicyclic derivative does not bind with high affinity to human SHBG [38, 39], and this likely reflects species differences in the topography of the SHBG steroid-binding site, which undoubtedly contribute to the unique steroid-binding properties of SHBGs in different species [20]. The identification of 5597 as a human SHBG ligand indicates that other structurally-related compounds probably exist with even higher affinities for human SHBG and they could be predicted by applying the computational solutions we have developed. In this context, very good consensus docking scores of below -8.2 were obtained for seven of the eight novel SHBG ligands we have identified. Despite the fact that the virtual docking approach resulted in numerous false positive predictions of SHBG ligands, three inactive compounds (3838, 1898 and 2099) were correctly rejected by the virtual docking. Although several other compounds (3305, 4944, 5607, 1824, 2835, 5537, 5098, 1235, 4819, 479 and 938) were falsely predicted as good SHBG binders by the QSAR model, QSAR ranking did not produce any false negative predictions. All six compounds (9808, 823, 1372, 5605, 1924, 298, 422) that scored below the 0.10 threshold in the QSAR model were selected for in vitro testing due to acceptable docking scores, but they did not demonstrate significant binding activity. The QSAR model assigned very significant network outputs above 0.5 to four of the most active compounds which produce a >35% reduction of [3H] DHT binding to SHBG in the screening assay (compounds 696, 2593, 2623, 2228). This model also resulted in output values above 0.10 for four other molecules (6767, 8590, 5636, 5597) that were subsequently shown to bind to SHBG. By contrast, it would seem that the ANN tended to over-rank chemical structures having several non-aromatic rings, as all 11 top-ranked substances contain two or more of them. However, this was not unexpected since the ANN was trained on natural steroids (containing several conjoined aliphatic rings) as the strongest SHBG binders. On one hand, this provides several advantages because the condensed polycyclic compounds we have identified (2593, 696, 2228 and 2623) all demonstrated good SHBG binding properties. On the other hand, the ANN did not rank some of the most active substances, such as 5597, 5636, 6767 and 8590, very highly, and this is like due to the fact that those molecules contain aromatic groups which may resemble non-steroidal structures used as the ‘negative control’ in the ANN model training. Nonetheless, the trained ANN model ranked the corresponding chemicals structures highly enough to select them for purchasing and testing. Overall, based on the results of experimental testing, we conclude that QSAR ranking allowed efficient prioritization and selection of the chemicals. At the same time, one should keep in mind, that the usage of ANN does not allow interpreting contributions from individual QSAR descriptors. On another hand, the developed binary QSAR model is exceptionally fast compared to docking and other ‘in silico’ approaches, it represents an excellent complimentary technique that enhances the power of pharmacophore methods. It can also be coupled in the future studies with programs for combinatorial synthesis/library design to carry out the rounds of lead optimization. In the upcoming studies we will also attempt developing QSAR solutions operating by statistical, rather than machine-learning algorithms that will provide more insight into the SAR. Nonetheless, it should also be stressed, that the ‘in silico’ procedure we have developed in the current work has already allowed a more than 25% recovery rate of novel sub-micromolar inhibitors of DHT binding to SHBG, and this significantly exceeds standard success rates of conventional ‘in silico’ protocols.
Figure 8. Structural alignment of a compound 5597 with other SHBG ligands.8a) Structure of the most active identified non-steroidal SHBG blocker (green scaffold) 5597 flexibly aligned with (-)-3, 4-divanyllyltetrahydrofuran – the best known non-steroidal SHBG blocker (scaffold coloured by elements).
8b) Structure of 5597 in its SHBG-docked configuration (green) superimposed with the experimentally determined orientation of DHT in the SHBG binding site (red)
8c) Structure of 5597 (green scaffold) flexibly aligned with dicyclohexyl derivative previously demonstrated high binding affinity toward the rat ABP protein (scaffold coloured by elements)
Conclusions
The computational approach we have developed resulted in the identification of 105 prospective compounds from a collection of 23,836 natural substances that were prioritized for in vitro testing by QSAR modeling and virtual docking techniques. This procedure resulted in the identification of eight novel non-steroidal ligands with an ability to displace the natural ligand (DHT) with the highest known affinity for human SHBG at low micro-molar concentrations. We also conclude that the combination of rather stringent QSAR ranking with more ‘relaxed’ docking criteria provides the most efficient predictive power. In fact, the results indicate that the ‘in silico’ pipeline we have implemented correctly identifies every fourth compound we selected for in vitro testing as a micro-molar inhibitor of the target protein. Moreover, the eight most active non-steroidal SHBG ligands we have identified belong to four distinct molecular scaffolds with several available substitution positions. Hence, there is potential to improve the binding activity of these lead compounds through further chemical modification. It is also anticipated that the adopted ‘in silico’ procedure will undergo further methodological development and enhancement: more ‘inductive’ QSAR descriptors are under way, and more advanced machine learning and statistical techniques are being tested for QSAR modeling with ‘inductive’ parameters (as the latter will allow interpreting contributions from individual ‘inductive’ descriptors and, thus, better guide lead optimization/synthetic efforts). It is also feasible to enhance the ‘in silico’ pipeline with such sensitive approaches as CoMFA/CoMSIA that may accelerate further leads optimization. We also plan on expanding the developed approach to other areas of therapeutics and expect that such endeavour may lead to the identification of novel drug leads.
Table 5. Structures of the three most active non-steroidal lead compounds superimposed by the flexible alignment with DHT molecule (second column) and an estrogen molecule (third column) and the corresponding values of the SHBG dissociation constants and RBA parameters.
References
1. Hammond, G.L. Potential functions of plasma steroid-binding protein. Trends Endocrinol Metab. 1995, 6, 298-304.
2. Joseph, D.R. Structure, function, and regulation of androgen-binding protein/sex hormone-binding globulin. Vitamins and Hormones. 1994, 49,197-280.
3. Siiteri, P.K., Murai, J.T., Hammond, G.L., Nisker, J.A., Raymoure, W.J., Kuhn, R.W. The serum transport of steroid hormones. Recent Prog Horm Res. 1982, 38, 457-510.
4. Nakhla, A.M., Rosner, W. Stimulation of prostate cancer growth by androgens and estrogens through the intermediacy of sex hormone-binding globulin. Endocrinology. 1996, 137, 4126-4129.
5. Nisker, J.A., Hammond, G.L., Davidson, B.J., Frumar, A.M., Takaki, N.K., Judd, H.L. Siiteri, P.K. Serum sex hormone-binding globulin capacity and the percentage of free estradiol in postmenopausal women with and without endometrial carcinoma. A new biochemical basis for the association between obesity and endometrial carcinoma. Am J Obstet Gynecol. 1980, 138, 637-642.
6. Hogeveen, K.N., Cousin, P., Pugeat, M., Dewailey, D., Soudan, B., Hammond, G.L. Human sex hormone-binding globulin variants associated with hyperandrogenism and ovarian dysfunction. J Clin Invest. 2002, 109, 973-981.
7. Anderson, D.C. Sex-hormone-binding globulin. Clin Endocrinol. 1974, 3, 69-96.
8. Van Pttelgergh, I., Goemaere, S., Zmierczak, H., Kaufman, J.M. Perturbed sex steroid status in men with idiopathic osteoporosis and their sons. J Cin Endocrinol Metab. 2004, 89, 4949-4953.
9. Rapuri, P.B., Gallagher, J.C., Haynatzki, G. Endogenous levels of serum estradiol and sex hormone binding globulin determine bone mineral density, bone remodeling, the rate of bone loss, and response to treatment with estrogen in elderly women. J Clin Endocrinol Metab. 2004, 89, 4954-4962.
10. Lindstedt, G., Lundberg, P-A., Lapidus, L., Lundgren, L., Björntrop, P. Low sex-hormone-binding globulin concentration as independent risk factor for development of NIDDM. 12-yr follow-up of population study of women in Gothenburg, Sweden. Diabetes. 1991, 40, 123-128.
11. Haffner, S.M., Katz, M.S., Stern, M.P., Dunn, J.F. Association of decreased sex hormone binding globulin and cardiovascular risk factors. Arteriosclerosis. 1989, 9, 136-143.
12. Virtual Screening for Bioactive Molecules (Methods and Principles in Medicinal Chemistry). Böhm, H-J., Schneider, G., Kubinyi, H., Mannhold, R., Timmerman, H. Eds., Wiley, New York, 2000.
13. Klebe, G. Virtual Screening: An Alternative or Complement to High Throughput Screening? Kluwer, Dordrecht, 2000.
14. Guner, O.F. Pharmacophore Perception, Development, and Use in Drug Design (Iul Biotechnology Series). International University Line, La Jolla, 2000.
15. Cherkasov, A. ‘Inductive’ Descriptors. 10 Successful Years in QSAR. Curr Comp Aided Drug Design. 2005, 1, 21-42.
16. Cherkasov, A. Inductive QSAR Descriptors. Distinguishing Compounds with Antibacterial Activity by Artificial Neural Networks. Intern J Mol Sci. 2005, 6, in press.
17. Cherkasov, A., Jankovic, B. Application of ‘Inductive’ QSAR Descriptors for Quantification of Antibacterial Activity of Cationic Polypeptides. Molecules, 2004, 9, 1034-1052.
18. Schottner, M., Gansser, D., Siteller, K.M. Lignans interfering with 5 alpha-dihydrotestosterone binding to human sex hormone-binding globulin. J Nat Products. 1997. 61, 119-121.
19. Newman, D.J., Cragg, G.M., Snader, K.M. Natural products as sources of new drugs over the period 1981-2002. J Nat Products. 2003, 66, 1022-1037.
20. Westphal, U. Steroid-protein Interaction II. Mongraphs in Endocrinology, Heeidelberg:Springer-Verlag, Berlin. 1986.
21. Xu, J., Stevenson, J. Drug-like index: a new approach to measure drug-like compounds and their diversity. J Chem Inf Comp Sci. 2000, 40, 1177-1187.
22. MOE: Molecular Operational Environment; Version 2003.10, Chemical Computation Group Inc., Montreal, Canada, 2004.
23. Catalyst; Version 4.8, Accelrys Inc., San Diego, 2004.
24. Cherkasov, A. Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges. J Chem Inf Comp Sci. 2003, 43, 2039-2047.
25. Cherkasov, A.R., Galkin, V.I., Cherkasov, R.A. "Inductive" Electronegativity Scale. 2. "Inductive" Analog of Chemical Hardness. J Mol Struct Theochem. 2000, 497, 115-123.
26. Cherkasov, A.R., Galkin, V.I., Cherkasov, R.A. A New Approach to the Theoretical Estimation of Inductive Constants. J Phys Org Chem. 1998, 11, 437-447.
27. Cherkasov, A.R., Galkin, V.I., Cherkasov, R.A. "Inductive" Electronegativity Scale. J Mol Struct Theochem. 1999, 489, 43-46.
28. Asinex Gold Collection, Asinex Ltd., Moscow, 2004.
29. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. The Protein Data Bank. Nucl Acid Res. 2000, 28, 235-242.
30. Grishkovskaya, I., Avvakumov, G.V., Hammond, G.L., Catalano, M.G., Muller, Y.A. Steroid ligands bind human sex hormone-binding globulin in specific orientations and produce distinct changes in protein conformation. J Biol Chem. 2002, 277, 32086-32093.
31. Avvakumov, G.V., Grishkovskaya, I., Muller, Y.A., Hammond, G.L. Crystal structure of human sex hormone-binding globulin in complex with 2-methoxyestradiol reveals the molecular basis for high affinity interactions with C-2 derivatives of estradiol. J Biol Chem. 2002, 277, 45219-45225.
32. Grishkovskaya, I., Avvakumov, G.V., Hammond, G.L., Muller, Y.A. Resolution of a disordered region at the entrance of the human sex hormone-binding globulin steroid-binding site. J Mol Biol. 2002, 318, 621-626.
33. Hammond, G.L., Avvakumov, G.V., Muller, Y.A. Structure/function analyses of human sex hormone-binding globulin: effects of zinc on steroid-binding specificity. J. Steroid Biochem Mol Biol. 2003, 85, 195-200.
34. Avvakumov, G.V., Muller, Y.A, Hammond, G.L. Steroid-binding specificity of human sex hormone-binding globulin is influenced by occupancy of a zinc-binding site. J Biol Chem. 2000, 275, 25920-25925.
35. Glide; Version 2.7, Schrödinger Inc., San Diego, 2004.
36. Dechaud, H., Ravard, C., Claustrat, F., de la Perriere, A.B., Pugeat, M. Xenoestrogen interaction with human sex hormone-binding globulin (hSHBG). Steroids, 1999, 64, 328-334.
37. Hogert, J., Zacharewski, T.R., Hammond, G.L. Interactions between human plasma sex hormone-binding globulin and xenobiotic ligands. J Steroid Biochem Mol Biol, 2000, 75, 167-176.
38. Rousseau, G.G., Rolin Jacquemyns, C.F., Sirett, D.A.N., Huybrechts, M., De Coen, J.L., Quivy, J.I. Inhibition of steroid-protein interactions by dicyclohexane derivatives. J Steroid Biochem. 1988, 31, 691-697.
39. Rousseau, G.G., Quivy, J.I., Kirchnoff, J., Bui, X-H., Devis, R. Nonsteroidal compounds which bind epididymal androgen-binding protein but not the androgen receptor. Nature. 1980, 284, 458-459. 40. SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995.
41. Maestero; Schrödinger Inc., San Diego, 2004.
42. Halgren, T.A. Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization and Performance of MMFF94. J Comp Chem. 1996, 17, 490-519.
43. Hammond, G.L., Lahteenmaki, P.L. A versatile method for the determination of serum cortisol binding globulin and sex hormone binding globulin binding capacities. Clin Chim Acta, 1983, 132, 101-110.
5. Can ‘Bacterial-Metabolite-Likeness’ model improve odds of ‘in silico’ antibiotic discovery? In the series of our previous works we reported the development of 3D-sensitive QSAR descriptors called ‘inductive’ and demonstrated their successful application in a number of molecular modeling studies including quantification of antibacterial activity of organic compounds [1] and cationic peptides [1,2], computation of partial charges in small molecules [3] and proteins [4], in comparative docking analysis [4,5] as well as in ‘in silico’ lead discovery [4,5]. The detailed description of ‘inductive’ QSAR descriptors and their rationale can be found in the recent review [1]. In summary, all ‘inductive’ QSAR parameters are related to atomic electronegativity (?), covalent radii (R) and intramolecular distances (r) and can be derived from the formulas for steric Rs and inductive s* parameters (equations (1)-(2)), ‘inductive’ electronegativity ? (equations (3)), ‘inductive’ partial charge (equation (4)) and ‘inductive’ analogues of chemical hardness ? and softness s (equations (5) and (6) respectively):
where the variables indexed with j subscript describe influence of a singe atom onto a group of atoms G (typically the rest of N-atomic molecule) while G indices designate group (molecular) quantities. The linear character of equations (1)-(6) make ‘inductive’ descriptors readily computable and suitable for sizable databases and position them as appropriate parameters for large-scale models of ‘drug-likeness’, ‘antibiotic-likeness’, etc1. These binary QSAR classifiers represent an emerging topic of ‘in silico’ research that assist virtual screening studies, combinatorial library design and large-scale data mining [6-13] etc.
A variety of QSAR descriptors as well as statistical and machine-learning techniques have been previously used for solving drug/non-drug separation problem and for creating ‘drug-like’, ‘agrochemical-like’, ‘lead-like’ and ‘natural product –like’ binary classifiers [6-13]. A number of QSAR models have been reported that distinguish between 249 antibacterials and general drugs presented in the Tomas-Vert dataset [14-16]. Several related ‘antibiotic-likeness’ modeling studies have also been conducted on smaller sets of antimicrobials [18-20].
For the purpose of the current study we assembled an extended molecular set consisting of 525 antimicrobials, 959 general drugs and 1202 drug-like substances and used them to evaluate applicability of ‘inductive’ descriptors for large-scale modeling of ‘antibiotic-likeness’ properties and related QSAR categories.
Results and Discussion
Model (a): Distinguishing Antimicrobials from Drugs.
1484 input patterns corresponding to 30 ‘inductive’ descriptors for the studied antimicrobials and general therapeutics have been used to train 31-10-1 configured ANN (the structure is featured on Figure 1). As is has been previously described, the network outputs have been assigned to 1.0 for antimicrobial compounds and to 0.0 for all drugs in the training set (containing 70% of all patterns) and 20 independent training runs have been carried out. After each run, the produced network weights have been used to assess the ANN performance on the testing set consisting of the remaining 30% of the input patterns. The counts of the false/true positive and negative predictions have been estimated using 0.5 cut-off for ANN outputs and the resulting values of Specificity, Sensitivity, Accuracy and the Positive Prediction Values (PPV) have been collected into Table 2. The averaged 30-10-1 ANN outputs for antimicrobials and drugs can also be found in the Supplementary materials file. Table 2 features averaged statistics for training (70% of the inputs used) and testing (30% of the inputs used) network runs as well as accuracy parameters for ANN classification of the entire (100%) set of antimicrobials and drug substances. The estimated values demonstrate that the use of ‘inductive’ QSAR descriptors results in up to 92% accurate prediction of antimicrobial activity among the studied compounds. Such accuracy is similar or superior to the results of several similar ‘antibiotic-likeness’ studies conducted on smaller sets of molecular structures where the overall accuracy ranged from 78% [16] to 98% [18] depending of the QSAR methodology, size of molecular set and validation techniques used.
Model (b): Distinguishing Antimicrobials from Drug-like Substances.
When the 30-10-1 neural network has been trained on 525 antimicrobials and 1202 general drugs (using the same set of ‘inductive’ descriptors as in the previous example) – the resulting separation accuracy achieved 96% during the training phase and 92% for the testing runs. The averaged Specificity, Sensitivity, Accuracy and PPV values established with 0.5 cut-off have also been collected into Table 2. The estimated parameters illustrate that the ANN can separate antimicrobials from drug-like chemicals more effectively then from actual drugs. Less accurate separation of antimicrobials from general drugs may be explained by the fact that some conventional non-antibiotic therapeutics can possess side antibacterial activity. Thus, there have been reports on antimicrobial potential of numerous types of conventional drugs [30-39] and, therefore, it is possible that some compounds form the ‘negative’ control group use in the example a) can act as antimicrobials.
Model (c): Distinguishing Antimicrobials from all others
The 30-10-1 network trained and tested on 525 antimicrobials and 2161 non-antimicrobial compounds of the ‘negative control’ group (corresponding to the combined set of general drugs and drug-likes) allowed 92-94% accurate separation of the two activity classes. The corresponding values of average accuracy can be found in Table 2 along with all other statistics characterising the performance of the developed QSAR model on the entire set of 2686 substances. These statistical parameters demonstrate that QSAR model (c) represents somewhat average version of models (a) and (b) confirming the overall good ability of inductive’ QSAR parameters to separate antimicrobials, drugs and drug-like compounds.
d) Distinguishing Antimicrobials versus Drugs versus Drug-likes.
Considering that ‘inductive’ descriptors could confidently distinguish antimicrobials from drugs and from drug-like chemicals, we attempted creating a Neural Network with two output nodes enabling simultaneous recognition of all three activity types. The expectation was that such ANN configuration will produce more robust predictions and will result in reduction of false positives. As it has been described in the ‘Materials and Methods’ section, we assigned two dependent variables to 2686 molecules under study. Namely, 525 substances have been assigned activity values [1.0; 1.0] corresponding to their classification as antimicrobials and as drugs; 959 molecules have been associated with [0.0; 1.0] outputs reflecting the absence of antimicrobial- and presence of drug activity; the remaining 1202 drug-like chemicals were considered as ‘double inactive’ and have been assigned [0.0; 0.0] values. We have adopted 30-10-2 configuration for the ANN (more details can be found on Figure 1 and legends) and trained it in the same manner as described for previous models (a) –(c). The network outputs have also been interpreted with 0.5 threshold and only when both output values were correct the corresponding prediction by the ANN has been considered as true. The performance by the 30-10-2 network has also been assessed with Specificity, Sensitivity, Accuracy and PPV values. As it can be seen from the data in Table 2 the developed model resulted in ~98% specific classification of the three types of activity as the number of false positive predictions has been considerably reduced compared to the previously discussed models that utilized 30-10-1 ANN configuration. The overall prediction accuracy of the network with two output nodes also appeared very high (93% and 91% respectively for the training and testing phases). Thus, the results of classification of the three classes of compounds by the developed binary QSAR models have readily demonstrated that a limited set of ‘inductive’ descriptors can adequately capture those structural features of the studied chemicals that are relevant for their antimicrobial- and drug- behaviours. The rationale for this adequacy can be possibly explained by the fact, that ‘inductive’ QSAR parameters cover a broad range of proprieties of bound atoms and molecules related to their size, polarizability, electronegativity, electronic and steric interactions. Thus, despite certain drawbacks, such as a ‘black-box’ nature of ANN solutions, the developed QSAR models demonstrate very high prediction accuracy and can be suggested as effective tools for large-scale QSAR studies/discovery of antimicrobial compounds. On another hand, most of the effective antimicrobial leads identified to the date, are either derived from or similar to the substances naturally involved into bacterial metabolism. Thus, to enable more effective virtual screening for antibiotic candidates we attempted developing a QSAR model for ‘Bacterial-Metabolite-Likeness’.
Figure 1. Neural Networks used for distinguishing Antibacterial compounds from a combined set of Drugs and non-Drugs (network A) and for classification of all 3 groups of compounds (network B). The edge colour coding corresponds to the assigned weights (ranged from red: -1 to green +1), the nodes colors designate the last passed value calibrated from 0 to 1 (from blue to bright green).
QSAR model for Bacterial Metabolites.
According to the formally effective Section 507(a) of the Federal Food Drug and Cosmetic Act [40,41], antibiotics have been defined as “…and drug intended for use by man containing any quantity of any chemical substance which is produced by a microorganism and has the capacity to inhibit or destroy microorganisms in dilute solution…”. This definition reflected the fact that historically, bacterial metabolites (and chemically synthesised equivalents) served as the main source of antibiotic leads. Thus, the development of effective ‘in silico’ tool for assessing the lead resemblance to natural bacterial metabolites represent an important task. We employed QSAR methodology described in the previous examples a) – d) to develop a binary classifier operating on bacterial metabolites. First of all, we assembled a set of 565 compounds recently isolated from bacteria and characterized by the Analyticon Discovery Company [42]. The names and the SMILES records for most of the studied bacterial metabolites can be found in the Supplement materials (though some metabolites structures have not been disclosed as they can only be obtained from the Analyticon under the non-disclosure agreement). The previously described 1202 drug-like substances from the Assinex Gold collection have been selected as the negative control for building the ‘Bacterial-Metabolite-Like’ (BML) model. Using SMILES-based comparison and ‘inductive’ descriptor-based clustering we ensured that the bacterial metabolites dataset does not contain duplicates to the above described 2686 chemical structures. Similarly to the previous cases, we calculated 30 ‘inductive’ descriptors featured in Table 1 for all metabolites and assigned 1.0 dependent value to all active entries and 0.0 to the negative control (drug-like substances). The training of the 30-10-1 network has been similarly carried out using 70% to 30% split for the training and testing sets. 20 independent runs have been conducted and the separation of active and non-active molecules by the resulting ANN solutions have been assessed using 0.5 output threshold. The estimated average Accuracy, Specificity, Sensitivity and PPV reflecting the ability of the developed model to distinguish bacterial metabolites from the drug-likes (derived from the Assinex Gold collection [24]) have been colleted into Table 2. The prediction accuracy for the training and the testing runs appeared in the range of 92-93% what demonstrates very good discriminative ability of ‘inductive’ descriptors and characterizes the resulting ANN-based model of ‘Bacterial-Metabolite-Likeness’ (BML) as very adequate. Then, the developed BML model has then been applied to the validation set of 2686 molecular structures consisting of the conventional antimicrobials, drugs and drug-like compounds. In this case, however, we did not use the original set of 1202 drug-like substances that have previously been used for training the BML model. Instead, we assembled the ‘external’ set of 1202 drug-like structures randomly derived from the National Cancer Institute (NCI) dataset [43]. To derived drug like substances from the NCI dataset, we employed the same criteria as described in the previous sections. The non-redundancy of the selected 1202 NCI structures has been ensured through the SMILES records; the structures of the compounds have been optimized with the MMFF94 force-field [25] and all 30 inductive parameters previously used in the BML model training have been computed. Thus, the resulting set to be used for assessing performance of the BML model, included 525 antimicrobial compounds, 959 general drugs and 1202 NCI drug-like chemicals that can also be found in ‘Supplementary materials’ file. The objective for applying the BML model to such dataset was to investigate the overall similarity of the three classes of chemicals (antimicrobials, drugs and drug-likes) to native bacterial metabolites. The patterns of 30 ‘inductive’ descriptors for each of 2686 molecular structures have been passed through the pre-trained BML-model to produce a single network output that could be considered as likelihood of the corresponding compound to be a metabolite. The generated ANN predictions for 525 antimicrobials, 959 drugs and 1202 NCI drug-like chemicals (can be found in the Supplementary materials) have then been classified with 0.5 cut-off and transformed into the corresponding confusions matrices (false/true positive/negatives) as well as percent yield (%Y), percent accurate (%A), enrichment factor (E) and Goodness of hit list (GH) parameters custom for in silico screening studies:
Where Ht is the total number of compounds in the hit list (in our case – ANN outputs exceeding the threshold), Ha is the number of known actives in the hit list (true positives), A is the active compounds in the database, D is the number of compounds in the database. These parameters have been computed using different ANN threshold values and have been collected into Table 3 for those cases when antimicrobials, drugs and drug-likes have been considered as active substances. By doing so we could evaluate the ability of the BML model to recognize these three groups of compounds from a mixed pool of chemicals. Figure 2 features compositions of the resulting BML hit lists produced with 0.5, 0.6 and 0.7 output thresholds; where numbers of identified antimicrobials, drugs and drug-likes are placed into separate circles color-coded by the number of constituent hits. Panels b)-d) on Figure 2 illustrate that the produced BML-hit lists contain up to 56%, 61% and 70% of antimicrobial compounds for 0.5, 0.6 and 0.7 thresholds respectively. These data also graphically illustrate the fact that 30-10-1 neural network pre-trained on bacterial metabolites can distinguish antibacterials from other substances with 83% accuracy, while general drugs could only be recognized by the BML model with 59% accuracy (see Table 3 for more details). The application of the ‘Bacterial-Metabolite-Like’ criteria to drug-like substances produced only few hits resulting in 35% prediction accuracy. Thus, only 34, 22, 10 and 7 compounds have been sufficiently recognized by the BML model from the set of 1202 NCI drug-like substances, when 0.5, 0.6, 0.7 and 0.8 ANN threshold have been respectively applied. To summarize this section, it is necessary to stipulate that the developed BML model is 2.6 to 5 times more likely to recognize antimicrobial compounds compared to general drugs and 18 to 45 times more likely to recognize antimicrobial than just drug-like substances (when judged by the corresponding enrichment factors). On one hand, these findings clearly demonstrate that there exists a definite similarity between conventional antimicrobial therapeutics and native bacterial metabolites. On another hand, the results characterise the developed “Bacterial-Metabolite-Likeness” model as potentially useful additional QSAR tool for in silico antibiotic discovery. Unfortunately, the ANN nature of the developed solutions and utilized ‘inductive’ QSAR descriptors do not allow direct identification of easily interpretable factors of intra- and/or intermolecular interactions that distinguish bacterial metabolites from dugs and drug-like substances. We can only speculate that certain molecular features are likely to involve chemicals into bacterial uptake or can enhance their ability to penetrate bacterial cell walls or allow the compounds to fit the specific physical-chemical environment of bacterial cells. What is certain however, that it is possible to construct effective QSAR models distinguishing bacterial metabolites from other chemicals substances and therefore the current study may lay the foundations for further broad investigation of the BML systems. Thus, thorough investigation of bacterial metabolites is required in order to derive dependences similar to Lipinski’s rule or to define special patterns of structural/fragmental features (such as ratio of hetero-atoms, number of rings, number of chiral centers among others) as it has been previously done to distinguish natural produces, drugs and combinatorial chemicals [44]. It is anticipated that identification of such distinguished factors of ‘bacterial-metabolite-likeness’ will help developing new antibiotics that more closely resemble native bacterial compounds and therefore may have enhanced potency or cause less pathogen resistance. Such investigations are currently under way. Interpretation of some False Positive Predictions.
Figure 2. Composition of the studied molecular dataset (a) and distribution of active hits produced by the developed ‘BML-model’ at various ANN thresholds (b-d).
As it has been previously mentioned, some conventional drugs and/or untested chemicals may possess profound but unrecognized antimicrobial activity. On one hand, such possibility may lower accuracy of ‘antibiotic-likeness’ models (as our own results indicate, even highly specific 30-10-2 Neural Network has produced significant ‘antibiotic-like’ predictions for a number of conventional drugs that don’t have any antimicrobial annotation). On another hand, some of these ‘false positive’ predictions may posses non-appreciated antibiotic potential and, thus, may be considered as interesting antimicrobial leads. Moreover, some of the studied conventional therapeutics produced significant scores by both ‘antibiotic-likeness’ and ‘bacterial-metabolite-likeness’ models, what makes these substances even more attractive for further testing. Thus, Table 4 features three conventional therapeutics – Lovastatin, Gentisic acid and Olivomycin A, that all exhibited significant ‘antibiotic-likeness’ and ‘bacterial-metabolite-likeness’ potentials when processes by the developed QSAR models. Moreover, all these three compounds have structurally similar bacterial metabolites present in the BML training set (also featured in Table 4). The structural similarity to bacterial metabolites has been established by the descriptor-based clustering using 30 ‘inductive’ parameters and Tanimoto similarity criteria. In particular, the bacterial metabolite NP-007587 appeared to be Lovastatin’s steric isomer. This observation is not too surprising considering that Lovastatin has originally been discovered as a fungal metabolite isolated from a strain of Aspergillus terreus [45]. Another compound featured in Table 4 - the Gentisic acid (scored 0.7456 as bacterial metabolite-like and 0.6605 as antimicrobial-like) closely resembles a bacterial substance NP-001423 that differs only by one methyl group and a substitution position of hydroxyl radical. Interestingly, Olivomycin A, annotated as ‘antineoplastic, cytostatic agent’ by the Merck Index Database [23] produced very significant 0.997 ANN output as a potential antimicrobial as yielded significant BML-score of 0.7219. This large compound appeared to have a close bacterial metabolite analogue – a compound NP-009248 that is also featured in Table 4. Naturally, we conducted literature search for any evidences of antimicrobial activities of Lovastatin, Gentisic acid and Olivomycin A. It turned out, that antilipemic drug Lovastatin (scoring 0.7091 out of 1.0 as bacterial metabolite and 0.5669 as antimicrobial) does, in fact, exhibit antibiotic potentials against a number of bacteria including Escherichia coli [46], Halobacterium holobiuim [47] and Halobacterium volcanii [48]. Anti-fungal [49-52] and anti-parazite [53-58] activities of Lovastatin have also been well documented with some studies reporting the minimal inhibitory concentration of Lovastatin being as low as 0.25?M [59] The analgesic; antiinflamatory agent Gentisic acid and its close derivatives demonstrated profound antimicrobial activity against a range of human pathogens including Campylobacter jejuni, Escherichia coli, Listeria monocytogenes, and Salmonella enterica [60-61]. Antimicrobial activity of Olivomycin A – a compound isolated from the fungus Actinomyces olivoreticuli has also been well documented in early antibiotic papers [62-64]. Olivomycins have even been annotated as antibiotics in several databases different from the Merck Index, such as ChemIDPlus [21] and the NIAID database [65]. Thus, all three conventional therapeutics having very similar bacterial metabolite analogues do demonstrate certain antimicrobial potential and, in principle, can serve as antimicrobial leads for future drug discovery attempts. A small number of other conventional drugs have also produced significant ‘antibiotic-likeness’ and ‘bacterial-metabolite-likeness’ predictions, but those molecules did not have close structural analogues among experimentally determined bacterial metabolites. Possible antimicrobial activity of those substances will also be further investigated by means of extensive literature and database search and by experimental verification. What is possible to state at the moment however – that the examples of Lovastatin, Onlivomycin and Gentisic acid illustrate possible application of the developed ‘antibiotic-likeness’ and ‘bacterial-metabolite-likeness’ QSAR models for large-scale screening for novel antimicrobial candidates.
Conclusions and Future Directions
The results of the present work demonstrate that a range of atomic, substituent and molecular properties that are represented by the ‘inductive’ QSAR descriptors allow adequate separation of the four groups of chemicals: antimicrobial substances, general drugs, drug-like compounds and bacterial metabolites Using only 30 ‘inductive’ descriptors with no additional parameters we were able to achieve up to 97% correct separation of these activities using the Artificial Neural Network approach. Thus, the developed QSAR models can be suggested as useful ‘in silico’ tool for distinguishing and ranking potential antibiotic leads. It should also be mentioned, that our choice of the ANN approach for the current ‘pilot-study’ has not been dictated by any particular preferences rather than the factor that we have previously been using ANN for modeling ‘Antibiotic-likeness’ potentials [1]. Thus, the accuracy of the developed approach operating by the inductive descriptors can presumably be further improved by applying other advanced classification techniques such as Support Vector Machines or Bayesian Neural Networks. Use of merely statistical techniques in conjunction with the ‘inductive’ QSAR descriptors will also be pursued in the following studies and the role of specific molecular features in antibiotic-, drug- and bacterial metabolite-likeness will be investigated.
Table 1. 30 inductive QSAR descriptors used in the study.
Table 2. Specificity, Sensitivity, Accuracy and Positive Predictive Values of various QSAR models.
Table 3. Ability of the /Bacterial-metabolite-likeness’ model to recognize antimicrobials, drugs and drug-like compounds.
Table 4. Some non-antimicrobial compounds that produced significant Antibiotic-likeness scores and posses close bacterial metabolite analogues.
References
1. Cherkasov, A. ‘Inductive’ Descriptors. 10 Successful Years in QSAR. Current Computer-Aided Drug Design, 2005, 1, 21-42.
2. Cherkasov, A., Jankovic, B. Application of ‘Inductive’ QSAR Descriptors for Quantification of Antibacterial Activity of Cationic Polypeptides. Molecules, 2004, 9, 1034-1052.
3. Cherkasov, A. Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges. J. Chem. Inf. Comp. Sci., 2003, 43, 2039-2047,
4. Cherkasov, A. Electronegativity of Atoms in Proteins. J. Chem. Inf. Model (CICS). 2005, 45, submitted.
5. Cherkasov, A., Shi, Z., Fallahi, M., Hammond, GL. Successful in Silico Discovery of Novel Non-Steroidal Ligands for Human Sex Hormone Binding Globulin. J. Med. Chem., 2005, 48, 3203-3213.
6. Anzali, S., Barenickel, G., Cezanne, B., Krug, M., Filimonov, D., Poroikob, V. Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra for Substances (PASS). J. Med. Chem. 2001, 44, 2432-2437.
7. Byvalov, E., Fechner, U., Sadowski, J., Schneider, G. Comparison of Support VEctr Machines and Artificial Neural Network Systems for Drug/Nondrug classification. J. Chem. Inf. Comp. Sci. 2003, 43, 1882-1889.
8. Zernov, V., Balakin, K.V., Ivaschenko, A.A., Savchuk, N.P., Pletnev, I.V. Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions. J. Chem. Inf. Comp. Sci. 2003, 43, 2048-2056.
9. Murcia-Soler, M., Perez-Gimenez, F., Garcia-March, F.J., Salabert-Salvador, M.T., Diaz-Villanueva, W., Castro-Bleda, M.J. Drugs and Nondrugs: An Effective Discrimination with Topological Methods and Artificial Neural Networks. J. Chem. Inf. Comp. Sci. 2003, 43, 1688-1702.
10. Frimurer, T.M., Bywater, R., Naerum, L., Lauritsen, L.N., Brunak, S. Improving the Odds in Discriminating ‘Drug-like’ from ‘Non Drug-like’ Compounds. J. Chem. Inf. Comp. Sci. 2000, 40, 1315-1324.
11. Galvez, J., de Julian-Ortiz, J.V., Garcia-Demenech, R. General Topological Patterns of Known Drugs. J. Mol. Graph. Model. 2001, 20, 84-94.
12. Sadowski, J., Kubinyi, H. A scoring Scheme for Discriminating between Drugs and Nondrugs. J. Med. Chem. 1998, 41, 3325-3329.
13. Ajay, Walters, W.P., Murcko M.A. Can we :earn to Distinguish between “Drug-like” and “Nondrug-like” Molecules? J. Med. Chem. 1998, 41, 3314-3324.
14. Tomas-Vert, F., Perez-Gimenez, F., Salabert-Salvador, M.T., Garcia-March, F.J., Jaen-Oltra, J. Artificial Neural Networks Applied to the Discrimination of Antibacterial Activity by Topological Methods. J. Molec. Struct. (Theochem). 2000, 504, 249-259.
15. Cronin, M.T.D., Aptula, A.O., Dearden, J.C., Duffy, J.C., Netzeva, T.I., Patel, H., Rowe, P.H., Schultz T.W., Worth, A.P., Voutzoulidis, K., Schuurmann, G. J. Chem. Inf. Comp. Sci. 2002, 42, 869.
16. Murcia-Soler M, Perez-Gimenez F, Garcia-March FJ, Salabert-Salvador MT, Diaz-Villanueva W, Castro-Bleda MJ, Villanueva-Pareja A. Artificial neural networks and linear discriminant analysis: a valuable combination in the selection of new antibacterial compounds. J Chem Inf Comput Sci. 2004, 44:1031-1041.
17. Molina, E., Diaz, H.G., Gonzalez M.P., Rodriguez, E., Uriarte, E. Distinguishing Antibacterial Compounds through a Topological Substructural Approach. J. Chem. Inf. Comp. Sci. 2004, 44, 515-521.
18. Garcia-Domenech, R., de Julian-Ortiz, J.V. Antimicrobial Activity Characterization in a Heterogeneous Group of Compounds. J. Chem. Inf. Comp. Sci. 1998, 38, 445-449.
19. Mishra, R.K., Garcia-Demenech, R., Galvez, J. Getting Discriminant Functions of Antibacterial Activity from Physicochemical and Topological Parameters. J. Chem. Inf. Comp. Sci. 2001, 41, 387-393.
20. Jaen-Oltra, J., Salabert-Sakvador, M.T., Garcia-March, F.J., Perez-Giminez, F., Tomas-Vert, Artificial neural network applied to prediction of fluorquinolone antibacterial activity by topological methods. F. J. Med. Chem. 2000, 43, 1143-1148.
21. ChemIDPlus database: http://chem.sis.nlm.nih.gov/chemidplus/
22. The Journal of Antibiotics database: http://www.nih.go.jp/~jun/NADB/byname.html
23. The Merck Index 13.4 CD-ROM Edition, CambridgeSoft, Cambridge, MA, 2004.
24. Assinex Gold Collection, Assinex Ltd., Moscow, 2004.
25. Halgren, T.A. Merck molecular force field .1. Basis, form, scope, parameterization, and performance of MMFF94. J. Comp. Chem. 1996, 17, 490-519.
26. Molecular Operational Environment, 2004, by Chemical Computation Group Inc., Montreal, Canada.
27. MOE SVL exchange community: http://svl.chemcomp.com/index.php
28. Zupan, J.; Gasteiger, J. Neural Networks in Chemistry and Drug Design, 2nd Ed.; Wiley: New York, 1999.
29. SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995.
30. Annadurai, S., Basu, S., Ray, S., Dastidar, S.G., Chakrabarty,A.N.. Antibacterial activity of the antiinflammatory agent diclofenac sodium. Indian J. Exp. Biol. 1998, 36, 86-90
31. Dash, S.K., Dastidar, S.G., and Chakrabarty, A.N. Antibacterial property of promazine hydrochloride. Indian J. Exp. Biol. 1977, 15, 324-326.
32. Dastidar, S.G., Chaudhury, A., Annadurai,S., Roy,S., Mookerjee,M., Chakrabarty, A.N. In vitro and in vivo antimicrobial action of fluphenazine. J. Chemother. 1995, 7, 201-206.
33. Dastidar, S.G., Das, S., Mookerjee, M., Chattopadhyay, D., Ray, S., Chakrabarty, A.N. Antibacterial activity of local anaesthetics procaine & lignocaine. Indian J. Med. Res. 1988, 87, 506-508.
34. Dastidar, S.G., Jairaj, J., Mookerjee, M., Chakrabarty, A.N. Studies on antimicrobial effect of the antihistaminic phenothiazine trimeprazine tartrate. Acta Microbiol. Immunol. Hung. 1997, 44, 241-247.
35. Dastidar, S.G., Mondal, U., Niyogi, S., Chakrabarty, A.N. Antibacterial property of methyl-DOPA & development of antibiotic cross-resistances in m-DOPA mutants. Indian J. Med. Res. 1986, 84, 142-147.
36. Kristiansen, J.E. Experiments to illustrate the effect of chlorpromazine on the permeability of the bacterial cell wall. Acta Pathol. Microbiol. Scand. [B], 1979 87, 317-319.
37. Kristiansen, J.E. The antimicrobial activity of non-antibiotics. Report from a congress on the antimicrobial effect of drugs other than antibiotics on bacteria, viruses, protozoa, and other organisms. APMIS Suppl., 1992, 30, 7-14.
38. Munoz-Bellido, J.L., Munoz-Criado, Garcia-Rodrigez, J.A. Antimicrobial Activity of Psychotropic Drugs. Selective Serotonin Uptake Inhibitors. Int. J. Antimicrob. Agens, 2000, 14, 177-180.
39. Lind, K., Kristiansen, J.E. Effect of some psychotropic Drugs and a Barbiturate on Mycoplasmas. Int. J. Antimicrob. Agens, 2000, 14, 235-238.
40. [no authors] Conforming regulations regarding removal of section 507 of the Federal Food, Drug, and Cosmetic Act; confirmation of effective date. Food and Drug Administration, HHS. Direct final rule; confirmation of effective date. Fed Regist. 1999, 17; 26657 [PMID: 10558515]
41. The Federal Food Drug and Cosmetic Act: http://www.fda.gov/opacom/laws/fdcact/fdctoc.htm
42. Analyticon Discovery Company: www.ac-discovery.com
43. NCI Open Database Compounds: http://cactus.nci.nih.gov/ncidb2/download.html
44. Feher, M., Schmidt, J.M. Property Distributions: Differences between Drugs, Natural Products and Molecules from Combinatorial Chemistry. J. Chem. Inf. Comp. Sci., 2003, 43, 218-227.
45. Alberts, A.W. Lovastatin and simvastatin--inhibitors of HMG CoA reductase and cholesterol biosynthesis. Cardiology, 1990, 77 Suppl 4, 14-21.
46. Zhou, D., White, R.H. Early steps of isoprenoid biosynthesis in Escherichia coli. Biocghem J, 1991, 273, 627-634.
47. Cabrera, J.A., Bolds, J., Shields, P.E., Havel, C.M., Watson, J.A. Isoprenoid synthesis in Halobacterium halobium. Modulation of 3-hydroxy-3-methylglutaryl coenzyme a concentration in response to mevalonate availability. J Biol Chem., 1986, 261, :3578-3583.
48. Lam, W.L., Doolittle, W.F. Shuttle vectors for the archaebacterium Halobacterium volcanii. Proc Natl Acad Sci U S A, 1989, 86, 5478-5482.
49. Lorenz, R.T., Parks, L.W. Effects of lovastatin (mevinolin) on sterol levels and on activity of azoles in Saccharomyces cerevisiae. Antimicrob Agents Chemother., 1990, 34, 1660-1665.
50. Sud, I.J., Feingold, D.S. Effect of ketoconazole in combination with other inhibitors of sterol synthesis on fungal growth. Antimicrob Agents Chemother., 1985, 28, 532-534.
51. Bejarano, E.R., Cerda-Olmedo, E. Independence of the carotene and sterol pathways of Phycomyces. FEBS Lett., 1992, 306, 209-212.
52. Engstrom, W., Larsson, O., Sachsenmaier, W. The effects of tunicamycin, mevinolin and mevalonic acid on HMG-CoA reductase activity and nuclear division in the myxomycete Physarum polycephalum. J Cell Sci., 1989, 92, 341-344.
53. Florin-Christensen, M., Florin-Christensen, J., Garin, C., Isola, E., Brenner, R.R., Rasmussen, L. Inhibition of Trypanosoma cruzi growth and sterol biosynthesis by lovastatin. Biochem Biophys Res Commun., 1990, 166, 1441-1445.
54. Urbina, J.A., Lazardi, K., Marchan, E., Visbal, G., Aguirre, T., Piras, M.M., Piras, R., Maldonado, R.A., Payares, G., de Souza, W. Mevinolin (lovastatin) potentiates the antiproliferative effects of ketoconazole and terbinafine against Trypanosoma (Schizotrypanum) cruzi: in vitro and in vivo studies. Antimicrob Agents Chemother., 1993, 37, 580-591.
55. Vandewaa, E.A., Mills, G., Chen, G.Z., Foster, L.A., Bennett, J.L. Physiological role of HMG-CoA reductase in regulating egg production by Schistosoma mansoni. Am J Physiol., 1989, 257, R618-625.
56. Chen, G.Z., Foster, L., Bennett, J.L. Antischistosomal action of mevinolin: evidence that 3-hydroxy-methylglutaryl-coenzyme a reductase activity in Schistosoma mansoni is vital for parasite survival. Naunyn Schmiedebergs Arch Pharmacol., 1990, 342, 477-482.
57. Grellier, P., Valentin, A., Millerioux, V., Schrevel, J., Rigomier, D. 3-Hydroxy-3-methylglutaryl coenzyme A reductase inhibitors lovastatin and simvastatin inhibit in vitro development of Plasmodium falciparum and Babesia divergens in human erythrocytes. Antimicrob Agents Chemother., 1994, 38, 1144-1148.
58. Coppens, I., Courtoy, P.J. Exogenous and endogenous sources of sterols in the culture-adapted procyclic trypomastigotes of Trypanosoma brucei. Mol Biochem Parasitol. 1995, 73, 179-188.
59. Ikeura, R., Murakawa, S., Endo, A. Growth inhibition of yeast by compactin (ML-236B) analogues. J Antibiot (Tokyo), 1988, 41, 1148-1150.
60. Fernandez, M.A., Garcia, M.D., Saenz, M.T. Antibacterial activity of the phenolic acids fractions of Scrophularia frutescens and Scrophularia sambucifolia. J. Ethnopharmacol., 1996, 26, 11-4.
61. Friedman, M.; Henika, P.R, Mandrell, R.E. Antibacterial Activities of Phenolic Benzaldehydes and Benzoic Acids against Campylobacter jejuni, Escherichia coli, Listeria monocytogenes, and Salmonella enterica. Journal of Food Protection, 2003, 66, 1811-1821.
62. Terekhova, L.P., Galatenko, O.A., Preobrazhenskaia, T.P. Sensitivity of actinomycetes of the genus Actinomadura to antibiotics. Antibiotiki, 1981, 26, 345-349.
63. Gol'dberg, L.E. Antibacterial and antineoplastic antibiotics. Sov Med., 1977, 10, 115-123.
64. Berlin, Y.A., Esipov, S.E., Kiseleva, O.A., Kolosov, M.N. Olivomycin and related antibiotics X. Isolation and acid degradation of olivomycins A, B, C, and D. Chemistry of Natural Compounds, 1968, 3, 280 – 285.
65. The NIAID database: http://chemdb.niaid.nih.gov