BASE (Binding Affinity Similarity Explorer)

About the Service

Binding Affinity Similarity Explorer (BASE) is a web-based platform designed to analyze the similarity bias in compound-protein binding affinity predictions. BASE provides comprehensive compound-protein binding affinity datasets split by similarity to create training and test sets for model training and performance evaluation. Additionally, BASE offers example results showcasing the prediction performance of state-of-the-art models based on similarity.

Publication

BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias

Key Features

Extensive Database Analysis: BASE collects and analyzes data from seven well-known databases, including ChEMBL and BindingDB, to identify similarity biases within compound-protein binding affinity datasets.
Customizable Training Sets: Users can create training sets based on three types of similarities—protein sequence, gene ontology, and integrated similarity—by setting specific similarity thresholds relative to the test set.
Example Results: BASE demonstrates the impact of different similarity settings through the performance of established prediction methods, highlighting how similarity thresholds influence prediction accuracy.

What We Provide

Data Browser: BASE allows users to interactively explore and generate compound-protein binding affinity training and test sets. Selections can be made based on user-defined similarity cutoffs across:

Protein Sequence Similarity
Gene Ontology Similarity
Integrated Similarity

Running Examples: BASE provides access to regression prediction results and performance analyses using advanced methods like ColdDTA and MMD-DTA.

Visitors Today: 1

Total Visitors: 1641

The data in this platform is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0).. See the LICENSE file for more details.

Database Statistics

BASE (Binding Affinity Similarity Explorer) has collected relation information of human protein targets and compounds (including drugs) from the following seven well-known databases. The collected data are measured by binding affinity, specifically using the dissociation constant (Kd), or are verified to have definitive binding through the assay descriptions.

Performance of ColdDTA with Varying Similarity Cutoffs

BASE (Binding Affinity Similarity Explorer) evaluated the performance of existing binding affinity prediction methods by progressively removing similarity with the test set to build the training set. This evaluation revealed a significant drop in prediction performance as similarity decreases (Similarity Bias).

Note: Number of test set is fixed at 80,578 data points.

Performance of MMD-DTA with Varying Similarity Cutoffs

Note: Number of test set is fixed at 80,578 data points.

These tables demonstrate the effect of decreasing similarity cutoffs on the performance of the models ColdDTA and MMD-DTA. As the similarity to the test set diminishes, indicated by lower cutoff values, the performance metrics—PCC (Pearson Correlation Coefficient), MSE (Mean Squared Error), CI (Concordance Index), Precision (Prec), Recall, and BACC (Balanced Accuracy)—generally decline. This trend highlights the significant impact of similarity bias on predictive performance.

Download Train Set Download Test Set

Download Prediction Result

Contact

Principal Investigator: Gwan-Su Yi, gwansuyi@kaist.ac.kr, Room 1008 CMS Building (E16) KAIST

Lab Member: Hyojin Son, Github, hyojin0912@kaist.ac.kr, Room 411 CMS Building (E16) KAIST

Address

291 Daehak-ro Yuseong-gu Daejeon 34141, Republic of Korea
Synergistic Bioinformatics Laboratory
Dept. of Bio and Brain Engineering
Korea Advanced Institute of Science and Technology (KAIST)

References

Publication

1. BASE: Son, H., Lee, S., Kim, J., Park, H., Hwang, M. H., & Yi, G. S. (2024). BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias. BMC Bioinformatics, 25, 340.

Databases

1. PDBbind: Liu, Z., Su, M., Han, L., Liu, J., Yang, Q., Li, Y., & Wang, R. (2017). Forging the basis for developing protein–ligand interaction scoring functions. Accounts of Chemical Research, 50(2), 302-309.

2. BindingDB: Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Research, 44(D1), D1045-D1053.

3. ChEMBL: Zdrazil, B., Felix, E., Hunter, F., Manners, E. J., Blackshaw, J., Corbett, S., ... & Leach, A. R. (2024). The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Research, 52(D1), D1180-D1192.

4. IUPHAR: Harding, S. D., Armstrong, J. F., Faccenda, E., Southan, C., Alexander, S. P., Davenport, A. P., ... & Davies, J. A. (2024). The IUPHAR/BPS guide to pharmaCOLOGY in 2024. Nucleic Acids Research, 52(D1), D1438-D1449.

5. GPCRdb: Pándy-Szekeres, G., Caroli, J., Mamyrbekov, A., Kermani, A. A., Keserű, G. M., Kooistra, A. J., & Gloriam, D. E. (2023). GPCRdb in 2023: state-specific structure models using AlphaFold2 and new ligand resources. Nucleic Acids Research, 51(D1), D395-D402.

6. GLASS: Chan, W. K., Zhang, H., Yang, J., Brender, J. R., Hur, J., Özgür, A., & Zhang, Y. (2015). GLASS: a comprehensive database for experimentally validated GPCR-ligand associations. Bioinformatics, 31(18), 3035-3042.

7. Davis: Davis, M. I., Hunt, J. P., Herrgard, S., Ciceri, P., Wodicka, L. M., Pallares, G., ... & Zarrinkar, P. P. (2011). Comprehensive analysis of kinase inhibitor selectivity. Nature Biotechnology, 29(11), 1046-1051.

8. NR-DBIND: Réau, M., Lagarde, N., Zagury, J. F., & Montes, M. (2018). Nuclear receptors database including negative data (NR-DBIND): a database dedicated to nuclear receptors binding data including negative data and pharmacological profile: Miniperspective. Journal of Medicinal Chemistry, 62(6), 2894-2904.

Prediction Models

1. ColdDTA: Fang, K., Zhang, Y., Du, S., & He, J. (2023). ColdDTA: utilizing data augmentation and attention-based feature fusion for compound-protein binding affinity prediction. Computers in Biology and Medicine, 164, 107372.

2. MMD-DTA: Qi, Z., Liu, L., Wei, Y., Zhang, S., & Liao, B. (2023). MMD-DTA: A multi-modal deep learning framework for compound-protein binding affinity and binding region prediction. bioRxiv, 2023-09.