Binding Affinity Similarity Explorer (BASE) is a web-based platform designed to analyze the similarity bias in compound-protein binding affinity predictions. BASE provides comprehensive compound-protein binding affinity datasets split by similarity to create training and test sets for model training and performance evaluation. Additionally, BASE offers example results showcasing the prediction performance of state-of-the-art models based on similarity.
Data Browser: BASE allows users to interactively explore and generate compound-protein binding affinity training and test sets. Selections can be made based on user-defined similarity cutoffs across:
Running Examples: BASE provides access to regression prediction results and performance analyses using advanced methods like ColdDTA and MMD-DTA.
Visitors Today: 5
Total Visitors: 252
The data in this platform is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0).. See the LICENSE file for more details.
BASE (Binding Affinity Similarity Explorer) has collected relation information of human protein targets and compounds (including drugs) from the following seven well-known databases. The collected data are measured by binding affinity, specifically using the dissociation constant (Kd), or are verified to have definitive binding through the assay descriptions.
BASE (Binding Affinity Similarity Explorer) evaluated the performance of existing binding affinity prediction methods by progressively removing similarity with the test set to build the training set. This evaluation revealed a significant drop in prediction performance as similarity decreases (Similarity Bias).
Note: Number of test set is fixed at 80,578 data points.
Note: Number of test set is fixed at 80,578 data points.
These tables demonstrate the effect of decreasing similarity cutoffs on the performance of the models ColdDTA and MMD-DTA. As the similarity to the test set diminishes, indicated by lower cutoff values, the performance metrics—PCC (Pearson Correlation Coefficient), MSE (Mean Squared Error), CI (Concordance Index), Precision (Prec), Recall, and BACC (Balanced Accuracy)—generally decline. This trend highlights the significant impact of similarity bias on predictive performance.
Principal Investigator: Gwan-Su Yi, gwansuyi@kaist.ac.kr, Room 1008 CMS Building (E16) KAIST
Lab Member: Hyojin Son, Github, hyojin0912@kaist.ac.kr, Room 411 CMS Building (E16) KAIST
291 Daehak-ro Yuseong-gu Daejeon 34141, Republic of Korea
Synergistic Bioinformatics Laboratory
Dept. of Bio and Brain Engineering
Korea Advanced Institute of Science and Technology (KAIST)
1. BASE: Son, H., Lee, S., Kim, J., Park, H., Hwang, M. H., & Yi, G. S. (2024). BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias. BMC Bioinformatics, 25, 340.
1. PDBbind: Liu, Z., Su, M., Han, L., Liu, J., Yang, Q., Li, Y., & Wang, R. (2017). Forging the basis for developing protein–ligand interaction scoring functions. Accounts of Chemical Research, 50(2), 302-309.
2. BindingDB: Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Research, 44(D1), D1045-D1053.
3. ChEMBL: Zdrazil, B., Felix, E., Hunter, F., Manners, E. J., Blackshaw, J., Corbett, S., ... & Leach, A. R. (2024). The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Research, 52(D1), D1180-D1192.
4. IUPHAR: Harding, S. D., Armstrong, J. F., Faccenda, E., Southan, C., Alexander, S. P., Davenport, A. P., ... & Davies, J. A. (2024). The IUPHAR/BPS guide to pharmaCOLOGY in 2024. Nucleic Acids Research, 52(D1), D1438-D1449.
5. GPCRdb: Pándy-Szekeres, G., Caroli, J., Mamyrbekov, A., Kermani, A. A., Keserű, G. M., Kooistra, A. J., & Gloriam, D. E. (2023). GPCRdb in 2023: state-specific structure models using AlphaFold2 and new ligand resources. Nucleic Acids Research, 51(D1), D395-D402.
6. GLASS: Chan, W. K., Zhang, H., Yang, J., Brender, J. R., Hur, J., Özgür, A., & Zhang, Y. (2015). GLASS: a comprehensive database for experimentally validated GPCR-ligand associations. Bioinformatics, 31(18), 3035-3042.
7. Davis: Davis, M. I., Hunt, J. P., Herrgard, S., Ciceri, P., Wodicka, L. M., Pallares, G., ... & Zarrinkar, P. P. (2011). Comprehensive analysis of kinase inhibitor selectivity. Nature Biotechnology, 29(11), 1046-1051.
8. NR-DBIND: Réau, M., Lagarde, N., Zagury, J. F., & Montes, M. (2018). Nuclear receptors database including negative data (NR-DBIND): a database dedicated to nuclear receptors binding data including negative data and pharmacological profile: Miniperspective. Journal of Medicinal Chemistry, 62(6), 2894-2904.
1. ColdDTA: Fang, K., Zhang, Y., Du, S., & He, J. (2023). ColdDTA: utilizing data augmentation and attention-based feature fusion for compound-protein binding affinity prediction. Computers in Biology and Medicine, 164, 107372.
2. MMD-DTA: Qi, Z., Liu, L., Wei, Y., Zhang, S., & Liao, B. (2023). MMD-DTA: A multi-modal deep learning framework for compound-protein binding affinity and binding region prediction. bioRxiv, 2023-09.