""" Created on Tue Sep 18 22:54:42 2015 @author: CBDD Group, CSU, China """ <1>. Data 1. The targets properties file: The file contains the targets' Uniprot ID, protein name, function, PDB ID, Biogrid ID, DrugBank ID, GuidetoPHARMACOLOGY ID, PharmGKB ID, KEGG ID, DIP ID, STRING ID, MINT ID, IntAct ID, DMDM ID, BRENDA ID, Reactome ID, SignaLink ID, BioCyc ID, GeneID and BindingDB ID. 2. The molecules about targets file: The file contains molecules' BindingDB ID, smi format, the bioactivity value about the targets. 3. The model data file: The file contains the positive part and the negative part. Each part contains 623 files named by their Uniprot ID. <2>. The performance of models 1. The performance of the model calculated with FP2 fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. 2. The performance of the model calculated with FP4 fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. 3. The performance of the model calculated with ECFP2 fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. 4. The performance of the model calculated with ECFP4 fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. 5. The performance of the model calculated with ECFP6 fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. 6. The performance of the model calculated with MACCS fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. 7. The performance of the model calculated with Daylight fingerprint: The file contains 623 targets' model evaluation inculding Accuracy, AUC, MCC and F1 score. <3>.The models 1. The models calculated with FP2 fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. 2. The models calculated with FP4 fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. 3. The models calculated with ECFP2 fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. 4. The models calculated with ECFP4 fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. 5. The models calculated with ECFP6 fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. 6. The models calculated with MACCS fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. 7. The models calculated with Daylight fingerprint: The file contains 623 targets' models. Each model contains six files inculding one 'pkl' file and five 'npy' files. You should put them in the same document folder if you want to use the model. <4>. The scripts 1. Before using these scripts,you should have installed some Python packages(pybel, numpy, pydpi, rdkit). 2. Use your molecule(*.smi) and the 'pkl' file to get the prediction. 3. The prediction contains two parts.The first part is the predicted label which presents as 0 representing negative or 1 representing positive. The second part is the predicted value from 0 to 1 represents the positive posibility of the molecule. 4. If you want to know the interaction between dapagliflozin and sodium/glucose cotransporter 2, you can run the script as the example. Example: import numpy as np import pybel from pydpi.drug import fingerprint as fp from pydpi import pydrug from rdkit.Chem import AllChem from rdkit import Chem ################################################################## '''get smi_des FP2''' ##############################################################get smi_des FP2 import numpy as np import pybel def get_smi_des_FP2(smi): mol = pybel.readstring("smi",smi) fp = mol.calcfp() des = fp.bits num_map = np.array(des)-np.ones(len(des)) list_finger_0_1 = np.zeros(1024) list_finger_0_1[list(num_map)] = 1 return list(list_finger_0_1) #################################################################### '''Prediction''' #################################################################### from sklearn.naive_bayes import BernoulliNB from sklearn import cross_validation from sklearn import metrics from sklearn.externals import joblib import numpy as np def predict_FP2(smi,pkl_path): des = get_smi_des_FP2(smi) x = np.array(des) clf = joblib.load(pkl_path) pred_label = clf.predict(x) pred_prob = clf.predict_proba(x) return list(pred_label)[0],pred_prob[0,1] if __name__ == '__main__': smi = 'O.CC(O)CO.CCOC1=CC=C(CC2=CC(=CC=C2Cl)[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)C=C1' #the dapagliflozin smiles pkl_path = /'*'/P31639.pkl' #the path of your 'pkl' file print preditc_FP2(smi,pkl_path)