Skip to main content

Table 2 Performance of various machine learning classifiers to predict toxicity. The following classifiers are tested

From: eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates

Dataset

Metric

Toxicity classifiers

LDA

MLP

RF

ET

FDA-appr. /

TOXNET

ACC

0.745

0.744

0.760

0.756

TPR / FPR

0.723 / 0.232

0.679 / 0.180

0.733 / 0.218

0.719 / 0.186

MCC

0.495

0.525

0.528

0.523

KEGG-Drug /

T3DB

ACC

0.647

0.645

0.674

0.721

TPR / FPR

0.671 / 0.362

0.675 / 0.365

0.688 / 0.331

0.631 / 0.248

MCC

0.272

0.273

0.316

0.353

TCM

Tox-score

0.504 ± 0.013

0.537 ± 0.242

0.574 ± 0.143

0.552 ± 0.122

% toxic

63.9

61.8

68.5

59.7

  1. Linear Discriminant Analysis (LDA), Multi-Layer Perceptron (MLP), Random Forest (RF), and Extra Trees (ET). Individual models are first trained and 5-fold cross-validated against FDA-approved and TOXNET datasets and then applied to KEGG-Drug and T3DB as an additional validation against independent datasets. The performance of toxicity classifiers on FDA-approved / TOXNET and KEGG-Drug / T3DB datasets is assessed with the accuracy (ACC, Eq. 1), true (TPR, Eq. 2) and false (FPR, Eq. 3) positive rates, and the Matthews correlation coefficient (MCC, Eq. 4). The best performance across all models in terms of the highest ACC and MCC values are highlighted in bold. Finally, the trained models are applied to estimate the toxicity of traditional Chinese medicines in the TCM dataset and the average ± standard deviation Tox-score values as well as the percentage of predicted toxic molecules are reported