Skip to main content

Table 1 Compound datasets used to evaluate the performance of eToxPred. These non-redundant sets are employed to train and test SAscore, Tox-score, and specific toxicities

From: eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates

Dataset

Size

Usage

Description

NuBBE

1008

Train/test (SAscore)

Natural products and derivatives from the Brazilian biodiversity

UNPD

81,372

Train/test (SAscore)

Diverse collection of natural products

DUD-E (actives)

17,499

Train/test (SAscore)

Mostly synthetic bioactive compounds against 102 protein targets

FDA-approved

1515

Train/test (SAscore)

Train (Tox-score)

FDA approved drugs from DrugBank

KEGG-Drug

3682

Test (Tox-score)

Drugs approved in Japan, United States, and Europe

TOXNET

3035

Train (Tox-score)

Potentially hazardous chemicals

T3DB

1283

Test (Tox-score)

Collection of pollutants, pesticides, drugs, and food toxins

TCM

5883

Test (SAscore, Tox-score, unlabeled)

Traditional Chinese medicines

CP

1401

Train/test (specific toxicity)

Carcinogenic compounds tested in rodents

CD

1571

Train/test (specific toxicity)

Cardiotoxic compounds tested against hERG potassium channel

ED

17,059

Train/test (specific toxicity)

Endocrine disrupting compounds tested against androgen and estrogen receptors

AO

12,612

Train/test (specific toxicity)

Toxins from various sources annotated with acute oral toxicity