Tox21 Leaderboard 🧪
Measuring AI progress in Drug Discovery
| Rank | Type | Model | Organization | Publication | Avg. AUC | Avg. ΔAUC-PR | # Parameters | ROC-AUC | ΔAUC-PR | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NR-AR | NR-AR-LBD | NR-AhR | NR-Aromatase | NR-ER | NR-ER-LBD | NR-PPAR-gamma | SR-ARE | SR-ATAD5 | SR-HSE | SR-MMP | SR-p53 | NR-AR | NR-AR-LBD | NR-AhR | NR-Aromatase | NR-ER | NR-ER-LBD | NR-PPAR-gamma | SR-ARE | SR-ATAD5 | SR-HSE | SR-MMP | SR-p53 | ||||||||
| 🥇 | 🔼 | DeepTox | JKU Linz | DeepTox: Toxicity Prediction using Deep Learning | 0.847 | 0.302 | 0.807 | 0.850 | 0.928 | 0.834 | 0.793 | 0.815 | 0.839 | 0.841 | 0.793 | 0.858 | 0.941 | 0.862 | 0.221 | 0.052 | 0.509 | 0.350 | 0.421 | 0.265 | 0.167 | 0.301 | 0.433 | 0.223 | 0.437 | 0.248 | |
| 🥈 | 🔼 | SNN | JKU Linz | Self-Normalizing Neural Networks | 0.844 | 0.261 | 1.9M | 0.852 | 0.918 | 0.897 | 0.789 | 0.809 | 0.814 | 0.838 | 0.784 | 0.813 | 0.828 | 0.937 | 0.849 | 0.236 | 0.098 | 0.446 | 0.189 | 0.317 | 0.225 | 0.171 | 0.242 | 0.219 | 0.323 | 0.448 | 0.223 |
| 🥉 | ⤵️ | CheMeleon | MIT | Descriptor-based Foundation Models for Molecular Property Prediction | 0.838 | 0.294 | 40M | 0.829 | 0.873 | 0.913 | 0.808 | 0.804 | 0.784 | 0.821 | 0.789 | 0.827 | 0.830 | 0.948 | 0.826 | 0.205 | 0.055 | 0.522 | 0.272 | 0.420 | 0.286 | 0.136 | 0.261 | 0.234 | 0.458 | 0.511 | 0.174 |
| 4 | 🔼 | RF | JKU Linz | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.829 | 0.299 | 40.1M | 0.781 | 0.769 | 0.916 | 0.823 | 0.814 | 0.768 | 0.832 | 0.800 | 0.809 | 0.841 | 0.946 | 0.851 | 0.198 | 0.042 | 0.456 | 0.315 | 0.417 | 0.285 | 0.203 | 0.239 | 0.290 | 0.333 | 0.534 | 0.274 |
| 5 | 🔼 | SNN Ensemble | Rasayan Labs Inc. | 0.827 | 0.291 | 19M | 0.803 | 0.874 | 0.916 | 0.772 | 0.806 | 0.744 | 0.828 | 0.792 | 0.817 | 0.818 | 0.923 | 0.827 | 0.382 | 0.067 | 0.538 | 0.218 | 0.382 | 0.248 | 0.169 | 0.244 | 0.291 | 0.287 | 0.487 | 0.176 | |
| 6 | 🔼 | XGBoost | JKU Linz | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.823 | 0.277 | 460.7K | 0.735 | 0.804 | 0.912 | 0.822 | 0.813 | 0.789 | 0.771 | 0.810 | 0.818 | 0.824 | 0.945 | 0.827 | 0.131 | 0.072 | 0.479 | 0.295 | 0.404 | 0.228 | 0.139 | 0.268 | 0.314 | 0.250 | 0.536 | 0.206 |
| 7 | ⤵️ | GROVER | Tencent AI Lab (finetuned by JKU Linz) | Self-Supervised Graph Transformer on Large-Scale Molecular Data | 0.822 | 0.233 | 48.4M | 0.847 | 0.881 | 0.914 | 0.818 | 0.767 | 0.734 | 0.815 | 0.794 | 0.772 | 0.779 | 0.919 | 0.827 | 0.166 | 0.087 | 0.460 | 0.166 | 0.350 | 0.124 | 0.131 | 0.254 | 0.145 | 0.380 | 0.391 | 0.143 |
| 8 | 🔼 | Chemprop | MIT (trained by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.815 | 0.232 | 709K | 0.839 | 0.861 | 0.893 | 0.767 | 0.818 | 0.767 | 0.772 | 0.748 | 0.788 | 0.805 | 0.914 | 0.813 | 0.302 | 0.065 | 0.433 | 0.108 | 0.333 | 0.137 | 0.067 | 0.267 | 0.210 | 0.284 | 0.445 | 0.131 |
| 9 | 🔼 | GIN | MIT & Stanford (trained by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.810 | 0.244 | 154K | 0.808 | 0.882 | 0.890 | 0.773 | 0.771 | 0.778 | 0.740 | 0.756 | 0.787 | 0.774 | 0.930 | 0.836 | 0.281 | 0.097 | 0.399 | 0.142 | 0.381 | 0.277 | 0.072 | 0.213 | 0.240 | 0.261 | 0.416 | 0.144 |
| 10 | ⤵️ | TabPFN | PriorLabs (trained by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.807 | 0.262 | 86.9M | 0.753 | 0.741 | 0.893 | 0.770 | 0.782 | 0.806 | 0.793 | 0.798 | 0.787 | 0.816 | 0.942 | 0.806 | 0.161 | 0.030 | 0.413 | 0.223 | 0.402 | 0.266 | 0.158 | 0.293 | 0.222 | 0.328 | 0.477 | 0.173 |
| 11 | 🔼 | D-MPNN (chemprop) | Independent Researcher | D-MPNN Model for Toxicity Prediction on the Original Tox21 Benchmark | 0.799 | 0.262 | 2.1M | 0.797 | 0.818 | 0.884 | 0.809 | 0.767 | 0.768 | 0.761 | 0.797 | 0.736 | 0.733 | 0.914 | 0.797 | 0.296 | 0.056 | 0.428 | 0.217 | 0.329 | 0.151 | 0.199 | 0.315 | 0.202 | 0.250 | 0.521 | 0.184 |
| 12 | 0️⃣ | GPT-OSS | OpenAI (inference by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.702 | 0.083 | 120B | 0.558 | 0.702 | 0.824 | 0.708 | 0.688 | 0.692 | 0.676 | 0.707 | 0.646 | 0.763 | 0.732 | 0.724 | 0.016 | 0.018 | 0.233 | 0.077 | 0.077 | 0.090 | 0.049 | 0.121 | 0.060 | 0.076 | 0.114 | 0.060 |
Avg. AUC: Mean ROC-AUC across all 12 tasks
Avg. ΔAUC-PR: Mean ΔAUC-PR across all 12 tasks
Rank: based on Avg. AUC
Type: 0️⃣ Zero-shot | 1️⃣ Few-shot | ⤵️ Pre-trained | 🔼 Models trained from scratch