Tox21 Leaderboard 🧪
Measuring AI progress in Drug Discovery
| Rank | Type | Model | Organization | Publication | Avg. AUC | Avg. ΔAUC-PR | # Parameters | ROC-AUC | ΔAUC-PR | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NR-AR | NR-AR-LBD | NR-AhR | NR-Aromatase | NR-ER | NR-ER-LBD | NR-PPAR-gamma | SR-ARE | SR-ATAD5 | SR-HSE | SR-MMP | SR-p53 | NR-AR | NR-AR-LBD | NR-AhR | NR-Aromatase | NR-ER | NR-ER-LBD | NR-PPAR-gamma | SR-ARE | SR-ATAD5 | SR-HSE | SR-MMP | SR-p53 | ||||||||
| 🥇 | 🔼 | DeepTox | JKU Linz | DeepTox: Toxicity Prediction using Deep Learning | 0.846 | 0.807 | 0.879 | 0.928 | 0.834 | 0.810 | 0.814 | 0.861 | 0.840 | 0.793 | 0.865 | 0.942 | 0.862 | ||||||||||||||
| 🥈 | 🔼 | SNN | JKU Linz | Self-Normalizing Neural Networks | 0.844 | 0.261 | 1.9M | 0.852 | 0.918 | 0.897 | 0.789 | 0.809 | 0.814 | 0.838 | 0.784 | 0.813 | 0.828 | 0.937 | 0.849 | 0.236 | 0.098 | 0.446 | 0.189 | 0.317 | 0.225 | 0.171 | 0.242 | 0.219 | 0.323 | 0.448 | 0.223 |
| 🥉 | 🔼 | RF | JKU Linz | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.829 | 0.299 | 40.1M | 0.781 | 0.769 | 0.916 | 0.823 | 0.814 | 0.768 | 0.832 | 0.800 | 0.809 | 0.841 | 0.946 | 0.851 | 0.198 | 0.042 | 0.456 | 0.315 | 0.417 | 0.285 | 0.203 | 0.239 | 0.290 | 0.333 | 0.534 | 0.274 |
| 4 | 🔼 | XGBoost | JKU Linz | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.823 | 0.277 | 460.7K | 0.735 | 0.804 | 0.912 | 0.822 | 0.813 | 0.789 | 0.771 | 0.810 | 0.818 | 0.824 | 0.945 | 0.827 | 0.131 | 0.072 | 0.479 | 0.295 | 0.404 | 0.228 | 0.139 | 0.268 | 0.314 | 0.250 | 0.536 | 0.206 |
| 5 | 🔼 | Chemprop | MIT (trained by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.815 | 0.232 | 709K | 0.839 | 0.861 | 0.893 | 0.767 | 0.818 | 0.767 | 0.772 | 0.748 | 0.788 | 0.805 | 0.914 | 0.813 | 0.302 | 0.065 | 0.433 | 0.108 | 0.333 | 0.137 | 0.067 | 0.267 | 0.210 | 0.284 | 0.445 | 0.131 |
| 6 | 🔼 | GIN | MIT & Stanford (trained by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.810 | 0.244 | 154K | 0.808 | 0.882 | 0.890 | 0.773 | 0.771 | 0.778 | 0.740 | 0.756 | 0.787 | 0.774 | 0.930 | 0.836 | 0.281 | 0.097 | 0.399 | 0.142 | 0.381 | 0.277 | 0.072 | 0.213 | 0.240 | 0.261 | 0.416 | 0.144 |
| 7 | ⤵️ | TabPFN | PriorLabs (trained by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.807 | 0.262 | 86.9M | 0.753 | 0.741 | 0.893 | 0.770 | 0.782 | 0.806 | 0.793 | 0.798 | 0.787 | 0.816 | 0.942 | 0.806 | 0.161 | 0.030 | 0.413 | 0.223 | 0.402 | 0.266 | 0.158 | 0.293 | 0.222 | 0.328 | 0.477 | 0.173 |
| 8 | 0️⃣ | GPT-OSS | OpenAI (inference by JKU Linz) | Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge | 0.703 | 0.088 | 120B | 0.625 | 0.673 | 0.829 | 0.715 | 0.656 | 0.705 | 0.659 | 0.701 | 0.667 | 0.728 | 0.765 | 0.710 | 0.036 | 0.009 | 0.251 | 0.104 | 0.081 | 0.070 | 0.048 | 0.127 | 0.060 | 0.072 | 0.141 | 0.058 |
Avg. AUC: Mean ROC-AUC across all 12 tasks
Avg. ΔAUC-PR: Mean ΔAUC-PR across all 12 tasks
Rank: based on Avg. AUC
Type: 0️⃣ Zero-shot | 1️⃣ Few-shot | ⤵️ Pre-trained | 🔼 Models trained from scratch