Electra Base Discriminator
google/electra-base-discriminator
published Mar 2022 · updated Feb 2024
Electra Base Discriminator is a transformer model pretrained using replaced token detection to distinguish real input tokens from generator-produced fake tokens.
specs
| Task | Token-level discrimination (real vs. fake tokens) / fine-tunable for classification, QA, sequence tagging |
| Architecture | Transformer encoder (ELECTRA base) |
| Pre-training Objective | Replaced token detection |
| License | Apache 2.0 |
about this model
google/electra-base-discriminator is a transformer-based language representation model pre-trained using replaced token detection, a discriminative objective that distinguishes real input tokens from fake tokens generated by a small generator network. This approach, introduced in the ELECTRA paper (ICLR 2020), enables the model to learn efficiently from all input tokens rather than just masked ones.
Key Strengths
- Compute efficiency: A small ELECTRA model trained on a single GPU for 4 days outperforms GPT, which uses 30× more compute, on the GLUE benchmark. At scale, ELECTRA performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute, and exceeds them when given the same compute budget.
- Versatile fine-tuning: Supports downstream tasks including text classification (GLUE), question answering (SQuAD), and sequence tagging.
Benchmark Results
On the SQuAD 2.0 leaderboard, ELECTRA-based models achieve strong results. The best single ELECTRA model (“ELECTRA+ATRLP+PV”) reports an Exact Match (EM) of 89.551 and an F1 score of 92.366. Another variant, “Retro-Reader on ELECTRA,” achieves EM 89.562 and F1 92.052.
The model is released under the Apache 2.0 license and has been widely adopted with over 40 million downloads in the past month on Hugging Face, along with 75 fine-tuned variants and 22 Spaces.
best for
- ·Fine-tuning for text classification (e.g., GLUE tasks)
- ·Fine-tuning for question answering (e.g., SQuAD 2.0)
- ·Token-level anomaly detection or real/fake token identification
FAQ
ELECTRA uses replaced token detection, where the model learns to detect which tokens were replaced by a generator.
The card does not specify exact parameters, but the base model is efficient enough to train on a single GPU; it outperforms models using much more compute.
Apache 2.0 license.
Yes, it supports fine-tuning on classification, QA, and sequence tagging tasks using the Transformers library.
Call the OpenAI-compatible endpoint with your API key, specifying the model as google/electra-base-discriminator.
We're benchmarking and onboarding Electra Base Discriminator as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.