Electra Base Discriminator

google/electra-base-discriminator

published Mar 2022 · updated Feb 2024

Electra Base Discriminator is a transformer model pretrained using replaced token detection to distinguish real input tokens from generator-produced fake tokens.

status

coming soon

API providers

downloads / mo

40.4M

license

apache-2.0

specs

Task	Token-level discrimination (real vs. fake tokens) / fine-tunable for classification, QA, sequence tagging
Architecture	Transformer encoder (ELECTRA base)
Pre-training Objective	Replaced token detection
License	Apache 2.0

about this model

google/electra-base-discriminator is a transformer-based language representation model pre-trained using replaced token detection, a discriminative objective that distinguishes real input tokens from fake tokens generated by a small generator network. This approach, introduced in the ELECTRA paper (ICLR 2020), enables the model to learn efficiently from all input tokens rather than just masked ones.

Key Strengths

Compute efficiency: A small ELECTRA model trained on a single GPU for 4 days outperforms GPT, which uses 30× more compute, on the GLUE benchmark. At scale, ELECTRA performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute, and exceeds them when given the same compute budget.
Versatile fine-tuning: Supports downstream tasks including text classification (GLUE), question answering (SQuAD), and sequence tagging.

Benchmark Results

On the SQuAD 2.0 leaderboard, ELECTRA-based models achieve strong results. The best single ELECTRA model (“ELECTRA+ATRLP+PV”) reports an Exact Match (EM) of 89.551 and an F1 score of 92.366. Another variant, “Retro-Reader on ELECTRA,” achieves EM 89.562 and F1 92.052.

The model is released under the Apache 2.0 license and has been widely adopted with over 40 million downloads in the past month on Hugging Face, along with 75 fine-tuned variants and 22 Spaces.

best for

·Fine-tuning for text classification (e.g., GLUE tasks)
·Fine-tuning for question answering (e.g., SQuAD 2.0)
·Token-level anomaly detection or real/fake token identification

FAQ

What pre-training objective does ELECTRA use?

ELECTRA uses replaced token detection, where the model learns to detect which tokens were replaced by a generator.

What is the model size and compute requirement?

The card does not specify exact parameters, but the base model is efficient enough to train on a single GPU; it outperforms models using much more compute.

What license is this model released under?

Apache 2.0 license.

Can I fine-tune this model on my own dataset?

Yes, it supports fine-tuning on classification, QA, and sequence tagging tasks using the Transformers library.

How do I use this model via the gigarouter API?

Call the OpenAI-compatible endpoint with your API key, specifying the model as google/electra-base-discriminator.

not yet live

We're benchmarking and onboarding Electra Base Discriminator as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

wespeaker-voxceleb-resnet34-LM

6.8M dl/mo

unidepth-v2-vitl14

6.3M dl/mo

stable-diffusion-v1-5-archive

5.8M dl/mo