GPT-2
openai-community/gpt2
published Mar 2022 · updated Feb 2024
GPT-2 is a text-generation model that predicts the next word in a sequence, trained on a large corpus of English data using a causal language modeling objective.
specs
| Task | Text Generation |
| Architecture | Transformer (decoder-only) |
| Parameters | 124M |
| License | Modified MIT |
about this model
GPT-2 is a text-generation model that predicts the next word in a sequence, pretrained on a large corpus of English text using a causal language modeling objective. Developed by OpenAI in February 2019, this is the smallest version of GPT-2 with 124 million parameters. The model was trained on the WebText dataset, which comprises text from 45 million outbound Reddit links (excluding Wikipedia) and contains approximately 40GB of text. The training data cutoff is the end of 2017.
The model uses a byte-level version of Byte Pair Encoding (BPE) with a vocabulary size of 50,257 and processes input sequences of up to 1024 consecutive tokens. It was trained to predict the next token in a sequence using a causal mask, learning an internal representation of English that is best suited for text generation from a prompt.
Zero-shot evaluation results
| Dataset | LAMBADA (PPL) | LAMBADA (ACC) | CBT-CN (ACC) | CBT-NE (ACC) | WikiText2 (PPL) | PTB (PPL) | enwiki8 (BPB) | text8 (BPC) | WikiText103 (PPL) | 1BW (PPL) |
|---|---|---|---|---|---|---|---|---|---|---|
| GPT-2 (124M) | 35.13 | 45.99 | 87.65 | 83.4 | 29.41 | 65.85 | 1.16 | 1.17 | 37.50 | 75.20 |
The model was trained on the WebText dataset, which contains text from 45 million outbound Reddit links (excluding Wikipedia) and weighs approximately 40GB. The training data reflects the biases present in unfiltered internet content. As noted in the original model card, GPT-2 does not distinguish fact from fiction, and all versions should be approached with similar caution regarding biases related to human attributes.
best for
- ·Generating creative text from a prompt
- ·Prototyping language model applications
- ·Educational experiments with causal language models
FAQ
GPT-2 is best for generating coherent English text from a prompt, such as creative writing, autocompletion, or chatbots. It is not fine-tuned for factual accuracy.
GPT-2 (124M parameters) is the smallest version, making it faster and less resource-intensive than GPT-2 Medium (355M), Large (774M), or XL (1.5B).
GPT-2 is released under a Modified MIT license, which permits use, modification, and distribution with attribution.
GPT-2 expects tokenized text using a byte-level BPE tokenizer with a vocabulary size of 50,257. Input sequences can be up to 1024 tokens.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending a prompt in the standard chat completions or text completions format.
We're benchmarking and onboarding GPT-2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.
