GPT-2

openai-community/gpt2

published Mar 2022 · updated Feb 2024

GPT-2 is a text-generation model that predicts the next word in a sequence, trained on a large corpus of English data using a causal language modeling objective.

status

coming soon

API providers

downloads / mo

13.3M

license

mit

specs

Task	Text Generation
Architecture	Transformer (decoder-only)
Parameters	124M
License	Modified MIT

about this model

GPT-2 is a text-generation model that predicts the next word in a sequence, pretrained on a large corpus of English text using a causal language modeling objective. Developed by OpenAI in February 2019, this is the smallest version of GPT-2 with 124 million parameters. The model was trained on the WebText dataset, which comprises text from 45 million outbound Reddit links (excluding Wikipedia) and contains approximately 40GB of text. The training data cutoff is the end of 2017.

The model uses a byte-level version of Byte Pair Encoding (BPE) with a vocabulary size of 50,257 and processes input sequences of up to 1024 consecutive tokens. It was trained to predict the next token in a sequence using a causal mask, learning an internal representation of English that is best suited for text generation from a prompt.

Zero-shot evaluation results

Dataset	LAMBADA (PPL)	LAMBADA (ACC)	CBT-CN (ACC)	CBT-NE (ACC)	WikiText2 (PPL)	PTB (PPL)	enwiki8 (BPB)	text8 (BPC)	WikiText103 (PPL)	1BW (PPL)
GPT-2 (124M)	35.13	45.99	87.65	83.4	29.41	65.85	1.16	1.17	37.50	75.20

The model was trained on the WebText dataset, which contains text from 45 million outbound Reddit links (excluding Wikipedia) and weighs approximately 40GB. The training data reflects the biases present in unfiltered internet content. As noted in the original model card, GPT-2 does not distinguish fact from fiction, and all versions should be approached with similar caution regarding biases related to human attributes.

best for

·Generating creative text from a prompt
·Prototyping language model applications
·Educational experiments with causal language models

FAQ

What is GPT-2 best used for?

GPT-2 is best for generating coherent English text from a prompt, such as creative writing, autocompletion, or chatbots. It is not fine-tuned for factual accuracy.

How does GPT-2 compare to larger versions in size and speed?

GPT-2 (124M parameters) is the smallest version, making it faster and less resource-intensive than GPT-2 Medium (355M), Large (774M), or XL (1.5B).

What is the license for GPT-2?

GPT-2 is released under a Modified MIT license, which permits use, modification, and distribution with attribution.

What input format does GPT-2 expect?

GPT-2 expects tokenized text using a byte-level BPE tokenizer with a vocabulary size of 50,257. Input sequences can be up to 1024 tokens.

How can I call GPT-2 via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a prompt in the standard chat completions or text completions format.

not yet live

We're benchmarking and onboarding GPT-2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →

opt-125m

13.7M dl/mo

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo

dolphin-2.9.1-yi-1.5-34b

4.6M dl/mo