Why count tokens before calling the API?
The OpenAI API returns an error if your input exceeds the model's context window. But even before hitting the limit, running up large token counts silently drives up costs. A batch job that processes 50,000 records with a 500-token prompt per record consumes 25 million input tokens — at GPT-4 Turbo pricing that's $250 just for inputs.
Counting tokens beforehand lets you:
- Guard against context window overflow errors in production
- Estimate costs before launching expensive batch jobs
- Trim prompts that are unexpectedly large
- Allocate token budget between system prompt, user message, and expected output
Context window limits by model (2026)
| Model | Context window | Tokenizer |
|---|---|---|
| GPT-4o | 128,000 | o200k_base |
| GPT-4 Turbo | 128,000 | cl100k_base |
| GPT-4 | 8,192 | cl100k_base |
| GPT-3.5-turbo | 16,385 | cl100k_base |
| Claude 3.5 Sonnet | 200,000 | Anthropic custom |
| Gemini 1.5 Pro | 1,000,000 | Google SentencePiece |
The token counts for Claude and Gemini will differ from GPT-4 counts even for the same text, because they use different tokenizers. Always use the model's own tokenizer when counting for those APIs. Token Compare uses cl100k_base, which matches GPT-4, GPT-4o, GPT-3.5-turbo, and ChatGPT.
Method 1: Browser tool (fastest)
Token Compare runs entirely in your browser using the gpt-tokenizer JavaScript library, which implements cl100k_base. Paste your text and the token count updates instantly — no API key, no signup, no server round trip.
This is the best option for:
- Quick checks during prompt development
- Comparing two prompt phrasings side by side
- Visualizing exactly where the tokenizer splits words
Method 2: tiktoken in Python (most accurate for production)
OpenAI's tiktoken library is the reference implementation for counting tokens in Python. It's exact, fast, and handles all the edge cases around chat message formatting.
import tiktoken
def count_tokens(text: str, model: str = "gpt-4") -> int:
enc = tiktoken.encoding_for_model(model)
return len(enc.encode(text))
# Basic text
print(count_tokens("Hello, world!")) # 4
# System + user message format (adds overhead tokens)
def count_chat_tokens(messages: list, model: str = "gpt-4") -> int:
enc = tiktoken.encoding_for_model(model)
tokens_per_message = 3 # every message adds <|start|>role<|end|> overhead
tokens_per_name = 1 # if name is set, adds 1 token
total = 0
for msg in messages:
total += tokens_per_message
for key, value in msg.items():
total += len(enc.encode(value))
if key == "name":
total += tokens_per_name
total += 3 # reply is primed with <|start|>assistant<|end|>
return total
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
print(count_chat_tokens(messages)) # ~24 tokens
Every message in the chat format adds 3-4 overhead tokens for role markers. A 10-message conversation adds roughly 30-40 tokens of overhead on top of the actual content. Always account for this in cost estimates.
Method 3: gpt-tokenizer in JavaScript/Node.js
For JavaScript environments, gpt-tokenizer is a pure JavaScript implementation of the same cl100k_base encoding. Token Compare is built on this library.
import { encode, decode, encodeChat } from 'gpt-tokenizer'
// Count tokens in a string
const tokens = encode("Hello, world!")
console.log(tokens.length) // 4
// Count tokens in a chat conversation
const chatTokens = encodeChat([
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
], 'gpt-4')
console.log(chatTokens.length) // ~24 tokens
This is the right choice for serverless functions, edge workers, or any JavaScript backend that needs token counting without spawning a Python process.
Method 4: Rough estimation (for back-of-napkin math)
When you need a quick estimate and don't have a tool handy, these rules of thumb work for English text:
- 1 token ≈ 4 characters (including spaces)
- 1 token ≈ 0.75 words (so 100 words ≈ 75 tokens)
- 1 page of double-spaced text ≈ 300-400 tokens
- 1 paragraph ≈ 60-100 tokens
These are approximations that work for typical English prose. Code, non-English text, and text with many numbers or URLs can differ significantly from these estimates.
What gets counted as tokens in API calls
When you send a chat completion request, the total token count is not just your message text. The API counts:
- System prompt tokens
- All previous messages in the conversation history
- Message format overhead (role markers, separators)
- Function/tool definitions, if you're using tool calling
- The model's response (counted as output tokens)
If you're maintaining a conversation history across multiple turns, the effective token count grows with each turn until you implement some form of context truncation or summarization.
Token counting in the OpenAI API response
After a successful API call, the response includes a usage object with the actual token counts used:
{
"id": "chatcmpl-...",
"model": "gpt-4-turbo",
"usage": {
"prompt_tokens": 42,
"completion_tokens": 18,
"total_tokens": 60
},
"choices": [...]
}
Log this data in production to track actual token usage over time. The counted prompt_tokens should match what tiktoken or gpt-tokenizer reports for the same input (within 1-2 tokens for edge cases).