How to Count Tokens for GPT-4, GPT-4o & ChatGPT

Why count tokens before calling the API?

The OpenAI API returns an error if your input exceeds the model's context window. But even before hitting the limit, running up large token counts silently drives up costs. A batch job that processes 50,000 records with a 500-token prompt per record consumes 25 million input tokens — at GPT-4 Turbo pricing that's $250 just for inputs.

Counting tokens beforehand lets you:

Guard against context window overflow errors in production
Estimate costs before launching expensive batch jobs
Trim prompts that are unexpectedly large
Allocate token budget between system prompt, user message, and expected output

Context window limits by model (2026)

Model	Context window	Tokenizer
GPT-4o	128,000	o200k_base
GPT-4 Turbo	128,000	cl100k_base
GPT-4	8,192	cl100k_base
GPT-3.5-turbo	16,385	cl100k_base
Claude 3.5 Sonnet	200,000	Anthropic custom
Gemini 1.5 Pro	1,000,000	Google SentencePiece

Important

The token counts for Claude and Gemini will differ from GPT-4 counts even for the same text, because they use different tokenizers. Always use the model's own tokenizer when counting for those APIs. Token Compare uses cl100k_base, which matches GPT-4, GPT-4o, GPT-3.5-turbo, and ChatGPT.

Method 1: Browser tool (fastest)

Token Compare runs entirely in your browser using the gpt-tokenizer JavaScript library, which implements cl100k_base. Paste your text and the token count updates instantly — no API key, no signup, no server round trip.

This is the best option for:

Quick checks during prompt development
Comparing two prompt phrasings side by side
Visualizing exactly where the tokenizer splits words

Method 2: tiktoken in Python (most accurate for production)

OpenAI's tiktoken library is the reference implementation for counting tokens in Python. It's exact, fast, and handles all the edge cases around chat message formatting.

import tiktoken

def count_tokens(text: str, model: str = "gpt-4") -> int:
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

# Basic text
print(count_tokens("Hello, world!"))  # 4

# System + user message format (adds overhead tokens)
def count_chat_tokens(messages: list, model: str = "gpt-4") -> int:
    enc = tiktoken.encoding_for_model(model)
    tokens_per_message = 3  # every message adds <|start|>role<|end|> overhead
    tokens_per_name = 1     # if name is set, adds 1 token

    total = 0
    for msg in messages:
        total += tokens_per_message
        for key, value in msg.items():
            total += len(enc.encode(value))
            if key == "name":
                total += tokens_per_name
    total += 3  # reply is primed with <|start|>assistant<|end|>
    return total

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "What is the capital of France?"}
]
print(count_chat_tokens(messages))  # ~24 tokens

Chat overhead

Every message in the chat format adds 3-4 overhead tokens for role markers. A 10-message conversation adds roughly 30-40 tokens of overhead on top of the actual content. Always account for this in cost estimates.

Method 3: gpt-tokenizer in JavaScript/Node.js

For JavaScript environments, gpt-tokenizer is a pure JavaScript implementation of the same cl100k_base encoding. Token Compare is built on this library.

import { encode, decode, encodeChat } from 'gpt-tokenizer'

// Count tokens in a string
const tokens = encode("Hello, world!")
console.log(tokens.length)  // 4

// Count tokens in a chat conversation
const chatTokens = encodeChat([
  { role: 'system',    content: 'You are a helpful assistant.' },
  { role: 'user',      content: 'What is the capital of France?' }
], 'gpt-4')
console.log(chatTokens.length)  // ~24 tokens

This is the right choice for serverless functions, edge workers, or any JavaScript backend that needs token counting without spawning a Python process.

Method 4: Rough estimation (for back-of-napkin math)

When you need a quick estimate and don't have a tool handy, these rules of thumb work for English text:

1 token ≈ 4 characters (including spaces)
1 token ≈ 0.75 words (so 100 words ≈ 75 tokens)
1 page of double-spaced text ≈ 300-400 tokens
1 paragraph ≈ 60-100 tokens

These are approximations that work for typical English prose. Code, non-English text, and text with many numbers or URLs can differ significantly from these estimates.

What gets counted as tokens in API calls

When you send a chat completion request, the total token count is not just your message text. The API counts:

System prompt tokens
All previous messages in the conversation history
Message format overhead (role markers, separators)
Function/tool definitions, if you're using tool calling
The model's response (counted as output tokens)

If you're maintaining a conversation history across multiple turns, the effective token count grows with each turn until you implement some form of context truncation or summarization.

Token counting in the OpenAI API response

After a successful API call, the response includes a usage object with the actual token counts used:

{
  "id": "chatcmpl-...",
  "model": "gpt-4-turbo",
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 18,
    "total_tokens": 60
  },
  "choices": [...]
}

Log this data in production to track actual token usage over time. The counted prompt_tokens should match what tiktoken or gpt-tokenizer reports for the same input (within 1-2 tokens for edge cases).