How to Reduce Token Count in LLM Prompts

Why token reduction matters at scale

For exploratory work or low-volume usage, token count is largely irrelevant. But once you're running thousands or millions of API calls per day, even small improvements compound significantly. A system prompt that uses 800 tokens instead of 1,200 tokens saves 400 tokens per call. At 100,000 calls per day with GPT-4 Turbo pricing, that's 40 million fewer input tokens daily — roughly $400 per day, or $12,000 per month in savings from a single optimization.

The techniques below work best on system prompts and long user instructions. Short conversational messages typically can't be compressed much without losing clarity.

How to use this guide

For each technique, copy both versions into the two panels of Token Compare. You'll see the exact token savings in real time, color-coded by individual token.

Technique 1: Remove filler words and pleasantries

System prompts often accumulate polite but token-expensive language that adds zero information for the model.

Before	After	Savings
Please be helpful, thorough, and accurate in all of your responses.	Be helpful, thorough, and accurate.	~7 tokens
I would like you to act as a professional software engineer.	Act as a professional software engineer.	~5 tokens
Could you please summarize the following text for me?	Summarize:	~9 tokens

The model doesn't need "please" to comply. Conversational framing is for human-to-human communication; in a system prompt, direct imperative statements are both more efficient and typically produce better results.

Technique 2: Switch from prose to structured lists

Bullet points and numbered lists carry the same information as prose with fewer connector words ("in addition", "furthermore", "it is also important to note that").

Before (prose, ~42 tokens):

You should always respond in a professional tone.
In addition, you should avoid using jargon unless
the user has demonstrated technical expertise.
Furthermore, keep responses concise and focused
on the user's actual question.

After (list, ~26 tokens):

Rules:
- Professional tone
- Avoid jargon unless user shows technical expertise
- Keep responses concise and on-topic

A 38% token reduction with no loss of instruction quality. The model reads lists reliably and interprets them correctly.

Technique 3: Prefer shorter synonyms

English has many pairs where the longer word means the same as the shorter word. Replacing one per sentence adds up quickly.

Wordy version	Shorter version
utilize	use
demonstrate	show
in order to	to
due to the fact that	because
at this point in time	now
with respect to	about / for
it is important to note that	(delete it)

Technique 4: Define abbreviations for repeated concepts

If your system prompt mentions a concept many times, define an abbreviation and use it throughout.

Before:

You are helping users with customer support requests.
When a customer support request involves a billing issue,
escalate the customer support request to the billing team.
For all other customer support requests, attempt to
resolve directly.

After:

You handle customer support requests (CSRs).
Billing CSRs: escalate to billing team.
Other CSRs: resolve directly.

The original is ~45 tokens; the rewrite is ~20 tokens. This technique scales: a long prompt that says "customer support request" fifteen times saves 15+ tokens just from the abbreviation alone.

Technique 5: Remove redundant context the model already knows

Many system prompts include instructions that are already the model's default behavior.

"Always answer in the language the user writes in" — this is already GPT-4's default
"If you don't know something, say so" — already default behavior
"Provide accurate and factual information" — already a core objective
"You can use markdown formatting" — already enabled in API responses

Remove these and you save 10-30 tokens while actually reducing potential instruction conflicts.

Technique 6: Compress example inputs/outputs

Few-shot examples are often the most token-expensive part of a prompt. Compress them by removing explanation and keeping only the essential format.

Before (~35 tokens per example):

Here is an example of the input you will receive and
the output you should produce:

Input: "The user wants to return a product."
Output: { "category": "returns", "priority": "medium" }

After (~18 tokens per example):

Examples:
"The user wants to return a product." → { "category": "returns", "priority": "medium" }

With five examples, this saves ~85 tokens. With ten examples, ~170 tokens.

Technique 7: Use JSON shorthand for structured outputs

When asking for JSON output, the field names in your schema specification count as tokens. Use short, clear field names over descriptive ones in the schema definition.

Verbose schema	Short schema
"customerSatisfactionScore"	"score"
"productCategoryIdentifier"	"category"
"responseGenerationTimestamp"	"ts"

This only applies to field name instructions in the prompt — the model will use whatever names you specify, short or long.

Technique 8: Trim whitespace and blank lines

Extra blank lines, trailing spaces, and indentation all count as tokens. In a long system prompt with decorative formatting and section dividers, these can add up to 20-50 tokens.

# Step 1


   Do this thing.


# Step 2


   Do that thing.

Compresses to:

Step 1: Do this thing.
Step 2: Do that thing.

Reformatting alone can save 10-15% of tokens in prompts that use heavy markdown formatting for organization.

Technique 9: Move static knowledge to fine-tuning or RAG

If your system prompt includes large amounts of reference material (product documentation, company policies, FAQ text), you're paying for those tokens on every single API call. Two alternatives:

Fine-tuning: Bake static knowledge into the model weights. The fine-tuned model "knows" your content without it needing to appear in the prompt. Works well for stable, factual information.
Retrieval-Augmented Generation (RAG): Retrieve only the relevant chunks of information at query time and inject just those chunks into the prompt. Works well for large knowledge bases where only a small portion is relevant to any given query.

These approaches require more engineering effort but can reduce prompt size by 50-80% for knowledge-heavy applications.

Measuring your savings

After applying any of these techniques, use Token Compare to measure the before/after token counts side by side. The comparison bar shows you the percentage difference at a glance. For system prompts that run at high volume, even a 15% reduction can have significant cost and latency implications over time.

What not to cut

Token reduction has diminishing returns — and crossing a line makes outputs worse. Avoid cutting:

Constraints that prevent the model from doing something harmful or off-topic
Output format specifications (if you remove these, outputs become unpredictable)
Examples that demonstrate a non-obvious behavior you actually need
Context that is genuinely new information the model couldn't know otherwise

Always test prompt changes before deploying. A prompt that is 30% shorter but produces wrong outputs 10% of the time is not actually cheaper when you factor in the cost of retries and user dissatisfaction.

Why token reduction matters at scale

Technique 1: Remove filler words and pleasantries

Technique 2: Switch from prose to structured lists

Technique 3: Prefer shorter synonyms

Technique 4: Define abbreviations for repeated concepts

Technique 5: Remove redundant context the model already knows

Technique 6: Compress example inputs/outputs

Technique 7: Use JSON shorthand for structured outputs

Technique 8: Trim whitespace and blank lines

Technique 9: Move static knowledge to fine-tuning or RAG

Measuring your savings

What not to cut

What is LLM Tokenization? A Plain-English Explanation

How to Count Tokens for GPT-4 & ChatGPT