Cost optimization - Zelix Glossary

What it means

The cost of an AI deployment scales with usage. Each model call costs tokens, each token costs a fraction of a cent, and the totals matter once volume is real. Cost optimisation is the discipline of getting the same output for less spend.

The common levers: route easy steps to cheaper models (small models do classification just fine), shorten prompts (you are paying for every input token too), cache repeated questions (the same FAQ does not need to hit the model twice), batch where possible (one call with ten records is cheaper than ten calls with one), and use embeddings for retrieval rather than feeding everything into the prompt.

Why it matters

An AI deployment that works at SGD 600 a month at 200 conversations a day can run SGD 6,000 a month at 2,000 conversations a day without optimisation. Cost discipline is what keeps the unit economics healthy as the business grows on the back of the AI working.

It is also how you keep the deployment defensible. If finance is asking why the AI bill jumped, you want to have an answer that is more sophisticated than 'we got more customers'.

Example

An e-commerce AI agent reduces its monthly bill from SGD 2,100 to SGD 820 over four weeks: routing the intent-classification step to a smaller model, caching the top 80 FAQ replies, and trimming the system prompt by 35 percent. Same quality scores on the eval set, less than half the cost.

What it means

Why it matters

Example

Related terms

Where this comes up