5 Ways to Improve Your ML System

Behind every polished demo sits a messier truth: most machine-learning systems are full of duct tape, and barely hold together to deliver some results. After a decade in AI – building feature-selection tools at Google, production pipelines at BP, and high-fidelity generative video at Metaphysic – I keep seeing the same low-hanging optimizations. Fix them early, and you’ll both improve your system, prepare it for scale, and reduce operation costs. All before touching the model.
Below are the five quickest wins I reach for when a founder, CTO, or product lead asks, “How do we make this thing both better and cheaper?”
1 Clean data in—smaller data out
Untrimmed datasets increase storage, prolong training, and hide signal in noise. In one Google project we cut the raw feature set by 70 % using a first-principles feature-selection tool; accuracy held steady, while every downstream metric, from GPU hours to notebook load times, dropped proportionally.
Quick check
- Automate schema validation at ingestion.
- Add lightweight “outlier sweeps” before the main ETL.
- Version your datasets the same way you version code; roll back when drift creeps in.
- Clean your data of information that will have 0 impact on the quality of your result.
When the training set is slimmer, every later optimisation compounds. It should be the top of the funnel and your top 1 priority.
2 Keep GPUs busy
Nothing cloud bill soar like an idle H100. On a GenAI project, we overlapped data pre-fetch, augmentation, and upload, pushing average GPU utilisation from ~48 % to 90 %—a change that shaved 45 % off end-to-end cost.
Quick check: use htop
to monitor GPU usage. Ideally, you should have maximum usage all the time, instead of blocks of high usage followed by dips.
Also, measure the percentage of GPU usage within each epoch. If your epoch takes 4 seconds, and the GPU is used just 1, you’re losing both time and money.
3 Problem first, model second
LLMs, diffusion, mixture-of-experts, so many buzzwords. Yet algorithm selection research keeps confirming the obvious: the best model is the one that matches the problem constraints, not the headline.
At AI Flow we see founders burn weeks fine-tuning giant models when a gradient-boosted tree plus a well-crafted embedding outperforms on latency-sensitive tasks. Start with the decision boundary you need (classification? ranking? retrieval augmented generation?) and work backward to model family, size, and architecture.
Quick check
- Clarify latency, cost, interpretability, and update cadence before reading a single model card.
- Pilot three architectures of different complexity; keep the cheapest one that meets KPIs.
4 Make observability a first-class citizen
A transparent system is a healthy system. Modern ML observability tools like Weights & Biases track distribution drift, resource spikes, and even token-level LLM traces. Teams using W&B report faster triage and fewer silent failures in production.
In practice we wire logging at experiment one. By the time a model ships, the dashboard already tells you how it behaves across a different range of settings.
Quick check
import wandb
wandb.init(project="credit-scoring")
wandb.watch(model, log="all", log_freq=100)
Add alert rules for drift and GPU memory spikes; your ops team will thank you.
5 Invest in the data foundation
Generative AI has resurfaced an old truth: model quality asymptotically approaches data quality.
At a law firm we worked with we stood up resilient pipelines (Millions of daily records, streaming + batch) before a single model hit prod. The result: fewer late-night pages, quicker regulatory audits, and a platform that still scales years later.
Quick check
- Centralise metadata (Snowflake, Pinecone, etc depending on specific need).
- Treat data contracts as part of the CI pipeline.
- Budget for continuous quality tests: null ratios, completeness, referential integrity.
Closing thought
Most organisations don’t need a moon-shot architecture to feel an immediate difference; they need well processed data, full GPUs, a model that fits, and dashboards that speak early and often.
If you found these principles useful, you can explore more practical deep-tech notes at AI Flow or browse my personal build logs at antonmih.ai. Strong foundations repay themselves. Quietly, compoundingly, long after version 1 ships.