Case study: reducing 65% of ML pipeline time at Google [code available]
tl;dr: The AI Flow CEO helped Google push the boundaries of feature selection in machine learning pipelines, by reducing 65% of ML pipeline time. By building on top of their state-of-the-art research, we developed a tool that reduces feature sets by up to 64%, while maintaining model performance—cutting training time, inference time, and model size significantly, with the potential to scale across Google’s ecosystem.

The problem
At Google’s scale, even small inefficiencies in machine learning pipelines can lead to massive time and cost implications. A key bottleneck identified was the high dimensionality of input data in ML systems—training times were long, models were resource-intensive, and the iteration cycle was slow. Google was already experimenting with cutting-edge techniques like Sequential Attention for feature selection, but sought to further improve the tradeoff between model performance, feature count, and training time.
They needed an experienced person to explore the problem deeply, improve on the current state of the art (SOTA), and design a robust solution that could be served at scale within Google’s ML workflows.
The approach
- Discovery: we conducted an in-depth review of the existing Sequential Attention implementation, identifying opportunities to optimize it for speed, stability, and scalability. We studied prior experiments and aligned our direction with Google’s goals of faster iteration, high performance, and lower resource consumption.
- Design: we proposed several strategies including ensemble models, intermediate synchronization, and dataset sub-sampling—all aimed at increasing model stability and decreasing runtime without sacrificing accuracy. A research roadmap was designed to test the trade-offs across multiple datasets, tasks, and architectures.
- Implementation: we expanded the open-source Sequential Attention codebase with parallelized ensemble support, dataset-aware subsampling, and SVD-based upper bound estimation. Our architecture allowed different worker models to vote on feature importance in real-time, generating more robust and compact masks.
- Testing: we validated our hypotheses across four datasets (MNIST, Fashion-MNIST, Activity, and Mice Protein). All experiments were run multiple times to ensure statistical significance, and we measured accuracy, training time, and feature count reductions for every run.
- Delivery: we packaged our improvements into a modular, production-ready feature masking tool that could be used internally across Google ML teams. The tool returned optimal feature subsets for a given budget, allowing teams to immediately experiment with reduced-size, high-quality models.
The solution
We delivered a next-gen feature selection platform, designed for scalability, interpretability, and performance—ideal for enterprise ML environments like Google’s.
- Intermediate Sequential Attention Instead of running independent models that vote at the end (which often leads to redundancy), we introduced an intermediate synchronization mechanism. In this setup:
- A pool of attention-based models is trained in parallel.
- At fixed checkpoints (e.g. every 5 features), they vote collaboratively on the next best features.
- This method preserves the advantages of diversity in ensembling while avoiding the pitfall of redundant feature clusters.
- Sequential Attention models showed lower variance and higher stability, especially when selecting small subsets of features.
- Ensemble-based Feature Voting with Random Subspace Method To increase robustness, we employed ensembles where each model trained on a randomly masked subspace of features.
- This increases diversity in learned representations.
- The voting heatmaps generated across ensembles allowed us to identify consistently important features, improving interpretability.
- We also analyzed the “redundancy together” problem (e.g., central pixels in MNIST), and added smart subspace masking to counter it.
- Data Subsampling with Full-Scale Training We decoupled the mask selection from the final model training:
- Masks were computed using as little as 20% of the data.
- Final models were trained using the full dataset but only the selected features.
- This gave us a 3–5x speedup in experimentation cycles while maintaining final model quality.
- This also unlocked new R&D capabilities—like low-cost experiments and hyperparameter tuning over large datasets.
- SVD-based Feature Budget Estimation To avoid arbitrary feature count decisions, we implemented an SVD-based heuristic that estimates the effective dimensionality of a dataset:
- This acts as an upper bound for feature subset size.
- For instance, Mice Protein data had 77 features, but SVD showed only ~36 had non-trivial contribution to variance—matching our experimental observations.
- Avoids wasteful computation and helps ML teams confidently tune feature budgets.
- Production-grade Feature Masking Tool We wrapped the entire flow into a directed graph pipeline, capable of:
- Accepting any tabular or image dataset.
- Automatically training multiple mask candidates (10%, 20%, …, 50%).
- Returning three outputs: a) Best mask (absolute best accuracy), b) Optimal mask (best quality/feature count tradeoff), and c) Full evaluation report (for downstream teams to use in decision-making).
- Future-ready: Built with plug-and-play support for Sequential Attention, active learning extensions, and integration into existing ML pipelines (e.g. TensorFlow Extended, Vertex AI pipelines).
This tool is now enabling internal teams at Google to automate feature pruning, reduce ML cost-to-train, and iterate faster on experimentation—all with transparent tradeoffs and reproducible metrics.
The results
- Feature reduction: Reduced features by up to 64% with no loss in accuracy on some datasets.
- Training time cut: Up to 90% smaller models, translating to faster training and inference.
- End-to-end time savings: Parallelized models lowered total mask selection time by over 40%.
- High-quality models with 20% of the data: Comparable performance to full-dataset models.
- Open source impact: Codebase released alongside Google Research’s repository to support reproducibility and industry adoption.
If you reached this far, let’s set up a call to discuss more about AI and how it could transform your business. Click here to find out more.
Open Source code: we’re happy to share the code with you. Schedule a 15 min call by clicking this link, and we’ll share the code and the whitepaper.