Wealthsimple Accelerates Machine Learning Model Delivery
How we scaled with NVIDIA’s AI Inference platform
Wealthsimple uses state-of-the-art technology to offer a full suite of simple, sophisticated financial products across managed investing, do-it-yourself trading, cryptocurrency, tax filing, spending, and saving. However, the company was confronting a common challenge. Without a standardized AI inference platform, engineering teams were averaging several months to deploy new machine learning models into production, impeding the company's success in delivering ML-enabled investment services to their clients. Wealthsimple's engineers decided to deploy the NVIDIA AI inference platform, resulting in a series of noteworthy achievements.
Today, Wealthsimple supports over 30 AI models that has generated over 247 million predictions in the last 12 months (2024/2025). Since implementing the NVIDIA AI inference platform, the platform engineering organization hasn’t encountered a single IT ticket related to AI inferencing—a testament to the platform’s low maintenance and high reliability.
Recently, the company achieved another milestone. Model developers were able to deploy their first ML model without any engineering support. With the NVIDIA AI inference platform in place, the engineering organization has seamlessly transitioned to delivering ML as-a-service to other teams, avoiding the diversion of valuable data science resources from critical projects. Following the change, model delivery time has been slashed from months to under 15 minutes—a game-changing advancement for the organization.
Crafting White Glove Experiences with ML-Driven Personalization
ML models play a pivotal role at Wealthsimple, detecting fraud, analyzing suspicious transactions, and optimizing onboarding experiences for new clients. The team also uses recommender engines to enhance customer experiences, ensuring top-notch service for new users. The NVIDIA AI inference platform empowers Wealthsimple to deploy models that predict the division, within a particular financial institution, to which an institutional transfer should be sent. This significantly accelerates transaction processes for clients.
Before adopting the NVIDIA Triton™ Inference Server—part of the NVIDIA AI inference platform—engineers experimented with an alternative native AI framework inferencing product and experienced a 95% uptime. This caused delays of up to several weeks for 5% of clients’ electronic transfers. Triton Inference Server changed the game, delivering an impressive 99.999% uptime. The impact is clear, as every incorrect prediction no longer translates to weeks of delays for clients to access their funds.
Wealthsimple initiated this transformative journey with model experimentation on CPUs, soon evolving to deploy their models on NVIDIA GPUs. This was facilitated by Triton Inference Server with hardware-agnostic capabilities that run on both GPUs and CPUs. Today, Wealthsimple runs models powered by NVIDIA A10G GPUs on the AWS cloud, tapping into the NVIDIA GPU-Optimized AWS Machine Image to drive efficiency and innovation.
...
Repost: NVIDIA case study interview with Mandy Gu, Senior Engineering Manager
Interested in working at Wealthsimple? Check out the open roles on our team today.