
On November 21, 2025, Huawei caused a splash in the AI world with the announcement of a major new technology. Called Flex:ai, an open-source AI container platform was unveiled during Huawei’s “2025 AI Container Application Forum” held in Shanghai. With this launch, Huawei has promised to change the way in which the use of hardware by AI workloads is carried out-smartening chips rather than simply ramping up their power.
Fundamentally, Flex:ai enables a single physical AI compute card-either a GPU or NPU-to be sliced into multiple virtual compute units. That means instead of one big task hogging the entire card, many smaller AI tasks can run at the same time. Huawei says it can split a card with precision as fine as 10%.
Why does this matter? Because in many AI clusters today, hardware sits idle much of the time. According to Huawei, the industry average utilization of compute cards is only about 30–40%. With Flex:ai, that number could go up meaningfully. By using resources more smartly, AI centers can run more work without buying a bunch more hardware.
How Flex:ai Works — Smarter Scheduling, Better Matching
Huawei’s Flex:ai — Unlocking AI Power with Smart Container Technology
Huawei has rolled out new OPEN SOURCE software called Flex:ai, which is built on Kubernetes and designed to boost AI chip usage by about 30%.
— Wall St Engine (@wallstengine) November 21, 2025
It can split a single GPU or NPU into multiple virtual units and run several AI workloads at the same time, and it includes a scheduler… pic.twitter.com/0eBmINu6Z0
On November 21, 2025, Huawei caused a splash in the AI world with the announcement of a major new technology. Called Flex:ai, an open-source AI container platform was unveiled during Huawei’s “2025 AI Container Application Forum” held in Shanghai. With this launch, Huawei has promised to change the way in which the use of hardware by AI workloads is carried out-smartening chips rather than simply ramping up their power.
Fundamentally, Flex:ai enables a single physical AI compute card-either a GPU or NPU-to be sliced into multiple virtual compute units. That means instead of one big task hogging the entire card, many smaller AI tasks can run at the same time. Huawei says it can split a card with precision as fine as 10%.
Why does this matter? Because in many AI clusters today, hardware sits idle much of the time. According to Huawei, the industry average utilization of compute cards is only about 30–40%. With Flex:ai, that number could go up meaningfully. By using resources more smartly, AI centers can run more work without buying a bunch more hardware.
Flex:ai does more than just slice compute. It uses a multilevel intelligent scheduler to decide which AI tasks go where. When the system sees unused compute power, it gathers that and assigns it to tasks that need it. This way, AI workloads are matched precisely to the available compute.
Another clever part is compute aggregation. Flex:ai can pull together unused capacity from different servers-or “nodes”-in a cluster. These idle units form a so-called “shared compute pool”, which means no capacity goes completely to waste. According to reports, this pooling helps free up more of the hardware’s potential.
Also important: Flex:ai is hardware-agnostic; meaning it will support both Huawei’s Ascend NPUs and third-party GPUs, such as Nvidia’s. Whether a company uses Huawei chips, Nvidia cards, or a mix of both, Flex:ai can manage them all.
Flex:ai does more than just slice compute. It uses a multilevel intelligent scheduler to decide which AI tasks go where. When the system sees unused compute power, it gathers that and assigns it to tasks that need it. This way, AI workloads are matched precisely to the available compute.
Another clever part is compute aggregation. Flex:ai can pull together unused capacity from different servers-or “nodes”-in a cluster. These idle units form a so-called “shared compute pool”, which means no capacity goes completely to waste. According to reports, this pooling helps free up more of the hardware’s potential.
Also important: Flex:ai is hardware-agnostic; meaning it will support both Huawei’s Ascend NPUs and third-party GPUs, such as Nvidia’s. Whether a company uses Huawei chips, Nvidia cards, or a mix of both, Flex:ai can manage them all.
Huawei has rolled out new OPEN SOURCE software called Flex:ai, which is built on Kubernetes and designed to boost AI chip usage by about 30%.
— Wall St Engine (@wallstengine) November 21, 2025
It can split a single GPU or NPU into multiple virtual units and run several AI workloads at the same time, and it includes a scheduler… pic.twitter.com/0eBmINu6Z0
Proven in Real-World Use
Huawei has already tested Flex:ai in real-world scenarios. In collaboration with Ruijin Hospital, Huawei employed Flex:ai for a large pathology model analyzing medical images to identify cancers. Thanks to Flex:ai, they could train a massive model with just 16 Ascend 910B cards. Without slicing and smart scheduling, they would need far more hardware.
In that deployment, Flex:ai helped push the usable compute rate from about 40% to 70%. That boost in efficiency counts for a lot; it means lower costs, less energy wasted, and more work done on the same hardware.
Why This Is a Big Deal for AI Infrastructure
- Improved Efficiency
Flex:ai targets a long-standing problem in AI: underused hardware. By slicing cards and scheduling intelligently, Huawei is turning today’s “waste” into usable compute. - Cost Savings
Companies won’t need to buy as many new GPUs or NPUs , they can get more mileage out of what they already have. - Cross-Vendor Support
Because Flex:ai supports different hardware ecosystems, it avoids vendor lock-in. Users can mix and match Nvidia GPUs and Huawei NPUs without losing efficiency. - Open Source
Huawei is releasing Flex:ai on its open “Mojing” community. This means developers, researchers, and businesses can access the software, improve it, or build on it. - Scalable Architecture
With pooling and shared compute, data centers can scale smarter, not just by adding more hardware, but by using what they already have more intelligently.
Challenges to Watch
While the idea is powerful, it’s not without challenges:
- Scheduling Complexity: Splitting compute into many virtual units means the scheduler has to be very smart. If not, there could be delays or inefficiencies.
- Performance Trade-Offs: Very large AI models might still prefer full cards. Some tasks may suffer from virtualized compute’s overhead.
- Security & Isolation: Running multiple workloads on one card could raise isolation risks. Ensuring tasks don’t interfere with each other will be critical.
- Adoption: Even though Flex:ai is open source, real-world use depends on whether cloud companies and data centers adopt it.
Looking Ahead: What’s Next for Flex:ai
Because it’s open-source, we’re likely to see fast adoption. Developers might build new tools around Flex:ai to make resource pooling even smarter. AI startups could benefit immediately: they might not need as much hardware to train or run models.
Large data centers and AI labs may rethink their infrastructure. Instead of building around the biggest, most powerful cards, they could rely on flexible pools of compute. Over time, this could shift how AI infrastructure is designed: from “buy more chips” to “use what we have, better.”
Also, Huawei’s open ModelEngine ecosystem — which includes Flex:ai — could grow. As more tools are added, the platform could become a full stack for AI training, inference, and deployment.
Watch to know more
conclusion
Huawei’s Flex:ai is more than just a clever piece of software. It’s a meaningful step forward in AI infrastructure, especially in a world where compute is expensive, power usage matters, and efficiency is king. By slicing physical hardware into smaller, flexible units, Flex:ai helps unlock unused potential. Its open-source nature and support for mixed hardware make it especially powerful for a wide range of users, from big AI labs to startups. If licensed and adopted wisely, this technology could help shape the future of AI compute.

