OpenInfer has raised $8 million in funding to redefine AI inference for edge applications. It’s the brain child of Behnam Bastani and Reza Nourai, who spent nearly a decade of building and scaling AI systems together at Meta’s Reality Labs and Roblox. Through their work at the forefront of AI and system design, Bastani and Nourai witnessed firsthand how deep system architecture enables continuous, large-scale AI inference. However, today’s AI inference remains locked behind cloud APIs and hosted systems—a barrier for low-latency, private, and cost-efficient edge applications. OpenInfer changes that. It wants to agnostic to the types of devices at the edge, Bastani said in an interview with GamesBeat. By enabling the seamless execution of large AI models directly on devices—from SoCs to the cloud—OpenInfer removes these barriers, enabling inference of AI models without compromising performance. The implication? Imagine a world where your phone anticipates your needs in real time — translating languages instantly, enhancing photos with studio-quality precision, or powering a voice assistant that truly understands you. With AI inference running directly on your device, users can expect faster performance, greater privacy, and uninterrupted functionality no matter where they are. This shift eliminates lag and brings intelligent, high-speed computing to the palm of your hand. Building the OpenInfer Engine: AI Agent Inference Engine OpenInfer’s founders Since founding the company six months ago, Bastani and Nourai have assembled a team of seven, including former colleagues from their time at Meta. While at Meta, they had built Oculus Link together, showcasing their expertise in low-latency, high-performance system design. Bastani previously served as Director of Architecture at Meta’s Reality Labs and led teams at Google focused on mobile rendering, VR, and display systems. Most recently, he was Senior Director of Engineering for Engine AI at Roblox. Nourai has held senior engineering roles in graphics and gaming at industry leaders including Roblox, Meta, Magic Leap, and Microsoft. OpenInfer is building the OpenInfer Engine, what they call an “AI agent inference engine” designed for unmatched performance and seamless integration. To accomplish the first goal of unmatched performance, the first release of the OpenInfer Engine delivers 2-3x faster inference compared to Llama.cpp and Ollama for distilled DeepSeek models. This boost comes from targeted optimizations, including streamlined handling of quantized values, improved memory access through enhanced caching, and model-specific tuning—all without requiring modifications to the models. To accomplish the second goal of seamless integration with effortless deployment, the OpenInfer Engine is designed as a drop-in replacement, allowing users to switch endpoints simply by updating a URL. Existing agents and frameworks continue to function seamlessly, without any modifications. “OpenInfer’s advancements mark a major leap for AI developers. By significantly boosting inference speeds, Behnam and his team are making real-time AI applications more responsive, accelerating development cycles, and enabling powerful models to run efficiently on edge devices. This opens new possibilities for on-device intelligence and expands what’s possible in AI-driven innovation,” said Ernestine Fu Mak, Managing Partner at Brave Capital and an investor in OpenInfer. OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inference on large models—outperforming industry leaders on edge devices. By designing inference from the ground up, they are unlocking higher throughput, lower memory usage, and seamless execution on local hardware. Future roadmap: Seamless AI inference across all devices OpenInfer’s launch is well-timed, especially in light of recent DeepSeek news. As AI adoption accelerates, inference has overtaken training as the primary driver of compute demand. While innovations like DeepSeek reduce computational requirements for both training and inference, edge-based applications still struggle with performance and efficiency due to limited processing power. Running large AI models on consumer devices demands new inference methods that enable low-latency, high-throughput performance without relying on cloud infrastructure, creating significant opportunities for companies optimizing AI for local hardware. “Without OpenInfer, AI inference on edge devices is inefficient due to the absence of a clear hardware abstraction layer. This challenge makes deploying large models on compute-constrained platforms incredibly difficult, pushing AI workloads back to the cloud—where they become costly, slow, and dependent on network conditions. OpenInfer revolutionizes inference on the edge,” said Gokul Rajaram, an investor in OpenInfer. Rajaram is an angel investor and currently a board member of Coinbase and Pinterest. In particular, OpenInfer is uniquely positioned to help silicon and hardware vendors enhance AI inference performance on devices. Enterprises needing on-device AI for privacy, cost, or reliability can leverage OpenInfer, with key applications in robotics, defense, agentic AI, and model development. In mobile gaming, OpenInfer’s technology enables ultra-responsive gameplay with real-time adaptive AI. Enabling on-system inference allows for reduced latency and smarter in-game dynamics. Players will enjoy smoother graphics, AI-powered personalized challenges, and a more immersive experience evolving with every move. “At OpenInfer, our vision is to seamlessly integrate AI into every surface,” said Bastani. “We aim to establish OpenInfer as the default inference engine across all devices—powering AI in self-driving cars, laptops, mobile devices, robots, and more.” OpenInfer has raised an $8 million seed round for its first round of financing. Investors include Brave Capital, Cota Capital, Essence VC, Operator Stack, StemAI, Oculus VR’s Co-founder and former CEO Brendan Iribe, Google Deepmind’s Chief Scientist Jeff Dean, Microsoft Experiences and Devices’ Chief Product Officer Aparna Chennapragada, angel investor Gokul Rajaram, and others. “The current AI ecosystem is dominated by a few centralized players who control access to inference through cloud APIs and hosted services. At OpenInfer, we are changing that,” said Bastani. “Our name reflects our mission: we are ‘opening’ access to AI inference—giving everyone the ability to run powerful AI models locally, without being locked into expensive cloud services. We believe in a future where AI is accessible, decentralized, and truly in the hands of its users.”