• Homepage
  • News
  • eSports
  • PC
  • Playstation
  • Xbox
  • Nintendo
  • Reviews
  • Cosplay
What's Hot

God of War Laufey lets you play as Kratos’ dead wife and befriend a gelatinous cube

June 2, 2026

Zelda: Breath of the Wild is now MetaCritic’s #1 game in 2025, and that just feels weird

June 2, 2026

LeanBeefPatty unveils stunning Dispatch cosplay to thank devs who based Malevola off her

June 2, 2026
Facebook Twitter Instagram
  • Contact
  • Terms & Conditions
  • Privacy Policy
Facebook Twitter Instagram
Gaming MasterGaming Master
Subscribe
  • Homepage
  • News
  • eSports
  • PC
  • Playstation
  • Xbox
  • Nintendo
  • Reviews
  • Cosplay
Gaming MasterGaming Master
Home»Uncategorized»Perplexity splits AI inference between PCs and cloud to cut costs
Uncategorized

Perplexity splits AI inference between PCs and cloud to cut costs

By June 2, 2026No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


TL;DR

Perplexity AI announced a platform at Computex that dynamically routes AI inference between PCs and cloud servers in real time, acting as an “air-traffic controller” for AI tasks. The chip-agnostic system targets the cost crisis of centralised inference as Perplexity’s revenue hits $500 million.

Perplexity AI has developed a platform that dynamically splits AI workloads between personal computers and cloud servers, deciding in real time which tasks can run locally on a PC’s processor and which need the power of data centre hardware. CEO Aravind Srinivas announced the system at Computex in Taipei on Tuesday, describing it as an “air-traffic controller for AI tasks” designed to reduce the cost of inference, the process of running trained AI models to generate responses.

“You don’t want all your compute centralised in servers and everything running through the largest models,” Srinivas said in a Bloomberg Television interview. “You’re already reading reports of how people are freaking out about their cost. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user.”

How it works

The system evaluates each AI task and routes it to the most efficient compute layer. Simple operations that modern PC processors can handle, such as summarisation, formatting, or lightweight classification, run locally without touching the cloud. More complex tasks that require large model inference, such as multi-step reasoning or retrieval-augmented generation across large datasets, get routed to cloud servers. The routing decision happens in real time, invisible to the user.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

The practical effect is that Perplexity can serve more users at lower cost by offloading a portion of inference work to the billions of PCs already in circulation. As AI inference demand strains data centre capacity and drives utilities to plan $1.4 trillion in grid upgrades, distributing compute to the edge is both an economic and infrastructure necessity.

Srinivas made the announcement alongside Intel CEO Lip-Bu Tan, whose company leads the market for PC processors and has a commercial interest in making PCs a meaningful AI compute layer. However, Srinivas said the platform is “chip agnostic” and works with Nvidia processors as well. Nvidia highlighted the same edge-inference trend at Computex with its new RTX Spark platform for AI-powered laptops and desktops.

The cost problem

Srinivas’s reference to companies “spending half a billion dollars per month” on AI compute is not hyperbole. OpenAI’s infrastructure costs have been widely reported at that scale, and Anthropic’s projected $10.9 billion in Q2 revenue comes with substantial compute expenses that compress margins. The energy and cost burden of centralised AI inference is one of the defining constraints of the current AI boom.

Perplexity’s approach inverts the assumption that AI inference must happen in the cloud. By treating the PC as a first-class compute node rather than a thin client, the company can reduce its own server costs while potentially delivering faster responses for tasks that run locally. The tradeoff is complexity: the routing system must accurately assess task difficulty in milliseconds, and the quality of local inference depends on the user’s hardware capabilities.

Revenue efficiency

Perplexity’s financial trajectory underscores why cost efficiency matters. Srinivas posted on X in April that the company’s revenue grew fivefold, from $100 million to $500 million, while headcount increased just 34%. That ratio, roughly 15x revenue growth per employee added, reflects both the leverage of AI-native business models and Perplexity’s position as an aggregator that routes queries across multiple AI providers rather than training its own frontier models.

“Every time any of the AI gets better, our unified system also gets better because we route across all of them,” Srinivas said. The AI-native growth rates that are drawing capital away from traditional SaaS companies are partly enabled by this kind of architectural efficiency, where the product improves as its underlying providers improve, without proportional cost increases.

The hybrid compute platform extends that logic to hardware. If Perplexity can use the compute already sitting on users’ desks to handle a meaningful share of inference work, it reduces marginal cost per query and improves response latency for lightweight tasks. As AI moves deeper into enterprise workflows, the economics of who pays for the compute, the cloud provider, the AI company, or the user’s own hardware, will become a critical competitive variable.



Source link

See also  Rocket engine startup Impulse raises $500 million to hire people, not AI
Cloud costs cut inference PCs Perplexity splits
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Martin Scorsese has officially joined the AI camp and it’s not what anyone expected

June 2, 2026

Cyera eyes $12B valuation at 80x ARR multiple despite operating losses

June 2, 2026

Focused Energy raises $240M to commercialise NIF laser fusion tech

June 2, 2026
Add A Comment

Leave A Reply Cancel Reply

Our Picks

God of War Laufey lets you play as Kratos’ dead wife and befriend a gelatinous cube

June 2, 2026

Zelda: Breath of the Wild is now MetaCritic’s #1 game in 2025, and that just feels weird

June 2, 2026

LeanBeefPatty unveils stunning Dispatch cosplay to thank devs who based Malevola off her

June 2, 2026

Martin Scorsese has officially joined the AI camp and it’s not what anyone expected

June 2, 2026
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Don't Miss
Playstation

God of War Laufey lets you play as Kratos’ dead wife and befriend a gelatinous cube

By June 2, 20260

Sony Santa Monica has revealed their next project, and the next chapter in the God…

Zelda: Breath of the Wild is now MetaCritic’s #1 game in 2025, and that just feels weird

June 2, 2026

LeanBeefPatty unveils stunning Dispatch cosplay to thank devs who based Malevola off her

June 2, 2026

Martin Scorsese has officially joined the AI camp and it’s not what anyone expected

June 2, 2026

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

About Us
About Us

Targeted Gaming delivers the best and most comprehensive video game and entertainment coverage, including news, reviews, trailers, walkthroughs, and guides for PS4, Xbox One, Nintendo Switch, PC, and More.

We're accepting new partnerships right now.

Latest Posts

God of War Laufey lets you play as Kratos’ dead wife and befriend a gelatinous cube

June 2, 2026

Zelda: Breath of the Wild is now MetaCritic’s #1 game in 2025, and that just feels weird

June 2, 2026

LeanBeefPatty unveils stunning Dispatch cosplay to thank devs who based Malevola off her

June 2, 2026
Sponsors

Type above and press Enter to search. Press Esc to cancel.