Perplexity AI has introduced a hybrid inference system that automatically routes AI tasks between a user's device and the company's cloud servers [1, 2, 3].
This shift in processing represents a move toward decentralized AI, reducing the reliance on massive server farms for every single query. By utilizing the hardware already present in a user's laptop, the company can potentially scale its services while offering users more control over their data.
The feature, referred to as "Local-Cloud Mode," is available globally to users of the Perplexity Personal Computer app on Mac [2, 3]. The system determines which parts of a request can be handled locally by the machine's processor and which require the heavy lifting of the cloud [1, 2].
Perplexity said the hybrid routing improves user privacy by keeping more data on-device [1]. This architecture ensures that sensitive information does not always have to travel to a remote server to be processed.
Beyond privacy, the company said the system cuts server-side costs for both the organization and its users [1]. By offloading specific computations to the local hardware, the company reduces the energy and financial overhead associated with cloud-based inference.
The rollout follows the broader release of the Personal Computer app for Mac, which was made available to all users in May 2026 [2]. This integration allows the AI assistant to interact more deeply with the local operating system while maintaining the power of a large language model via the cloud [2, 3].
“The system determines which parts of a request can be handled locally by the machine's processor”
This development signals a broader industry trend toward 'edge AI,' where the goal is to balance the raw power of the cloud with the speed and security of local hardware. By shifting the computational burden to the user's device, Perplexity is attempting to solve the two biggest hurdles of the AI era: the massive cost of server maintenance and growing consumer anxiety over data privacy.





