How do they work?
On-device LLMs utilize edge inference, leveraging local compute power to perform inference without relying on cloud servers. Within the PIN AI ecosystem, models are downloaded, stored, and managed securely through the PIN AI app, ensuring data remains private and under local control. The process includes the following:- Storage: Compressed models are stored on an SSD/HDD or in a private cloud and accessed through the PIN AI app.
- Local processing: AI computations run directly on the device’s CPU/GPU, keeping sensitive operations local.
- Hybrid model: Local processing is combined with optional cloud resources for complex tasks, under user control.
- Local personalization: Learning and model updates occur on-device during downtime, enabling continuous adaptation while maintaining data privacy.