The gist

Fara is an open-source 7-billion parameter small language model from Microsoft, designed as a Computer Use Agent (CUA). It solves problems by visually perceiving computer screens and directly controlling the mouse and keyboard to execute multi-step tasks on websites. Its compact size enables on-device deployment, aiming to reduce latency and improve user privacy by keeping data local. Fara is trained on a synthetic dataset of 145K web trajectories.

What it does

Automates web-based tasks by visually perceiving interfaces and controlling the mouse and keyboard.
Searches for information online and summarizes the results.
Fills out forms, manages online accounts, and books reservations or tickets.
Shops on retail websites and compares prices across different platforms.
Finds job postings or real estate listings based on user criteria.

How it works

A user provides a task via a command-line interface. The Fara-7B model, a vision-language model, then interprets the screen's visual information to generate and execute a sequence of mouse clicks and keyboard inputs. The tool is open-source and can be run locally by self-hosting the model with vLLM or through cloud services like Azure Foundry. It requires a Python environment and Playwright for browser automation.

Best for

Fara is best for developers and researchers needing a lightweight, locally-deployable agent to automate complex, multi-step tasks on websites, such as automated data entry, comparison shopping, or testing web application workflows.

Watch out for

Fara is an experimental release. The creators recommend running it in a sandboxed environment and avoiding use with sensitive data or for high-risk domains.