ui-tars
by
bytedance

Description: Pioneering Automated GUI Interaction with Native Agents

View bytedance/ui-tars on GitHub ↗

Summary Information

Updated 15 minutes ago
Added to GitGenius on February 12th, 2026
Created on January 19th, 2025
Open Issues/Pull Requests: 43 (+0)
Number of forks: 700
Total Stargazers: 9,661 (+0)
Total Subscribers: 98 (+0)
Detailed Description

UI-TARS is a cutting-edge project developed by ByteDance, pioneering the automation of GUI (Graphical User Interface) interaction using native agents. The core purpose of UI-TARS is to enable AI agents to effectively interact with and operate within various digital environments, including desktop applications, mobile interfaces, and even games. This is achieved through a multimodal approach, leveraging a powerful vision-language model that allows the agent to "see" the screen, understand the context, and generate appropriate actions.

The repository primarily focuses on UI-TARS-1.5, an open-source multimodal agent built upon a robust vision-language model. This version represents a significant advancement, integrating advanced reasoning capabilities enabled by reinforcement learning. This allows the model to reason through its thoughts before taking action, significantly enhancing its performance and adaptability, particularly in inference-time scaling. The project also highlights the upcoming UI-TARS-2, a major upgrade with enhanced capabilities across GUI, Game, Code, and Tool Use, designed as an "All In One" Agent model.

The main features of UI-TARS revolve around its ability to understand and interact with GUIs. It can perform a wide range of actions, including mouse clicks, drags, keyboard inputs, and scrolling. The project provides different prompt templates to cater to various use cases. The `COMPUTER_USE` template is designed for desktop environments, supporting common desktop operations. The `MOBILE_USE` template is tailored for mobile devices or emulators, including mobile-specific actions like `long_press` and `open_app`. Finally, the `GROUNDING` template is designed for lightweight tasks focused solely on action output, useful for evaluating grounding capability.

The repository provides a quick start guide to help users deploy and utilize the model. This involves deployment and inference, followed by post-processing steps to convert the model's output into executable actions. The guide includes installation instructions and example code for parsing the model's responses and generating the necessary code for GUI interaction. The project also offers a guide for coordinate processing visualization, crucial for understanding how the model interacts with screen elements.

UI-TARS demonstrates impressive performance across various benchmarks, including OSWorld, Windows Agent Arena, WebVoyager, Online-Mind2web, Android World, ScreenSpot-V2, ScreenSpotPro, and several Poki games. The performance metrics showcase UI-TARS-1.5's strong reasoning capabilities and notable improvements over prior models. The repository also includes a comparison of different model scales, highlighting the performance gains achieved with the UI-TARS-1.5 model.

While UI-TARS offers significant advancements, the repository also acknowledges its limitations. These include the potential for misuse, the need for substantial computational resources, the possibility of generating inaccurate outputs (hallucination), and the fact that the released UI-TARS-1.5-7B is not specifically optimized for game-based scenarios.

The project is actively being developed, with ongoing research and development efforts. The team is providing early research access to the top-performing UI-TARS-1.5 model to facilitate collaborative research. The ultimate vision for UI-TARS is to evolve into increasingly sophisticated agentic experiences capable of performing real-world actions, empowering platforms to accomplish more complex tasks. The repository also includes a citation for researchers to reference the project in their work.

ui-tars
by
bytedancebytedance/ui-tars

Repository Details

Fetching additional details & charts...