(2025-09-09 v4.3)
(2025-09-02 v4.2)
android_env.interfaces.env.EnvironmentTaskManagers to save runtime resourcesSee our Change log for details.
(2024-12-18 v4.1) Some essential bugs are fixed. See Change log for details.
(2024-07-14 v4.0)
TEXT actiondm_env to android_env.interfaces to distinguish successful, failed, and truncated episode ends. Updated episode end events to control its triggering in the task definition file more conviniently.cache_until field for event slots to correctly trigger an AND node whose sub-events are expected to be triggered simultaneously. Now an activatated event can be cached temporarily until another triggered event clears it.null_listener for target-less event nodes.RemoteSimulator.gym to gymnasium.VhIoWrapper and TapActionWrapperSee our Change Log for details. The documents will be revised soon. A new tutorial w.r.t. episode event management is on plan.
(2024-04-30 v3.6)
(2023-12-18 v3.5)
ResponseEvent: regex matching, fuzzy matching, and vector encoding matching.VhIoWrapper and TapActionWrapper. Added support to SCROLL and TYPE to TapActionWrapper.RemoteSimulator. In order to reduce the delay of network transfering, enabled action batch to send and execute a group of actions and enabled resizing the image before and after transferring to shrink the transferred data.ResponseEvent to annotation tool.For more details, please see our Change Log and Documents.
OR-type virtual event combining multiple types of event sources.Please see our Change Log and Document.
Please see our Change Log and Document.
ResponseEvent). Now enables the agent to generate response to human user and parses episode signales from it. This will enable interaction tasks like question-answering, retrieval, etc.Please see our Change Log, Usage Document, and Task Definition Document.
Mobile-Env is a interaction platform for building evaluation benchmarks for GUI interaction and evaluating and training GUI agents. Our paper is available at arXiv.
Mobile-Env is developed based on AndroidEnv. The agent can take the screenshot and the view hierarchy (disabled defaultly for the long latency) as the observation and take a touch or type a token as the action to interact with the Android apps. Several episode signals like step instructions, rewards, or the episode end will be informed during interaction at some crucial steps. A so-called crucial step may be opening a target page, srolling to a correct area, etc. and is depending on the specific task definition.
The proposed WikiHow task set is available at the Hugging Face Platform.
Mobile-Env is a flexible, adaptable, and easily-extendable platform for InfoUI interaction with the following features:
~~Install from PyPI:~~
#pip insall mobile-env-rl
~~or clone the repository and build locally.~~ Clone the repository and build locally (the proto files need to be compiled locally):
git clone https://github.com/X-LANCE/Mobile-Env
cd Mobile-Env
pip install .
~~Several Docker images with well-configured Android AVD are also available.~~
A common-use Docker image is provided for convinience. The dependencies of Mobile-Env, Android commandline tools are already insalled in the image. It will be ready-to-use after binding the code repository of Mobile-Env and the directory of AVDs (Android Virtual Device).
Launch the container:
docker run -it --device /dev/kvm --gpus all -v $ANDROID_AVD_HOME:/root/.android -v $MOBILE_ENV_REPO_DIR:/root/mobile-env -v $TASK_SET_DIR:/root/task-set-dir zdy023/mobile-env:latest
Run in the container:
# Run this command to compile the ProtoBuffer items in the Docker environment
cd mobile-env && pip install -e .
Before loading the Mobile-Env environment, you will need to set up an Android Emulator device. Then you can load the environment with some existing task definitions and start your experiments. A detailed guidance is provided in Evaluating and Traning Agents on Mobile-Env.
Templates for agents and evaluators are now offered in android_env.templates module. The evaluation of agents can be launched with a simple invocation of the template evaluator functions. A usage example of the offered evalautor functions are provided under examples/template_module_example directory. Several examples of agent implementation based on the base class MobileEnvAgent are offered in android_env.templates.agents.
To extend a new environment for Mobile-Env, the environment designer needs to prepare the app package and ensure that the package manages to launch and run on some versions of Android Emulator. If the app requires varying online data, the necessary data should be crawled and dumped and then be replayed for a consistent evaluation. In such case, the designer is supposed to validate the certain effectiveness of certificate unpinning plan for the package. As regards to extend new tasks, task definition files are just required. Detailed instructions can be found in Extending a New Environment (App) or a New Task Based on Mobile-Env.
Several demo task definitions are provided under demos. Three of them are migrated from AndroidEnv:
classic_2048.m.textproto - Classic 2048 game.accessibility_forwarder_clock_set_timer.m.textproto - A simple task requiring the agent to reset a running timer.systemui_egg_land_default.m.textproto - Flappy Droid. An open-sourced implementation of classic game, Flappy Bird.Another one, openmoneybox.add_billings.textproto is defined upon an open-sourced billing app, OpenMoneyBox. Details are referred to in the task definition files.
We also developed an annotation tool for the human demonstrations, and a suite of template tool to auto-generate task definitions according to templates and to combine multiple task definitions to form a multi-step task. The details are referred to in Miscellaneous Auxiliary Tools.
The data are measured under the configuration below:
| Item | Avg Time | Time Std Dev |
|---|---|---|
TOUCH action |
410.50 µs | 64.71 µs |
LIFT action |
412.30 µs | 84.18 µs |
TEXT action |
~~1.30 s~~ 0.58s | ~~0.28 s~~ 0.03 s |
| screenshot capturing | 19.94 ms | 21.47 ms |
| invocation of Sentence Transformer(all-MiniLM-L12-v2) | 8.51 ms | 0.17 ms |
| VH capturing | 2.53 s | 1.90 s |
| invocation of EasyOCR | 0.44 s | 0.08 s |
When only an app of WikiHow 2.9.6 is running, the Android emulator occupies 6,031 MiB of virtual memory and 3,444 MiB of residual memory.
This library is developed and maintained by SJTU X-Lance. The corresponding paper is available at https://arxiv.org/abs/2305.08144.
If you find Mobile-Env useful in your research, you can cite the project using the following BibTeX:
@article{DanyangZhang2023_MobileEnv,
title = {{Mobile-Env}: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction},
author = {Danyang Zhang and
Zhennan Shen and
Rui Xie and
Situo Zhang and
Tianbao Xie and
Zihan Zhao and
Siyuan Chen and
Lu Chen and
Hongshen Xu and
Ruisheng Cao and
Kai Yu},
journal = {CoRR},
volume = {abs/2305.08144},
year = {2023},
url = {https://arxiv.org/abs/2305.08144},
eprinttype = {arXiv},
eprint = {2305.08144},
}