Website • Paper • Tasks • Leaderboard
AndroidWorld is an environment for building and benchmarking autonomous computer control agents.
It runs on a live Android emulator and contains a highly reproducible benchmark of 116 hand-crafted tasks across 20 apps, which are dynamically instantiated with randomly-generated parameters to create millions of unique task variations.
In addition to the built-in tasks, AndroidWorld also supports the popular web benchmark, MiniWoB++ from Liu et al..
Key features of AndroidWorld include:
See demo videos on our website.
Set up the Android Emulator
Launch the Android Emulator from the command line
Launch the emulator from the command line, not using the Android Studio UI, with the -grpc 8554
flag which is needed communication with accessibility forwarding app.
# Typically it's located in ~/Android/Sdk/emulator/emulator or
# ~/Library/Android/sdk/emulator/emulator
EMULATOR_NAME=AndroidWorldAvd # From previous step
~/Library/Android/sdk/emulator/emulator -avd $EMULATOR_NAME -no-snapshot -grpc 8554
[Optional] It's recommended to use conda
, which you can download here.
conda create -n android_world python=3.11.8
conda activate android_world
Install AndroidWorld. Note: Python 3.11 or above is required.
git clone https://github.com/google-research/android_world.git
cd ./android_world
pip install -r requirements.txt
python setup.py install
Add model provider APIs as environment variables.
# Add to .bashrc.
export OPENAI_API_KEY=your-key
export GCP_API_KEY=your-key
Install ffmpeg
, if not already installed.
# Linux (Ubuntu/Debian)
# sudo apt update && sudo apt install ffmpeg
# macOS
brew install ffmpeg
Run the minimal_task_runner.py
script to see the basic mechanics of AndroidWorld components. It initializes the environment, sets up a task, and runs the default agent, M3A, on it.
python minimal_task_runner.py --task=ContactsAddContact
If you don't specify a task, a random task will be selected. NOTE: If you want to try open-source apps, i.e. not included with Android OS, please run --perform_emulator_setup
in the script below.
Note: Task Step Limits Update As of 11/18/2024, the max_steps/step_budget for each task in AndroidWorld have been updated to approximately 2x the human average completion time. This adjustment ensures agents have sufficient time to complete tasks, while also reducing overhead of running thebenchmark. Here are the per-task updates.
python run.py \
--suite_family=android_world \
--agent_name=t3a_gpt4 \
--perform_emulator_setup \
--tasks=ContactsAddContact,ClockStopWatchRunning \ # Optional: Just run on a subset.
The first time you run this script, you must install the necessary apps and set permissions by specifying --perform_emulator_setup
. This is a one-time setup. It may take several minutes depending on the connection speed.
Above we specify the optional --tasks
flag to run on a subset of tasks. Leave it empty to run on the entire AndroidWorld suite.
The n_task_combinations
argument specifies how many parameter permutations to use for each task. For example, for an SMS task, it would correspond to different phone number/message combinations for each run.
If a run fails part-way through, you can resume it by re-running the script with the --checkpoint_dir
flag pointing to the output directory from the original run.
To run the MiniWoB++ web-based tasks in AndroidWorld, simply set --suite_family=miniwob
and --perform_emulator_setup
in the command above.
A key advantage of running MiniWoB++ tasks is that common input elements are rendered as native, commonly used Android UI widgets, rather than as HTML. Thus agents must learn to use universal widgets such as time- and date-pickers:
In addition to the agents we provide here, you can also easily create your own agent and run the benchmark with it as follows.
Create an agent class that inherits from EnvironmentInteractingAgent and implement the step method. In the current workflow, the agent tries to complete a task in a for loop. In each round, the step method will be called and this is where you implement your agent's logic. A typical approach involves first gathering information like the current screenshot, the UI elements (like buttons, icons) through the AndroidEnv instance within the agent, selecting one of the supported actions, executing it through the AndroidEnv and returning an AgentInteractionResult. The done
property on AgentInteractionResult should be set to true to indicate that the task is finished.
Import your agent in run.py and also add it into the _get_agent method which takes in your agent's name and return an instance of it.
Now you can run the benchmark with your new agent using the command above with the agent_name
flag changed to your agent's name.
Please see the guide on adding new tasks to AndroidWorld.
If you use our environment or data, please cite our paper:
@misc{rawles2024androidworlddynamicbenchmarkingenvironment,
title={AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents},
author={Christopher Rawles and Sarah Clinckemaillie and Yifan Chang and Jonathan Waltz and Gabrielle Lau and Marybeth Fair and Alice Li and William Bishop and Wei Li and Folawiyo Campbell-Ajala and Daniel Toyama and Robert Berry and Divya Tyamagundlu and Timothy Lillicrap and Oriana Riva},
year={2024},
eprint={2405.14573},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2405.14573},
}
This is not an officially supported Google product.