The Hybrid Multi-Agent Playground (HMAP) is an experimental framework designed for Reinforcement Learning (RL) researchers. Unlike any other framework which only isolates the TASKs from the framework, HMP also separates the ALGORITHMs from the framework to achieve excellent compatibility.
Any algorithm, from the most straightforward script-AI to sophisticated RL learner, is abstracted into a module inside ./ALGORITHM/*.
We also put effect to interface all kinds of multi-agent environments, including gym, SMAC, air combat, et.al. These can be found in ./MISSION/*.
Other frameworks, such as pymarl2, mappo, can interface with HMP as well. The entire HMP can disguise as an RL environment in pymarl2. We make it happen by building a particular ALGORITHM module, which runs pymarl2 in a subprocess. This work is ongoing. Currently, HMP can link to a modified version of pymarl2. The source code of these third-party frameworks are located in ./THIRDPARTY/*.
Please star
the root Github project. Your encouragement is extremely important to us as researchers: https://github.com/binary-husky/hmp2g
Archived code used in our AAAI papers: https://github.com/binary-husky/hmp2g/tree/aaai-conc
.
This resp is frequently updating.
# to clone this resp
git clone https://github.com/binary-husky/hmp2g.git
If any unexpected problem is encountered, please clean the temp files by running
# update code version
git checkout master --force && git pull --force && git clean -xfd
If you'd like to use pymarl2 algorithm, please download submodules first:
# download submodules
git submodule update --init
git submodule foreach -q --recursive 'branch="$(git config -f $toplevel/.gitmodules submodule.$name.branch)"; git switch $branch'
If the problem does not go away, do not hesitate to contact us or send an issue!
git pull && python main.py -c RESULT/uhmap_hete10vs10/render_result.jsonc
To visualize:
"render": false
option to true
../UHMP.exe -OpenLevel=Ip:Port
(Port is random, see program terminal output).Redirection to another MARL project: Building high efficient multiagent environment with Unreal Engine! Please refer to resp https://github.com/binary-husky/unreal-hmp
.
http://cloud.fuqingxu.top:11601/
git pull && python main.py -c RESULT/test-50+50/test-50+50.jsonc --skip
git pull && python main.py -c RESULT/test-100+100/test-100+100.jsonc --skip
git pull && python main.py -c RESULT/50RL-55opp/test-50RL-55opp.jsonc
(Also see https://www.bilibili.com/video/BV1vF411M7N9/)
git pull && python main.py -c RESULT/test-aii515/test-aii515.jsonc --skip
git pull && python main.py -c RESULT/test-cargo50/test-cargo50.jsonc --skip
HMP aims to optimize the parameter control experience as a framework for researchers. One configuration file is all that is needed for the config insertion.
We discard the command line method to control parameters; instead, the commented-JSON (JSONC) is used for experiment configuration. To run an experiment, just type:
python main.py --cfg Json_Experiment_Config_File.jsonc
Parameters assigned and overridden in the JSON file are NOT passed via init functions layer by layer as other frameworks usually do; instead, at the start of the main.py
, a special program defined in UTIL/config_args.py
will directly INJECT the overridden parameters to the desired location.
We give an example to demonstrate how simple it is to add new parameters. Suppose we want to introduce HP into DCA, then an initial HP, let say HP_MAX
need to be defined as a parameter. Then:
MISSION/collective_assault/collective_assault_parallel_run.py
. (You can create new file if you wish so.)ScenarioConfig
class add a new line writing HP_MAX=100
. (You can create another class if you wish so.)HP_MAX
, first from xxx.collective_assault_parallel_run import ScenarioConfig
, then use the parameter by init_hp_of_some_agent = ScenarioConfig.HP_MAX
.HP_MAX=100
in JSON (e.g., in ./example_dca.jsonc
), you just need to add a line in the field "MISSION.dca.collective_assault_parallel_run.py->ScenarioConfig"
, for example:{
...... (other field)
"MISSION.dca.collective_assault_parallel_run.py->ScenarioConfig": {
"HP_MAX": 222, # <------ add this!
"random_jam_prob": 0.05, # (other config override in ScenarioConfig)
......
},
...... (other field)
}
{"HP_MAX": 222}
or {"HP_MAX": "222"}
. If the value is a bool, you can write {"Key1":true,"Key2":false}
or {"Key1":"True", "Key2":"False"}
. Both are OK.HP_MAX=100
defines HP_MAX
as Int. If you want a float, please write HP_MAX=100.0
. Overriding an Int with float will trigger an assert error.Our framework can fully support complicated parameter dependency. Some parameters are sometimes just Chained together. Changing one of them can lead to the change of another. E.g., Let the number of parallel envs (num_threads
) be 32, and we test the performance of every test_interval
episode. We wish to have relate them with test_interval
= 8*num_threads
, meaning that a test run is shot every 8 rounds of parallel env executions. This function can be satisfied by defining a Chained var structure:
num_threads = 32 # run N parallel envs,
# define test interval
test_interval = 8*num_threads
# define the Chains of test interval
test_interval_cv = ChainVar(lambda num_threads:8*num_threads, chained_with=['num_threads'])
# all done! You need to do nothing else!
After this, you can expect the following override (JSON config override) behaviors:
num_threads
= 32, test_interval
= 8*32)num_threads
in JSON, then test_interval
is also forced to change according to test_interval=8*num_threads
.test_interval
in JSON, the Chain will not work, obey JSON override, nothing has higher priority than an explicit JSON override.For details, please refer to config.py
and UTIL/config_args.py
, it is straightforward to understand once you read an example of this.
When the experiment starts, the Json config override will be stored in RESULT/the-experiment-note-you-defined/experiment.json
. If the experiment later produces surprising results, you can consistently reproduce it again using this config backup.
Task Runner (task_runner.py
) only has three lines of important code:
# line 1
actions_list, self.info_runner = self.platform_controller.act(self.info_runner)
# line 2:
obs, reward, done, info = self.envs.step(actions_list)
# line 3:
self.info_runner = self.update_runner(done, obs, reward, info)
self.platform_controller.act
: Get action, block information access between teams (LINK to ARGORITHM
), handle algorithm internal state loopback.
self.envs.step
: Multi-thread environment step (LINK to MISSION
).
self.update_runner
: Prepare obs (for decision making) and reward (for driving RL algorithms) for the next step.
In general, the HMP task runner can operate two ways:
VHMAP is a visualization component of HMP.
It is unfortunate that all existing RL environments fail to provide a visual interface satisfying the following useful features:
VHMAP is just the answer, Features:
Interface functions, operation introduction.
More details of VHMAP can be found in VHMAP README.
We use docker to solve dependency: SetupDocker.
Please do not run on WindowsOS (low efficiency), but if you have to, also refer to the last part of SetupDocker for pip requirements list.
We use docker to solve dependency: SetupDocker. This project uses techniques such as shared memory for extreme training efficiency, as a cost, WindowsOS+GPU training is not well supported (using pipe IO for Windows compat).
For Windows (Not recommended, please do NOT run under Windows if possible), also refer to the last part of SetupDocker for pip requirements list.
Please read setup_docker.md first, and then set up the container using:
$ docker run -itd --name hmp-$USER \
--net host \
--gpus all \
--shm-size=16G \
fuqingxu/hmp:latest
# Now inside the HMP container
$ su hmp # (switch the account Inside the HMP container, password: hmp)
$ cd ~ # (go to home directory)
git pull && python main.py -c RESULT/test-50+50/test-50+50.jsonc --skip
git pull && python main.py -c RESULT/test-100+100/test-100+100.jsonc --skip
When the testing starts, open the revealed URL for monitoring
. The front end is done by JavaScript and ThreeJS.
--------------------------------
JS visualizer online: http://172.18.116.150:aRandomPort
JS visualizer online (localhost): http://localhost:aRandomPort
--------------------------------
git pull && python main.py -c DOCS/examples/dca/example_dca.jsonc
git pull && python main.py -c DOCS/examples/dca/train_old_dca.jsonc
Launch with:
python main.py --cfg xx.json
git pull && python main.py -c RESULT/test-aii515/test-aii515.jsonc --skip
git pull && python main.py -c RESULT/test-cargo50/test-cargo50.jsonc --skip
git pull && python main.py --cfg RESULT/adca-demo/test.json
git pull && python main.py --cfg RESULT/basic-ma-40-demo/test.json
Please refer to MISSION README for more details.
MISSION/env_router.py
, add the path of environment's init function in env_init_function_ref
, for example:env_init_function_ref = {
"bvr": ("MISSION.bvr_sim.init_env", "make_bvr_env"),
}
# bvr is the final name that HMP recognizes,
# MISSION.bvr_sim.init_env is a py file,
# ScenarioConfig is a class
MISSION/env_router.py
, add the path of environment's configuration in import_path_ref
import_path_ref = {
"bvr": ("MISSION.bvr_sim.init_env", 'ScenarioConfig'),
}
# bvr will be the final name that HMP recognizes,
# MISSION.bvr_sim.init_env is a py file,
# make_bvr_env is a function
MISSION.bvr_sim.init_env.ScenarioConfig
, as a template).MISSION.bvr_sim.init_env.make_bvr_env
, as a template).We designed a parallel execution pool based on shared memory, the most efficient inter-process communication method possible. This parallel pool is only functional on Linux, therefore, when running on Windows, we will automatically switch to a backup pool using the pipe.
UTIL/shm_pool.pyx
UTIL/win_pool.py
Both of them is initialized in main.py
, and they share the same APIs:
smart_pool = SmartPool(...)
Furthermore, the pickle and the reverse-pickle process is slow for large NumPy arrays, we solve this problem by copying raw NumPy memory directly to the shared memory area without pickle and significantly improve the data transfer efficiency.
When dealing with a large number of parallel environments (100+), the process coordination of OS can slow down even shared-memory communication.
As a compromise, we design a folding
mechanism to allow a single process to run N=fold
parallel environments to relieve the burden of OS. When folding is disabled (default disabled), fold=1
.
However, note that setting fold>1
will not accelerate the parallel FPS (and usually decrease the FPS) for a single experiment, but it allows you to run more experiments simultaneously on the server.
If you are interested in something, you may continue to read:
Handling parallel environment --> task_runner.py & shm_env.py
The link between teams and diverse algorithms --> multi_team.py
Adding new env --> MISSION.env_router.py
Adding algorithm --> ALGORITHM.example_foundation.py
Configuring by writing py files --> config.py
Configuring by JSONC --> xx.jsonc
colorful printing --> colorful.py
auto pip deployer --> pip_find_missing.py
efficient parallel execting --> shm_pool.pyx
auto GPU selection --> auto_gpu.py
logging/plotting bridge --> mcom.py & mcom_rec.py
experiment batch executor --> mprofile.py
This resp is frequently updating.
For more information on how to use HMP, please check out the README list below, most of other READMEs are located in ./DOCS/*.
How to change environments(missions): MISSION README
How to solve dependency with Docker: SetupDocker
How to solve dependency without Docker: SetupNODocker
How to get Docker with unreal engine: SetupUEDocker
How to use third-party framework(pymarl2): UsePymarl2
How to use Unreal-Engine-Based HMP: UseUHMP