Code Package to download: here
- In the behaviour parameters script in ML agents (can be found attached to your roller ball agent from lab 5)
- Which function (in the agent code) does the variable Vector Observation –> Space Size relate to?
- Which function (in the agent code)does the variable Vector Action –> Space Size relate to?
- How do you let the agent know that an decision it made resulted in a behaviour…
- That you want it to do
- That you don’t want it to do
Please read this next paragraph carefully
In this lab we are revisiting our robot arm that we programmed earlier in the course to hit targets. This time we will be using ML agents to solve the problem. This lab will be slightly different than others in that it will use more of an iterative approach rather than a completion based one. This means that you should expect to go back over sections of your code multiple times and experiment with different configurations. We recommend that each person in your group participates by trying a different strategy, and communicating together what you think works and what doesn’t. You will not need to train the network for more than 2 minutes before you see progress. If the arm doesn’t look like it’s starting to figure out the problem after that amount of time, it means you need a new approach.
The three areas of code that you should expect to edit are listed below, and can all be found in the ArmAgent.cs file attached to the robot arm under //TODO:
- This function is where you add inputs to the neural network.
- Note that you will need to update Vector Observation -> Space Size as you change the number of inputs
- What information do you think is needed to solve this problem? What is the simplest way of representing this problem?
- We have given several example inputs, as well as several functions which you may find helpful
- Keep it simple! The more inputs you add, the more time the network will take to train.
- Train your agent by adding rewards!
- You may find the rewardCloserToZero function to be helpful. As value approaches zero, it will return a higher reward.
- What rewards should you give when it hits things?
You can also set Max Step, which will end the episode after the specified amount of frames (zero will set it to infinity). This acts like a time limit.
Train your network by running the command
mlagents-learn config/robot_arm.yaml --run-id=yourRunId
You can overwrite a run by adding
You can continue a training session after stopping by adding
mlagents-learn config/robot_arm.yaml --run-id=yourRunId --resume
You can speed up training by right clicking the training area object and hitting duplicate. Then make sure to drag the duplicated one to a new spot. Repeat this as many times as you like (or as your computer can handle without lagging).
- Note that if you make a change to the behaviour parameters it will only update the one copy you changed.
Make sure to get checked off by your TA once you have trained your agent
Did you know that you can solve this problem using only 2 inputs and 2 outputs? Can you figure out which ones they are? (note that training it this way will take at least 5-10m minimum)
- Describe one way you tried to solve the problem that did not work and explain why you think the agent had a difficult time solving it. (1 paragraph max)
- How did the agent end up representing the environment? How was this different than the automated solution you wrote in lab 2 – part 3? (1 paragraph max)
- This question is multiple choice: Are you excited about the robot tournament starting next week?
- Can’t Wait
- I’m SUPER excited
- I’m planning on winning
- (Optional) How can this lab be improved for the future?