[NDC2021] Why did Embark train robots for two years?

“Our goal is to make the physical gameplay a viable core mechanic. If we can make the entire game physically believable, we can create emergent gameplay. This gameplay allows users to determine and create their own experiences within the open sandbox.”

Engineer Tom Solberg of Embark Studio, a subsidiary of Nexon, gave a lecture on ‘how to train a robot’ at ‘Nexon Developers Conference (NDC) 2021’. In the lecture, he introduced why Embark Studio came to train a robot that operates based on physics, its process, and its goals.

According to Solberg engineer, the reason Embark Studio trains robots is to implement physical gameplay. The ultimate goal is to implement emergent gameplay through this.

For example, suppose there is a boss monster on fire in the game. In a realistic situation, the user will think that he needs magic to block the flame and will try to avoid the fire on the ground. If there is water nearby, it can put out a fire on our body. Users who experience the physical world every day like this can have the same intuition for physics.

However, Solberg engineers explain that few games actually make use of this as a core and influential gameplay mechanic in a sandbox setting. Therefore, it is said that it was decided that physics-based animation was necessary for the immersion of users. The implementation method was reinforcement learning.

According to him, Embark Studio has been conducting research in this area for about two years from June 2019 to now. The first test was conducted with Unity by two researchers, and R&D continued to prove it worked in Unreal Engine again. In December of the same year, we were able to build an artificial intelligence (AI) system and gameplay to prove the concept in a working prototype. Continuing the research again, it took almost a year for one robot to reach the target quality from the prototype stage, and another two months to build momentum among the new robot sets, and now I am focusing more on the second prototype.

Reinforcement learning is used in this course. Reinforcement learning is simply ‘classical conditioning’ or ‘conditional reflex training’. Reinforce behavior by providing a reward, such as feeding a dog or cat.

Reinforcement learning consists of four stages: observation, action, reward, and improvement. During the observation phase, the cat inspects empty bowls, food bags and buttons. Cats do not know the connections between these objects, but they do have the ability to interact. Step 2 is action. Cats can do anything. Sometimes you can press a button with the right action. Step 3 is the reward. When the button is pressed, the reward is instantaneous. In the final ‘enhancement’, the cat knows that it receives food at the push of a button. This repetition is reinforcement learning.

Of course, the results of reinforcement learning do not flow only in the right direction. Dogs are trained to bark on command, but dogs can also learn to bark in order to get a reward. Similarly, the training of the robot does not flow as intended.

Embarc Studio also said that after sufficiently training the robot, when checking the success or failure, success was determined by examining ‘how much compensation the robot received’ and ‘what kind of mission was rewarded’.

In fact, he introduced the walking ability of the six-legged robot, which changes according to the 48-hour training process, and also showed the realistic movement of the robot that was created through his previous research through video.

There were also difficulties. Tools necessary for reinforcement learning are emerging, but they are still at a low level, so technical demonstrations and tools have to be built by hand, which takes a long time. He explains that there is still a long way to go. It was also necessary to convince the artist to design a robot with a shape that would be impossible in a physical environment. His introduction is that the cool robots that appear in Hollywood movies can break down or fall over under their own weight in the real world. It took them a long time to get artists to consider things like mass, balance, influence and gait in their designs.

He suggested three strengths of this study. The first is that physical design can be tied to more motion than hand-created animations. For example, a robot that looks like a gorilla was trained to move closer to a real gorilla.

Another advantage is that by extending the rules of the game, the robot can learn to use the new rules. He showed a winged robot in action and said it was like a bird that learned to fly by adding wings to a physics simulation.

The third is in contrast to the aforementioned problems of robot design. With reinforcement learning, he says, most robots learn to walk and move in some way. Even robots that have failed can often learn to move. For example, a robot that doesn’t move the way the developers want it to. This is seen as a failure on the surface. However, he thinks that if we don’t see it as a failure by expanding the aesthetic boundaries, we can create animations of a wide range of physically possible creatures.

[임영택 게임진 기자]
[ⓒ 매일경제 & mk.co.kr, 무단전재 및 재배포 금지]

Leave a Reply

Your email address will not be published. Required fields are marked *