As an example, consider simple setup of a robot arm and commodity camera. An operator wearing a VR headset can see through the camera and the VR controllers can move the arm. The operator receives instructions ("arrange all items by colour") and proceeds to execute the instruction, which gets saved and added to the training set. Would a big enough training set be able to produce an instruction following robot arm?
https://arxiv.org/search/?query=llm+robots&searchtype=all&so...