Navigation

    M5Stack Community

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    1. Home
    2. ElephantRobotics
    • Continue chat with ElephantRobotics
    • Start new chat with ElephantRobotics
    • Flag Profile
    • Profile
    • Following
    • Followers
    • Blocks
    • Topics
    • Posts
    • Best
    • Groups
    Save
    Saving

    ElephantRobotics

    @ElephantRobotics

    Enjoy robots world

    92
    Reputation
    61
    Posts
    2742
    Profile views
    2
    Followers
    0
    Following
    Joined Last Online
    Website www.elephantrobotics.com/en/ Location China,shenzhen

    ElephantRobotics Follow

    Posts made by ElephantRobotics

    • Revealing the Potential of myCobotM5Stack AI Kit Vision Algorithms

      Introduction

      In this article, we will delve deeper into understanding how the machine recognition algorithm of myCobot 320 AI Kit is implemented. In today's society, with the continuous development of artificial intelligence technology, the application of robotic arms is becoming increasingly widespread. As a robot that can simulate human arm movements, the robotic arm has a series of advantages such as efficiency, precision, flexibility, and safety. In industrial, logistics, medical, agricultural and other fields, robotic arms have become an essential part of many automated production lines and systems. For example, in scenes such as automated assembly on factory production lines, cargo handling in warehouse logistics, auxiliary operations in medical surgery, and planting and harvesting in agricultural production, robotic arms can play its unique role. This article will focus on introducing the application of robotic arms combined with vision recognition technology in the myCobot 320 AI Kit scene, and exploring the advantages and future development trends of robotic arm vision control technology.

      Product

      myCobot 320 M5Stack

      myCobot 320 is a 6-axis collaborative robot designed for user-independent programming and development. With a motion radius of 350mm, it can support a maximum end load of 1000g with a repetitive positioning accuracy of 0.5mm. It provides a fully open software control interface that enables users to quickly control the robotic arm using a variety of mainstream programming languages.The robotic arm uses M5Stack-Basic as the embedded control board, and ATOM as the core control center of the robotic arm.
      alt text
      alt text

      myCobot Adaptive gripper

      alt text
      The myCobot adaptive gripper is an end-of-arm actuator used for grasping and transporting objects of various shapes and sizes. It has high flexibility and adaptability and can automatically adjust its gripping force and position based on the shape and size of different objects. It can be combined with machine vision to adjust the gripping force and position of the gripper by obtaining information from vision algorithms. The gripper can handle objects up to 1kg and has a maximum grip distance of 90mm. It is powered by electricity and is very convenient to use. This is the equipment we are using, along with the myCobot 320 AI Kit that we will be using later.

      Vision algorithm

      Vision algorithm is a method of analyzing and understanding images and videos using computer image processing techniques. It mainly includes several aspects such as image preprocessing, feature extraction, object detection, and pose estimation.

      Image preprocessing:
      Image preprocessing is the process of processing the original image to make it more suitable for subsequent analysis and processing. Commonly used algorithms include image denoising algorithms, image enhancement algorithms, and image segmentation algorithms.
      Feature point extraction:
      Feature extraction is the process of extracting key features from the image for further analysis and processing. Common algorithms include SIFT algorithm, SURF algorithm, ORB algorithm, HOG algorithm, LBP algorithm, etc.
      Object detection:
      Object detection is the process of finding a specific object or target in an image. Commonly used algorithms include Haar feature classifier, HOG feature + SVM classifier, Faster R-CNN, YOLO.
      Pose estimation:
      Pose estimation is the process of estimating the pose of an object by identifying its position, angle, and other information. Common algorithms include PnP algorithm, EPnP algorithm, Iterative Closest Point algorithm (ICP), etc.

      Example

      Color recognition algorithm
      The verbiage is too abstract. Let us demonstrate this step through practical application. How can we detect the white golf ball in the image below? We shall employ the use of OpenCV's machine vision library.
      alt text
      Image processing:

      Initially, we must preprocess the image to enable the computer to swiftly locate the target object. This step involves converting the image to grayscale.

      Grayscale image:

      A grayscale image is a method of converting a colored image to a black and white image. It depicts the brightness or gray level of each pixel in the image. In a grayscale image, the value of each pixel represents its brightness, typically ranging from 0 to 255, where 0 represents black and 255 represents white. The intermediate values represent varying degrees of grayness.

      import cv2
      import numpy as np
      
      image = cv2.imread('ball.jpg')
      # turn to gray pic
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      
      cv2.imshow('gray', gray) 
      

      alt text
      Binarization:

      As we can observe, there is a significant color contrast between the golf ball and the background in the image. We can detect the target object through color detection. Although the golf ball is primarily white, there are some gray shadow areas caused by lighting. Therefore, while setting the pixels of the grayscale image, we must consider the gray areas as well.

      lower_white = np.array([180, 180, 180])  # Lower limit
      upper_white = np.array([255, 255, 255])  # Upper limit
      
      # find target object
      mask = cv2.inRange(image, lower_white, upper_white)
      contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
      

      This step is called binarization, which separates the target object from the background.

      Contour filtering:

      After binarization, we need to establish a filter for the contour area size. If we fail to set this filter, we may encounter the result depicted in the image below, where many areas are selected, whereas we only desire the largest one. By filtering out small regions, we can achieve our desired outcome.
      alt text

      #filter
      min_area = 100
      filtered_contours = [cnt for cnt in contours if cv2.contourArea(cnt) > min_area]
      
      #draw border
      for cnt in filtered_contours:
          x, y, w, h = cv2.boundingRect(cnt)
          cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 2)
      

      alt text

      import cv2
      import numpy as np
      
      image = cv2.imread('ball.jpg')
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      
      lower_white = np.array([170, 170, 170])   
      upper_white = np.array([255, 255, 255])  
      
      mask = cv2.inRange(image, lower_white, upper_white)
      contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
      min_area = 500
      filtered_contours = [cnt for cnt in contours if cv2.contourArea(cnt) > min_area]
      
      for cnt in filtered_contours:
          x, y, w, h = cv2.boundingRect(cnt)
          cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 2)
      
      
      cv2.imshow('Object Detection', image)
      cv2.waitKey(0)
      cv2.destroyAllWindows()
      

      It is important to note that we are utilizing a robotic arm to grasp the object. Hence, merely detecting the target object is insufficient. We must obtain the coordinate information of the object. To achieve this, we use OpenCV's Aruco markers, which are commonly used 2D barcodes for tasks such as camera calibration, pose estimation, and camera tracking in computer vision. Each Aruco marker has a unique identifier. By detecting and recognizing these markers, we can infer the position of the camera and the relationship between the camera and the markers.
      alt text
      alt text
      The two unique Arcuo codes in the picture are used to fix the size of the cropped picture and the position of the arcuo code, and the target object can be obtained through calculation.
      alt text
      With the Aruco marker's positioning, we can detect the location of the target object. We can then convert the x and y coordinates into world coordinates and provide them to the robotic arm's coordinate system. The robotic arm can then proceed with grasping the object.
      0_1690940889304_微信图片_20230802094700.png

      # get points of two aruco
       def get_calculate_params(self, img):
              """
              Get the center coordinates of two ArUco codes in the image
              :param img: Image, in color image format.
              :return: If two ArUco codes are detected, returns the coordinates of the centers of the two codes; otherwise returns None.
              """
              # Convert the image to a gray image 
              gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
              # Detect ArUco marker.
              corners, ids, rejectImaPoint = cv2.aruco.detectMarkers(
                  gray, self.aruco_dict, parameters=self.aruco_params
              )
      
              """
              Two Arucos must be present in the picture and in the same order.
              There are two Arucos in the Corners, and each aruco contains the pixels of its four corners.
              Determine the center of the aruco by the four corners of the aruco.
              """
              if len(corners) > 0:
                  if ids is not None:
                      if len(corners) <= 1 or ids[0] == 1:
                          return None
                      x1 = x2 = y1 = y2 = 0
                      point_11, point_21, point_31, point_41 = corners[0][0]
                      x1, y1 = int((point_11[0] + point_21[0] + point_31[0] + point_41[0]) / 4.0), int(
                          (point_11[1] + point_21[1] + point_31[1] + point_41[1]) / 4.0)
                      point_1, point_2, point_3, point_4 = corners[1][0]
                      x2, y2 = int((point_1[0] + point_2[0] + point_3[0] + point_4[0]) / 4.0), int(
                          (point_1[1] + point_2[1] + point_3[1] + point_4[1]) / 4.0)
      
                      return x1, x2, y1, y2
              return None
      
          # set camera clipping parameters  
          def set_cut_params(self, x1, y1, x2, y2):
              self.x1 = int(x1)
              self.y1 = int(y1)
              self.x2 = int(x2)
              self.y2 = int(y2)
      
          # set parameters to calculate the coords between cube and mycobot320
          def set_params(self, c_x, c_y, ratio):
              self.c_x = c_x
              self.c_y = c_y
              self.ratio = 320.0 / ratio
      
          # calculate the coords between cube and mycobot320
          def get_position(self, x, y):
              return ((y - self.c_y) * self.ratio + self.camera_x), ((x - self.c_x) * self.ratio + self.camera_y)
      

      Summary

      Vision-based control technology for robotic arms is a rapidly developing and widely applied technology. Compared to traditional robotic arm control technology, vision-based control technology boasts advantages such as high efficiency, precision, and flexibility, and can be extensively utilized in industrial production, manufacturing, logistics, and other fields. With the constant evolution of technology such as artificial intelligence and machine learning, vision-based control technology for robotic arms will have even wider application scenarios. In the future, it will be necessary to strengthen technological research and development and innovation, constantly improving the level of technology and application capabilities.

      posted in PROJECTS
      ElephantRobotics
    • Automating Fruit Harvesting and Sorting with 2 Robotic Arms-M5Stack Basic

      Introduction

      As the hottest field in technology, robotics is revolutionizing industries and driving innovation across the globe. To meet the growing demand for skilled professionals in this rapidly evolving landscape, Elephant Robotics has developed a groundbreaking robotics education solution for colleges and universities. This innovative solution combines the simulation of an automated fruit picker with the compound robot of automation of fruit sorting and delivery, offering students a comprehensive learning experience in one of the most sought-after and trending areas of technology.
      In this artcile, we shall expound in detail upon the scenario of fruit harvesting and sorting robots. We shall commence with an introduction to the kit and its employment scenarios, and proceed to elaborate on the robotic arm's functional implementation.

      Fruit-Picker Kit

      alt text
      The pic depicts the Fruit-Picker Kit, which comprises of the following components.
      alt text
      Your curiosity must be piqued as to how this kit operates. Allow me to elucidate the operational process of the kit. Firstly, as you can observe, we have two robotic arms that perform distinct functions. The robotic arm closest to the fruit tree is the fruit-picking robot, henceforth referred to as R1. The arm situated in the middle is the sorting robot, abbreviated as R2. Their nomenclature itself implies their respective tasks. R1 is responsible for plucking the fruits from the tree and placing them on the conveyor belt, while R2 sorts out any substandard fruits from the conveyor belt.
      Step:
      Harvesting: R1 employs a depth camera to identify the fruits on the tree and then locates them. The coordinates of the fruit are sent to R1 for picking.
      Transportation: Through R1's picking, the fruit is placed on the conveyor belt. The conveyor belt transports the fruit to the range recognizable by R2 for determining the quality of the fruit.
      Sorting: The camera above R2 recognizes the fruit in its sight range and uses an algorithm to determine its quality. If it is deemed as good fruit, the fruit is transported by the conveyor belt to the fruit collection area. If it is deemed as bad fruit, the coordinates of the bad fruit are transmitted to R2, which then grabs the target and places it in a specific area.
      The aforementioned process is continuously looped: Harvesting->Transportation->Sorting->Harvesting->Transportation.

      Produc

      Robotic Arm - mechArm 270 M5Stack

      This is a compact six-axis robotic arm with a center-symmetrical structure (mimicking an industrial structure) that is controlled by the M5Stack-Basic at its core and assisted by ESP32. The mechArm 270-M5 weighs 1kg and can carry a load of 250g, with a working radius of 270mm. It is designed to be portable, small but highly functional, simple to operate, and capable of working safely in coordination with human beings.
      alt text

      Conveyor belt

      A conveyor belt is a mechanical device used for transporting items, commonly comprised of a belt-shaped object and one or more rolling axes. They can transport various items such as packages, boxes, food, minerals, building materials, and many more. The working principle of the conveyor belt involves placing the item on the moving belt and then moving it to the target location. Conveyor belts typically consist of a motor, a transmission system, a belt, and a support structure. The motor provides power, the transmission system transfers the power to the belt, causing it to move.
      alt text
      Currently, on the market, various types of conveyor belts can be customized according to the user's needs, such as the length, width, height of the conveyor belt, and the material of the track.

      3D Depth camera

      Due to the diversity of usage scenarios, ordinary 2D cameras fail to meet our requirements. In our scenario, we use a depth camera. A depth camera is a type of camera that can obtain depth information of a scene. It not only captures the color and brightness information of a scene, but also perceives the distance and depth information between objects. Depth cameras usually use infrared or other light sources for measurement to obtain the depth information of objects and scenes.
      alt text
      The depth camera can capture various types of information such as depth images, color images, infrared images, and point cloud images.
      alt text

      Adaptive gripper - robotic arm end effector.

      The adaptive gripper is an end effector used to grasp, grip, or hold objects. It consists of two movable claws that can be controlled by the robotic arm's control system to adjust their opening and closing degree and speed.
      The adaptive gripper is an end effector used to grasp, grip, or hold objects. It consists of two movable claws that can be controlled by the robotic arm's control system to adjust their opening and closing degree and speed.

      Project functions

      To begin with, we must prepare the compiling environment. This scenario is coded using the Python language. Therefore, it is essential to install the environment for using and learning purposes.

      numpy==1.24.3
      opencv-contrib-python==4.6.0.66
      openni==2.3.0
      pymycobot==3.1.2
      PyQt5==5.15.9
      PyQt5-Qt5==5.15.2
      PyQt5-sip==12.12.1
      pyserial==3.5
      

      Let us first delve into the introduction of the machine vision recognition algorithm. The project's functionality can mainly be divided into three parts:

      • Machine vision recognition algorithm, including depth recognition algorithm
      • Robotic arm control and trajectory planning
      • Communication and logical processing among multiple machines

      Machine vision recognition algorithm

      Camera calibration is required before using the depth camera.Here is a tutorial link
      Camera calibration:
      Camera calibration refers to the process of determining the internal and external parameters of a camera through a series of measurements and calculations. The internal parameters of the camera include focal length, principal point position, and pixel spacing, while the external parameters of the camera include its position and orientation in the world coordinate system. The purpose of camera calibration is to enable the camera to accurately capture and record information about the position, size, and shape of objects in the world coordinate system.

      Our target objects are fruits, which come in different colors and shapes, ranging from red, orange to yellow. To ensure accurate and safe fruit grasping, it is necessary to gather comprehensive information about the fruits, including their width, thickness, and other characteristics, to perform intelligent grasping.
      alt text
      Let us examine the target fruits. Currently, the most prominent difference between them is their distinct colors. We will select the targets with red and orange hues. We will use the HSV color space to locate the targets. The following code is designed to detect the target fruits.

      class Detector:
          class FetchType(Enum):
              FETCH = False
              FETCH_ALL = True
      
          """
          Detection and identification class
          """
      
          HSV_DIST = {
              # "redA": (np.array([0, 120, 50]), np.array([3, 255, 255])),
              # "redB": (np.array([176, 120, 50]), np.array([179, 255, 255])),
              "redA": (np.array([0, 120, 50]), np.array([3, 255, 255])),
              "redB": (np.array([118, 120, 50]), np.array([179, 255, 255])),
              # "orange": (np.array([10, 120, 120]), np.array([15, 255, 255])),
              "orange": (np.array([8, 150, 150]), np.array([20, 255, 255])),
              "yellow": (np.array([28, 100, 150]), np.array([35, 255, 255])), # old
              # "yellow": (np.array([31, 246, 227]), np.array([35, 255, 255])),   # new
          }
      
          default_hough_params = {
              "method": cv2.HOUGH_GRADIENT_ALT,
              "dp": 1.5,
              "minDist": 20,
              "param2": 0.6,
              "minRadius": 15,
              "maxRadius": 40,
          }
      
          def __init__(self, target):
              self.bucket = TargetBucket()
              self.detect_target = target
              
          def get_target(self):
              return self.detect_target
      
          def set_target(self, target):
              if self.detect_target == target:
                  return
              self.detect_target = target
              if target == "apple":
                  self.bucket = TargetBucket(adj_tolerance=25, expire_time=0.2)
              elif target == "orange":
                  self.bucket = TargetBucket()
              elif target == "pear":
                  self.bucket = TargetBucket(adj_tolerance=35)
      
          def detect(self, rgb_data):
              if self.detect_target == "apple":
                  self.__detect_apple(rgb_data)
              elif self.detect_target == "orange":
                  self.__detect_orange(rgb_data)
              elif self.detect_target == "pear":
                  self.__detect_pear(rgb_data)
      

      alt text
      Color streaming video on the left, depth video on the right

      The first step is to accurately detect the target fruits in order to obtain their coordinates, depth, and other relevant information. We will define the necessary information to be gathered and store it for later use and retrieval.

      class VideoCaptureThread(threading.Thread):
          def __init__(self, detector, detect_type = Detector.FetchType.FETCH_ALL.value):
              threading.Thread.__init__(self)
              self.vp = VideoStreamPipe()               
              self.detector = detector                  
              self.finished = True                      
              self.camera_coord_list = []               
              self.old_real_coord_list = []             
              self.real_coord_list = []                 
              self.new_color_frame = None              
              self.fruit_type = detector.detect_target  
              self.detect_type = detect_type            
              self.rgb_show = None
              self.depth_show = None
      

      Ultimately, our end goal is to obtain the world coordinates of the fruits, which can be transmitted to the robotic arm for executing the grasping action. By converting the depth coordinates into world coordinates, we have already achieved half of our success. Finally, we only need to transform the world coordinates into the coordinate system of the robotic arm to obtain the grasping coordinates of the target fruits.

      # get world coordinate
       def convert_depth_to_world(self, x, y, z):
              fx = 454.367
              fy = 454.367
              cx = 313.847
              cy = 239.89
      
              ratio = float(z / 1000)
              world_x = float((x - cx) * ratio) / fx
              world_x = world_x * 1000
              world_y = float((y - cy) * ratio) / fy
              world_y = world_y * 1000
              world_z = float(z)
              
              return world_x, world_y, world_z
      

      To proceed, we have successfully detected and obtained the graspable coordinates of the target object, which can now be transmitted to the robotic arm. Our next step is to handle the control and trajectory planning of the robotic arm.

      Robotic arm control and trajectory planning

      When it comes to controlling the robotic arm, some may find it difficult to make it move according to their desired trajectory. However, there is no need to worry. Our MechArm270 robotic arm is equipped with pymycobot, a relatively mature robotic arm control library. With just a few simple lines of code, we can make the robotic arm be controlled.
      (Note: pymycobot is a Python library for controlling robotic arm movement. We will be using the latest version, pymycobot==3.1.2.)

      #Introduce two commonly used control methods
      '''
      Send the degrees of all joints to robot arm.
      angle_list_degrees: a list of degree value(List[float]), length 6
      speed: (int) 0 ~ 100,robotic arm's speed
      '''
      send_angles([angle_list_degrees],speed)
      send_angles([10,20,30,45,60,0],100)
      '''
      Send the coordinates to robot arm
      coords:a list of coordiantes value(List[float]), length 6
      speed: (int) 0 ~ 100,robotic arm's speed
      '''
      send_coords(coords, speed)
      send_coords([125,45,78,-120,77,90],50)
      

      we can transmit the trajectory to the robotic arm in either angle or coordinate format for execution.

      Harvesting Robot Trajectory Planning

      After implementing basic control, our next step is to design the trajectory planning for the robotic arm to grasp the fruits. In the recognition algorithm, we obtained the world coordinates of the fruits. We will process these coordinates and convert them into the coordinate system of the robotic arm for targeted grasping.

      #deal with fruit world coordinates
         def target_coords(self):
              coord = self.ma.get_coords()
              while len(coord) == 0:
                  coord = self.ma.get_coords()
      
              target = self.model_track()
              coord[:3] = target.copy()
              self.end_coords = coord[:3]
      
              if DEBUG == True:
                  print("coord: ", coord)
                  print("self.end_coords: ", self.end_coords)
      
              self.end_coords = coord
      
              return coord
      

      With the target coordinates in hand, we can now plan the trajectory of the robotic arm. During the motion of the robotic arm, we must ensure that it does not collide with any other structures or knock down any fruits.
      0_1688976843495_e4b99e82-d993-4a78-9e90-48012fa60c23-image.png
      When planning the trajectory, we can consider the following aspects:
      ●Initial posture
      ●Desired grasping posture
      ●Obstacle avoidance posture
      We should set up various postures according to the specific requirements of the scene.
      Sorting robot trajectory planning
      The prior subject matter pertained to the harvesting robot, and now we shall delve into the discussion of trajectory planning for sorting robots. In actuality, the trajectory planning for both of these robots is quite similar. The sorting objective is the targets that are present on the conveyor belt, and these targets are sorted out by utilizing a depth camera which captures the coordinates of the spoiled fruit present on the conveyor belt.
      0_1688976882612_45054419-f577-41a9-accc-e1c21a91fb59-image.png
      The control and trajectory planning for both the harvesting robot and sorting robot have been addressed. Moving on, it is time to shed light on a rather crucial aspect of this set-up. The communication channel between these two robotic arms is of utmost importance. How do they ensure effective communication, avoid entering into a deadlock, and operate in a seamless manner along with the conveyor belt? This is what we shall discuss next.

      Communication and logical processing among multiple machines

      Without a comprehensive and coherent logic, these two robotic arms would have certainly engaged in a clash by now. Let us examine the flowchart of the entire program.
      alt text
      When R2 detects a spoiled fruit, R1's robotic arm is temporarily halted. After R2 completes its sorting task, it transmits a signal to R1, enabling it to resume harvesting operations.
      Socket
      Since communication is imperative, the Socket library becomes indispensable. Socket is a standard library that is frequently employed in network programming. It offers a set of APIs for network communication, enabling facile data transfer between different computers. To address the communication issue between R1 and R2, we have devised a solution wherein a server and a client are established.
      alt text
      The following is the relevant code and initialization information of the server establishment.
      Code:

      class TcpServer(threading.Thread):
          def __init__(self, server_address) -> None:
              threading.Thread.__init__(self)
              self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
              self.s.bind(server_address)
              print("server Binding succeeded!")
              self.s.listen(1)
              self.connected_obj = None
              
              self.good_fruit_str = "apple"
              self.bad_fruit_str = "orange"
              self.invalid_fruit_str = "none"
      
              self.target = self.invalid_fruit_str
              self.target_copy = self.target
              self.start_move = False
      

      Here is client.
      Code:

      class TcpClient(threading.Thread):
          def __init__(self, host, port, max_fruit = 8, recv_interval = 0.1, recv_timeout = 30):
              threading.Thread.__init__(self)
      
              self.good_fruit_str = "apple"
              self.bad_fruit_str = "orange"
              self.invalid_fruit_str = "none"
      
              self.host = host
              self.port = port
      
              # Initializing the TCP socket object
              # specifying the usage of the IPv4 address family
              #  designating the use of the TCP stream transmission protocol
              self.client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      
              self.client_socket.connect((self.host, self.port))        
              self.current_extracted = 0
              self.max_fruit = max_fruit
              self.recv_interval = recv_interval
              self.recv_timeout = recv_timeout
      
              self.response_copy = self.invalid_fruit_str
              self.action_ready = True
      

      Now that the server and client have been established, we can enable R1 and R2 to communicate just like we do through text messages.
      R1: "I am currently harvesting fruits."
      R2: "Message received, over."\

      Technical points

      Indeed, throughout this project, effective communication and logical processing between multiple robotic arms are of utmost importance. There are several ways to establish communication between two robotic arms, such as physical communication through serial ports, Ethernet communication through LAN, Bluetooth communication, etc.
      In our project, we are utilizing Ethernet communication using the existing TCP/IP protocol, and implementing it through the socket library in Python. As we all know, laying a strong foundation is essential when building a house; similarly, setting up a framework is crucial when starting a project. Furthermore, understanding the principles of robotic arm control is also imperative. It is necessary to learn how to convert the target object into world coordinates and then transform it into the target coordinates in the robotic arm's reference frame.
      https://www.youtube.com/watch?v=sWCuwXSjNfs

      Summary

      The application of these fruit harvesting and sorting robots is not only beneficial for students to better understand the principles of mechanics and electronic control technology, but also to foster their interest and passion for scientific technology. It provides an opportunity for practical training and competition thinking.
      Learning about robotic arms requires practical operation, and this application provides a practical opportunity that allows students to deepen their understanding and knowledge of robotic arms through actual operation. Moreover, this application can help students learn and master technologies such as robotic arm motion control, visual recognition, and object gripping, thereby helping them better understand the relevant knowledge and skills of robotic arms.
      Furthermore, this application can also help students develop their teamwork, innovative thinking, and competition thinking abilities, laying a solid foundation for their future career development. Through participating in robot competitions, technology exhibitions, and other activities, students can continuously improve their competition level and innovative ability, making them better equipped to adapt to future social development and technological changes.

      posted in PROJECTS
      ElephantRobotics
    • RE: AI Robotic Arm Plays Connect4:Who will Win!

      @ajb2k3 I deeply apologize for the error. I was attempting to find a suitable term to encompass all board games, and had considered using "board games" but found it to be inadequate. I have now revised it to exclusively refer to Connect4.

      posted in PROJECTS
      ElephantRobotics
    • AI Robotic Arm Plays Connect4:Who will Win!

      In the previous article, we delved into the creation of a cerebral apparatus capable of conducting the Connect 4 game. We briefly introduced various gaming algorithms such as the minimax algorithm, Alpha-Beta pruning algorithm, among others. However, the most crucial algorithm nowadays is the neural network algorithm and deep learning. The neural network algorithm allows computers to think like humans by setting unique environments for learning connect4. In this article, we will further explore how to implement the idea of a robotic arm playing connect4. In other words, an AI mechanical arm that can play the game of connect4. If you are interested, I would recommend reading the previous article.

      Introduction

      The following content is mainly divided into four sections for introduction:
      ● Information acquisition: The chessboard information is obtained through the camera to conduct the game.
      ● Information processing: The acquired information is processed to identify the position of the connect4 pieces. Through the gaming algorithm, the next step of the connect4 piece is calculated.
      ● Robotic arm trajectory: The design of how the mechanical arm can grasp the connect4 pieces and the path of placing the connect4 pieces.
      ● Integration of functions: The above three functions are combined to achieve AI robotic arm connect4 playing.
      alt text
      Let see together!

      Process

      Information acquisition
      Environment: Python,OpenCV the lastest,numpy
      To begin with, the acquisition of chessboard information is necessary, which includes the chessboard itself and the pieces on it. We utilize the OpenCV library's method cv2.aruco.ArucoDetector(dictionary, parameters) to detect the Aruco QR codes, which marks the position of the Aruco from an image and calculates the position and posture information. This will allow us to determine the entire chessboard's position, including the location of the four corners.
      alt text
      Code: The code uses the cv2.aruco.ArucoDetector(dictionary, parameters) method to determine the position of our board.

      dictionary = cv2.aruco.getPredefinedDictionary(cv2.aruco.DICT_6X6_250)
              parameters = cv2.aruco.DetectorParameters()
              detector = cv2.aruco.ArucoDetector(dictionary, parameters)
      
              corners, ids, rejectedCandidates = detector.detectMarkers(bgr_data)
              rvec, tvec, _ = cv2.aruco.estimatePoseSingleMarkers(corners, 0.05, self.mtx, self.dist)
      
              if rvec is None or len(corners) != 4:
                  return None
      
              # debug
              if DEBUG:
                  debug_img = bgr_data.copy()
                  for i in range(rvec.shape[0]):
                      cv2.drawFrameAxes(debug_img, self.mtx, self.dist, rvec[i, :, :, ], tvec[i, :, :, ],
                                        0.03)
                      # Draw a square around the marker.
                      cv2.aruco.drawDetectedMarkers(debug_img, corners)
                  cv2.imshow("debug1", debug_img)
      
              # Sort the detected QR code corner points in the following order: top left, top right, bottom left, bottom right.
              corners = np.mean(corners, axis=2)
              corners = (np.ceil(corners)).astype(int)
              corners = corners.reshape((4, 2))
              cx, cy = (np.mean(corners[:, 0]), np.mean(corners[:, 1]))
              res: list = [None for _ in range(4)]
              for x, y in corners:
                  if x < cx and y < cy:
                      res[0] = (x, y)
                  elif x > cx and y < cy:
                      res[1] = (x, y)
                  elif x < cx and y > cy:
                      res[2] = (x, y)
                  else:
                      res[3] = (x, y)
              res = np.array(res)
      
              ## debug code
              if DEBUG:
                  debug_img = bgr_data.copy()
                  for p in res:
                      cv2.circle(debug_img, p, 3, BGR_GREEN, -1)
                  cv2.imshow("aruco", debug_img)
      
              return res
      
      

      Once the chessboard is determined, we can assign different colors to represent the connect4 pieces. For a clearer distinction, we will use two highly contrasting colors, red and yellow, and label them accordingly.
      alt text
      We will set a logical sequence wherein every time a new connect4 piece is added to the board, the current state of the chessboard data will be passed on to the gaming algorithm to determine the next move.

      Information processing

      As previously mentioned, we have acquired the data of the chessboard. The next step is to pass this information to the gaming algorithm to predict the location of the next connect4 piece.
      Here is the pseudocode for the processing:

      function model_predict(state, available_actions):
          # Convert available_actions to numpy array
          available_actions = np.array(available_actions)
          # Extend the state to meet the input requirements of the ONNX model
          state = np.expand_dims(np.expand_dims(np.array(state, dtype=np.float32), axis=0), axis=0)
          # Build the input of the ONNX model
          ort_inputs = {self.policy_net.get_inputs()[0].name: state}
          # Make model predictions and get the predicted value for each available location
          r_actions = self.policy_net.run(None, ort_inputs)[0][0, :]
          # Select the optimal drop position according to the predicted value
          state_action_values = np.array(
              [r_actions[action] for action in available_actions])
          argmax_action = np.argmax(state_action_values)
          greedy_action = available_actions[argmax_action]
          return greedy_action
      

      The main logic of this method is to use ONNX models for model prediction and select the optimal connect4 piece location based on the predicted results. Firstly, the available locations, which are stored as available_actions, are converted into numpy arrays. Then, the current game state, state, is expanded to meet the input requirements of the ONNX model. The expanded state is then passed to the ONNX model for prediction, and the predicted results are stored in the r_actions variable. Afterwards, based on the predicted results and available move locations, the predicted value of each available location is calculated. The location with the highest predicted value is selected as the optimal connect4 piece location and returned.

      Robotic Arm trajectory

      Now that we have both the brain (gaming algorithm) and eyes (recognition algorithm), all we need is a hand to execute the actions. We will use the Python library pymycobot to control the mechanical arm. Due to the nature of the chessboard, the connect4 pieces can only be dropped from above. By assigning a coordinate point for each connect4 piece in each column, we can plan the trajectory of the mechanical arm. As the chessboard is relatively clean without any obstructions, we do not need to consider too many path factors.
      Code:

      # Initialize and define several coordinate points
      # Set a list of length 7
      self.chess_table = [None for _ in range(7)]
      self.chess_table[0]-[6] = [J1,J2,J3,J4,J5,J6] # seven positions
      
      self.angle_table = {
          "recovery": [0, 0, 0, 0, 0, 0], #initial position
          "observe": [-2.54, 135.0, -122.95, -32.34, 2.46, -90.35],#observation position
          "stack-hover-1": [-47.19, -37.96, -58.53, 1.05, -1.93, -1.84],#Pick up the position of the pieces
              }
      

      Next, introduce some methods of controlling the robotic arm in pymycobot:

      #Sending the angle to the robotic arm.
      self.send_angles(self.chess_table[n], ARM_SPEED)
      
      #Sending the Cartesian coordinates to the robotic arm.
      self.send_coords(coord_list,ARM_SPEED)
      

      Integration of functions

      Before integrating the function points, we have to sort out the logic between them.
      alt text
      With the flow chart of playing connect4, the next step is to combine the function points together.
      This is the structure file of the program.
      ├── Agent.py The logic of the robotic arm's gameplay.
      ├── ArmCamera.py Camera control.
      ├── ArmInterface.py Robotic Arm control
      ├── Board.py Data structure of the chessboard and related judgments.
      ├── CameraDemo.py Small program for testing camera performance.
      ├── config.py Utilized to define specific configuration details.
      ├── Detection.py Machine vision recognition.
      ├── dqn.pt Neural network model file, used to implement gameplay logic.
      ├── main.py main program.
      ├── StateMachine.py A state machine.
      The specific codes are not shown here one by one, let's watch a video to see how they perform.
      video address:https://youtu.be/uq35EdptpAk

      Summary

      Theoretically speaking, it is nearly impossible for anyone to win against the machine. This is because the algorithm used by the machine can predict several steps ahead and even more, while an average person can only predict two or three steps ahead at most. However, based on the videos we have seen, the AI has only won one game, and lost the other due to structural issues. The chessboard was supposed to be in one location, but was forced to another.
      Do you find this project interesting? We will continue to improve the Connect4 game suite and make it available on our website. If you are interested, please follow us for updates.
      Would you like to try implementing other chess games using the mechanical arm, such as chess or Chinese chess? Keep in mind that different chess games use different algorithms. We welcome you to share your thoughts with us in the comments section.

      posted in PROJECTS
      ElephantRobotics
    • Let AI play Connect4:Implementing using DQN neural network with myCobot 280 M5stack

      Introduction

      Today, we shall acquaint ourselves with the subject of artificial intelligence in chess, whereby we shall employ a mechanical arm as an opponent to engage in a game of chess with you. The study of artificial intelligence in chess can be traced back to the 1950s, when computer scientists began to explore the writing of programs that would enable computers to play chess. The most renowned example of this was Deep Blue, developed by IBM, which in 1997 defeated the reigning world chess champion, Gary Kasparov, with a score of 3.5-2.5.
      alt text
      Artificial intelligence in chess is akin to granting a computer a mode of deliberation that enables it to achieve victory in a match. There are many such modes, most of which stem from superior algorithms. At its core, Deep Blue's algorithm was based on brute-force search: generating every possible move, conducting searches as deeply as possible, constantly evaluating the game state, and attempting to identify the optimal move.
      Now, let me show you how to realize the robot arm to play chess intelligently.

      Connect 4

      The game I shall introduce today is known as Connect4, a strategic board game commonly referred to as "Four in a Row". The objective of Connect4 is to achieve a horizontal, vertical, or diagonal sequence of four game pieces within a vertically-oriented grid comprising six rows and seven columns. Two players take turns inserting their respective pieces from the top of the grid, with the pieces falling to the lowest available position within the selected column. Players may select any column in which to place their pieces, provided the pieces are placed only underneath existing pieces.
      0_1685428981128_fce141b2-119c-4ef2-8908-d6cd42dc039b-image.png
      As shown in the animation, this is Connect4.

      myCobot 280

      The robotic arm selected for the task is the myCobot 280 M5Stack, a powerful desktop six-axis robot arm that employs the M5Stack-Basic as its control core and supports multiple programming languages. The myCobot 280's six-axis structure grants it high flexibility and precision, enabling it to perform a variety of complex operations and movements. It supports multiple programming languages, including Python, C++, and Java, allowing developers to program and control the mechanical arm according to their specific needs. Its user-friendly interface and detailed user manual facilitate rapid familiarization, while its embedded design ensures a compact size, making it easy to carry and store.
      alt text
      This is the scene we built,we use myCobot280 to play connect4 with us.

      Algorithm for playing chess

      Firstly, we must address the crucial matter of which algorithm to employ for playing chess. In other words, we must provide the mechanical arm with a cognitive brain capable of deep contemplation. Allow me to briefly present to you a few commonly used algorithms for playing chess:

      The Minimax Algorithm:

      This is a classic game algorithm that is applicable for two-player games. It works by recursively simulating the moves of both the opponent and oneself, evaluating the score of every possible move, and selecting the action with the highest score. The Minimax Algorithm can find the best chess strategy by searching through the tree structure of the game board. This algorithm is a zero-sum game, meaning that one player chooses the option that maximizes their advantage from the available choices, while the other player selects the method that minimizes the advantage of their opponent. The total sum is zero at the beginning. Let me give a simple example of Tic-Tac-Toe to illustrate this.
      alt text
      Max represents us, while Min represents our opponent. At this point, we need to assign a score to every possible outcome, which is known as the Utility. This score is evaluated from our perspective (Max), for example, in the figure above, if I win, the score is +1, if I lose, it is -1, and if it is a tie, it is 0. Therefore, we want to maximize this score, while our opponent wants to minimize it. (In the game, this score is called the static value.) I must mention that Tic-Tac-Toe is a relatively simple game, so it is possible to list all possible outcomes. However, for most games, it is impossible to list all outcomes. According to the computational power of the computer, we may only be able to look forward to 7 or 8 steps, so the score is not as simple as -1, 1, 0. There will be specific algorithms to assign different scores based on the current outcome.

      Alpha-Beta Pruning Algorithm:

      This is an optimization of the Minimax Algorithm. It reduces the number of branches to be searched by pruning, thus speeding up the search process. The Alpha-Beta Pruning Algorithm uses upper and lower bounds (Alpha and Beta values) to determine which branches can be discarded, reducing the depth of the search.
      Neural Networks and Deep Learning:
      The connect4 game algorithm that we designed also uses neural networks and deep learning for gameplay.

      Neural Networks:

      Scientists have always hoped to simulate the human brain and create machines that can think. Why can humans think? Scientists have found that the reason lies in the neural network of the human body. Neural networks are mathematical models that simulate the structure and function of the human brain's nervous system. They process information and learn by simulating the connections and signal transmission between neurons. Neural networks are the foundation of all artificial intelligence.
      alt text
      The basic idea of the neural network algorithm is to pass input data to the input layer of the network, then obtain the output layer's results through a series of computations and transmissions in intermediate layers (hidden layers). The training process adjusts the connection weights to minimize the difference between the actual output and expected output, optimizing the performance of the network.
      Deep Learning:
      Deep learning is a branch of machine learning that focuses on using deep neural networks for learning and reasoning. Deep learning solves complex learning and decision-making problems by constructing deep neural networks, which have multiple intermediate layers (hidden layers). It can be said that deep learning is a learning method that uses neural networks as the core tool. Deep learning not only includes the structure and algorithm of neural networks but also involves training methods, optimization algorithms, and large-scale data processing.

      Project

      The project is mainly divided into two parts, hardware and software:
      alt text
      The most crucial part of this project is information collection, analysis, and processing. As mentioned earlier, we also used relevant knowledge of neural algorithms and deep learning, and the specific neural network used is the DQN neural network.

      DQN Neural Network:

      The DQN neural network was proposed by DeepMind and combines the ideas of deep learning and reinforcement learning. DQN uses a deep neural network to estimate the state-action value function (Q function), enabling optimal decision-making in complex environments. The core idea of DQN is to use a deep neural network as a function approximator to approximate the state-action value function. By taking the current state as input, the neural network outputs the corresponding Q value for each action, that is, predicting the long-term return of that action in the current state. Then, the optimal action is selected and executed based on the Q value.

      Environment Setup:

      Firstly, we need to define the Connect4 game using a two-dimensional array to represent the game board and two types of game pieces, red (R) and yellow (Y). We also need to define the end condition of the game, which is when four game pieces of the same color are connected in a line, the game ends.

      self.bgr_data_grid = [[None for j in range(6)] for i in range(7)]
      
      #Used to display the state of the board
      def debug_display_chess_console(self):
          for y in range(6):
              for x in range(7):
                  cell = self.stable_grid[x][y]
                  if cell == Board.P_RED:
                      print(Board.DISPLAY_R, end="")
                  elif cell == Board.P_YELLOW:
                      print(Board.DISPLAY_Y, end="")
                  else:
                      print(Board.DISPLAY_EMPTY, end="")
              print()
          print()
      

      alt text
      Here is the code that defines whether the game is over:

      def is_game_over(board):
          # Check if there are four consecutive identical pieces in a row.
          for row in board:
              for col in range(len(row) - 3):
                  if row[col] != 0 and row[col] == row[col+1] == row[col+2] == row[col+3]:
                      return True
      
          # Check if there are four consecutive identical pieces in a column.
          for col in range(len(board[0])):
              for row in range(len(board) - 3):
                  if board[row][col] != 0 and board[row][col] == board[row+1][col] == board[row+2][col] == board[row+3][col]:
                      return True
      
          # Examine if there are four consecutive identical pieces in a diagonal line.
          for row in range(len(board) - 3):
              for col in range(len(board[0]) - 3):
                  if board[row][col] != 0 and board[row][col] == board[row+1][col+1] == board[row+2][col+2] == board[row+3][col+3]:
                      return True
      
          for row in range(len(board) - 3):
              for col in range(3, len(board[0])):
                  if board[row][col] != 0 and board[row][col] == board[row+1][col-1] == board[row+2][col-2] == board[row+3][col-3]:
                      return True
      
          # Verify if the game board is filled completely.
          for row in board:
              if 0 in row:
                  return False
      
          return True
      

      Building the DQN Neural Network:
      We need to define the input layer and output layer of the neural network. The dimension of the input layer should match the state representation of the game board, while the dimension of the output layer should match the number of legal actions. In short, the input layer receives the status information of the game board, and the output layer generates the corresponding action selection.

      Experience Replay Buffer:

      Machines need to learn, so we need to build an experience replay buffer to store the agent's experience. This can be a list or queue used to store information such as the state, action, reward, and next state during the game process.
      Here is the pseudocode for constructing the experience replay buffer:

       class ReplayBuffer:
          def __init__(self, capacity):
              self.capacity = capacity
              self.buffer = []
      
          def add_experience(self, experience):
              if len(self.buffer) >= self.capacity:
                  self.buffer.pop(0)
              self.buffer.append(experience)
      
          def sample_batch(self, batch_size):
              batch = random.sample(self.buffer, batch_size)
              states, actions, rewards, next_states, dones = zip(*batch)
              return states, actions, rewards, next_states, dones
      

      Decision-making:

      We have defined a strategy class named EpsilonGreedyStrategy, which uses the ε-greedy strategy for action selection and exploration. In the initialization function init(), we specify the exploration rate ε. The select_action() method selects actions based on the Q-value, randomly selects actions with a probability based on the exploration rate or selects the action with the highest Q-value.

      class EpsilonGreedyStrategy:
          def __init__(self, epsilon):
              self.epsilon = epsilon
      
          def select_action(self, q_values):
              if random.random() < self.epsilon:
                  action = random.randint(0, len(q_values) - 1)
              else:
                  action = max(enumerate(q_values), key=lambda x: x[1])[0]
              return action
      

      Training Framework:
      We use the PyTorch framework in Python to construct the training and implement loop training. We regularly evaluate the performance of the agent by playing against the current DQN neural network and pre-trained or other opponents until the preset requirements are met.
      demo

      Summary

      This article has come to a temporary conclusion. We mainly introduced how the DQN neural algorithm is implemented in Connect4. The next article will introduce how the robotic arm executes the optimal solution. The algorithm described in this article is just the tip of the iceberg. If you are interested in game algorithms, you can refer to relevant books for further understanding.
      We are currently in a period of great change, where artificial intelligence is everywhere, not only capable of defeating top players in games but also has a presence in various fields. We must seize the opportunity to keep up with this technology-filled 21st century.
      We will soon update the next article. If you are interested, please follow us and leave a message below, which is the best support for us!

      posted in PROJECTS
      ElephantRobotics
    • Object Tracking on myCobot 280 Jetson Nano: A Case Study

      Introduction

      When we saw a video on YouTube of someone using a robotic arm to achieve object tracking, it deeply inspired us. We became very interested in this project and decided to independently develop a similar program.

      alt text

      myCobot 280 M5Stack

      The robotic arm used for the operation is the myCobot280 M5Stack. This is a small 6-axis robotic arm produced by Elephant Robotics, with M5Stack-Basic as the microprocessor, ESP32 as the auxiliary control, and a UR collaborative structure. The myCobot280 M5Stack-Basic has a body weight of 800g, a payload of 250g, a working radius of 280mm, and a compact and portable design. Despite its small size, it is powerful and easy to operate, capable of collaborating with humans and working safely.
      alt text

      Process

      The following image is a flowchart of the project development process.
      alt text

      Capture the target

      Before beginning development, we conducted some research and experiments. We used a camera to capture images of objects and utilized the OpenCV library for recognition. We attempted various methods, but object recognition required machine learning for the target we wanted to identify, which would increase the project development time. Ultimately, we decided to use aruco codes for identification, which allowed us to quickly capture the aruco codes and proceed to the next stage of development.
      alt text
      Code:

      def show_video_v2(self):
              # self.robot.init_robot()
              xyz = np.array([0,0,0])
              LIST = []
              num_count = 0
              list_len = 5
              # cmax = [180, 80, 240]
              # cmin = [130, -80, 200]
              cmax = [150, -150, 300]
              cmin = [-150, -250, 200]
      
              while cv2.waitKey(1) < 0:
                  success, img = self.cap.read()
                  if not success:
                      print("It seems that the image cannot be acquired correctly.")
                      break
                  # transfrom the img to model of gray
                  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                  # Detect ArUco marker.
                  corners, ids, rejectImaPoint = cv2.aruco.detectMarkers(
                      gray, self.aruco_dict, parameters=self.aruco_params
                  )
      
                  if len(corners) > 0:
                      if ids is not None:
                          # get informations of aruco
                          ret = cv2.aruco.estimatePoseSingleMarkers(
                              # '''https://stackoverflow.com/questions/53303730/what-is-the-value-for-markerlength-in-aruco-estimateposesinglemarkers'''
                              corners, 0.025, self.camera_matrix, self.dist_coeffs
                          )
                          # rvec:rotation offset,tvec:translation deviator
                          (rvec, tvec) = (ret[0], ret[1])
                          
                          (rvec - tvec).any()
                          xyz = tvec[0, 0, :] * 1000
                          rpy = rvec[0,0,:]
      
                          camera = np.array([xyz[0], xyz[1], xyz[2]])
      
                          if num_count > list_len:
                              target = model_track(camera)
                              print("target", target)
      
                              for i in range(3):
                                  if target[i] > cmax[i]:
                                      target[i] = cmax[i]
                                  if target[i] < cmin[i]:
                                      target[i] = cmin[i]
      
                              pose = np.array([-103, 8.9, -164])
                              coord = np.concatenate((target.copy(), pose), axis=0)
      
                              # q1 = math.atan(xyz[0] / xyz[2])*180/np.pi
                              mc.send_coords(coord,50,0)
                              
                              
                              # print('target', coord)
                              num_count = 1
                          else:
                              num_count = num_count + 1
                          
      
                          for i in range(rvec.shape[0]):
                              # draw the aruco on img
                              cv2.aruco.drawDetectedMarkers(img, corners)
                  cv2.imshow("show_video", img)
      

      Hand-eye calibration

      Hand-eye calibration refers to the process of determining the position and orientation of the robot end effector (such as a mechanical arm) relative to the robot base coordinate system in the field of robotics. This process involves pairing the robot end effector with a camera and then determining its position and orientation in the robot base coordinate system by capturing its position and orientation in the camera's field of view.
      Hand-eye calibration typically involves a series of movements between the robot end effector and the camera to collect enough data to calculate the transformation matrix between them. This transformation matrix describes the position and orientation of the robot end effector relative to the camera, which can be used to control the robot's motion and accurately perform the required tasks.
      In "eye-to-hand" hand-eye calibration, the camera is considered a stationary observer ("eye"), while the robot end effector is considered a moving object in the camera's field of view ("hand"). As the robot end effector moves around the camera, a series of images are collected that contain information about the end effector's position and orientation at different locations and orientations. By analyzing these images, the position and orientation of the robot end effector relative to the camera can be calculated, completing the hand-eye calibration.
      The following is the code for processing the coordinate transformation data.

      #The function is used to calculate the similarity between cameras.
      def calculate_similarity(camera):
          n = camera.shape[0]
          total_similarity = 0
          for i in range(n):
              for j in range(i+1, n):
                  vector_a = camera[i]
                  vector_b = camera[j]
                  dot_product = np.dot(vector_a, vector_b)
                  norm_a = np.linalg.norm(vector_a)
                  norm_b = np.linalg.norm(vector_b)
                  similarity = dot_product / (norm_a * norm_b)
                  total_similarity += similarity
          return total_similarity/n
      # The function is used to calculate the rate of change in similarity.
      def similarity_change_rate(new_similarity):
          global prev_similarity
          if prev_similarity is None:
              prev_similarity = new_similarity
              return 0
          else:
              change_rate = (new_similarity - prev_similarity) / prev_similarity
              prev_similarity = new_similarity
              return change_rate
      
      #The function is used to convert a rotation matrix to Euler angles.
      def CvtRotationMatrixToEulerAngle(pdtRotationMatrix):
          pdtEulerAngle = np.zeros(3)
      
          pdtEulerAngle[2] = np.arctan2(pdtRotationMatrix[1, 0], pdtRotationMatrix[0, 0])
      
          fCosRoll = np.cos(pdtEulerAngle[2])
          fSinRoll = np.sin(pdtEulerAngle[2])
      
          pdtEulerAngle[1] = np.arctan2(-pdtRotationMatrix[2, 0], (fCosRoll * pdtRotationMatrix[0, 0]) + (fSinRoll * pdtRotationMatrix[1, 0]))
          pdtEulerAngle[0] = np.arctan2((fSinRoll * pdtRotationMatrix[0, 2]) - (fCosRoll * pdtRotationMatrix[1, 2]), (-fSinRoll * pdtRotationMatrix[0, 1]) + (fCosRoll * pdtRotationMatrix[1, 1]))
      
          return pdtEulerAngle
      # The function is used to convert Euler angles to a rotation matrix.
      def CvtEulerAngleToRotationMatrix(ptrEulerAngle):
          ptrSinAngle = np.sin(ptrEulerAngle)
          ptrCosAngle = np.cos(ptrEulerAngle)
      
          ptrRotationMatrix = np.zeros((3, 3))
          ptrRotationMatrix[0, 0] = ptrCosAngle[2] * ptrCosAngle[1]
          ptrRotationMatrix[0, 1] = ptrCosAngle[2] * ptrSinAngle[1] * ptrSinAngle[0] - ptrSinAngle[2] * ptrCosAngle[0]
          ptrRotationMatrix[0, 2] = ptrCosAngle[2] * ptrSinAngle[1] * ptrCosAngle[0] + ptrSinAngle[2] * ptrSinAngle[0]
          ptrRotationMatrix[1, 0] = ptrSinAngle[2] * ptrCosAngle[1]
          ptrRotationMatrix[1, 1] = ptrSinAngle[2] * ptrSinAngle[1] * ptrSinAngle[0] + ptrCosAngle[2] * ptrCosAngle[0]
          ptrRotationMatrix[1, 2] = ptrSinAngle[2] * ptrSinAngle[1] * ptrCosAngle[0] - ptrCosAngle[2] * ptrSinAngle[0]
          ptrRotationMatrix[2, 0] = -ptrSinAngle[1]
          ptrRotationMatrix[2, 1] = ptrCosAngle[1] * ptrSinAngle[0]
          ptrRotationMatrix[2, 2] = ptrCosAngle[1] * ptrCosAngle[0]
      
          return ptrRotationMatrix
      
      

      Robotic Arm Control

      After this step, object detection and control of the mechanical arm follow, which involves converting the coordinates of the recognized object into motion commands for the mechanical arm. The pymycobot library is used to control the mechanical arm.

      #The function is used for visual tracking and calculating the target position.
      def Visual_tracking280(coord, camera):
          pose_camera = camera[:3]
          angle_camear = camera[3:]
          r = CvtEulerAngleToRotationMatrix(angle_camear)
          # r = np.array([[1, 0, 0],
          #                  [0, 1, 0],
          #                  [0, 0, 1]])
          euler = np.radians(coord[3:])
          R = CvtEulerAngleToRotationMatrix(euler)
          offset = np.array([0, 0, -250])
          Roff = np.array([[1, 0, 0],
                           [0, -1, 0],
                           [0, 0, -1]])
          # Roff = np.array([[1, 0, 0],
          #                  [0, 1, 0],
          #                  [0, 0, 1]])
          vector = pose_camera + offset
          # print("R", R)
          # print("r", r)
      
          move_pos = np.dot(np.dot(R, r), Roff).dot(vector)
          pos = coord[:3] + move_pos
          # angle = np.array(CvtRotationMatrixToEulerAngle(np.dot(np.dot(R, r), Roff))) * 180/np.pi
          angle =  coord[3:]
          target = np.concatenate((pos, angle))
          return target
          
          #Calculate the target position based on the camera coordinates.
      def model_track(camera):
          model_pos = np.array([-camera[0], -camera[2], -camera[1]])
          camera_pos = np.array([-37.5, 416.6, 322.9])
          target_pos = model_pos + camera_pos
          # print("model_pos", model_pos)
          # print("target_pos", target_pos)
          return target_pos
      
      

      Finally, let's summarize the logical relationship of the project.
      alt text
      Let's take a look at how it performs.
      alt text
      It may be noticed that sometimes myCobot does not move. This is because its body was blocking the camera, preventing it from capturing the target object. When moving objects, it is important to ensure that the mechanical arm body does not block the camera.

      This code is applicable to the entire myCobot280 series, including Pi, Jetson Nano, and other versions. The parameters may need to be adjusted based on the specific version being used.

      People often compare the Jetson Nano with the Raspberry Pi. I have tried this program on two different robotic arms, and it is evident that the Jetson Nano version is much more responsive than the Raspberry Pi, owing to its superior computational power.There is a noticeable delay of approximately one second between them as observed by the naked eye.

      Summary

      During the debugging process, we found that the tracking effect was not very smooth and responsive. We adjusted the smoothness by controlling the detection cycle, but it was necessary to slowly move the tracked object to achieve better results. There are still some shortcomings, as the body of the mechanical arm may block the camera's field of view when the camera is fixed, making it impossible to proceed with the next tracking step. One solution we thought of is to move the camera to a position where it is not blocked (which would require recalculating the coordinates). If you have any better ideas, please feel free to communicate with us! Thank you for your patience.

      posted in PROJECTS
      ElephantRobotics
    • Deep Machine Vision and Random Grasping with Robotic Arms

      Introduction

      Today, I would like to share with you my experience using the myCobot320 M5 and FS820-E1 depth camera for an unordered object grabbing demonstration. Why did I choose to use a depth camera and robotic arm for this case study?

      The commonly used 2D camera can capture two-dimensional images with pixel values in the horizontal and vertical directions. These cameras are typically used to capture static scenes or moving objects, and cannot provide depth information. In machine vision applications, 2D cameras can be used for tasks such as image classification, object detection, and recognition.

      In contrast, depth cameras can capture depth information, allowing for the acquisition of three-dimensional information about objects. These cameras use various techniques to measure object depth, such as structured light, time-of-flight, and stereo vision. In machine vision applications, 3D cameras can be used for tasks such as point cloud segmentation, object recognition, and 3D reconstruction.

      The information captured by 2D cameras is inadequate for certain special cases, hence the use of depth cameras to obtain more information, such as the length, width, and height of objects.

      Let us begin our topic for today.

      FS820-E1

      Environment building

      Firstly, I need to set up the development environment for the FS820-E1 depth camera using RVS, which is the software used for developing this depth camera. By utilizing the visual operator in RVS, I can quickly construct the node for the grabbing function.
      alt text
      In the resource window located at the top left corner, locate the TyCameraResource operator and add it to the ResourceGroup in the operator graph. In the operator list, search for the TyCameraAccess and trigger operators, and add them to the operator graph. Adjust the operator parameters according to the requirements. Then, click on Run and set the Trigger->true in the property panel to visualize the data.

      TyCameraResource operator

      ● The start and stop operators are used to respectively initiate and terminate the thread of the resource operator. The auto_start option is also used to initiate the resource operator. If it is checked, the resource thread will only automatically start when entering the running state for the first time after opening the RVS software.

      ● The reset option is used to reset the attribute parameters if they need to be changed after opening the resource thread.

      TyCameraAccess operator

      ● Open the visualization properties for cloud, RGB, and depth, and set the cloud_color to -2, which represents the true color.alt text

      myCobot 320-M5Stack

      The myCobot 320 is a practical robot designed for user-independent programming and development. The product has a maximum effective arm span of 350mm, a maximum load of 1KG, and a repetitive positioning accuracy of ±0.5mm. It supports development in various programming languages.
      alt text

      Environment building

      For this project, Python is being used and a compiling environment for Python needs to be set up, as well as installing the Pymycobot library, which enables control of the robotic arm's movement.
      pip install pymycobot --upgradealt text
      PS: It is recommended to use a computer graphics card that is at least a 1060 2G discrete graphics card for this project because it requires image recognition and other operations that can benefit from a higher-performing graphics card. The better the graphics card's performance, the faster the program will run. A 3060 graphics card is recommended.

      Random object grasping.

      Next, we will implement the random object grasping of the robotic arm, which can accurately grasp the object in any posture. The following image shows the overall operator graph, which is the unstacking.xml.
      alt text

      Hand-eye calibration

      Using a chessboard pattern for hand-eye calibration.

      Preparation:

      ● Prepare a chessboard, determine the number of rows and columns, and the length of each square (in mm).

      ● Hand-eye calibration can be divided into two types: eye in hand and eye to hand. Depending on the situation, fix the calibration board and the camera. Here we choose eye to hand calibration.
      alt text

      Data recording

      open unstacking_runtime/HandEyeCalibration/HandEyeCalibration.xml
      alt text
      Correctly fill in the number of rows and columns of the calibration board, the unit length of the calibration board cells, and the file path for saving the calibration data in the property panel.
      alt text
      Before starting the calibration process, make sure that the camera can fully recognize the complete chessboard, and during the calibration process, the chessboard must be fixed and cannot be moved. After running the process, you will get 18 sets of data.
      alt text
      alt text

      Calculation of indicator results

      If the positional error is within 0.005 (5mm), then it is considered an ideal result.
      alt text

      Coordinate system conversion

      The following steps aim to transform the coordinate system of the point cloud from the camera RGB lens coordinate system to the robot coordinate system, which involves camera extrinsic parameters and hand-eye calibration results.

      Steps:

      1. Right-click in the operator graph and select "Import Group XML here" to import the HandToEye_Depth2Robot.group.xml from the RVSCommonGroup. Besides this file, there is also HandInEye_Depth2Robot.group.xml.

      2. Connect the pose port of the hand-eye calibration data group to the rgb2robot port of the HandToEye_Depth2Robot group.

      3. Drag in the LoadCalibFile operator to load the calibration file, connect the finished port to the start port of the HandToEye_Depth2Robot group, connect the extrinsic_pose port to the rgb2depth port, and connect the start port to the InitTrigger port of the finished port. The specific connections are as follows:
        alt text

      4. Click on the Group, find the rgb2tcp operator, and in the properties panel, paste the hand-eye calibration results in the "pose" property.
        alt text

      5. Through the previous steps, we have obtained the transformation matrices from the RGB camera to the robot coordinate system (rgb2robot) and from the depth camera to the robot coordinate system (depth2robot). Here, we will transform the point cloud from the depth camera coordinate system to the robot coordinate system.

      6. First, drag the Transform operator into the graph and select "PointCloud" as the type. Connect the depth2robot port to the pose input port of this operator, and connect the pointcloud port of the LoadLocalData operator group to the same-named input port of this operator.
        alt text

      AI training

      Acquire training images

      To open the file unstacking_runtime/MaskRCNN/ty_ai_savedata.xml, you can use a text editor like Notepad or a code editor like Visual Studio Code. The contents of the file are similar to recording RGB images, and you just need to adjust the string parameter in EmitString to the desired file path. Once you have set the file path, you can click on Capture to record the images. It is recommended to record as many images as possible to ensure stability in the data.
      alt text
      Annotate the trained model
      Currently, we have recorded the RGB annotations. We recommend using the labelme software for annotation, and this document provides a method for installing labelme.

      ● 1. Install pip according to the official website:

      https://pip.pypa.io/en/stable/installation/

      ● 2. Install PyQt5:

      pip install PyQt5
      ● 3. Install labelme:

      pip install labelme
      Preparation:

      First, determine the task goal and clarify which objects need to be detected during the detection process and which objects do not need to be detected, so as to carry out targeted annotation.

      The annotation conditions given do not need to be overly strict. Do not think according to human thinking, but whether the annotation ideas set by yourself are easy to implement in the code.

      Process:

      ● Open labelme in the terminal and click "OpenDir" to select the path where our annotations are located (the Emit operator string path in step 3.2.1 for collecting training images).

      ● Click "Create Polygons" to draw a red border around the wooden blocks.
      alt text
      After finishing, a naming dialog will pop up. Please name it "wooden block" for the first time, and select it directly for subsequent boxes of the same type.

      ● When all the boxes in the image have been labeled, click "Save" to save them with the default folder and name. Then select "Next Image" to switch to the next image.

      Train AI model
      Open the file unstacking_runtime/MaskRCNN/ty_ai_train.xml, and adjust the paths for data_directory and classnames_filepath. Then click on the "start_train" button to start the training process.

      After the training process is completed, a folder named "train_output" will be generated. In this folder, there will be a model file (ending in.pth) which contains the required weights for the trained model.

      AI reasoning

      1. Drag in an Emit operator, select "pose" for the type attribute, rename it to "Grasp Reference Pose", and input "3.141592654" for the pose_roll. This operator will be used in subsequent operators. Connect the pose port of this operator to the down_pose port of the Compute Grasping Point group.

      2. Double-click to expand the Compute Grasping Point group. The data needs to be pre-trained using the MaskRCNN network. Change the type of the AIDetectGPU operator to MaskRCNN and modify the rest of the configuration file parameters accordingly. Since the AI inference operator needs to be initialized before running, an extra Trigger (type is InitTrigger) needs to be added before the operator.

      3. The AI inference operator will obtain the position region of the target in the 2D image (i.e., the mask image, corresponding to the obj_list port). Next, we need to convert these position regions to 3D point clouds, which is the ProjectMask operator in the Compute Grasping Point group. For the ProjectMask operator, we not only need to input the obj_list obtained by the AI inference operator, but also the 2D image corresponding point cloud, the transformation matrix between the 2D image camera coordinate system and the point cloud coordinate system, and the intrinsic parameters of the camera's RGB lens. Here, the point cloud has been converted to the robot coordinate system, so the transformation matrix from the RGB lens to the robot coordinate system needs to be input. The intrinsic parameters of the camera's RGB lens can be directly read from the camera parameter file. After the operator runs, a list of point clouds for all detected targets will be obtained.
        alt text
        Robotic arm positioning and grasping
        location and Identification
        According to the AI inference process, the point cloud list of all detected targets in the robot coordinate system has been obtained. Next, we need to obtain the centroid coordinates of its point cloud.

      4. Double-click to expand the Find Target group in the Calculate Grasp Point group. We need to first select the wooden block, and then sort the wooden blocks based on their Z-axis coordinates, selecting the topmost wooden block. Therefore, the FilterBoxList operator is used here and renamed to "PointCloud Height Sorting", with the following property values adjusted:

      5. Obtain the plane using the FindElement operator, select "Plane" for the type, and obtain a plane suitable for grasping in the point cloud. Adjust the distance_threshold property value of the operator to adjust the selected plane. Open the cloud visualization property to view the selected plane.

      6. Obtain the center point of the plane using the MinimumBoundingBox operator, rename it to "Get Bounding Box", select "ApproxMVBB" for the type attribute to obtain a convenient coordinate center point for the robot to grasp. Here, give the operator a ref_pose, which is connected to "TowardsDownPose" mentioned in 3.3.4 during AI inference, indicating a 180° rotation around the X-axis, so that the Z-axis faces downward for ease of robot grasping. Open the box and box_pose visualization properties in the "GetBoxCube" attribute panel to display the calculated center point of the plane.
        alt text

      7. Adjust the direction of the wooden block using the AdjustBoxNode operator, which selects objects with a length greater than their width and changes the object pose. Here, select a yaw of 90°.
        alt text
        Grasp by the robotic arm
        After completing the above operations, the target point coordinates have been obtained, and it is necessary to establish a TCP connection between the robot and the RVS software and carry out actual grasping.

      8. Write TCP communication code (RobotControl_Elephant.py). The following is a snippet of the code that implements TCP communication between RVS software and the robotic arm.

      #CAPTURE
      print("***get pose***%s"%time.asctime())
      capture_cmd = "GET_POSES \n"
      capture_bytes=bytes(capture_cmd,encoding="utf-8")
      sock_rvs.send(capture_bytes)
      #recv CAPTURE
      data = sock_rvs.recv(socket_buf_len)
      print("---------------------------get data----------------------------")
      print(data)
      print("***data end***%s"%data[-1:])
      
      print("***capture_receive***%s"%time.asctime())
      if int(data[-1:]) == 1:
          print("***received CAPTURE result***\n")
      if int(data[-1:]) == 2:
          print("***All finished!***")
          #P_FLAG = bool(1-P_FLAG)
          #print("change poision")
          continue
          #break
      
      1. Adjust the target point coordinates using the ScalePose operator. Set the type to "Normal" and adjust the scale_rpy property to convert the roll, pitch, and yaw values from radians to degrees.

      2. Finally, connect the finished and pose_list ports of the ScalePose operator to the MirrorOutput port of the outermost operator group and connect it back to the HandEyeTCPServer operator. With this, the project file editing is complete.
        alt text

      Show results

      After completing the above steps, under the unstacking.xml project, click Run, and run the RobotControl_Elephant.py file at the same time. After identifying multiple blocks, select one of the block poses and send it to the robotic arm for gripping.
      alt text

      Summary

      Overall, this is just a small part of what a depth camera can do. In the future, we may even consider stacking these objects together or using other irregular shapes to demonstrate its powerful performance. By training the model in advance, we can achieve the desired results. What do you expect me to do with it? Please leave a message below. Your likes and follows will be my motivation for updating more content!

      posted in PROJECTS
      ElephantRobotics
    • Reinforcement Learning for Gripping Task of myCobot using Isaac Gym

      This article is republished from Tatsushi Matsubayashi [1] of ALBERT Inc.[2]
      Link:https://blog.albert2005.co.jp/2023/03/28/isaac-gym-mycobot/
      alt text
      Greetings, I am from the Advanced Technology Department. Following our previous blog post titled "Controlling myCobot using RealSense D455 for Spatial Recognition", I will now introduce an experiment utilizing myCobot. This time, the experiment will be conducted using a simulator rather than a physical machine. When attempting deep reinforcement learning with robots, it can be challenging to prepare large amounts of training data on a physical machine. However, with a simulator, it is easy to collect massive datasets. Nevertheless, simulators may appear daunting to those who are unfamiliar with them. Therefore, we tried using Isaac Gym, developed by Nvidia, which allowed us to achieve everything from creating an experimental environment to conducting reinforcement learning with almost nothing but Python code. In this post, I will introduce the method we used.

      1. Introduction
        1.1 What is Isaac Gym?
        Isaac Gym is a physics simulation environment developed by Nvidia for reinforcement learning. Based on the OpenAI Gym library, the physics calculations are performed on the GPU and the results can be received as Pytorch GPU tensors, enabling fast simulation and learning. Physics simulation is carried out using PhysX, and it also supports soft body simulation using FleX (although some features are limited when using FleX).

      The latest version as of January 2023 is Preview4. While earlier versions had noticeable bugs, versions 3 and onwards have seen improvements and added functionality, making it a very attractive simulation environment. A future release of Omniverse Isaac Gym, integrated with Isaac Sim, is planned. However, Isaac Gym itself is standalone and can be used for experiments in Python. In a previous blog post ("GPU Server Expansion and A6000 Benchmarking"), it was mentioned that research and development using Omniverse Isaac Simulator had begun, but Isaac Gym was prioritized for reinforcement learning simulation. The biggest benefit of integrating Isaac Gym with Omniverse would likely be the ability to use photorealistic visuals for image recognition and high-precision continuous body simulation, such as with ray tracing. It will be exciting to see how this develops in the future.

      PhysX is a physics engine developed by Nvidia, enabling real-time physics calculations on the simulator's GPU. Although the version used by Isaac Gym has not been specified in publicly available arXiv or documentation, it is likely based on PhysX 4, given its release timing and separation from FleX. In Omniverse, PhysX 5 is used and FleX is integrated as well.

      FleX is also a physics engine developed by Nvidia, but it enables the representation of soft bodies and fluids using particle-based simulation, in contrast to PhysX's rigid body simulation.

      1.2 Purpose of this article
      I will tell you how I was able to easily create and learn reinforcement learning tasks using Isaac Gym. As an actual test case, I will introduce an object grasping task created by importing myCobot introduced in the previous article into the simulation environment.

      https://blog.albert2005.co.jp/2022/12/21/realsense-d455_mycobot/

      1.3 Environment
      PC1: Ubuntu 20.04, Python 3.8.10, Nvidia RTX A6000

      PC2: Ubuntu 18.04, Python 3.8.0, Nvidia RTX 3060Ti

      Please note that Nvidia Driver 470 or later is required.

      2.Install
      In this chapter, we will install Isaac Gym and IsaacGymEnvs. The recommended environment is Ubuntu 18.04, 20.04, Python 3.6~3.8, and Nvidia Driver==470. Please note that since python_requires<3.9 is described in Isaac Gym's setup.py, it cannot be used as is for versions 3.9 and above. Testing has not been performed on Ubuntu 22.04, but it is probably okay.

      2.1 Isaac Gym
      You can download the Isaac Gym main package for free from Nvidia's developer page. The documentation is saved in HTML format in the "docs" directory of the package (please note that it is not available on the website). After downloading, you can install it with the following command:

      $ cd isaacgym/python$ pip install -e .
      However, since PyTorch is specified as "torch ==1.8.0" and "torchvision ==0.9.0", you should install it first from the official page that matches your environment when using your GPU. Docker and Conda virtual environment setup files are also available. Since I use venv to manage my Python virtual environment, I will proceed with pip. Please note that I have written ">" in full-width characters due to a code block issue

      2.2 IsaacGymEnvs
      IsaacGymEnvs is a Python package for benchmark testing reinforcement learning environments in Isaac Gym. By referring to the implemented tasks, one can easily construct reinforcement learning environments using the reinforcement learning algorithms implemented in rl-games. Even for those who plan to write their own reinforcement learning algorithms, it is recommended to try learning with Isaac Gym using this package. Originally included in Isaac Gym, it was separated in Preview3 and is now publicly available on GitHub.

      $ git clone https://github.com/NVIDIA-Omniverse/IsaacGymEnvs.git
      $ cd IsaacGymEnvs$ pip install –e .
      With this, the necessary installation is now complete.

      1. Demo
        When you install Isaac Gym and take a look inside the package, you'll see that there are many sample environments available. These are also featured in the documentation, but in this article, we'll highlight some of the samples that are relevant to creating custom reinforcement learning environments in Chapter 4. If you've set up your environment, it's a good idea to try running some of these samples and see what they can do. You might even find that they provide some guidance on how to use the API to accomplish things you're interested in trying out (and if you're still not sure, don't hesitate to read through the documentation).

      3.1. Isaac Gym
      As of Preview4, there are 27 sample environments available.

      ● "1080_balls_of_solitude.py"

      The "1080_balls_of_solitude.py" script generates a pyramid-shaped group of balls that fall down. Running the script without options only allows collisions between balls within the same environment (i.e. within the same pyramid). The "--all_collisions" option enables collisions with balls from other environments, while the "--no_collisions" option disables collisions between objects within the same environment. This script also demonstrates how to configure the arguments of the "create_actor" function to add objects to the environment.

      ● "dof_controls.py"

      This script features an actor that moves in 3D, which is a variation of the well-known Cartpole problem in OpenAI Gym. It demonstrates how to set the control methods for each degree of freedom (DOF) of the robot, which can be either position, velocity, or force. Once set, these control methods cannot be changed during simulation, and the actor can only be controlled through the selected methods. Forgetting to set these control methods can cause the actor to fail to move.

      ● ”franka_nut_bolt_ik_osc.py“

      This script showcases Franka Robotics' multi-jointed robot arm Panda picking up a nut and screwing it onto a bolt. The arm is controlled using inverse kinematics (IK). The file name includes "OSC, " but OSC control is not implemented in this script. However, the script "franka_cube_ik_osc.py" includes OSC control.

      With the addition of SDF collision in Preview4, high-resolution collision file loading is possible, allowing for precise collision calculations between the nut and bolt grooves (Figure 1). While the initial SDF loading can take some time, subsequent loads are cached and will start quickly.
      alt text
      Figure 1: Simulation of a panda arm driving a nut onto a bolt

      ● interop_torch.py

      This script shows how to use the function get_camera_image_gpu_tensor to directly obtain sensor data from the camera on the GPU. The obtained data can be output as an image file using OpenCV, just like a regular physical camera. When executed, the script creates a directory called interop_images and saves the camera images there. Since simulation data is not exchanged between the GPU and CPU, fast image processing is possible. However, if using a multi-GPU environment, an error may occur. One solution suggested on the forum is to limit the GPU usage with CUDA_VISIBLE_DEVICES=0, but this did not work in the environment used for this script.

      3.2. IsaacGymEnvs
      There are 14 reinforcement learning tasks implemented, and benchmark tests can be performed using the scripts in the tasks directory.

      ● About the configuration file

      A configuration file written in YAML is prepared for each task. Common settings are in the config.yaml in the cfg directory, and settings can be changed without changing the YAML file with command line options using Hydra. The detailed settings for each task environment and PhysX are stored in the cfg/task/ directory, and the algorithm selection and structure are stored in the cfg/train/ directory.

      ● About algorithm implementation

      The reinforcement learning algorithm uses the PPO implementation in Rl-games. Although the docs/rl_examples.md mentions the option to select SAC, it is not currently included in the repository.

      NN layers are typically MLPs, and some models also include LSTM layers as RNN layers. Although CNN layers can also be added, there are no sample models with CNN layers included. In section 5.2, we will discuss our experience adding CNN layers to a model.

      The sample code can be executed in the isaacgymenvs directory where train.py is located.

      ● Cartpole
      python train.py task=Cartpole [options]
      This is the classic cartpole task where the goal is to move the cart in a way that the pole does not fall. By default, the model is trained for 100 epochs, which takes about 2 minutes on a PC2 RTX 3060Ti environment, and only 15 seconds in headless mode (without the viewer). When testing the model with inference, it performs well and the pole remains upright (after 30 epochs of training, the model is sufficiently trained to keep the pole upright). Although it may seem simple, the fact that the model can learn to complete this task successfully is reassuring.

      ● Franka Cube Stack
      python train.py task=FrankaCubeStack [options]
      This is a task where a Panda arm is used to stack boxes. The joint movements of the 7-axis arm are learned step by step. The default setting is 10, 000 epochs, but the arm movements can be learned in about 1, 000 epochs. On a PC1 RTX A6000 environment, 1, 000 epochs of training took about 20-30 minutes to complete. Figures 2 and 3 show the before and after states of the arm, where it goes from randomly moving to successfully grabbing and stacking boxes.

      The action space consists of 7 dimensions for the arm joints, while the observation space is a total of 26 dimensions. The reward function is designed to scale differently for actions that involve getting closer to the box, lifting the box, moving the boxes closer to each other, and successfully completing the stacking task.

      It's surprising how easily the arm can learn this level of task. However, it's important to note that the learning assumes a defined world coordinate system and the known positions and orientations of the objects. Therefore, applying this learned behavior to a physical robot may not be as straightforward.

      Breakdown of 26-dimensional observation:

      ● 7 dimensions for the position and orientation of the box being moved

      ● 3 dimensions for the vector from the box being stacked to the box being moved

      ● 7 dimensions for the gripper's grasp position and orientation

      ● 9 dimensions for the arm joints and gripper fingers
      alt text
      Figure 2: FrankaCubeStack before training
      alt text
      Figure 3: FrankaCubeStack after training

      Some common options in train.py are:

      ● headless (default: False): When set to True, the viewer is not launched. This is useful for heavy training or when capturing camera images, as the viewer can slow down the process significantly.

      ● test (default: False): When set to True, the learning mode is turned off, allowing you to run the environment without training. This is useful for environment generation and checking the learning results.

      ● checkpoint (default: ''): Specifies the PyTorch weight file to load. The learning results are saved in runs/<task name>/nn/<task name>.pth, and this option is used for resuming training or testing.

      ● num_envs (default: int): Specifies the number of parallel learning environments. It's important to set an appropriate number to avoid heavy viewers during testing (this option can also be set during training, but changing it can cause errors due to batch size and interference).

      Note that train.py configures horizon_length and minibatch_size, but batch_size = horizon_length * num_actors * num_agents, and batch_size must be divisible by minibatch_size. Additionally, num_actors and num_agents are proportional to num_envs, so changing only num_envs can cause errors.

      Other samples can be tried easily with just the environment, so give them a try for some interesting tests.

      3.3 Viewer Tips
      ● Drawing a collision mesh

      The simulator usually renders an object's visual mesh, but in Isaac Gym's Viewer, you can change it to render the collision mesh instead. To do this, go to the Viewer tab in the menu window and check "Render collision meshes". If an object is behaving strangely, it's a good idea to check if the collision mesh is loaded correctly (sometimes the visual and collision meshes have different orientations, or the mesh may not have loaded correctly or with enough detail in the simulator).
      alt text
      Figure 4: Drawing the collision mesh

      ● Reduce drawing environment

      You can reduce the rendering environment to only one without changing any settings. By checking "Show only selected env" in the Actors menu, as shown in Figure 5, only the selected environment will be displayed. If there are any strange behaviors, you can debug by outputting the environment number and only rendering that environment. This also lightens the rendering load and can increase FPS.
      alt text
      Figure 5: Numbering the drawing environment

      ● Change initial camera position

      The initial camera position and orientation can be set using gymapi's viewer_camera_look_at(viewer, middle_env, cam_pos, cam_target). In task scripts for training, you need to override the set_viewer function to make changes.

      1. Original environment and task creation
        It's finally time to create the original task for the main subject.

      4.1. Preparation
      Prepare the script and configuration files. The goal is to learn a simple task of lifting a box, using Mycobot for object picking. Therefore, we will proceed with creating a task named "MycobotPicking". We need three files:

      ● tasks: the main Python script

      ● cfg/task: YAML configuration file for environment and simulation parameters

      ● cfg/train: YAML configuration file for learning algorithms, neural network layers, and parameters.

      We can refer to the "FrankaCubeStack" task mentioned earlier and create these files accordingly. The configuration files are especially important and we can copy and modify them according to our requirements.

      As shown in the demo, we can load the task script from the train.py file using command-line options. Therefore, we need to add an import statement for the task class in the init.py file in the tasks directory, along with the task name when passing arguments.

      4.2. Environment Creation
      The task class is created by inheriting from the VecTask class in the tasks/base directory, and the task has the following structure as shown in Figure 6.
      alt text
      Figure 6: Task configuration. Those with an orange frame do not need to be edited, and those with a green frame are created for each task.

      4.2.1. init processing

      1. Creating Simulator and Environment

      ● create_sim: This function generates an instance of the simulator. The process itself is defined in the parent class, and the properties are set in the config file, such as gravity and step time. Similar to FrankaCubeStack, the function uses the following two functions to generate the ground plane and the actor.

      ● create_ground_plane: This function generates a ground plane by inputting the normal direction of the plane. If you want to create an uneven terrain, you can refer to the terrain_creation example.

      ● create_envs: This function loads and sets properties for the actor file, generates the actor and parallelizes the environment. In this task, we generated myCobot from the URDF and the target object from the create_box API. The URDF of myCobot was based on the one used in the previous experiment with MoveIt, and we added a gripper for picking (details about the gripper are explained in section 5.1).

      1. Data Initialization

      ● init_data: This function defines environment variables from the config file and prepares a buffer for data tensors processed by Isaac Gym (PhysX). The necessary data for calculating state and reward is defined as class variables. The API loads the tensor data into the buffer, which is updated every step by calling the corresponding refresh function.

      4.2.2. processing steps

      1. Step Processing:

      The main step function is defined in the parent class and does not need to be modified. However, the following two steps are necessary as abstract methods:

      ● pre_physics_step: Manipulate the actor using the action. The size of the action is defined in the config as [“env”][“numActions”]. For myCobot's 6-axis arm and gripper, we set it to 7 dimensions.

      ● post_physics_step: Calculate the observation and reward. Also check whether to reset the environment. We set it to reset after reaching a maximum of 500 steps or a successful lift.

      The step order is fixed to apply action → physics simulation → calculation of observation and reward to pass data for learning. Even if you only write "pass" here, you can check the environment while starting the viewer.

      ● reset_idx: Returns the environment to its initial state. Of course, the randomness of the initial state is closely related to the generalization of learning. We set myCobot to the initial posture and randomly reset the position of the target object within myCobot's reachable range.

      1. State and Reward Calculation:

      ● compute_observation: Update each buffer with the refresh function and put the desired state in the obs_buf. The size of obs_buf is defined in the config as [“env”][“numObservation”].

      ● compute_reward: Calculate the reward. As the gripper approaches the target object's grip position (between the fingers), a reward is obtained, and a larger reward is obtained as the height of the target object increases.

      4.3. Execution of training
      Now that the task framework has been created, let's train the model. We can start training the model using the following command:

      python train.py task=MycobotPicking --headless
      After 200 epochs, the initial weights will be saved, and new weights will be saved if the reward improves. However, the task we created may not work perfectly, and the training process may stop progressing quickly. In the following section, I will discuss the adjustments I made to the task to improve its performance.

      4.4. Task Coordination
      By using the learned weights to test, you can debug why the training did not work well. You ran the command

      python train.py task=MycobotPicking test=True checkpoint=runs/MycobotPicking/nn/[checkpoint].pth
      to test your model. However, you encountered issues with your gripper not moving well. Despite your efforts to resolve the issue, you concluded that the URDF did not support closed-loop structures, making it difficult to simulate the gripper's movements accurately. As a result, you decided to use a rule-based approach to control the gripper's closing and lifting actions. You fixed the gripper's fingers to a fixed link and reduced the action space from 7 to 6 dimensions. You also noted that when using a simulator to control a robot arm, it is better to use a gripper without closed loops, such as the Panda arm.

      Another issue you faced was that the agent stopped approaching the object at a certain distance and hesitated to touch it, resulting in lower rewards. You modified the reward system by increasing the reward function's value with a threshold distance as a step function, maximizing the reward when the agent reached the target point. You also removed the environment reset after task completion, as it caused the agent to stop learning before reaching the actual goal. Instead, you adjusted the maximum number of steps to the necessary number for task completion, improving the learning speed.

      You also found that penalizing difficult tasks too harshly made the reinforcement learning agent too conservative. This gave the agent a more human-like personality, making the learning process more interesting. Finally, you encountered a similar phenomenon in the FrankaCabinet benchmark task, where the agent would stop learning after pulling the drawer to a certain distance, even though a higher reward was available for fully pulling it out. You did not fix this issue, but instead, you removed the environment reset after task completion and adjusted the maximum number of steps to complete the task smoothly.

      Figure 7: myCobot keeps away from objects

      The self-collision of the arm was being ignored. Although I was able to reach the desired position, the arm was now in a position where self-collision was completely ignored, like a figure-eight. I tried to research whether it was possible to set up self-collision calculation in the documentation, but it didn't work well. In the first place, it is not realistic that all joint angle limits in the provided URDF are set to -3.14~3.14, so I decided to adjust the upper and lower limits of each joint angle to avoid self-collision. The reason why the joint angles moved to the largest possible values is still unknown.

      Figure 8: myCobot that ignores the accident collision

      The arm doesn't stop exactly where it's supposed to and instead wobbles around it. We wanted the action to approach 0 as it reached the target position, but it was difficult to achieve, and the arm kept vibrating around the target position. We tried penalizing the action and adjusting the reward by setting the target position precisely, but it didn't improve the results. We decided not to worry about this issue as it could be handled by rule-based control during actual operation.

      Although it was not a must-have, we wanted the gripper to be downward facing for better appearance. So, we added a penalty term to the reward function that penalized the gripper angle. Figure 9 shows the learning results before fine-tuning.

      Figure 9: MyCobot after learning before fine-tuning

      The result of the adjustments mentioned above is shown in Figure 10. If this level of accuracy can be achieved on the actual robot, it should be capable of lifting objects sufficiently.

      Figure 10: MyCobot after training after fine-tuning

      1. Others
        I will introduce the story that was not good and the story that I want to try.

      5.1. The story of a homemade URDF gripper that didn't work
      The URDF of myCobot was based on the one used in the previous attempt to move the actual robot, but it did not include the gripper. Although there was a gripper model available on the official GitHub page, it only provided a DAE file with a visual representation as shown in Figure 11(a). To create a URDF that can be used in a simulator, separate 3D models for each joint part are required. Therefore, using Blender, we divided the parts by joint (Figure 11(c)) and created simplified box-shaped parts for collisions since it is difficult to reproduce complex shapes (Figure 11(b)). Then, we described the structure of the links and joints in the URDF file to complete the model. However, since URDF does not support models with open link structures, we removed the collision from one of the links on the base and completed the joint with the fingertip side. Although this method is rough, we were able to reproduce the movement of the actual robot in the simulator by moving the six joints at the same angle. Figure 11(d) shows a comparison between the completed model and the actual robot (using the provided model, but the details are quite different). However, when we actually tried to move it, as mentioned in section 4.4, it did not work well. The reason was that it was not possible to move the joints in a coordinated manner when external forces were applied (it may have been solved if torque control was properly implemented).
      alt text
      Figure 11: Creating a gripper for myCobot (a) Published gripper model (b) Collision model parts created according to the model (c) Visual model parts disassembled from the gripper model (d) Isaac Gym Comparison of drawing and actual gripper

      5.2. Use image recognition
      In benchmarks and the MycobotPicking task, we use object position and orientation information in the observations, but obtaining this information in real-world tasks is not easy. Therefore, using only 2D camera information and easily obtainable servo joint angle information for reinforcement learning would be more valuable.

      We attempted to replace observations with images and use a CNN layer for learning in the FrankaCubeStack task. However, we only modified the algorithm to accept image input and as expected, the learning did not perform well. There is no framework to add the servo joint angle information as 1-dimensional data to the CNN layer, and using image information directly in the CNN layer increases the computational complexity and limits the parallelization of the environment. Additionally, we would need to tune hyperparameters such as learning rate and clip value, but we did not pursue this as the effect is not promising enough.

      In this test, we only confirmed the method of adding the CNN layer for learning. However, it may be more effective to use transfer learning to encode features for gripper and object recognition from easy-to-use object recognition models such as YOLO or ResNet, and then use the encoded features and joint angles for reinforcement learning, rather than using CNN layers directly with camera images.

      5.3. Using the Trained Model on the Actual Robot
      I attempted a Sim2Real experiment using the trained model and the myCobot and RealSense for spatial recognition, as mentioned in the previous article. However, it did not work well. While the reaching motion worked to some extent, the movement became unstable as it approached the object, and it was not possible to accurately move to the position to grab the object. Possible issues include the fact that the myCobot does not have enough power to move accurately to the target posture and the accumulation of small differences due to the fact that the simulator predicts the next target posture before reaching the current one, whereas the actual robot does not. Regarding the former, the myCobot used in this experiment is an inexpensive educational arm with a portable weight of 250g, so if you want to move more accurately, you should use a higher-end robot arm, such as those used in reinforcement learning for picking. Elephantrobotics, the company that makesthe myCobot, also sells models with stronger servo motors that can carry up to 1kg, so I would like to try those as well.

      1. Summary
        This time, I created a reinforcement learning task using Isaac Gym and actually trained the model. I experienced the design of a robot reinforcement learning problem in a 3D physics simulator and the issues that arise when running the trained model. It was attractive to be able to test the learning environment without having to write the reinforcement learning algorithm from scratch. The availability of benchmark environments makes it easy to compare and verify new learning algorithms, which is a great advantage for researchers and analysts with various professional backgrounds.

      ALBERT has researchers and analysts with various professional backgrounds who are ready to assist with any technical or business-related inquiries. Please feel free to contact us.

      Reference

      1. Tatsushi Matsubayashi, 2022.12.21, Isaac Gym で myCobot の把持タスクを強化学習してみました。

      https://blog.albert2005.co.jp/2023/03/28/isaac-gym-mycobot/

      1. ALBERT Inc. https://www.albert2005.co.jp/english/
      posted in PROJECTS
      ElephantRobotics
    • RE: Realizing Voice Control for Robotic Arm Movement -M5Stack-base

      WOW! The project looks interesting! Do you have ideas which are using the end-effector to complete some motion?

      posted in PROJECTS
      ElephantRobotics
    • RE: Smart Applications of Holography and Robotic Arms myCobot 320 M5Stack-Basic

      @kkkkkk
      Hi, you can click the link to know more about it

      posted in PROJECTS
      ElephantRobotics