How Neural Networks Power Robots in Starship | by Tanel Pärnamaa | Starship Technologies

Starship has built a host of robots to deliver packages on local demand. To successfully achieve this, robots need to be safe, respectful and fast. But how do you get there with short computational resources and without expensive sensors like LIDARs? This is certainly the engineering requirement you need to provide unless you live in a universe where customers are happy to pay $ 100 for a delivery.

To begin with, robots will start by believing in the world of radars, multiple cameras and ultrasonics.

However, the challenge is that most of it is perceived at a low level and not semantically. For example, a robot may sense that an object is ten meters away, even if the object’s category is not known, making it difficult to make safe driving decisions.

Machine learning through neural networks is surprisingly useful in converting this unstructured level to a much higher level of information.

Starship robots mostly drive on the sidewalks and cross the street when they need to. It presents a variety of challenges compared to self -driving cars. Traffic on auto roads is more structured and predictable. Cars move along lanes and don’t always change direction as people often stop abruptly, turn around, be accompanied by a dog on a leash, and don’t signal their intentions at turn lights. .

To detect the surrounding environment in real time, a central component of the robot is an object discovery module – a program that inserts images and returns a list of object boxes.

That’s fine, but how do you write such a program?

An image is a large three-dimensional array composed of many numbers representing the pixel intensity. These values ​​change significantly when the image is taken at night instead of day; if the color, size or position of the object changes, or if the object itself is cut or removed.

Nothing – what the person sees. Correct – what the computer sees.

For some complex problems, teaching is more natural than programming.

In robot software, we have a set of usable units, mostly neural networks, where the code is written right into the model. The program is represented by a set of weights.

Initially, these numbers dress up, and the output of the program is random as well. The engineers show the models as an example of what they want to predict and ask the network to be more efficient the next time it sees a similar input. By adjusting the changing weights, the optimization algorithm looks for programs that predict more boxes that are more accurate.

Evolution of programs found in the optimization method.

However, one has to think very carefully about the examples used to train the model.

  • Should the model be punished or rewarded if it sees a car visible through a window?
  • What will it do if it sees a picture of someone from the poster?
  • Should a car trailer be full of cars with annotations as an entity or each of the cars with different annotations?

These are all examples that happened while building the object discovery module in our robots.

The neural network detects objects in mirrors and from posters. A bug or a feature?

When pointing to a machine, a lot of data is not enough. The data collected should be rich and varied. For example, just using the same sample image and after annotating it, will show many pedestrians and cars, although the model does not have examples of motorcycles or skaters to reliably find it. categories.

Must be specifically fed by the team for difficult instances and rare cases, otherwise the model will not progress. Starship operates in many different countries and different weather conditions enrich the examples. Many people were surprised when the Starship delivery robots ran during the snowstorm ‘Emma’ in the UK, although airports and schools remain closed.

Robots deliver packages in a variety of weather conditions.

At the same time, annotating data takes time and sources. Ideally, it is best to train and improve models with minimal data. This is where architectural engineering begins. We incorporate prior knowledge of architectural and optimization processes to minimize the gap in finding programs that are more likely to be in the real world.

We incorporated prior knowledge of neural network architectures to obtain better models.

In some computer vision applications such as pixel-wise separation, it is useful for the model to determine if the robot is on a sidewalk or a road crossing. To give a hint, we encoded the global image levels in the neural network architecture; the model determines whether to use it or not without having to know from scratch.

After the data and architecture analysis, the model can work well. However, deep learning models require significant computing power, and this is a huge challenge for the team because we can’t take advantage of the robots ’most powerful graphics cards. provide a battery at low cost.

Starship wants our delivery cost to be low which means our hardware is not expensive. It’s still the same reason why Starship doesn’t use LIDARs (a detection system that operates on a radar principle, but uses light from a laser) to more easily understand the world – but we don’t want to pay our customers. than they need for delivery.

State-of-the-art object detection systems published in academic papers run about 5 frames per second. [MaskRCNN], and real-time object recognition papers do not report rates greater than 100 FPS [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. What’s more, these figures are reported in an image; however, we need an understanding of 360 degrees (the equivalent of processing almost 5 single images).

To provide a perspective, Starship models run more than 2000 FPS when measured by a consumer-grade GPU, and process a full 360-degree panoramic image in one go. This is equivalent to 10,000 FPS if 5 solo images with a batch size of 1 are processed.

Neural networks are better than people with many vision problems, even if they still have problems. For example, a binding box may be too wide, the trust may be too much, or something may be Consecrated in an empty place.

Potential problems with the object detection module. How can this be solved?

Challenge fixing these bugs.

Neural networks are considered black boxes that are difficult to analyze and understand. However, in order to improve the model, engineers need to understand the failure cases and thoroughly absorb the details of what is known in the model.

The model is represented by a set of weights, and one can imagine what is being tested on each specific neuron found. For example, the first layers of the Starship network were powered by conventional patterns such as horizontal and vertical edges. The next block of layers will find more complex fabrics, while the longer layers will find car parts and full objects.

The way how the neural network is used by robots is to understand images.

Technical debt has taken on a different meaning in machine learning models. Engineers are constantly improving the architectures, process optimization and datasets. The model becomes a more accurate result. However, changing the model to see a better one does not necessarily guarantee the overall behavioral success of a robot.

There are several components that use the output of the object detection model, each of which requires a different accuracy and level of memory set in accordance with the existing model. However, the new model may operate differently in different ways. For example, the output probability distribution may be biased towards greater values ​​or more widely. Even if it usually does better, it can be even worse for a specific group like large cars. To avoid these barriers, the team calibrated the probabilities and examined the downsides of multiple stratified data sets.

The typical show doesn’t tell you the whole story of the model.

Monitoring potential software components can be a variety of challenges compared to tracking conventional software. Very little concern is given about thinking time or memory use, as it is mostly constant.

However, dataset transfer has become a major concern-the distribution of the data used to train the model is different from the one in which the model is currently placed.

For example, suddenly there are electric scooters driving on the sidewalks. If the model does not consider this class, the model will have difficulty in correctly classifying it. The information obtained from the object detection module is inconsistent with other sensory information, resulting in requests for assistance from human operators and therefore, slowed deliveries.

A major concern of practical machine learning-training and test data from different distributions.

Neural networks power Starship robots to safely cross the road by avoiding obstacles such as cars, and on sidewalks by understanding all the different directions that people and others can choose. more obstacles.

Starship robots have achieved this by using inexpensive hardware that presents many engineering challenges but has made a solid reality today in robot delivery. Starship robots make actual deliveries seven days a week in many cities around the world, and it’s exciting to see how our technology continues to bring people more comfort in their lives.

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *