Koala | AI-powered Vending Machine

There are still industries that have changed very little over the decades. For example the traditional vending machine. Often located at train stations and other busy places.

My first impression about these machines: dirty and all kinds of unhealthy products. But does the user experience have to be so bad?

Let's go, we are building a smart vending machine based on a Raspberry PI and machine learning. I'll show you the result, which we realized with just two people in 4 days.

The rise of smart vending machines

Even with increasing digitalization, most vending machines are still from the last century. It's not just the payment limitations that stand out, but also the outdated design and mechanics.

All of us have probably experienced the frustrating situation of a product getting stuck in a machine. Or you have to put your hand through the nasty and sticky slot to get the product out.

Amazon is leading the way with its Amazon Go Stores, and in countries like Japan, smart vending machines are already on the rise. They promise a simple and pleasant user experience. Shopping should feel like taking products out of your own fridge.

We build the first prototype

Our goal was to develop a prototype for this user experience. A fully functional prototype in just 4 days. Our goal was to focus on the following 2 points:

Aesthetic design
Great user experience

For a rough overview we used this simplified activity diagram.

After we had discussed the application flow, we defined the following working packages.

The construction of a physical vending machine with a controllable locking mechanism
Assembling the hardware and electronics
Training of an artificial intelligence for product recognition
Development of the backend system to control the electronics and the artificial intelligence application
Development of a web app with communication to the vending machine

Konstruktion mit dem IKEA Cube

So, we went to IKEA and bought one of the well-known EKET cube. With the fitting glass door and hinges, the cube provides the basic skeleton for the vending machine.

We have implemented the locking mechanism using a simple electromagnetic magnetic lock. In order to be able to request the current status (door open / door closed), we have positioned a standard push button so that it comes into contact with the hinge of the glass door. When the door is closed, the push button remains pressed. If the door is opened by the magnetic lock, the hinge also opens and the Push Button is no longer pressed. So the push button signal can be read by the hardware.

We used a Raspberry PI 3 to control the electronics. The Raspberry PI 3 also has enough power to run the machine learning model. It also has an internet connection and can therefore communicate with the web app.

To recognize the pick-up of products, we have installed a wide-angle camera for the Raspberry PI on the ceiling of the cube. To ensure optimum detection, it is important that the vending machine is well lighted. This avoids shadows and other disturbing artefacts.

Training of the machine learning model

Google AutoML Vision is used for the product recognition. This can be used to train a machine learning model that can be used to classify images. The model is trained in the Google Cloud with a total of 150 images in different selections, combinations, numbers, rotations and positions of products in the vending machine.

We trained the model with the following 6 products and labelled about 77 images for each product.

Product	Label Count
Bebe Lotion	78
Dornfelder Wine	75
Feelissimo Condoms	78
Seitenbacher Cereal Bar	75
Koala Biscuits	76
Sagrotan Disinfectant Gel	77

The trained model is executed directly on the Raspberry PI using TensorFlow Light. For this purpose, the model trained in the Goolge Cloud is exported as a TensorFlow Light model. TensorFlow Light is specially designed for the execution of predictions on embedded devices with few computing resources and is therefore best suited for the application of artificial intelligence.

Development of the web app

To open the vending machine, the user has to enter the 3-digit code via a web app, which is also located on the vending machine. As soon as the machine is open, the user can simply take out their products and close the vending machine again. The vending machine automatically recognises the products removed in real time, displays them on the shopping card in the web app and automatically processes the payment.

Results

The calculated Precision and Recall of the trained model are very promising. The frequency of true-positive predictions can be derived from the precision. The higher the precision, the fewer false positive results are predicted. For the trained model, the precision is 97.62 %.

The recall can be used to determine the frequency of correct predictions of the label bounding boxes. The higher the recall, the fewer false negative results occur. The recall for the trained model is 93.18%.

Overall, the trained model seems to be able to make reliable predictions. In an early test in a real environment, the product recognition still shows some weaknesses despite the excellent metrics, meaning that errors are recognized from time to time. However, these could be corrected by using some heuristics. This involves flattening the recognised products in the shopping cart to significantly reduce incorrect recognition.

In our first test with 15 predictions, there was only one test with false positives, but these were automatically corrected by our shopping card heurisitc. The recognition rate of this test was therefore 93.33 %.

The precondition for our results was that the products in the vending machine were carefully sorted or kept at a distance. As soon as the products are on top of each other or very close to each other, the detection fails.

What happens next?

The AutoML Vision model used was sufficient for the proof of concept. But obviously there is still a lot of room to improve the prototype. A detection rate of up to 98% seems realistic with a customized model developed just for this use case. For this purpose, a model could be developed that not only recognises products on the basis of features in the image.

A model that first recognises objects and then identifies objects could increase the recognition rate. In addition, a model could be trained for each product so that new products can also be included in the inventory list without having to retrain the entire model to reduce scaling issues.

To solve the problem of overlapping products, several cameras could be installed from different angles. Recognition can also be improved by training the model with significantly more images with a very chaotic inventory and images with more artefacts.