There’s been a lot of talk in retail circles recently about Amazon’s announcement of Amazon Go. For those of you who haven’t heard about it yet, Amazon Go is a grab-and-go store which promises to bring an uber-like experience to shopping.
As someone focused on retail technology, the most common question that I get about Amazon Go is, “How does it actually work?” Amazon has been very cagey about it, and all their communication has been very high-level with basically a few buzzwords thrown in for good measure.
So, as I break down the shopping experience, there are four distinct elements to the problem:
- You need to know when a shopper enters the store
- You need to know who each shopper is
- You need to know which items they picked up, and lastly
- You need to know when the shopper exits the store
Broadly there’re two categories of technologies available to solve these type of problems: one is video and the other is wireless, which is basically Bluetooth beacons and RFID.
Let’s take the first category: video. With video, you can apply several computer vision techniques to identify both people and objects, and track their movement in real-time.
In theory, using just video, you can solve all the four elements of the problem above. You can know that a person entered the store, using face recognition you can know who they are, using object recognition you can identify the items as they’re being picked up, and you can tell when a shopper is exiting the store.
However, I don’t believe only video and computer vision is being used to create the grab-and-go experience. Why is that? Because although computer vision these days is very accurate, it’s not 100 percent accurate. For example, face recognition can get thrown off if you grow a beard, or wear shades; Item recognition can fail if you pick up the item in a way that blocks a clear view of the item, and so on. And, the Amazon Go concept is a use case where you pretty much need 100 percent accuracy. No one wants to see items on the receipt they didn’t buy. And, I’m sure Amazon doesn’t want to give away items for free.
So how do you solve this? You solve it by using additional sensor inputs, each of which could have its own unique drawbacks just like video, but then combining these various sensor inputs to get the overall accuracy to 100 percent. A good example of sensor fusion is RetailNext’s own Aurora, combining video, Wi-Fi and Bluetooth in a single device, and leading edge analytics platforms like RetailNext’s SaaS offering easily integrate RFID technologies from leading companies like Intel and Impinj.
For example, you can use Bluetooth beacons to know when a shopper enters or exits a store, identify that shopper, as well as track their movement through the store. You can use RFID to tag items and very reliably scan them in bulk as they exit a store. And so on.
So, that’s my guess. I think the items are RFID tagged with an inexpensive passive RFID chip, and they are being scanned in bulk as the shopper exits the store. The item recognition data from RFID is likely being supplanted by real-time object recognition data from video, so that you have two independent sources of data.
Once you have the items, who’s the shopper? It seems that, at least for the moment, the shopper identification is done in a relatively low tech way, by scanning a QR code generated by the smartphone app as you enter the store. And once you “check in” this way, you can be continuously tracked using video so any items can get associated back to you, either as you shop, via video, or as you exit the store, via RFID. And, of course, you can supplant this data as well using beacons.
That’s my opinion at the moment. The store is opening to the public in Q1 2017, so very soon we’ll all be able to see first-hand how it works for sure. I know a lot of people are certainly looking forward to it.
Join the #retail #smartstore & #inspiringretail conversations on Twitter @RetailNext, as well as at www.facebook.com/retailnext.