With explicit feedback, AI needs less data than you think

Register now for your free virtual pass to the Low-Code/No-Code Summit on November 9. Hear from executives at Service Now, Credit Karma, Stitch Fix, Appian and more. Learn more.

We’ve all come to realize that AI and machine learning are the magic sauce that powers large-scale internet properties for consumers. Facebook, Amazon and Instacart have huge data sets and huge numbers of users. Common wisdom suggests that this economies of scale is a powerful competitive moat; it enables much better personalization, recommendations and ultimately a better user experience. In this article I will show you that this moat is shallower than it appears; and that alternative approaches to personalization can deliver excellent results without relying on billions of data points.

Most of today’s user data comes from implicit behavior

How do Instagram and TikTok understand what you like and don’t like? Of course there are explicit cues – likes and comments. But the vast majority of your interactions are not; it’s your scrolling behavior, “read more” clicks, and video interactions. Users consume much more content than they do produce; the main factors that social media platforms use to determine what you liked and disliked are based on those cues. Did you unmute that Instagram video and watch it for a whopping 30 seconds? Instagram can deduce that you are interested. Scrolled past to skip? Okay, not so much.

Here’s an important question, though: does Instagram know? Why you undone that cat on a motorcycle video? Of course they don’t – they just observed the behavior but not the… Why behind it. You may have seen a familiar face in the first frame and wanted to see more. Or because you like motorcycles. Or in cats. Or you clicked accidentally. They can’t know because of the structure of the user experience and customer expectations. As such, to figure out if it was the cats, or the motorcycles, or something completely unrelated, they need to observe a lot more of your behavior. They show you motorcycle videos and cat videos separately, and that can boost their confidence a little more.

To add to this problem, the platform doesn’t just detect “cats” and “motorcycles” in this video – there are dozens if not hundreds of features that could explain why you were interested. If there is no taxonomy that properly defines the space, a deep learning approach that does not require a taxonomy (i.e., feature definition) requires orders of magnitude more data.

Event

Top with little code/no code

Join today’s leading executives at the Low-Code/No-Code Summit virtually on November 9. Register for your free pass today.

Promoting Human-Computer Interaction

You can see how vulnerable and data-hungry this approach is – all because it’s based on implicit behavioral inference.

Let’s evaluate an alternative approach to understanding user intent with an analogy. Imagine a social interaction where Person A shows the same video to Person B. If person B just says “that’s great”, can A infer a lot about B’s preferences? Few. What if A digs instead with “What did you think?” Much can be inferred from the answer to this question.

How can this interaction be translated to the world of human-computer interactions?

Explicit feedback: Ask the user!

Let’s take a look at rideshare. An important requirement in that business is guaranteeing the quality of the drivers; a driver who creates a bad driving experience must be removed from the system quickly or else they can be quite damaging to the business. This resulted in a very simple model: Uber asked the user to rate the driver after each ride. A rating below 4.6 expels the driver of the Uber system.

And yet hiring and employing drivers is an expensive undertaking; of bonuses as high as $1,000 for a new Uber driver, firing drivers for violations they could have easily addressed is pretty inefficient.

In a model based on a one- to five-star rating, a driver is “basically perfect” or “eventually fired.” This lack of nuance is bad for business. What if a driver commits a highly reparable offense by eating in his car regularly, and as such his car stinks for a few hours after lunch? If only there was a way for riders to indicate that in their feedbackand for the ignorant driver to learn about it…

This is exactly what Uber was aiming for in the second iteration of its feedback system. Whenever a rider rates a trip four stars or lower, she to be obliged to select a reason from a drop-down list. One of those reasons is ‘car smell’. Like a handful of riders – of the dozens of rides a driver gives! — provide explicit feedback on car odor, the driver can be notified and solve the problem.

What are the key features of this much more efficient approach?

Defined taxonomy: Uber’s rider experience specialists have defined several dimensions of the rider experience. What are the reasons a rider may be unhappy after a ride? Car smell is one; there are half a dozen others. This precise definition is possible because the problem space is limited and well understood by Uber. These reasons are not relevant to food delivery or YouTube videos. Asking the right questions is key.
Explicitly asking the user for the WHY behind the feedback: Uber can’t guess why you rated the ride with one star – was it because of the peeling paint on the car or because the driver was rude? Unlike Instagram, which would only throw more data at the problem, Uber can’t expose a few dozen customers to a bad driver, so data volume limitations force them to be smart.

There are wonderful examples in domains other than rideshare.

Hotels.com asks about your experience shortly after you check in. It’s a simple email survey. Once you click “great” they ask “What did you like?” with options like ‘friendly staff’ and ‘sparkling clean room’.

Hungryroot, the company I work for, asks the user about their dietary preferences during login to make healthy eating easy. Do you want to eat more vegetables? Do you like spicy food? Prefer gluten-free? Great, tell us in advance. Your grocery and recipe recommendations are based on what you’ve told us.

This approach is much more effective. It requires less data and the inference driven by each data point can be much stronger. This approach also doesn’t require eerily observing what the user clicks or scrolls past — the kind of snooping tech giants they got into trouble for.

It is important to make a decision here. Implicit feedback mechanisms require no user effort at all; on the other hand, going too far when asking the user for explicit feedback can lead to annoyance. Imagine Uber overdoing it with the follow-up questions: “What exactly was the smell in the car? Did that smell bother you all or part of the ride? Was it a strong smell?” This goes from helpful and caring to annoying and would definitely backfire. There is certainly a sweet spot to be found.

Canals built on implicit user data are pretty shallow

Don’t be afraid of an incumbent with an implicit data advantage. Build a taxonomy of your space and ask users for explicit feedback. Your users will appreciate it – and so will your bottom line.

Alex Weinstein is the chief digital officer at Hungryroot. Previously, he was Senior Vice President of Growth at Grubhub. Alex has a degree in computer science from UCLA.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contribute an article of your own!