Data is Holding Back AI
Data is Holding Back AI
I remember grumbling, “Good lord this is a waste of time,” in 1992 while I was working on an AI application for lip-reading.
The grumble escaped my lips because I felt like I was spending half my time inputting data cleanly into the video processing neural network. Bouncing from a video capture device to a DEC workstation to a Convex Supercomputer to a Cray, I felt like I had been thrown into a caldron of Chinese water torture.
Sitting over my head was a joke happy birthday poster from Arthur C. Clarke’s Space Odysseyseries featuring HAL 9000. I found it ironic that I was essentially acting like a highly-trained monkey, while a fictional AI stared down at me, laughing. Over the two years of that AI project, I easily spent 60% of my time just getting the data captured, cleaned, imported and in a place where it could be used by the training system. AI, as practitioners know, is the purest example of garbage in, garbage out. The worst part is that sometimes you don’t realize it until your AI answers “anvil” when you ask it what someone’s favorite food is.
Last month, I was having a conversation with the CEO of a well-respected AI startup when I was struck by deja-vu. He said, “I swear, we have spent at least half of our funding on data management.” I wondered if this could actually be the case, so I pushed him, probing him with questions on automation, data quality and scaling. His answers all sounded remarkably familiar. Over the next two weeks, I contacted a few other AI startup executives — my criteria was that they had raised at least $10 million in funding and had a product in the market — and their answers were all strikingly similar.
Join 30,000+ people who read the weekly