DARPA scientists want to create database of all conversations

4 Mar, 2013 18:40 / Updated 12 years ago

Your digital footprint could be getting a whole lot bigger: Pentagon scientists are searching for a way to transcribe every real-world conversation that happens into computer-readable files.

Robert Beckhusen of Wired’s Danger Room says it wouldn’t be unlike a real-life Twitter feed or an “email archive for everyday speak.”

“Imagine living in a world where every errant utterance you make is preserved together,” Beckhusen writes in an article this week that explores a Defense Department project that’s been undertaken by its Darpa laboratories and is now in the hands of a University of Texas computer scientists named Matt Lease.

Least has received a few hundred thousand dollars from Darpa — the US military’s Defense Advanced Research Projects Agency — to help find a way to take cell phone conversations, board room meetings and every miniscule real world back-and-forth and have them digitalized.

The project is being called “Blending Crowdsourcing with Automation for Fast, Cheap and Accurate Analysis of Spontaneous Speech,” and Lease will receive $300,000 in all from the government to work on it after winning a 2012 Young Faculty Award from Darpa last year.

Lease has previously worked with the Pentagon scientists on another project, Effective Affordable Reusable Speech-to-text, or EARS, which had him trying to find a better way to transcribe dialogue into text. Now after winning the respect of Darpa, he’s putting that research to work in hopes of finding a way to streamline all real world conversations into digital transcriptions. And by strategically crowd-sourcing the information, he thinks he might be able to do just that.

“Like other AI [artificial intelligence], it can only go so far, which is based on what the state-of-the-art methodology can do,” Lease tells Wired. “So what was exciting to me is thinking about going back to some of that work and now taking advantage of crowdsourcing and applying that into the mix.”

Lease says he saw both the “need and opportunity to really make conversational speech more accessible, more part of our permanent record instead of being so ephemeral, and really trying to imagine what this world would look like if we really could capture all these conversations and make use of them effectively going forward,” Lease adds.

Wired reports that the end result could mean that conversations and events could be transcribed and edited through crowdsourcing, then eventually and easily be shared with friends, family and colleagues. Once digitalizes, those dialogues could also be pursued for general search purposes. By uploading everything, though, some concerns are quickly showing up. For one, there’s the matter of possible privacy violations brought on by the seemingly constant collection of data. Then, of course, there’s the matter of what is being done with it.

According to a 2003 memo from the Congressional Research Service, the EARS project that first got Lease involved in the Pentagon was being considered for a rather particular kind of use. That report said that dialogue could be inputted into the system by way of telephone conversations so that “the military, intelligence and law enforcement communities” could “extract clues about the identity of speakers.”

For now, Lease won’t even speculate as to why the Pentagon wants him to develop his crowdsourcing project. He agrees, however, that there is an issue with “respecting the privacy rights of multiple people involved.”