Before machines become powerful enough to truly pave the way towards virtual assistants, translation programs and other language-oriented technology ‒ and before we welcome our new robot overlords ‒ they’re going to have to actually learn English.
That’s where Google comes in. The company announced Thursday that it’s releasing new software capable of understanding written English – and that the software will be available to anyone for free.
Called Parsey McParseface – yup, that’s a play on the fact that the internet wanted to name a British research vessel Boaty McBoatface – Google’s software is part of a programming toolkit called SyntaxNet, which was also released for free Thursday. The move paves the way for developers to integrate language understanding into more software.
“Our hope is that people will just use this instead of building their own,” Dave Orr, SyntaxNet’s product manager, told the Wall Street Journal. “They don’t have to reinvent the wheel.”
Essentially, Parsey McParseface works by breaking down the structure of a sentence in order to determine what the subjects, nouns, objects and verbs are. After doing so, the software would be capable of answering questions based on what it has learned about the sentence.
The software is also able to break down more complicated sentences than just “Alice saw Bob,” according to Google. For example, it can understand and answer questions about the sentence: "Alice, who had been reading about SyntaxNet, saw Bob in the hallway yesterday.”
After reading this sentence, Google said the software can tell people who Alice saw, when she saw him, and what Alice had been reading about.
However, parsing the English language becomes harder to do with more complicated sentences, since it’s possible that there may be more than one way to interpret what’s written. “Alice drove down the street in her car,” for example, most likely means that Alice is driving her car on a street. Yet, however unlikely, it could also mean Alice is driving on a street that is physically located inside of her car.
Humans will generally be able to figure out that sentence with ease; machines, not so much.
"One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. It is not uncommon for moderate length sentences ‒ say 20 or 30 words in length ‒ to have hundreds, thousands, or even tens of thousands of possible syntactic structures," Google wrote in a blog post. “A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context.”
According to the company, Parsey McParseface can get it right 94 percent of the time when it comes to analyzing “well-formed text,” close to the 96-97 percent rate of human linguists. On sentences taken from the internet, the program is closer to 90 percent accurate.
“Our work is still cut out for us,” Google said in its announcement. “We would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts.”