DARPA developing ultimate web search engine to police the internet

12 Feb, 2014 22:35 / Updated 11 years ago

​The Pentagon’s research arm that fosters futuristic technology for the military will soon begin working to surpass current abilities of commercial web search engines. Yet, once it masters the “deep Web,” the agency doesn’t say much about what comes next.

The Defense Advanced Research (DARPA) said its “Memex” project will be able to search the far corners of internet content that is unattainable by modern, mainstream search engines, offering DARPA “technological superiority in the area of content indexing and Web search on the Internet.”

DARPA said earlier this month in its solicitation announcement for Memex proposals that the system will initially be used to counter human trafficking, which often thrives in web forums, chat rooms, job postings, hidden services and other websites.

To root out trafficking operations within the invisible corners of the web, commonly referred to as the “deep web,” Memex (a melding of "memory" and "index”) “will address the inherent shortcomings of centralized search by developing technology for domain-specific indexing of Web content and domain-specific search capabilities.”

With Memex, DARPA hopes to achieve the ability for decentralized, automated, topic-precise searches that can leverage image recognition and natural language technology.

DARPA has asked researchers to develop advanced web-crawler software to reach sites and resources that have sophisticated crawler defenses. Memex operators would then be able to access the indexed domain-relevant content with much greater precision and ease than is currently possible.

Memex, DARPA says, will be first employed against human trafficking, which, “especially for the commercial sex trade, is a line of business with significant Web presence to attract customers and is relevant to many types of military, law enforcement, and intelligence investigations.”

DARPA says that dark places online where trafficking occurs enables “a growing industry of modern slavery” that can be stopped with Memex capabilities.

“An index curated for the counter trafficking domain, including labor and sex trafficking, along with configurable interfaces for search and analysis will enable a new opportunity for military, law enforcement, legal, and intelligence actions to be taken against trafficking enterprises,” DARPA’s solicitation announcement reads.

Yet while DARPA mentions the usefulness of such technology for law enforcement and investigative purposes regarding human trafficking – basically, crimes few are opposed to stopping – it does not address the myriad other uses Memex would offer the US military, government intelligence operations, or police actions.

Amid the recent disclosures of government spying via the National Security Agency’s operations, the topic of complete surveillance over the entirety of the web is a sore subject. Thus, DARPA says it is "specifically not interested in proposals for the following: attributing anonymous services deanonymizing or attributing identity to servers or IP addresses, or gaining access to information which is not intended to be publicly available."

How DARPA would catch traffickers without “deanonymizing” someone, though, the agency does not explain. Nor does it address just how far it wants to out anyone hiding in the deep web for legitimate reasons, whether they are journalists, whistleblowers, activists, and the like.

The Memex project takes its name from a 1945 article in The Atlantic titled “As We May Think,” by Dr. Vannevar Bush, head of the White House Office of Scientific Research and Development. Bush envisioned a "device” that could be used for finding and categorizing the world’s information, acting as a supplement for the human brain.

“In a nutshell, Bush wanted to mimic how the human brain thinks, learns, and remembers information,” writes Motherboard. “Which is exactly what artificial intelligence researchers at the DoD and in Silicon Valley are trying to do now, to glean better insights from the unruly army of big data being collected by web giants and the military alike.”

The Memex project is expected to run over the next three years, with proposals due in April.