Thursday, February 9, 2006

U.S. Spies plan massive data sweep of Internet

There's this current story about massive snooping into telephone conversations by the NSA. The NSA and CIA and other spy agencies are supposed to turn their efforts on targets outside the U.S. but under Bush Administration edict they've been working inside the U.S., in opposition to U.S. law. Yes, the President has been breaking the law.

The conduct of the U.S. Administration is that this is not a war on Terror, but instead a war on Personal Freedom.

Consider: US plans massive data sweep Little-known data-collection system could troll news, blogs, even e-mails. Will it go too far? (Christian Science Monitor, February 09, 2006)

The article describes a little-known system called Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE), a research and development program within the Department of Homeland Security (DHS), part of its three-year-old "Threat and Vulnerability, Testing and Assessment" portfolio.

The project it describes is very similar to the process of search engine companies like Google, Yahoo, Technorati, etc. It's to scoop up a vast amount of data from the Internet and to draw out extra information from it. The technical phrase is "data mining" which is a practice of taking one data source, and putting it to a different use. Data mining is widely performed in businesses.

For example credit card companies perform data mining to detect fraudulent use of credit cards. e.g. they might look for your card being used to make an abnormally large purchase, or a purchase made far from your normal area of activity. And if they see it, they could give you a phone call saying "we noticed suspicious activity, did you make purchase X on date Y".

So long as the ADVISE system is collecting publicly available data, is there a problem?

The difference here between the government activity and what, e.g. Google, would do with it is: The government is looking for "terrorists", and the government has people with guns who are known to use those guns to kill people.

The problem with the government's hunt for terrorists, is they've got a rather loose definition and they make mistakes. For example in the Extraordinary Rendition stories, one was a German tourist who the U.S. agents identified as Al Qaeda linked, they kidnapped him, flew him to Afghanistan, tortured him for months, eventually realized they mistakenly identified him, and dropped him off penniless in Kosovo. And for loose definitions of terrorism, we can think of the people arrested for "ecoterrorism" where they are taking their protests of e.g. logging activities to doing property damage and whatnot. Sure, commiting property damage is illegal and they should be punished, but labeling them as terrorists is going too far.

In other words, I think it's legal to collect data that's publicly available (e.g. published on a web site) and to make secondary uses of it. If it's good enough for Google or Technorati, then it's good enough for the U.S. Government. But there needs to be oversight and measurement to ensure they don't overstep themselves.

For example, what if they made a deal with Google or other search engines to capture some of the private search query data which the search engines have on hand. That is, each time you make a search engine query, the company running your preferred search engine receives your IP address, your web browser, your operating system, etc, along with the query terms. If you've registered with the search engine (e.g. signed into your "mail" account) then the search engine can know exactly who you are.

Now, that's private data which the search engine collects. One way they use it is to further tailor your search results based on past queries you've made. But what if they began handing that data over to the government, which the government would than incorporate into this ADVISE system?

Would you get a knock on the door just because you had a hankering to learn about terrorists and did a lot of google searches about activities done by terrorists? Or you wanted to see for yourself just how easy (or not) it is to get information on making nuclear bombs?

I want to close by reminding the reader of the Total Information Awareness system. TIA is/was a Department of Defense project to that acted as an umbrella over several inter-related projects, some of which would use data mining techniques of the kind described in the CS Monitor article. While a couple minor TIA projects were shut down, it's clear the bulk of them went forward, and that the intent of the Government for several years has been to create a technologically advanced system that can effectively track every action and look for "dangerous" patterns.

The TIA existed before the September 11, 2001 events which "changed everything". The TIA existed before the Bush administration. This is just an ongoing desire by government agencies to vastly step up their capabilities to spy on everyone.