Big Data and Big Brother: Can Security and Privacy Go Hand In Hand, or Does it Even Matter?

So it seems that all of those conspiracy theorists were right!  The US government IS spying on everyone.  Can you believe it?  They're taking all of your personal info and using it for their own purposes.  Unheard of, right?  The NSA currently has access to personal information, phone information, pictures that you post, what you're thinking, even what you ate for breakfast and what you do in your spare time.  Good thing there are no websites out there that collect that kind of stuff, or else it'd be available to the public too!  Can you imagine what would happen then?  All of your personal information could be sold to the highest bidder so they can target you in ad campaigns and call you in the middle of dinner.  

Wait, someone is frantically gesturing from across the room..he's pointing to his face and then to a book.  Someone must have done something to his coffee.  

Anyhoo, the NSA was (is?) collecting user records for a program known as PRISM.  No, PRISM is not an acronym for a criminal group from a James Bond movie, nor is it a reference to the cover of Pink Floyd's Dark Side of the Moon.   PRISM is a clandestine surveillance program lovingly sponsored by the aforementioned spy agency in which three of the largest wireless providers in the US, as well as Google, facebook, Hotmail, Yahoo, Apple, skype, paltalk, AOL, and YouTube are participating (although the extent of that participation is classified and if they told me, they'd probably have to kill me).  In short, the NSA demanded and was granted unfettered access to any and all communications from users of these various providers.  We could get subjective here and talk about privacy and civil liberties and all that fun stuff, but Big Data itself is objective and only cares about the numbers.  So for the purpose of this piece, so do we.  

Let's think about this for a moment.  The amount of data being analyzed by the spy agency is absolutely astounding.  According to the CTIA a non-profit trade association, in 2012 the number of wireless minutes used was 2.3 trillion, 2.19 trillion text messages were sent, and there were 301,779 cell sites.  Facebook is almost as populous as India and China, with one billion users.  These users upload hundreds of millions of pictures PER DAY.  That's a lot of pictures of babies, food, and mirror self-shots.  On YouTube, Gangnam Style by Psy alone has over 1.6 billion views.  Now imagine every other video uploaded, plus who viewed them, plus who liked them, plus who disliked them, plus the people that made comments disparaging those that disliked them.  That's a lot.  And how about Gmail.  At the time this article was written, there were 425 million active Gmail users.  I don't know about you, but I have 7,422 emails in my account, not counting sent mail.  I'll be conservative and say each person has an average of 5000 emails in their account.  That puts us at a ridiculous 2.125 trillion emails.  We can probably round up a bit since there are more users by now, as well as a higher average number of emails per user.  As of July 2011, Hotmail had 360 million users.  700 million people visit Yahoo sites every month.  You get the idea.  

Keep in mind, we're only talking about this particular program.  PRISM was only meant to "fill in the blanks" of other programs.  There are other mechanisms in place collecting phone conversations and other types of communications.  Many programs that have been in place since 9/11.  That's 12 years' worth of semi-structured and unstructured data that is being collected and processed.  And also keep in mind that most of the world's internet infrastructure is in the good ole' US of A.  This means that not only is American data being collected, but a ton of data coming into the US from other countries is being scrutinized as well.  

So how in the Wide Wide World of Sports can the NSA, or anyone else for that matter, process this data and get potentially life saving information out of it without waiting days for the reports to run?  The short answer is they probably are using technologies like Hadoop with data centers the size of  a Walmart.

Another question, and no more or less important, is even without this revelation that a Big Brother is indeed watching, should anyone be surprised?  We volunteer our information to other Big Brothers on a daily basis.  So maybe it's not that they have all of our information (although it is somewhat disconcerting that anything we've ever written is searchable) that's upsetting.  Maybe it's just that we didn't sign the t's and c's before they took it.