My colleague Bill Wooten
recently wrote about the
value of privacy and shared his thoughts on the book “Privacy
and Big Data” by Terence Craig and Mary E. Ludloff. I also read the book and was both fascinated
and disturbed by how much potentially revealing data is readily available to
consume. As a data professional, I get
excited about combining data in new ways and delivering insight that was previously not available. As a consumer
and user of technology, it is disturbing to know how my information is being
used.
Take a site like Spokeo,
they are doing some amazing things with publicly available information,
combining and presenting it in ways that no one had previously imagined. Within just a few clicks I can see every
address I’ve lived at since college plotted on a google map. I can see the demographic profile of each of
those areas, how much I paid for my houses, my age, phone numbers and various
email addresses. Spokeo can even connect
me to other people I am related to and present me with a family tree. Not only that, but they know all of the
social networks I have joined and can display my user name and public posts.
All this information is available just from searching on my name, email address
or phone number (and only one is required).
After getting over the initial shock of seeing all of my personal
data pulled together in this single easy to digest view I started to wonder
where it all came from and who else might be using it.
Where does it
come from?
As it turns out there are plenty of data marketplaces where
large data sets can be purchased or even downloaded for free. Data.gov provides access to all publicly
available government data, ranging from census findings to crime statistics. One of the things Spokeo is doing is a simple
exercise of tying address information to census demographic information and
real estate tax data. Knowing my age,
address, the demographics of the area I live and how much I paid for my house
becomes a powerful marketing tool that has real value to a variety of
companies.
Infochimps is another
data marketplace where a variety of datasets can be purchased. One of the most interesting data
sets available for purchase is data from dating site okcupid.com.
All 28 personality questions answered by users are available in the data
set, along with their gender, age, state and metro area of the individual. This data set married with the data found on
Spokeo and all of a sudden people know more than you ever thought possible. If you are wondering what okcupid is doing
with all this data (beyond selling it), check out this article
for some insight.
Add in
Geographic Data
Now that we know everything about you from easily procured
data, let’s take the next logical step and learn about your geographic location
and patterns. We learned from the Spokeo search that you have a twitter and/or
flickr account. We can take that information and plug your userid’s into the creepy application.
Creepy will then plot the location of every tweet and geo-tagged photo you have shared
via these services on a map. As you can
see in the below example, patterns quickly emerge.

Since we already know where you live, now we can see what
time of day you are most likely to post in that location vs. other locations
and can do things like predict when you are most likely to be out of the house.
We also likely know where you work from your LinkedIn profile and will be able
to see when you are tweeting from the office.
If we don’t know where you work, with the time and location of your
tweets we can make a very educated guess.
What Else?
Now that we know we know this data is available for the
taking, just think about how it could be enriched with our personal information
that is stored behind corporate firewalls.
What happens when a bank ties banking information to this data set? If they know I live in an expensive neighborhood
will they market to me differently? What
about when my health insurer knows I have been traveling abroad to a country
with an outbreak of a serious disease? Will my premiums be affected?
Ever wonder what clients your competitors are doing business with? How about taking that nice company managed twitter
list that identifies all their employees twitter handles. Using Creepy you can map all the individuals
tweet locations of the employees, filter out the non working times and you’ll
see obvious clusters of tweets occurring around certain geographic
locations. It isn’t a stretch to assume
those locations are the locations where people are working.
Worst of all, what about a military officer posting a picture
to share with his or her family from a secret location in an unfriendly
location? The geotag information that is
posted with the photo has just revealed his exact location to the enemy.
Now What?
Knowing how our data is being used is the most important
thing we can do. Once we understand we can
make informed decisions on what we share, where we share it from and when we choose
to share it. Unfortunately, it isn’t
always clear what happens to our data after we share it we can’t assume it will
always be used with the best intentions.