The Age of Big Data: Big Gain or Big Pain?
The age of ‘Big Data’ is now upon us. Big data is everywhere – data about us; data about things; and data about our relationship with things. IBM estimates that 90% of the data that has been generated in our entire history has been created in the last two years; and that we are adding approximately 2.5quintillion bytes (2.5 billion GB) of data to the heap every day. The magnitude of this data is incomprehensible and if you’re aware of the environmental implications of running the power-hungry servers required to store and process such volumes of data (if not, check out my post on Cloud Computing), then you’ll no doubt be wondering: does the benefits of big data outweigh the costs?
There are certainly a lot of people out there that are desperate to get their hands on these vast quantities of data. From economists to physicists, computer scientists to sociologists, academics to entrepreneurs, big data has become one of their most valuable resources for knowledge discovery.
What’s the Big Deal?
What makes data ‘Big Data’? Is there a difference between big data and other large data sets? Confusingly, size is not a defining characteristic of big data; its definition is based on its capacity to be searched and aggregated, and its ability to be cross-referenced against other large data sets in the hunt for patterns and variations. These patterns and variations can lead to both understanding what has already happened within a particular context; and using that understanding to make predictions about might what happen in the future.
Humankind has been looking for patterns in data for a long time. The man considered as the first data analyst was John Graunt, a haberdasher by trade, who lived in London 300 years ago when the bubonic plague was still rife. He searched for patterns in a data set known as the ‘Bills of Mortality’, which consisted of death records spanning over 100 years. He organised and made sense of the information, recording all causes of death and the numbers of people who died from each in tabular form. By studying the data, he observed patterns from which he was able to glean a range of insights, including evidence that the plague was not transmitted through bodily contact as was previously thought; and information which allowed him to determine the probabilities of survival to each age, which he recorded in the first life table.
We’ve moved on somewhat from the early pioneering work carried out by John Graunt, where data was organised by hand. Modern data mining involves complex computer algorithms to spot emerging patterns and variations in massive data sets that the human eye would be incapable of detecting.With continued advances in technology and computer science, we are thinking up increasingly resourceful ways of capturing data and ever more inventive ways of mining it. The insights that man is now capable of gaining by mining big data are simply astounding.
Discovering the Secrets of the Universe
The Square Kilometre Array (SKA), involving ten countries, is one of the largest-scale science projects ever launched. The array of radio telescopes will be the biggest data collector ever built and will continuously receive radio waves from every single star in deep space that have each travelled billions of light years. Astronomers have been collecting data for centuries however advanced data mining tools will now be used to look for patterns and variations in the vast and dynamic SKA data, which will allow them to catalogue the status of each star at each epoch to understand how the universe has evolved over time and to predict where it is going.
Understanding Disease for Informed Medical Treatment
Medicine is being transformed with the emergence of predictive treatment, personalised to the individual, based on their genome sequence. DNA has been stored on databases for many years now and is used daily to accurately link criminals to the crimes they commit. The process involves comparing the unique DNA sequence of a sample found at a crime scene to the DNA sequences of previous offenders, if a match is found they can identify the perpetrator. The big data approach has a very different purpose, it aims to collect huge numbers of DNA samples from patients with various diseases and cancers and carry out advanced data mining techniques to explore the patterns and variations in the genes that emerge. It will give researchers the ability to search for clues in a patients DNA if they have a rare, undiagnosed condition. It will allow oncologists to tailor cancer treatments for optimal response rather than the trial and error method that’s currently employed. It will provide greater insights into human evolution and it will be a treasure trove for epidemiologists to better understand disease. Yes, if you have lots of money you can pay a company to analyse your genome sequence, predict a disease common to any gene variation found and make recommendations for preventative treatment, however the big data approach could make much greater discoveries and that knowledge could be translated into better treatment for all.
Preserving the Past, Understanding the Present
GDELT (Global Data on Events, Location and Tone) is a comprehensive data set that includes details of every event in recent human history, geotagged by city and updated in real-time based on news media as new events occur. Its current state only takes us back to 1979, but when complete it will reach back to the year 1800 and will not only serve as a permanent record for future generations when they look back at our digital content, but mining that data can provide new insights for political and social researchers, or anyone for that matter as the data set is publicly accessible for free! (I certainly intend to play with it!)
Forecasting and Combating Crime
Policing is being transformed in the US with patrol routes being determined by daily forecasting, which predicts the locations where crimes are likely to occur. Researchers took a data set of the times and locations of 13million crimes recorded over 80 years and searched for patterns in the data. They found that when a crime occurs there is a high probability that other crimes will occur nearby and in quick succession. A mathematical model originally developed to predict the aftershocks of earthquakes was adapted for crime, based on the patterns emerging in the crime data. When the model was applied to a number of past crimes, it accurately predicted nearby crimes that did indeed occur. The system is currently being trialled in 150 cities across America.
And Making Money, Of Course
The fasted growing dataset is the one that we are generating ourselves. We subconsciously add to an everlasting trail of data in most things we do – making calls and texts, banking, travelling, swiping our loyalty cards, shopping online, the list goes on. And we even choose to drop extra crumbs (or rather bits) along the way – geotagging our photos on Flickr, updating our Facebook statuses, filling out surveys for the chance to win an iPad (we must love sharing information about ourselves!).
If you collect that data over millions of people you can start to guess what they may be interested in next and it is to this end that businesses have been investing in big data. This investment extends beyond market basket analysis, which led supermarkets to separate essential items that are often bought together to maximise our exposure to other items as we walk through a store (that’s why milk is never close to bread). It extends beyond personalising loyalty offers based on what items you buy and how much you spend. Now, the focus of big data in business is to search for the slightest hint of what we might want to buy before we’ve even realised it ourselves so they can bombard us with personalised, invasive advertising. The sole purpose of these data gathering efforts is to make money and that is exactly the kind of big data that we don’t want or need to see increased.
If all data centres could migrate to operating entirely on renewable, clean energy then advancing the capabilities of big data to learn about the universe, to combat crime, to help people combat disease and live longer, wouldn’t it be the right and just thing to do. But when it comes to mere money making initiatives, isn’t storing and processing such data wasteful and completely unnecessary?
Big data is undoubtedly a powerful tool that brings much hope to addressing the big issues in society, offering new insights into areas as diverse as cancer research, terrorism, and climate change however it also brings fear and concern, for the associated energy use and its environmental impact, and for the prospect of privacy invasion. With big data being in its infancy, very little is understood about the ethical implications but one thing is for certain: with the increasing quantity and detail of our transactions being captured by business, the continued lure of social media, and the Internet of Things era on the horizon, we are set to see exponential growth in big data for the foreseeable future.
Do you think Big Data is good or bad for us? Feel free to leave a comment…