What is Big Data?

Big data is a term used for large data sets that include structured, semi-structured, and unstructured data.

Organizations first started storing and analyzing data in the '70s with the help of relational databases. However, the proliferation of the internet in the early 2000s lead to a huge surge in generated data by way of social media and video and audio streaming. This data is characterized as being large in volume that is generated at very high speeds. Furthermore, it is a mixture of structured and unstructured data. Roger Mougalas coined the term big data in 2005 to describe such large data sets that can't be processed by traditional analytical tools, such as relational databases.

How is Big Data Different From Usual Data?

Big data is characterized by three Vs, which differentiate it from data of the pre-internet days. These are:

  • Volume: The widespread use of the internet has resulted in huge volumes of low-density data being produced every minute.
  • Velocity: The huge volume of data is being produced really quickly. According to estimates, around 1.5MB of data is produced every minute.
  • Variety: Data is generated from all sorts of sources, such as video streams, Twitter feeds, text, and audio streams. It is, largely, unstructured data in several different formats, which needs to be transformed before it can be analyzed.

What are the Challenges of Big Data?

In order to fetch insights and trends from big data, it first needs to be integrated into a central data repository. However, since big data includes data in several different formats, a key challenge is to transform the data into a common language, before it is loaded into the repository. ETL tools can help solve that problem.

The next challenge is to analyze thousands of petabytes of data at scale. Open-source frameworks, such as Apache Hadoop, allow analyses of distributed data sets on clusters of computers, concurrently.

How Can I Use Big Data?

Organizations, today, have access to a whole bunch of information on consumer behavior and usage pattern from different data sources. When analyzed correctly, big data can unearth trends and insights that can help enterprises take key business decisions. Some of the challenges that big data can help solve are:

  • New product development: With the help of data from social media, user surveys, and social listening, companies can predict new product segments. It can also be used by media houses to identify emerging artists and trends in music, films, and fashion.
  • Forecasting: With the help of sensors and historical data, manufacturing firms can predict equipment failure and engage in proactive maintenance. This can help them optimize maintenance costs and increase equipment uptime. Big data can also be used for financial forecasting, among other things.
  • Fraud detection: Banks and financial institutions increasingly use big data to identify fraudulent transactions. It is also used by enterprises for bot detection.
  • Machine learning: Big data forms the backbone for machine learning algorithms, which have a whole host of applications, including image and voice recognition, video surveillance, and traffic predictions, among other things.

Glossary of Terms

A guide to the nomenclature of data integration technology.