Posted on

Big Data: Analysing the World

Big DataBig data is a big idea.

Monumentally big.
1,000,000,000,000,000,000 big.
It’s so big that big data is actually three big ideas in one.

Because when we use the term ‘big data’ we could be talking about the:

  • scale of big data sets
  • discipline of capturing, storing, and analysing them, or
  • technology set that allows this

So, let’s roll up our sleeves and put our arms deep into some big numbers.

But first, a Primer on Big Numbers

Since we are going to need to bandy about some big numbers, here’s a quick primer for those who didn’t learn or cannot recall scientific notation.

Scientific notation starts with the number 10 – because we have ten digits on our hands. We write 10 x 10 as 102. We say 102 is ‘ten to the power of two’.

Likewise, 103 is 10 x 10 x 10, or ten to the power of three. And so on.

Data is measured in bits and bytes. A bit is a single unit of information: a yes or a no; a 1 or a 0. Outside of some software engineering, we work more often in bytes. 1 byte = 8 bits.

Now we can mix our knowledge of scientific notation and data. 1,000 bytes = 103 bytes. We call this a kilobyte and write it as 1 kB.

And we have names for all the powers of 3, 6, 9, etc…

Big Numbers for Big Data

Just how big are these numbers in terms of real things?

Well, there are approximately 1024 atoms in a gram of stuff – water, bread, plastic. Thats 1,000,000,000,000,000,000,000,000.

For smaller numbers, how about 1021 stars in the universe, and 1011 stars in our own galaxy.

There are 1010 people on earth and around 107 in each of our biggest cities. Huge sports stadia hold 105 people, and a pretty standard high school / secondary school will have around 103 pupils

Why do we Need Big Data?

First, though, another question…

Why have we got Big Data?

Our human world is now configured around data. And, since the 1980s, the cost of creating, acquiring, and storing data has been dropping rapidly, in real terms.

Where once a computer could cost an office worker a month’s salary, it’s now nearer a week. And the digital cameras a family used to invest in 20 years ago could capture a few hundred images with a couple of million points of colour in each. Now, for the same equivalent cost, a camera can create and store hundreds of thousands of images – each with many millions of points of colour.

And that’s just a simple consumer example. Now think of the millions of traffic and surveillance cameras, the video recorders, the audio recorders, the scientific data loggers and…

Most significant, perhaps are the vast arrays of data being created, captured, and stored by governments and big business.

So, Why do we Need all this Data?

Governments and businesses can use this data to make decisions and predictions.

By analysing vast arrays of data, they can find answers  to questions about:

  • Operational efficiency
  • Customer behaviour
  • New products
  • Causes of failure
  • Fraud and criminality
  • Financial trading
  • Demand for commodities

The list is endless – or maybe just big.

What is Big Data?

I said at the opening that Big Data is really three things. These are:

  1. scale of big data sets -typically in the petabyte range and above
  2. discipline of capturing, storing, and analysing them, or
  3. technology set that allows this

It is the first of these that is most often used, so here is my definition of Big Data:

Big data means the immense data sets that  we can analyse to reveal patterns, trends, and correlations, to help us make predictions and therefore decisions, about human behaviour and the systems we have created.

The Gartner Definition of Big Data

But there is one definition that deserves a special mention. Near the start of all this, analysis agency Gartner, gave this definition:

Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity.

In so-doing, they coined the ‘three Vs’.

The Three Vs of Big Data

The three Vs of variety, volume, and velocity are core to understanding what big data is. More recently, other organisations and analysts have added to them: veracity, variability, and more. All of them together create a vast complexity.

Variety

Data comes in many forms now, such as:

  • text
  • images
  • video
  • sound
  • financial
  • biographical

Volume

The amount of data we collect, measured in bytes and big multiples of bytes. Data can come from many sources, including:

  • financial transactions
  • government records
  • value chain hand-offs
  • social media
  • remote detection and sensing
  • machine-to-machine interactions

Velocity

This is about the speed at which data is created and stored, which is now constant, real time, across millions or even billions of sources.

Variability

The flow of bi streams of data can be highly inconsistent. There are peaks and troughs that can form predictable cycles and seemingly random, almost chaotic, bursts. Think about daily and seasonal patterns, and also political, social, and natural events.

Veracity

This is about the quality and accuracy of data – does it truly reflect the reality it purports to represent.

The Challenges of Big Data

Huge data sets create the possibility of extremely high levels of statistical significance. These make sampling feel like a crude technology akin to comparing an ox cart to a jumbo jet. But there are problems. The complexity introduced by the variety, variability, and flaws in veracity make interpreting the data a challenge False interpretations abound. And, in simplifying our data sets, how can we be sure which data is important, and which not?

There is also a massive cost to gathering and storing such vast amounts of data. It’s not just the financial cost of the technology, infrastructure, and energy. All of that energy manifests as waste heat. Big data creates carbon dioxide, uses precious resources, and dumps heat into our environment. It may be that we rely on big data to solve the world’s most pressing problems. But we must not forget that it also contributes to them.

What is Your experience of Big Data?

We’d love to hear your experiences, ideas, and questions. Please leave them in the comments below.

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *