What is 'big data'?
By eleanor blackwood
Wed 29 July 2020
What is big data?
‘Big data’ is an umbrella term covering a huge range of data types. Big data needs machines to source and analyse it before a human can understand what it means.
Big data affects our everyday lives much more than most of us realise.
In this article we’ll explain:
- What big data is
- How big data can be stored, analysed and used
- What the future of big data might look like
‘Big’ data is found on Cloud, Offshore, SQL etc., has the size of over 50k variables or over 50k people, and is not ready to be viewed and analysed.
In comparison, ‘small’ data can be the size of a few spreadsheet pages and is crucially ready to be viewed and analysed. 'Small data' can be found on a local PC, or database.
With online shopping and social media in particular, companies now have access to huge data sets about billions of customers, and use insights generated from these to gain efficiencies and increase profits.
Kenneth Cukier has an insightful Ted Talk about this, if you want to learn more!
What are the 5 V’s of big data?
The way we can classify any data as 'big' data is through the concept of the ‘5 V’s’. These are: volume, velocity, variety, veracity and value. And they’re the 5 keys to making big data a huge business.
To give you an idea, let’s use the example of big data in healthcare
- Volume: the NHS collects a huge amount of data annually
- Velocity: all of this data is generated at a very high speed e.g. test results
- Variety: there’s a lot of different types of data being collected e.g. ‘structured data’ like excel records, ‘semi-structured data’ like log files, and ‘unstructured data’ like x-ray images
- Veracity: this means the accuracy and trustworthiness of the data collected
- Value: all of this data can benefit the NHS by improving disease detection, treatment and reducing cost.
Big data visualization
Big data visualization is a way to present data visually, such as in a map or graph. It relies on powerful computer systems to ingest raw corporate data and process it to generate visual representations that let people easily understand vast amounts of data in seconds.
How do you store big data?
The way we can process and store big data is through frameworks like Apache Cassandra, Apache Hadoop and Apache Spark. AWS (Amazon Web-Service) also offers a range of services for storage and analytics.
For example, Hadoop uses a ‘distributed file system’ that breaks up big files of data into lots of little files and stores them on different machines. Then Hadoop uses ‘map reduce technique’ to process big data; it breaks up each task into lots of little tasks, that can be done on different machines at the same time, then assembled together.
NoSQL databases are a ‘non tabular’ way of storing big data, for example, MongoDB is the most popular open-source NoSQL system that stores data in document form. Google’s Bigtable is also a way to store big data: Google uses it to power services like Gmail and Google Maps.
What are big data analytics?
Once we’ve stored and processed big data it's time to do the most important thing: analyse it.
This means using tools like Python, EXCEL (connect to Hadoop), Azure and Apache (Cassandra, Hadoop and Spark) to find correlations and trends in data.
Big data can be used for everything from improving games to disaster management, it was even used during Hurricane Sandy in 2012 to know where the most damage occurred.
An example of how valuable and insightful data analytics can be was shown as early as 2013. A Cambridge study of 58,000 facebook users found that ‘individual traits and attributes can be predicted to a high degree of accuracy’ simply from their facebook likes. For example, liking thunderstorms, science and curly fries were signs that someone was highly intelligent. People who liked ‘hello kitty’ tended to be high in ‘openness’, low on ‘emotional stability’ and more likely to have democratic political views. This kind of data can be hugely helpful for creating data profiles on individuals, you can read more about this in our other blog.
Is big data the same as data science?
No. But, data science uses the machine learning algorithms created from big data analytics (see above) to design and develop statistical models that create knowledge from the pile of big data. Essentially, machines can collect, store and ‘analyse’ big data into spreadsheets or documents that a human would understand, but a data scientist uses this data to apply it to a business or organisation in a way that will significantly help them.
For example, big data in accounting can help predict shifts in consumer behaviour, economic trends and identify fraud quickly. This can be used to create strategies for banks and financial institutions.
What are the uses for big data?
The benefit of big data is that it can be used to gain valuable insights and get a bigger picture of any topic. We call the process of turning raw data into useful information ‘data mining’.
Netflix, for example, uses it to create predictive models for consumers by looking at past viewing habits. For example, your past watching habits allow Netflix to choose what scene image you see when a new movie appears on your home page. If you're a romantic film watcher, a romance scene might appear so you’re more likely to click and watch it.
Amazon uses big data to predict which products you’re likely to buy, when you might buy them and when you might need the products: this has drastically increased both its overall sales and profit margins. Millions of companies similarly use big data, to tailor their experience to individual customers, gleaning it from sources like tracking cookies and social media behaviour. The tailored ads you see across social media, especially once you’ve already viewed a product online, are derived from big data and combined with data held about you as a consumer.
Big data in education can also be used to engage and provide opportunities for students. Data such as grades, teacher observations and student actions can help teachers understand where weak areas are and what to focus on.
What does the future of big data look like?
Big data has, like many forms of technology, a huge capacity to both benefit and impinge upon the way we want to live our lives.
While this does have the potential to transform our social and natural world for the better, such as the huge forward bounds taken in the health industry and to tackle climate change, it is important to also note concerns. Particularly, in regards to defending our digital rights and privacy. The use of algorithms across advertising on social media in particular raises its own set of challenges, as scandals such as Cambridge Analytica have since shown us.