Big-Data…

Poojya Puju
3 min readSep 17, 2020

--

🤔 Why the Big Data problem arises?

Nowadays people are more addicted to social media. They are sharing many photos, videos, etc… so lots of data is increased for every second . have you ever think how the data is stored…

Google searches daily:

Google now processes over 40,000 search queries every second on average , which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.This is data is not overall searches in worldwide . 77% of searches of world wide is done on google. In world wide over 5 billion searches are done in one day.

Facebook Data Storge:

Face Book processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.

Facebook generates 4 petabytes of data per day — that’s a million gigabytes. All that data is stored in what is known as the Hive, which contains about 300 petabytes of data.

Every 60 seconds, 510,000 comments are posted, 293,000 statuses are updated, 4 million posts are liked, and 136,000 photos are uploaded.

🤔What is Big Data?

👉Big Data:

Large data sets that are useful to fetch some knowledge in order to reveal patterns, trends, and associations are called Big data. Big data is a term that explains the high volume of data that is both structured and unstructured. The quality of data is more important than its quantity. The motto behind the gathering, processing, and analyzing data is to fetch valuable results from the data. Big data analyses if done precisely can take us to better decisions and strategic moves. Analysts now have provided a mainstream definition of big data as the three Vs

👉Distributed Storage:

Big Data store relies on “Distributed Storage.” For Distributed Storage, instead of storing a large file sequentially, you can split it into pieces, and scatter those pieces across many disks.in this way, we can achieve higher storing and Retrieval speed.

  • The Distributed storage stores the data in parallel by stripping/splitting the GB’s and GB’s of data in some species,. So that it will store the data within the seconds … Data stripping/splitting is done by master node/name node and it transfers data to all the respective Data Nodes / Slave nodes within seconds …
  • In this Distributed Storage Cluster, there are N-numbers of slaves and they are connected to the Master.

Distributed Storage is a core concept for many technologies ..like- Hadoop, Robust Hardware, Grid Computing,etc..

👉Hadoop:

Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes.

Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on the Hadoop database, i.e., Apache HBase.

Hadoop distributed file system (HDFS) and several related components such as Apache Hive, HBase, Oozie, Pig, and Zookeeper.

That’s all…

Thanks For Reading…🤝

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response