Please enable JavaScript to view the comments powered by Disqus.

Apache Druid — A Brief Overview

Image

This article aims to quickly and concisely understand what Druid is and "how to use it".

Let's look at these questions:

  • Do you have a huge amount of event data?
  • Do you need to provide low-latency queries in addition to the data?

If your answer to any question is a definite YES, then Druid is something you must pay attention to. Druid is a wonderful open-source software in the field of big data and data warehouses from the Apache Software Foundation. It is written in Java and is a column-oriented distributed data store (or also called a columnar DB).

Can you trust Druid? Yes! Alibaba, Airbnb, Cisco, eBay, and many other well-known companies use Druid, and it actually started with the marketing company Metamarkets, then was supported by a large open-source community and began its distribution under Apache licenses.

Although it is not a relational database, it has many similarities with a relational database, which helps us understand Druid very quickly.

How much data can Druid handle? I can answer this question by quoting a statement from the Netflix blog: “Netflix is currently loading over 2 million events per second and querying over 1.5 trillion rows to get detailed information about Druid. Below are some of the main advantages of Druid:

  • Fast real-time reporting on a large array of data;
  • Long-term storage using HDFS;
  • High availability;
  • Extremely efficient querying of large data sets;
  • Aggregation and indexing;
  • High-level data compression;
  • Hive integration and more;
  • Can out-of-the-box pull data from Kafka and others;
  • Cool API and understandable query language (even SQL-like);
  • Supports various data loading formats (from CSV and JSON to Avro);
  • Good and complete documentation;
  • Open source with all its advantages.

Unlike many traditional systems, Druid can additionally pre-aggregate data as it arrives. This stage of pre-aggregation is known as rollup and can lead to significant memory savings.

Example of data aggregation by Druid:

What next? Next is practice. Familiarize yourself with the full description of Druid on the official website and start practicing.

Good luck! ;)

Makhno Mykhailo.

More recent stories

Image
2019-02-19 09:26:23
Making Real Time Bidding Solution for video ads
Read More
Image
2020-09-30 19:43:36
Click Analysis – OpenSource Architecture
Read More
Image
2020-10-05 17:33:16
The Difference between NiFi and Streamsets
Read More