FATA #1 / Big Data- Elastic Stack

4 min readNov 20, 2021

[FATA] - From test automation to architecture article series

Elasticsearch — is a distributed, real-time search and analytics engine for all types of data.

Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON

Used for:

Logging & Log analytics
Complex search
Security analysis
Marketing & Operations
Business analytics

Features:

Distributed — runs on multiple nodes within a cluster can scale to 1k nodes, which means performance of search can scale linearly with the number of nodes.
Highly available and fault-tolerant — multiple copies of data are stored within the cluster, and every index is replicated.
REST API — can be used for CRUD operations.
Schema-less — documents can be indexed without explicitly providing a schema, used inverted index concept for lookup.
Near real-time operations — read and write operations take less than a second to complete.
Complementary tooling an plugins — Kibana, Logstash, Beats.
Easy application development — Java, Python, PHP, JavaScript, Node.js, Ruby…

ELK Stack: Elasticsearch, Logstash, Kibana

Elasticsearch is a search and analytics engine.
Logstash is a server-side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch.
Kibana lets users visualize data with charts and graphs in Elasticsearch.
Beat — data shippers, send data from machines to Logstash (if you need transformation and parsing) or Elasticsearch.

Cluster and nodes

Node — is instance of Elasticsearch that stores data.

Cluster — is a collection of related nodes that have the same cluster.name attribute. Clusters are completely independent of each other, it’s not common to perform cross-cluster searches.

Major components

Indices — the largest unit of data in Elasticsearch, are logical partitions of documents and can be compared to a database in the world of relational databases.
Documents — are JSON objects that are stored within an Elasticsearch index and are considered the base unit of storage. In the world of relational databases, documents can be compared to a row in table. Data in documents is defined with fields comprised of keys and value

Each document is also associated with metadata, the most important items being:

_index — The index where the document is stored

_id — The unique ID which identifies the document in the index

Fields
Mapping — It defines the fields for documents of a specific type — the data type (such as keyword and integer) and how the fields should be indexed and stored in Elasticsearch.
Shards — is a single index which allow facilitate its scalability, when you create index you can define how many shards you want. (data parts inside shard)
Replica — fail-safe mechanisms which basically copies your index’s shards.

Analysis and Analyzers

An analyzer contains three lower-level building blocks: character filter, tokenizers, and token filters.

Manage Data in Elasticsearch

cat indices
cat plugins
cat templates
cat health

Analyze & Query your data

Histogram — is a multi-bucket values source-based aggregation that can be applied on numeric values or numeric range values extracted from the documents.
Terms — is a multi-bucket value source-based aggregation where buckets are dynamically built — one per unique value.
Range — is a multi-bucket value source-based aggregation that enables the user to define a set of ranges — each representing a bucket.

Metrics aggregation: Cardinality and Percentiles aggregation.

FATA #1 / Big Data- Elastic Stack

[FATA] - From test automation to architecture article series

ELK Stack: Elasticsearch, Logstash, Kibana

Cluster and nodes

Major components

Analysis and Analyzers

Manage Data in Elasticsearch

Analyze & Query your data

Top interview question references:

Used references:

Written by Nazar Khimin

No responses yet