Intro to Elasticsearch

Intro to Elasticsearch

Tags
Elasticsearch
Created
Oct 12, 2022 11:20 PM
Edited
Oct 12, 2022
Description
Recently I watched a brownbag from my current company, and I figured it was time to groom some notes from my previous company and the current one. So this is a note of very basic Elasticsearch in case one day I will need it again.
  • Developed in Java
  • Fast read
  • Good for full-text search

Cluster

  • Contains multiple nodes

Node

master node:

  • Decide node and shard combination

Dedicated master node

    coordinating node:

    • Distribute requests
    • Default for node

    data node:

    • Store data

    Shard

    • Distribute data from index to different nodes evenly like load balancer

    Primary Shard

    • Like master
    • Every shard is one Lucene instance
    • The number can not be modified after setting

    Replica Shard

    • Like slave
    • Prevent primary from losing data
    • The number of replica shards can be tuned

    Shard Allocation

    • shard rebalance
    • low / high / flood-stage disk watermark
    • Concurrent recovery (from snapshot)/ relocation (from other nodes)
    • disable deleting indices with wild card (data security concerns)
    Three nodes on a cluster. Data is split into 3 shards and evenly stored in 3 nodes.
    notion image

    Index

    • Container to store data

    Inverted index

    • Make full-text search very fast
    • Built during data insertion

    Token

    1. Tokenize
    1. Token filtering such as edge n-gram token filter
      1. Makes full-text search super fast by searching a slice of token and return the entire document

    Document

    • Like rows in a table (RDMS)

    Reference