Geeknarrator

  • Home
  • Podcast
  • Blog
  • Serivce
  • Contact Us
Hit enter to search or ESC to close
post-header
computer science

How search engine works ? Elastic search – (Part-2)

admin
October 11, 2017

Hello! Welcome to the part-2 of “How search engine works” series. Previous part was all about the basic concepts that go behind a search engine at a low level. If you want to refer the previous part, feel free and read it here. This part will be focussed on how things work in elastic search and how to get started with it.

What is elastic search ? (version – 5.6) 

It is – highly scalable, open-source full-text search and analytics engine. Let’s understand what these fancy bold letter words actually mean :

  • Highly scalable : In the current world there are lots and lots of data sources which are just emitting data at the speed of light. Adding to the complexity there are different types of data as well. To serve search requests around this huge data set requires highly scalable implementation of search algorithms. Which means that, as the data grows your search system can easily grow (number of nodes) and keep up with the time taken to serve search requests.
  • Open source : Which means the code is open for all. Open source projects have their own advantages. Any one can contribute to the features, improvements and bug resolution. There is usually a huge community support for open source projects, which helps developers to get started very quickly and resolve any issues that they face while developing.
  • Search and analytics : You can search a bunch of full-text documents for keywords and also apply analytics on the data sets. Elastic search is capable of providing near real time search and analytics results on huge volumes of data. More details coming later in the blog.

Use cases where elastic search can be a fit:

  • Catalog search for retail websites.
  • Provide autocomplete features.
  • Analysing, aggregating, parsing large amounts of log, event data in production systems which are processing huge data and generating logs at a very high rate.(ELK stack)
  • If you have huge applications like stackoverflow, quora etc, where you want to search the answers provided by some users on some specific topics.

The basics : 

  • Node : A server which is part of a group known as cluster. It stores data and takes part in the clusters activities like indexing and searching.Nodes exist inside a cluster. They use cluster names to join.By default they join a cluster named “elasticsearch“.
  • Cluster : A group or collection of nodes which holds all your data and provide indexing and search capabilities on your data. Cluster has a name which is used as an identifier for nodes to join them. This is important to separate your clusters for different environments.
  • Index : A collection of documents which have logical relation, belong to similar category for example data related to your customers can be grouped to have an index. For catalog data you may have another index and so on. You may think something similar to schema in RDBMS. It has a name as an identifier (all lowercase)
  • Type : A type can be imagined something like tables in RDBMS which store data which have common fields. This is used to partition data based on their structure.For example for a retail website catalog can be an index which stores catalog data, then you can have a type “products” for product data, “price” for pricing related data etc.
  • Documents : Refers to the minimum unit of data which can be indexed.Like rows in table. This is represented in JSON format which is a well known standard for data exchange on the internet.
  • Shards : As mentioned earlier, elastic search is primarily used when you have huge data set. Now when you create an index for subset of your data, it is quite possible that the index size may fill up the disk space on a single node. It can take up to TBs of data which might be larger than available disk space on a single node. What do you do in such cases ? Shards is the answer. Shards are parts of your index, which can be stored on separate nodes but can still exist under one index. When you define an index you can define the number of shards. Each shard act as a separate index but logically connected a parent index. This enables horizontal scaling of your content and also parallelise tasks across shards.
  • Replicas : Now, as we storing some set of data across nodes, there is a risk that if node dies you will lose your data and that’s the last thing you want. To avoid that you can use replication. Replicating data provides fault tolerance and high availability in turn. Replica is never stored on the same node (obviously) , but is stored on some other node in the cluster. So when one node dies, you still have your data. So now you will have primary shard and replica shards for the same data set.

Installation :

   Elastic search requires Java 8 to be installed (at least the latest version). So make sure you have that. You can install elastic search using following :

-> Download the tar.
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.3.tar.gz

-> Extract:
tar -xvf elasticsearch-5.6.3.tar.gz

-> Change directory to its binary:
cd elasticsearch-5.6.3/bin

-> Start the node and single cluster:
./elasticsearch

-> On mac it is even simpler if you use homebrew:
brew install elasticsearch

 

This will create a cluster named “elasticsearch” (default) and a node with a random generated UUID. If you want to override that you can simply provide additional arguments while starting the cluster :

./elasticsearch -Ecluster.name=<<your cluster name>> -Enode.name=<<Your node name>>

 

Access your cluster: 

Now that you have your cluster up and running with a node, you would like to access it. Elastic search provides a rich set of RESTful apis to do that.By default it runs of port:9200

Now for hitting Rest APIs Postman is a great UI tool. Download postman or use plain curl command if that works for you.

Using _cat api we can check the health of our cluster.

GET /_cat/health?v

As a response you will see your cluster listed with its status, number of nodes, shards etc.

So with this you have your elastic search cluster up and running, In the part-3 of the article we will store our own data into our es clusters, index them and perform some complex queries.

I hope you liked the article, please share your feedback for any corrections, complements and improvements. Stay tuned for Part-3.

See you.

Cheers,

Kaivalya Apte

Share this on:
clouddistributeddocumentselasticelasticsearchfaulttoleranceindexreplicationsearchenginesearchqueriesshardingtypes
Previous post
Parallel Programming in Java – (Part-2)
Next post
Integration Testing with Test Containers
Related Posts
Technology

Managing credentials with AWS KMS

admin
Jan 17, 2018
Placeholder Image
Technology

How Facebook earns money ?

admin
Oct 2, 2017
1 Comment
    How search engine works ? – Elastic Search (Part – 1) – Geek Narrator
    Oct 11, 2017 Reply

    […] how Elastic search does it ? Leaving you with a small introduction, details will be covered in part-2 of this […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Embracing Technical Leadership and the Architect Role: Insights from Venkat Subramaniam
  • Distributing SQL Databases Globally
  • Why is DynamoDB AWSome?
  • Designing Instagram, Linkedin, Facebook like applications
  • Test Driven Development with Frederik Banke

Recent Comments

  1. kaivalya apte on Software Engineering Interviews – 5 Red Flags
  2. Amitk on Software Engineering Interviews – 5 Red Flags
  3. Anurag Kumar on Software Engineering Interviews – 5 Red Flags
  4. Sylvia Boyle on CAP Theorem – Consistency, Availability and Partition Tolerance.
  5. Rajan on Smart and Stupid Alexa – My Experience
team

Brayan Olson

Art Director

It’s our job to get you the information you need, so you can make the most of your aviation investments.

Categories

  • computer science (18)
  • Interviewing (2)
  • kafka (2)
  • REST APIs (1)
  • Scalability (7)
  • streamprocessing (1)
  • Technology (26)
  • Uncategorized (3)

Recent Posts

  • Software Engineering Interviews – 5 Red FlagsMay 26, 2019
  • Recursion : Understand recursion with a simple exampleOctober 2, 2017

Popular Tags

algorithm AWS bigdata cap clean code computerscience Design pattern elastic elasticsearch framework functional programming functions highly scalable index integration testing internet Interview mistakes invertedindex Java java programming languages microservices money mongodb monolith multiprocessors new index nodowntime nosql old index optimisation Parallel programming partition tolerance positive functions postgresql integration testing postman collection for elastic search Predicate Preparing for interview programming Scala sharding software engineering Testing threads unit testing

Newsletter

newsletter signup

Get notified about updates and be the first to get early access to new episodes.

FAQ

Most frequent questions and answers
What is The GeekNarrator?

The GeekNarrator is a dynamic platform dedicated to sparking curiosity, excitement, and inspiration in the realms of Technology and Software Engineering. Its unique offerings include comprehensive and in-depth technical discussions led by industry experts, providing actionable insights for aspiring and established software engineers alike. The GeekNarrator is more than a resource—it’s a community committed to nurturing your inner geek, helping you leverage technology, and empowering you to excel in software engineering.

How do I collaborate?

If you want to do a paid collaboration, then shoot an email at speakwithkv@gmail.com. 

 

 

We are a tech startup, how can you help?

I’m a firm advocate for the idea that contemporary technology companies must actively engage in podcasting. It’s an excellent medium for articulating the challenges they’re addressing, elucidating the unique solutions they’ve developed, and cultivating awareness around the technology they’re leveraging. But above all, it’s a powerful tool for fostering a vibrant and engaged community.

I can be your ally in this venture. Let’s join forces to create not just an episode, but perhaps an entire series. Our discourse can revolve around the bedrock principles of software engineering and state-of-the-art technology, all while highlighting the remarkable features of your product. This collaboration promises to amplify your voice in the tech community, underscore the importance of your work, and illuminate the path to a tech-forward future.

I am a Software Engineer, how can I use your platform to become a better engineer?

In the field of Software Engineering, effective communication, collaboration, and engaging in deep discussions play pivotal roles. These skills are just as crucial as technical expertise when it comes to becoming a truly exceptional engineer. With this in mind, I create a series of podcast episodes that aim to assist you in honing these abilities, expanding your knowledge, and igniting your curiosity to engage in meaningful conversations.

By tuning into the podcast, you will gain valuable insights from experienced professionals, discover their diverse range of experiences, and be inspired to embark on your own exploration of software engineering. Furthermore, the episodes will equip you with practical techniques to enhance your communication skills, ensuring that your ideas are effectively conveyed and understood.

By actively engaging with the podcast, you will find yourself equipped with the necessary tools to excel not only in the technical aspects of software engineering but also in fostering productive collaboration, articulating your thoughts clearly, and facilitating profound discussions. Together, we will unravel the secrets of effective communication, collaboration, and the art of engaging in thought-provoking conversations, ultimately empowering you to take your engineering capabilities to the next level.

site-logo

Be Geeky!

Subscribe Podcast on:
Youtube
apple music
spotify
Google podcast
Get in Touch

speakwithkv@gmail.com

Twitter Linkedin

© 2023 — Produced by Geeknarrator

All rights Reserved.