Geeknarrator

  • Home
  • Podcast
  • Blog
  • Serivce
  • Contact Us
Hit enter to search or ESC to close
post-header
computer science

How search engine works ? – Elastic Search (Part – 1)

admin
October 2, 2017
Have you ever wondered how a search engine works ? If yes then this article can help you understand the basics of search engine. We will talk about how a search engine is able to search for your queries within milliseconds.

So lets start with the steps involved in building fast search queries :

1) Lexical analysis for indexing

2) Removing stop words

3) Stemming

4) Synonyms

5) Persistence

6) Ranking

Lexical Analysis :

In this step the huge text is converted into group of words (also known as tokens).Which means that your text will be intelligentlyconverted into a subset which contains unique set of words on the basis of which the search will be performed.What next ?

Do you remember the index page at the back of a book that you love ? Just think for a while what it contains. Yes, it has a list of words followed by page numbers.Take a look at the below image if you don’t remember.

In search engine world this is known as an inverted index. So basically you have a reverse mapping of words to the page numbers. To understand this clearly, let’s take an example.

Suppose you have to search for the count of a particular word in a huge number of documents.How would you do that ? The easiest approach is to read every page of every document and increment your count when you find the word.After reading every document you would have your answer.

After some time you again have to refer to that word, but wait… Would you scan the documents again ? Lets say you do it, but what happens when the query comes again ? You can’t always read all the documents. Phew, this is very tiring and slow process. After thinking for a while, you come up with an idea to note down the page numbers for the words you encounter.Such that for the next query you just have to refer to the page numbers and jump directly on to that.

When you do this for all the words (that you really care about) you will have a document known as the Inverted Index.

So what is the idea – Create an index of all the important words and store it somewhere.

General definition – „Inverted index is a mapping between “terms” (the actual content/words/tokens) and the “postings” (documents in which the word appear).

Removing stop words: 

Generally in huge text documents there are lot of words which are not useful for your search queries.These words are known as stop words. For example a, an , is , the , to etc.These are not worth indexing, and if you do that you will end up growing your index size unnecessarily.„So a list of stop words needs to be identified and maintained which is used while indexing . Any words which is a stop word will not be indexed.

„Common stop-words include articles, prepositions and one-letter words.

Stemming : 

 Stemming generally refers to stripping off letters from words and bring them to their root form.For example fishing,fisherman,fisher,fishy all come from the word fish. This important step takes care of removing such redundancy.„The idea here is that we cannot and should not store all the grammatical forms of a single word.„The user who is searching might mean a different grammatical form of a word just by querying the stem word.„More examples : “run” -> “running”,”runner” , “swim” -> “swimming”,”swims” etc.

Synonym Database (Lemmatization) :

„A synonym database needs to be constructed to search on words with equivalent meanings.„The implementation is exactly the same as stemming , the only difference is that keywords are mapped to the words with similar meaning as opposed to different grammatical forms of the keyword.„Example : car -> vehicle, Automobile -> vehicle , plane -> vehicle. „This gives user a capability to search actual words by querying the synonyms.

Persistence:

Persistent indices allow for quick retrieval of previously indexed information.

„All the important information above a word is pre-processed as described in the previous steps and stored , which makes the information retrieval fast.

Relevance  Ranking :

 Keywords are assigned a relevance ranking, which is the calculation of a keyword’s frequency relative to the total number of words in the document.

„This ensures that the search results are more relevant to what the user needs.

„The relevance rank aids the end-user in locating the desired information by indicating which results (especially in a large result set) are more likely yield pertinent information.

After all the above steps, a search engine becomes capable of solving search queries in huge amount of data with improved user experience.

Next topic is to know about how Elastic search does it ? Leaving you with a small introduction, details will be covered in part-2 of this article.

„Elastic search is a flexible, powerful , distributed real time search and analytics engine. It is easy to setup and provides following features :

  •     Real time analytics
  •     Distributed
  •     High availability
  •     Full text search
  •     Document oriented (Json in and Json out)
  •     Schema free (Json with free schema)
  •     Provide RESTful api

 

Cheers,

Kaivalya Apte

Share this on:
elasticsearchfulltextsearchinvertedindexsearching
Previous post
Recursion : Understand recursion with a simple example
Next post
Deadlock in simple terms
Related Posts
Technology

Managing credentials with AWS KMS

admin
Jan 17, 2018
Placeholder Image
Technology

How Facebook earns money ?

admin
Oct 2, 2017
1 Comment
    How search engine works ? Elastic search – (Part-2) – Geek Narrator
    Oct 11, 2017 Reply

    […] go behind a search engine at a low level. If you want to refer the previous part, feel free and read it here. This part will be focussed on how things work in elastic search and how to get started with […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Embracing Technical Leadership and the Architect Role: Insights from Venkat Subramaniam
  • Distributing SQL Databases Globally
  • Why is DynamoDB AWSome?
  • Designing Instagram, Linkedin, Facebook like applications
  • Test Driven Development with Frederik Banke

Recent Comments

  1. kaivalya apte on Software Engineering Interviews – 5 Red Flags
  2. Amitk on Software Engineering Interviews – 5 Red Flags
  3. Anurag Kumar on Software Engineering Interviews – 5 Red Flags
  4. Sylvia Boyle on CAP Theorem – Consistency, Availability and Partition Tolerance.
  5. Rajan on Smart and Stupid Alexa – My Experience
team

Brayan Olson

Art Director

It’s our job to get you the information you need, so you can make the most of your aviation investments.

Categories

  • computer science (18)
  • Interviewing (2)
  • kafka (2)
  • REST APIs (1)
  • Scalability (7)
  • streamprocessing (1)
  • Technology (26)
  • Uncategorized (3)

Recent Posts

  • Software Engineering Interviews – 5 Red FlagsMay 26, 2019
  • Recursion : Understand recursion with a simple exampleOctober 2, 2017

Popular Tags

algorithm AWS bigdata cap clean code computerscience Design pattern elastic elasticsearch framework functional programming functions highly scalable index integration testing internet Interview mistakes invertedindex Java java programming languages microservices money mongodb monolith multiprocessors new index nodowntime nosql old index optimisation Parallel programming partition tolerance positive functions postgresql integration testing postman collection for elastic search Predicate Preparing for interview programming Scala sharding software engineering Testing threads unit testing

Newsletter

newsletter signup

Get notified about updates and be the first to get early access to new episodes.

FAQ

Most frequent questions and answers
What is The GeekNarrator?

The GeekNarrator is a dynamic platform dedicated to sparking curiosity, excitement, and inspiration in the realms of Technology and Software Engineering. Its unique offerings include comprehensive and in-depth technical discussions led by industry experts, providing actionable insights for aspiring and established software engineers alike. The GeekNarrator is more than a resource—it’s a community committed to nurturing your inner geek, helping you leverage technology, and empowering you to excel in software engineering.

How do I collaborate?

If you want to do a paid collaboration, then shoot an email at speakwithkv@gmail.com. 

 

 

We are a tech startup, how can you help?

I’m a firm advocate for the idea that contemporary technology companies must actively engage in podcasting. It’s an excellent medium for articulating the challenges they’re addressing, elucidating the unique solutions they’ve developed, and cultivating awareness around the technology they’re leveraging. But above all, it’s a powerful tool for fostering a vibrant and engaged community.

I can be your ally in this venture. Let’s join forces to create not just an episode, but perhaps an entire series. Our discourse can revolve around the bedrock principles of software engineering and state-of-the-art technology, all while highlighting the remarkable features of your product. This collaboration promises to amplify your voice in the tech community, underscore the importance of your work, and illuminate the path to a tech-forward future.

I am a Software Engineer, how can I use your platform to become a better engineer?

In the field of Software Engineering, effective communication, collaboration, and engaging in deep discussions play pivotal roles. These skills are just as crucial as technical expertise when it comes to becoming a truly exceptional engineer. With this in mind, I create a series of podcast episodes that aim to assist you in honing these abilities, expanding your knowledge, and igniting your curiosity to engage in meaningful conversations.

By tuning into the podcast, you will gain valuable insights from experienced professionals, discover their diverse range of experiences, and be inspired to embark on your own exploration of software engineering. Furthermore, the episodes will equip you with practical techniques to enhance your communication skills, ensuring that your ideas are effectively conveyed and understood.

By actively engaging with the podcast, you will find yourself equipped with the necessary tools to excel not only in the technical aspects of software engineering but also in fostering productive collaboration, articulating your thoughts clearly, and facilitating profound discussions. Together, we will unravel the secrets of effective communication, collaboration, and the art of engaging in thought-provoking conversations, ultimately empowering you to take your engineering capabilities to the next level.

site-logo

Be Geeky!

Subscribe Podcast on:
Youtube
apple music
spotify
Google podcast
Get in Touch

speakwithkv@gmail.com

Twitter Linkedin

© 2023 — Produced by Geeknarrator

All rights Reserved.