Geeknarrator

Podcast

Latest Episodes

Hey Everyone,

In the 43rd episode I speak with Tim Berglund on Realtime Analytics with Apache Pinot.

Chapters: 

00:00 Introduction 
01:22 What do we mean by analytics and realtime analytics?
05:35 Can we define realtime in millis, seconds or minutes?
08:54 What is the fundamental difference between traditional analytics systems and Apache Pinot?
12:19 Was Kafka one of the reasons Apache Pinot could reach its full potential?
16:50 E-commerce Application example - How do I get my data in?
20:07 How is data stored (structured) on the disk?
23:31 Are joins available in Apache Pinot?
26:07 Joins vs pre-computing at ingestion
27:15 How is historical data ingested into Apache Pinot?
28:14 Types of indexes available in Apache Pinot
35:42 Do indexes cause write amplification? Is that a problem in Apache Pinot?
40:02 Point lookups in Apache Pinot
42:54 Anamoly Detection
45:51 Coming up in Apache Pinot

Links:
StarTree https://startree.ai/
Apache Pinot: https://pinot.apache.org/
Joins in Pinot: https://startree.ai/blog/apache-pinot-native-join-support
Apache Pinot Indexes: https://docs.pinot.apache.org/basics/indexing

Other playlists:
Distributed systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern Databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Serverless Architecture: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfpX9hT_tJEFb69o0GWlEZS

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

I hope you like the episode. Like, share and subscribe to the channel. 

Cheers,
The GeekNarrator

Hey Everyone,

In the 43rd episode I speak with Tim Berglund on Realtime Analytics with Apache Pinot.

Chapters:

00:00 Introduction
01:22 What do we mean by analytics and realtime analytics?
05:35 Can we define realtime in millis, seconds or minutes?
08:54 What is the fundamental difference between traditional analytics systems and Apache Pinot?
12:19 Was Kafka one of the reasons Apache Pinot could reach its full potential?
16:50 E-commerce Application example - How do I get my data in?
20:07 How is data stored (structured) on the disk?
23:31 Are joins available in Apache Pinot?
26:07 Joins vs pre-computing at ingestion
27:15 How is historical data ingested into Apache Pinot?
28:14 Types of indexes available in Apache Pinot
35:42 Do indexes cause write amplification? Is that a problem in Apache Pinot?
40:02 Point lookups in Apache Pinot
42:54 Anamoly Detection
45:51 Coming up in Apache Pinot

Links:
StarTree https://startree.ai/
Apache Pinot: https://pinot.apache.org/
Joins in Pinot: https://startree.ai/blog/apache-pinot-native-join-support
Apache Pinot Indexes: https://docs.pinot.apache.org/basics/indexing

Other playlists:
Distributed systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern Databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Serverless Architecture: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfpX9hT_tJEFb69o0GWlEZS

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

I hope you like the episode. Like, share and subscribe to the channel.

Cheers,
The GeekNarrator

YouTube Video VVVfbUd1WTRnMG1nZ2VVR002VjFvc2RBLmNHVGZmV2cyRUZz

Tim Berglund on Realtime Analytics with Apache Pinot

The Geek Narrator May 28, 2023 12:58 pm

In this video I talk to Philip Fried from Estuary about Batch vs Realtime Stream Processing.
Philip brings a ton of experience in the world of data processing and has shared some of the best practices in implementing these systems. We dive deep into the world of data processing, covering batch and streaming systems, their challenges, tradeoffs and use cases.

Chapters:
00:00 Batch vs Realtime Stream Processing
03:25 What is Batch and Reatlime processing?
18:29 How does Batch and Realtime compare in terms of Latency and Throughput?
27:24 Where is the cost saving coming from? Compute?Storage? or Network?
31:38 Moving from Batch to Stream processing
37:50 How is Idempotency implemented in Streaming systems?
48:50 How do we approach Schema evolution in Batch and Streaming systems?
57:16 Summary - key points to keep in mind

Do checkout Estuary if you deal with a ton of data, and don't want to deal with the painful operations, infrastructure management, schema migrations etc and only want to focus on building highly scalable and resilient applications. 

References:
Estuary: https://estuary.dev/
Flow documentation: https://docs.estuary.dev

If you like this video please hit the like button, share it with your network (whoever works with a ton of data) and subscribe to the channel.

Feel free to watch related episodes in the playlist: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern Databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed Systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4sd_CAWupznrpBezxT0gEvxB

Cheers, 
The GeekNarrator

In this video I talk to Philip Fried from Estuary about Batch vs Realtime Stream Processing.
Philip brings a ton of experience in the world of data processing and has shared some of the best practices in implementing these systems. We dive deep into the world of data processing, covering batch and streaming systems, their challenges, tradeoffs and use cases.

Chapters:
00:00 Batch vs Realtime Stream Processing
03:25 What is Batch and Reatlime processing?
18:29 How does Batch and Realtime compare in terms of Latency and Throughput?
27:24 Where is the cost saving coming from? Compute?Storage? or Network?
31:38 Moving from Batch to Stream processing
37:50 How is Idempotency implemented in Streaming systems?
48:50 How do we approach Schema evolution in Batch and Streaming systems?
57:16 Summary - key points to keep in mind

Do checkout Estuary if you deal with a ton of data, and don't want to deal with the painful operations, infrastructure management, schema migrations etc and only want to focus on building highly scalable and resilient applications.

References:
Estuary: https://estuary.dev/
Flow documentation: https://docs.estuary.dev

If you like this video please hit the like button, share it with your network (whoever works with a ton of data) and subscribe to the channel.

Feel free to watch related episodes in the playlist: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern Databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed Systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4sd_CAWupznrpBezxT0gEvxB

Cheers,
The GeekNarrator

YouTube Video VVVfbUd1WTRnMG1nZ2VVR002VjFvc2RBLnBPcVEtMGNSV0tV

Batch vs Realtime Stream Processing - A Deep Dive

The Geek Narrator May 19, 2023 1:57 pm

#Shorts Things to keep in mind as a Software Engineer

The Geek Narrator May 14, 2023 6:00 pm

#Shorts How to say "NO" for Software Developers

The Geek Narrator May 14, 2023 5:36 pm

Distributed Systems and Databases

Modern Databases

Placeholder Image