Data Engineering Podcast
Technology
About
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
- Maximizing GPU Utilization: Heterogeneous Pipelines with Ray and Kubernetes
Robert Nishihara discusses maximizing hardware utilization for AI and data-intensive workloads, focusing on Ray's integration with Kubernetes and PyTorch. The episode covers heterogeneous pipelines, data preparation shifts, and strategies…
- The AI-First Data Engineer: 10–50x Productivity and What Changes Next
In this episode, Gleb Mezhanskiy, CEO of Datafold, explores how agentic AI is reshaping data engineering workflows, potentially leading to 10–50x productivity increases. The discussion covers AI-driven execution, testing, and deployment, a…
- Treat Metering Like Finance: Building Data Platforms for Consumption Economics
Himant Goyal explains how data platform investments support consumption-based business models by enabling accurate metering. The discussion covers the operational and architectural requirements for real-time visibility, including event sch…
- Beyond the PDF: Rowan Cockett on Reproducible, Composable Science
Rowan Cockett discusses making scientific research reproducible and reusable by addressing issues like data integrity, publishing incentives, and PDF-based workflows. He explores open standards, cloud-optimized formats, and initiatives lik…
- Beyond Prompts: Practical Paths to Self‑Improving AI
Raj Shukla, CTO of SymphonyAI, joins the Data Engineering Podcast to discuss the practicalities of building self-improving AI systems for production. The conversation covers agentic systems, feedback loops for learning, the role of intelli…
- Orion at Gravity: Trustworthy AI Analysts for the Enterprise
Summary In this episode of the Data Engineering Podcast, Lucas Thelosen and Drew Gilson, co-founders of Gravity, discuss their vision for agentic analytics in the enterprise, enabled by semantic layers and broader context engineering…
- From Models to Momentum: Uniting Architects and Engineers with ER/Studio
Summary In this episode of the Data Engineering Podcast, Jamie Knowles (Product Director) and Ryan Hirsch (Product Marketing Manager) discuss the importance of enterprise data modeling with ER/Studio. They highlight how clear, shared…
- From Data Models to Mind Models: Designing AI Memory at Scale
Summary In this episode of the Data Engineering Podcast, Vasilije "Vas" Markovich, founder of Cognee, discusses building agentic memory, a crucial aspect of artificial intelligence that enables systems to learn, adapt, and retain kno…
- Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops
Summary In this episode of the Data Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational groundwork required to run LLM-powered applications reliably and cost-effectively. He highlights common blind spots…
- From Legacy to AI-Ready: How MongoDB AMP Accelerates Modernization
Summary In this episode, Shilpa Kolhar, SVP of Product and Engineering at MongoDB, discusses using MongoDB as a unified foundation for AI-driven and agentic applications. She explains how the Application Modernization Platform (AMP) accele…
- Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows
Summary In this episode Tim Sehn, founder and CEO of DoltHub, talks about Dolt - the world’s first version‑controlled SQL database - and why Git‑style semantics belong at the heart of data systems and AI workflows. Tim explains how D…
- Logical First, Physical Second: A Pragmatic Path to Trusted Data
Summary In this episode of the Data Engineering Podcast Jamie Knowles, Product Director for ER/Studio, talks about data architecture and its importance in driving business meaning. He discusses how data architecture should start with…
- Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability
Summary In this episode Jacob Leverich, cofounder and CTO of Observe, talks about applying lakehouse architectures to observability workloads. Jacob discusses Observe’s decision to leverage cloud-native warehousing and open table for…
- Semantic Operators Meet Dataframes: Building Context for Agents with FENIC
Summary In this episode Kostas Pardalis talks about Fenic - an open-source, PySpark-inspired dataframe engine designed to bring LLM-powered semantics into reliable data engineering workflows. Kostas shares why today’s data infrastruc…
- Beyond Dashboards: How Data Teams Earn a Seat at the Table
Summary In this episode Goutham Budati about his Data–Perspective–Action framework and how it empowers data teams to become true business partners. Gautham traces his path from automating Excel reports to leading high‑impact data org…
- Unfreezing The Data Lake: The Future-Proof File Format
Summary In this episode PhD researcher Xinyu Zeng talks about F3, the “future-proof file format” designed to address today’s hardware realities and evolving workloads. He digs into the limitations of Parquet and ORC - especially CPU-…
- From Context to Semantics: How Metadata Powers Agentic AI
Summary In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into the foundational context layer for AI and agentic systems. They discuss the origins and g…
- From Data Engineering to AI Engineering: Where the Lines Blur
Summary In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI has transformed the practice and pace of data engineering over time. Starting from its origins in the Hadoop and cloud warehouse era,…
- Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics
Summary In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of Malloy from his and Lloyd Tabb’s experience at Looker, why SQL’s mental model often fig…
- Blurring Lines: Data, AI, and the New Playbook for Team Velocity
Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering…
- State, Scale, and Signals: Rethinking Orchestration with Durable Execution
Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first prog…
- The AI Data Paradox: High Trust in Models, Low Trust in Data
Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale A…
- Bridging the AI–Data Gap: Collect, Curate, Serve
Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why…
- Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access
Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems,…
- The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies
Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is le…
- Context Engineering as a Discipline: Building Governed AI Analytics
Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected wit…
- The Data Model That Captures Your Business: Metric Trees Explained
Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from h…
- From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares…
- From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data i…
- Duck Lake: Simplifying the Lakehouse Ecosystem
Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on si…
- Aligning Business and Data: The Essential Role of Data Modeling
Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as…
- From Academia to Industry: Bridging Data Engineering Challenges
Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discus…
- High Performance And Low Overhead Graphs With KuzuDB
Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through co…
- Bridging Data and Decision-Making: AI's Role in Modern Analytics
Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-maki…
- From Bits to Tables: The Evolution of S3 Storage
Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and…
- Revolutionizing Python Notebooks with Marimo
Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enha…
- Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics
Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving data…
- Streamlining Data Pipelines with MCP Servers and Vector Engines
Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data…
- Foundational Data Engineering At Two Sigma
Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the crit…
- Enabling Agents In The Enterprise With A Platform Approach
Summary In this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current ent…
- Dagster's New Era: Modularizing Data Transformation in the Age of AI
Summary In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the ro…
- AI and the Lakehouse: How Starburst is Pioneering New Workflows
Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data enginee…
- Amazon S3: The Backbone of Modern Data Systems
Summary In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to l…
- Scaling Data Operations With Platform Engineering
Summary In this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings. From his roots as an Oracle developer to leading the data platform at a major online t…
- From Data Discovery to AI: The Evolution of Semantic Layers
Summary In this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in the era of AI. As they explore the challenges of managing vast data ecosystems and provid…
- Balancing Off-the-Shelf and Custom Solutions in Data Engineering
Summary In this episode of the Data Engineering Podcast Tulika Bhatt, a senior software engineer at Netflix, talks about her experiences with large-scale data processing and the future of data engineering technologies. Tulika shares her jo…
- StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics
Summary In this episode of the Data Engineering Podcast Sida Shen, product manager at CelerData, talks about StarRocks, a high-performance analytical database. Sida discusses the inception of StarRocks, which was forked from Apache Doris i…
- Exploring NATS: A Multi-Paradigm Connectivity Layer for Distributed Applications
Summary In this episode of the Data Engineering Podcast Derek Collison, creator of NATS and CEO of Synadia, talks about the evolution and capabilities of NATS as a multi-paradigm connectivity layer for distributed applications. Derek discu…
- Advanced Lakehouse Management With The LakeKeeper Iceberg REST Catalog
Summary In this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house enabled by a fast and feature-rich Iceberg catalog. Viktor shares his journey from data w…
- Simplifying Data Pipelines with Durable Execution
Summary In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and or…