
What Changed in Data Engineer Job Descriptions Around 2023? For years, a Data Engineer job description was a known quantity: Python for pipeline code, SQL for transformations, Airflow for orchestration, Spark for batch processing, one cloud (AWS or Azure or GCP), and a warehouse. The role was about moving data reliably from sources to destinations that analysts could query. Machine learning was someone else’s problem downstream. That description still fits most postings today. But about 4 in 10 active Data Engineer postings now mention some form of AI, and a new vocabulary has appeared in the ones that do: vector databases, retrieval-augmented generation (RAG), LLM-integrated pipelines, AI agents. We analyzed every active Data Engineer posting on the InterviewStack.io job board as of May 2026, 6,736 listings, to map where that shift is and where it is not. The short version: there are two stories happening at once. One is explicit and visible in posting text. The other is ambient, nearly invisible to job-description scanning, and much larger. Key Findings 6,736 active Data Engineer postings analyzed across the live job board as of May 2026. 39.5% of postings (2,664 of 6,736) mention some form of AI, including traditional ML. 17.4% explicitly require new-wave generative AI skills such as LLMs, RAG, AI Agents, and vector databases: 1,169 postings. $18,965 salary premium for US-based roles with new-wave AI requirements: median $136,520 vs. $117,555 for non-AI roles. Machine Learning leads all AI skills at 30.6% of postings; LLMs (6.7%), AI Agents (6.6%), and RAG…
Want more insights? Join Grow With Caliber - our career elevating newsletter and get our take on the future of work delivered weekly.