




### **Role Overview** We’re looking for a **Data Engineer** who’s passionate about building scalable, high\-performance data solutions that empower analytics and business decisions. In this role, you’ll design, develop, and optimize data pipelines using **PySpark, Databricks, and Delta Lake**, ensuring data integrity and reliability across large distributed systems. **Key Responsibilities** * Build and maintain **robust, high\-performance data pipelines** using **PySpark**, **Databricks**, and **Delta Lake**. * Develop a strong understanding of **data models, lineage, and business logic** — not just ETL flow. * Ensure **data quality, consistency, and accuracy** through validation, profiling, and automated testing. * Debug, optimize, and **fix issues across large\-scale distributed systems**. * Collaborate with analysts and business teams to deliver **trustworthy, production\-ready datasets**. * Apply **CI/CD, version control, and testing frameworks** to maintain reliability and scalability. **Qualifications** * **3–5 years** of hands\-on experience in data engineering or equivalent. * Strong expertise in **PySpark**, **SQL**, **data modeling**, and **performance tuning**. * Proven experience in **data quality frameworks**, **unit testing**, and **data validation**. * Hands\-on with **Synapse,** **Databricks**, **Azure/AWS.** * Curious and analytical mindset — you **understand what data represents**, not just how it’s moved. **Nice to Have** * Exposure to **streaming data**, **CDC**, or **real\-time processing**. * Familiarity with **Power BI** or other BI tools. * Experience in **retail or supply\-chain domains** is a plus.


