Senior Data Engineer for data infrastructure

COP 1/hour

Indeed

Full-time

Onsite

No experience limit

No degree limit

79Q22222+22

Favourites

Some content was automatically translatedView Original

Description

Summary: Seeking a Senior Data Engineer to design, develop, and optimize data infrastructure on Databricks, architecting scalable pipelines and ensuring data quality. Highlights: 1. Design and optimize data infrastructure on Databricks using GCP technologies 2. Architect scalable pipelines with Airflow, dbt, Dataflow, and Pub/Sub 3. Enforce data quality standards and implement CI/CD best practices We are looking for a **Senior Data Engineer** to design, develop, and optimize our data infrastructure on **Databricks**. You will architect scalable pipelines using BigQuery, Google Cloud Storage, Apache Airflow, dbt, Dataflow, and Pub/Sub, ensuring high availability and performance across our ETL/ELT processes. You will leverage Great Expectations to enforce data quality standards. The role also involves building our Data Mart (Data Mach) environment and implementing CI/CD best practices. A successful candidate has extensive knowledge of cloud\-native data solutions, strong proficiency with ETL/ELT frameworks (including dbt), and a passion for building robust, cost\-effective pipelines. **Key ResponsibilitiesData Architecture \& Strategy** * Define and implement the overall data architecture on GCP, including data warehousing in BigQuery/Databricks, data lake patterns in Google Cloud Storage, and Data Mart (Data Mach) solutions. * Integrate Terraform for Infrastructure as Code to provision and manage cloud resources efficiently. * Establish both batch and real\-time data processing frameworks to ensure reliability, scalability, and cost efficiency. **Pipeline Development \& Orchestration** * Design, build, and optimize ETL/ELT pipelines using Apache Airflow for workflow orchestration. * Implement dbt (Data Build Tool) transformations to maintain version\-controlled data models in BigQuery, ensuring consistency and reliability across the data pipeline. * Use Google Dataflow (based on Apache Beam) and Pub/Sub for large\-scale streaming/batch data processing and ingestion. * Automate job scheduling and data transformations to deliver timely insights for analytics, machine learning, and reporting. **Event\-Driven \& Microservices Architecture** * Implement event\-driven or asynchronous data workflows between microservices. * Employ **Docker and Kubernetes (K8s)** for containerization and orchestration, enabling flexible and efficient microservices\-based data workflows. * Implement **CI/CD** pipelines for streamlined development, testing, and deployment of data engineering components. **Data Quality, Governance \& Security** * Enforce data quality standards using Great Expectations or similar frameworks, defining and validating expectations for critical datasets. * Define and uphold metadata management, data lineage, and auditing standards to ensure trustworthy datasets. * Implement security best practices, including encryption at rest and in transit, Identity and Access Management (IAM), and compliance with GDPR or CCPA where applicable. **BI \& Analytics Enablement** * Collaborate with Data Science, Analytics, and Product teams to ensure the data infrastructure supports advanced analytics, including machine learning initiatives. * Maintain Data Mart (Data Mach) environments that cater to specific business domains, optimizing access and performance for key stakeholders. **Requirements** Experience * 5\+ years of professional experience in data engineering, with at least 1 year in mobile data **Technical Expertise with GCP Stack** * Proven track record building and maintaining **Databricks and BigQuery** environments and Google Cloud Storage\-based data lakes. * Deep knowledge of Apache Airflow for scheduling/orchestration and ETL/ELT design. * Experience implementing dbt for data transformations, RabbitMQ for event\-driven workflows, and Pub/Sub \+ Dataflow for streaming/batch data pipelines. * Familiarity with designing and implementing Data Mart (Data Mach) solutions, as well as using Terraform for IaC. **Programming \& Containerization** * Strong coding capabilities in Python, Java, or Scala, plus scripting for automation. * Experience with Docker and Kubernetes (K8s) for containerizing data\-related services. * Hands\-on with CI/CD pipelines and DevOps tools (e.g., Terraform, Ansible, Jenkins, GitLab CI) to manage infrastructure and deployments. **Data Quality \& Governance** * Proficiency in Great Expectations (or similar) to define and enforce data quality standards. * Expertise in designing systems for data lineage, metadata management, and compliance (GDPR, CCPA). * Strong understanding of OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) systems. **Communication** * Excellent communication skills for both technical and non\-technical audiences. * High level of organization, self\-motivation, and problem\-solving aptitude. Preferred Skills : * Machine Learning (ML) Integration: Familiarity with end\-to\-end ML workflows and model deployment on GCP (e.g., Vertex AI). * Advanced Observability: Experience with Prometheus, Grafana, Datadog, or New Relic for system health and performance monitoring. * Security \& Compliance: Advanced knowledge of compliance frameworks such as HIPAA, SOC 2, or relevant regulations. * Real\-Time Data Architectures: Additional proficiency in Kafka, Spark Streaming, or other streaming solutions. * Certifications: GCP\-specific certifications (e.g., Google Professional Data Engineer) are highly desirable. Will be a plus * Machine Learning (ML) Integration: Familiarity with end\-to\-end ML workflows and model deployment on GCP (e.g., Vertex AI). * Advanced Observability: Experience with Prometheus, Grafana, Datadog, or New Relic for system health and performance monitoring. * Security \& Compliance: Advanced knowledge of compliance frameworks such as HIPAA, SOC 2, or relevant regulations. * Real\-Time Data Architectures: Additional proficiency in Kafka, Spark Streaming, or other streaming solutions. * Certifications: GCP\-specific certifications (e.g., Google Professional Data Engineer) are highly desirable. **Benefits, Why should you join us?** Growth and career development Work\-Life balance Competitive salary (USD) PTO Job Type: Full\-time

Source: indeed View original post