
AWS C2C requirements
Job Title: AWS Databricks Data Engineer
Los Angeles CA – Hybrid
Experience: 12 yrs
Contract
Job Summary:
Must Demonstrate (Critical Skills & Architectural Competencies)
Designing and implementing Databricks-based Lakehouse architectures on AWS
Clear separation of compute vs. serving layers
Ability to design low-latency data/API access strategies (beyond Spark-only patterns)
Strong understanding of caching strategies for performance and cost optimization
Data partitioning, storage optimization, and file layout strategy
Ability to handle multi-terabyte structured or time-series datasets
Skill in requirement probing, identifying what matters architecturally
A player-coach mindset: hands-on engineering + technical leadership
Job Description:
We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL, Python, PySpark, Data Warehousing, and Cloud-based ETL to join our data engineering team. The ideal candidate will design, implement, and optimize large-scale data pipelines, ensuring scalability, reliability, and high performance. This role requires close collaboration with cross-functional teams and business stakeholders to deliver modern, efficient data solutions.
Key Responsibilities
1. Data Pipeline Development
Build and maintain scalable ETL/ELT pipelines using Databricks on AWS.
Leverage PySpark/Spark and SQL to transform and process large, complex datasets.
Integrate data from multiple sources including S3, relational/non-relational databases, and AWS-native services.
2. Collaboration & Analysis
Partner with downstream teams to prepare data for dashboards, analytics, and BI tools.
Work closely with business stakeholders to understand requirements and deliver tailored, high‑quality data solutions.
3. Performance & Optimization
Optimize Databricks workloads for cost, performance, and efficient compute utilization.
Monitor and troubleshoot pipelines to ensure reliability, accuracy, and SLA adherence.
Apply query optimization, Spark tuning, and shuffle minimization best practices when handling tens of millions of rows.
4. Governance & Security
Implement and manage data governance, access control, and security policies using Unity Catalog.
Ensure compliance with organizational and regulatory data‑handling standards.
5. Deployment & DevOps
Use Databricks Asset Bundles for deployment of jobs, notebooks, and configuration across environments.
Maintain effective version control of Databricks artifacts using GitLab or similar tools.
Use CI/CD pipelines to support automated deployments and environment setups.
Technical Skills (Required)
Strong expertise in Databricks (Delta Lake, Unity Catalog, Lakehouse Architecture, Table Triggers, Workflows, Delta Live Pipelines, Databricks Runtime, etc.).
Proven ability to implement robust PySpark solutions.
Hands‑on experience with Databricks Workflows & orchestration.
Solid knowledge of Medallion Architecture (Bronze/Silver/Gold).
Significant experience designing or rebuilding batch‑heavy data pipelines.
Strong background in query optimization, performance tuning, and Spark shuffle optimization.
Ability to handle and process tens of millions of records efficiently.
Familiarity with Genie enablement concepts (understanding required; deep experience optional).
Experience with CI/CD, environment setup, and Git-based development workflows.
Solid understanding of AWS cloud, including:
IAM
Networking fundamentals
Storage integration (S3, Glue Catalog, etc.)
Preferred Experience
Experience with Databricks Runtime configurations and advanced features.
Knowledge of streaming frameworks such as Spark Structured Streaming.
Experience developing real-time or near real-time data solutions.
Exposure to GitLab pipelines or similar CI/CD systems.
Certifications (Optional)
Databricks Certified Data Engineer Associate / Professional
AWS Data Engineer or AWS Solutions Architect certification
To apply for this job email your details to santhosh.s@sightspectrum.com