🚀 Data Engineer (Databricks & AWS)
📍 Remote (Latin America / Europe) | 🕐 9 AM - 5 PM EST | 💼 Full-time
At CloudGeometry, we partner with industry leaders like AWS, Google, and Databricks to deliver cutting-edge cloud-native solutions. We are looking for a Senior Data Engineer to join our flagship project: a modern Data Platform for the life sciences industry, supporting global leaders like Pfizer, Moderna, and Novartis in developing innovative RNA-based solutions using cloud computing and advanced AI.
If you are an experienced data engineer who thrives in high-impact environments, zeroes out legacy systems, and wants to play a key technical leadership role in building scalable lakehouse architectures, let’s talk!
🎯 Key Responsibilities
- Pipeline Engineering: Design, develop, and optimize high-performance ETL pipelines within Databricks to connect analytics-ready data back to operational services.
- Architecture Leadership: Lead technical architecture discussions with engineering, product managers, and data scientists to implement advanced analytics.
- Workflow Optimization: Build, fine-tune, and monitor Databricks workflows to ensure system reliability, performance, and data integrity.
- Data Quality & Security: Collaborate with ML teams to ensure secure, rigorous, and accurate data ingestion across all processing stages.
- Agile Execution: Actively participate in daily Scrum ceremonies within a globally distributed engineering team.
🛠️ Technical Requirements & Stack
1. Core Data Engineering
- Databricks Ecosystem: 2+ years of hands-on experience (Delta tables/Iceberg, Spark jobs, MLflow, Unity Catalog, Model Registry).
- Architecture: Expert-level understanding of modern Lakehouse architectural design principles.
- Languages: Expert-level Python (for data processing/ETL) AND TypeScript / Node.js (for backend services using HapiJS, Zod, and Jest).
2. Cloud Infrastructure & DevOps (AWS)
- Compute & Storage: ECS (Fargate/EC2), Lambda, S3, and Athena.
- Messaging & Orchestration: SQS/SNS and Airflow.
- DevOps & CI/CD: GitHub Actions, CodeBuild, Docker, and repository templates via Cruft.
3. Data Stores & MLOps
- Databases: PostgreSQL (ACID/Migrations), DynamoDB (High-scale Key-Value), and Redis (Caching/Rate limiting).
- Search: OpenSearch / Elasticsearch for full-text search and aggregations.
- GenAI: Practical knowledge of LLMs, agents, function calling, and RAG architectures.
📋 Qualifications
- Experience: 5+ years in software development with a strong focus on data engineering/analytics teams.
- Senior Autonomy: Proven ability to challenge decisions, propose architectural improvements, and deliver complex features end-to-end.
- Communication: Exceptional English skills (written and spoken) to articulate complex data ideas to global stakeholders.
- Availability: Required online presence from 9 AM to 5 PM EST.
⭐ Nice to Have:
- Professional Databricks or AWS certifications.
- Experience building internal SDKs or developer experience tooling.
- Experience working directly alongside Data Scientists and ML Developers.
🎁 What We Offer (Our Commitment to You)
- 💰 Comprehensive compensation and benefits package.
- ⚡ Zero legacy systems – work exclusively with cutting-edge technologies.
- 📚 Continuous Learning: Extensive training, certifications, hackathons, and Udemy access.
- 🛠️ Premium Tooling: Developer Pro access to ClaudeCode, Codex, and AntiGravity.
- 🌍 Top-tier Culture: A collaborative, supportive environment with global experts.
📩 Ready to build the future of Life Sciences?
Click Apply or send us your resume. Let’s build something massive together!
#DataEngineering #Databricks #AWS #Python #TypeScript #RemoteJobs #Lakehouse