Key Responsibilities:
- Automate data quality and reconciliation checks across varied storage layers, including Snowflake, SQL, and RDF/SPARQL databases
- Test and verify data lineage, governance, and visualization components using Snowflake, data catalogs (ie. DataHub), Thoughtspot, and other visualization tools
- Integrate test suites into the core infrastructure orchestrated by Apache Airflow and utilizing Iceberg table formats, while monitoring data pipeline health, alerting, and observability metrics using Prometheus and Grafana Cloud
- Establish AI Evaluation Loops (Evals) and Guardrails: Build rigorous verification protocols— including structural tests, checks, and watchdog agents—to validate AI-generated artifacts, catch false positives, and ensure all automated outputs are secure, reliable, and free from hallucinations.
- Integrate automated testing workflows into CI/CD pipelines using GitHub Actions, ensuring continuous stability and quality gates across all deployment environments
- Validate ETL and dbt transformations across Data Lakehouses, rigorously testing data progression through a Medallion Architecture
- Test and automate complex API workflows, validating data payloads across OpenAPI integrations, 3rd party APIs, GraphQL, and AWS APIs (specifically S3)
Must Haves:
- Data engineering & data testing: dbt, data lakehouse concepts, Medallion architecture
- Databases & storage testing: SQL, Snowflake, AWS S3, Iceberg
- Integrating quality check into data pipelines: Apache Airflow
- API testing & automation: REST/OpenAPI, GraphQL
- Integrating test automation into CI/CD: GitHub Actions (or similar like ArgoCD/GoCD)
- Cloud / infrastructure and observability basics: Kubernetes (K8s), Prometheus, Grafana
Nice to Have:
- Graph databases: RDF / SPARQL
- Data governance & analytics tools: DataHub, Thoughtspot
- AI/ML testing & MLOps: AI evals, guardrails, RAG, vector databases, AI drift monitoring
- Advanced / emerging data tech: StarRocks, DuckDB
- Regulated environments: GxP, 21 CFR Part 11, HIPAA
- Clinical / domain-specific data standards: CDISC, ODM, FHIR
- AI-native tooling: Cursor, Claude Code, Copilot, QA Wolf
Data Engineer - Quality Assurance
TRANSCENDA
Rio de Janeiro, State of Rio de Janeiro