|
The Cloud/DevSecOps Engineer owns the automation, reliability, security, and compliance of a next-generation event-driven, agentic platform. This role designs and delivers the CI/CD pipelines, cloud infrastructure, secrets management, observability, and security controls that power LangChain/LangGraph/LLM agents, microservices, and data/AI workflows deployed across AWS and Azure. You'll collaborate closely with engineering, data/AI, and product teams in both development and incident response, championing a culture of security, operational excellence, and continuous improvement.
Responsibilities
- Design, implement, and automate multi-cloud infrastructure provisioning and deployments using Terraform, CloudFormation, Kubernetes/EKS, Docker, and serverless cloud functions.
- Architect and maintain robust CI/CD pipelines (Makefile, PyTest, Dockerfiles, Spies/Mocks) supporting modern agentic microservices and asynchronous event-driven workflows.
- Integrate and operationalize LangChain, LangGraph, LlamaIndex, and Pinecone-powered agent orchestration flows, building secure, monitorable event brokers (Kafka, AWS EventBridge, Redis Streams) and orchestrated job queues (Celery, AWS Batch).
- Champion security best practices: automate secrets/token/certificate management (Vault, AWS Secrets Manager), enforce fine-grained RBAC and token-based authentication (OAuth2), oversee Private Link and cross-cloud access controls.
- Monitor, manage, and remediate cloud and on-prem security incidents, participate in on-call rotations, and support production outage resolution and root cause analysis.
- Implement comprehensive observability: distributed tracing, logging, metrics, alerting (Prometheus, ELK, OpenTelemetry, DataDog), dashboard visualization, and actionable production feedback loops.
- Collaborate with architects, engineers, and QA to define, document, and maintain event schema contracts, compliance policies, backup/recovery, and SLO/SLA targets.
- Contribute to security audits, compliance reporting, incident and postmortem documentation, and continuous process improvement reviews.
- Lead or participate in sprint planning, backlog grooming, process retrospectives, and cross-team knowledge sharing and onboarding.
Qualifications
- Demonstrated experience in event-driven cloud infrastructure (AWS EKS, Kubernetes, Terraform, Docker, serverless Lambda/Batch, cross-cloud integration).
- Proficiency in building/optimizing CI/CD pipelines for fast, reliable agentic deployments (Makefile, PyTest, Dockerfiles).
- Practical experience implementing security in agent/LLM and microservices environments: Vault, Secrets Manager, token/cert rotation, RBAC, network controls.
- Experience deploying, scaling, and monitoring event brokers (Kafka/EventBridge/Redis Streams) and background worker orchestration platforms (Celery, AWS Batch).
- Deep knowledge of security, compliance, observability, incident response, and SRE best practices.
- Familiarity with LangChain/LangGraph agentic patterns, vector DBs (Pinecone), and event-driven ML/data integrations is highly desirable.
- Excellent communication skills for cross-function collaboration, agile ceremonies, incident postmortems, documentation, and knowledge transfer.
|