Director of Infrastructure
RebelMouse
- Pioneered agent-native infrastructure operations: standardized repository instructions, curated knowledge bases, and guarded execution conventions that let AI coding agents (Claude Code, Codex) plan and apply infrastructure changes safely under human review.
- Directed the design and rollout of a next-generation Kubernetes (EKS) platform for high-traffic media workloads, combining Karpenter autoscaling across spot and on-demand capacity, Istio service mesh, and GitOps delivery via ArgoCD to enable zero-downtime blue-green cluster migrations and cost-aware scaling under viral traffic.
- Led SOC 2 Type II compliance from scoping to certification, and architected automated Disaster Recovery environments for critical services using Terraform, Terragrunt, and Kubernetes.
- Ran a company-wide AWS cost-optimization program that cut spend 12% in six months: statistical analysis of access logs to tier legacy media into S3 Glacier, PostgreSQL storage optimization, Kubernetes workload rightsizing, and eBPF-guided cross-AZ traffic reduction.
- Directed the consolidation of observability onto VictoriaMetrics, OpenTelemetry, ClickHouse, and Grafana with eBPF-based network telemetry, creating a single platform for metrics, logs, traces, alerting, and traffic-cost analysis.
- Established a Linear-first, async-by-default operating model for the DevOps team, with structured intake, explicit priorities, and written decision records, improving cross-timezone collaboration and delivery predictability.
- Established the strategic vision and multi-year roadmap for infrastructure development, aligning with company objectives and focusing on scalability, reliability, and cost-efficiency.
- Led the definition and implementation of DevOps/SRE operational policies, role definitions, incident response protocols, and communication guidelines, establishing a structured knowledge base.
- Modernized infrastructure delivery on OpenTofu/Terragrunt with reusable modules, lockfile-based stack management, SOPS and External Secrets, and standardized CI/CD for clusters, monitoring, and platform services.
- Managed key vendor relationships, including AWS and Fastly, ensuring these partnerships supported technical and business objectives effectively.
- skills
- Strategic Planning, Team Management, Budget & Vendor Management, Async Operations
- stack
- AWS, EKS, Karpenter, Istio, ArgoCD, OpenTofu, Terragrunt, VictoriaMetrics, OpenTelemetry, ClickHouse, Grafana, SOPS, Claude Code, Codex