We have partnered with a market-leading enterprise organisation that is undergoing a massive digital and data transformation. Operating at scale, their infrastructure spans across both GCP and Azure. They are looking to onboard a Site Reliability Engineer (SRE) who brings a unique blend of traditional cloud automation and deep database performance expertise to ensure maximum uptime, scalability, and efficiency.
The Role As an Enterprise SRE, you will not just be monitoring systems; you will be engineering them for resilience. You will collaborate closely with Platform, Data, and Software Engineering teams to design fault-tolerant multi-cloud environments. A significant focus of this role will be leveraging Terraform to treat everything as code, while utilising your strong SQL background to solve complex database reliability and performance challenges at scale.
Key Responsibilities - Multi-Cloud Infrastructure: Design, provision, and optimise secure enterprise infrastructure across GCP and Azure.
- Advanced Infrastructure as Code (IaC): Standardise, scale, and maintain complex infrastructure blueprints using Terraform across development, staging, and production environments.
- Database Reliability Engineering: Collaborate with data squads to troubleshoot high-load database performance bottlenecks, optimise complex SQL queries, and ensure efficient data scaling and replication.
- Observability & Resilience: Build and refine advanced enterprise monitoring, logging, and alerting systems (e.g., Datadog, Dynatrace, Prometheus, ELK) to proactively identify and mitigate system degradation.
- Incident Response & Evolution: Participate in enterprise-level on-call rotations, championing blameless post-mortems and driving systemic fixes to eliminate recurring technical debt.
Technical Skills & Background Required - Enterprise Cloud: Proven experience architecting and managing production workloads within GCP and/or Microsoft Azure.
- Automation Mastery: Expert-level proficiency with Terraform (including module design, state management at scale, and integration into enterprise CI/CD pipelines).
- Deep SQL Capabilities: Strong hands-on experience tuning and scaling relational databases (e.g., PostgreSQL, SQL Server, MySQL). You should understand query execution plans, indexing strategies, and database internals.
- Modern Architecture: Strong familiarity with containerization (Docker) and orchestration (Kubernetes / GKE / AKS).
- Scripting & Tooling: Proficient in automating tasks using languages like Python, Go, or Bash.
This is a contract role initially for 6 months with a view to extend. Ideally based in Sydney, however, may consider Melbourne for the right candidate. Apply now if interested.