
Senior DevOps Engineer / SRE Lead
GPS Hardware Australia
Posted 5 days ago
Senior DevOps Engineer / SRE Lead
Location: Sydney, Australia (WFH with occasional office days)
Type: Full-time
About Us
GPSHardware Australia is an Australian-founded telematics and fleet operations platform delivering hardware + software as a service (HSaaS). We connect vehicles, assets, equipment and field staff to the cloud for real-time visibility, safer driving, lower operating costs, and streamlined compliance.
Our platform ingests tens of thousands of IoT payloads every second. That means routing, compliance, and real-time driver safety alerts for thousands of vehicles — all while juggling complex computations like engine hours, odometer tracking, route optimisation, and live dashboards for clients.
But the reality? We’ve had too many outages. RabbitMQ crashes have taken the system down. RDS overloads have caused cascading failures. When one bottleneck is fixed, another appears: CPU spikes choke cron jobs, queues back up, webhooks fail, live dashboards go dark. It’s a house of cards, and we’re ready to change that.
That’s why we’re hiring a Senior DevOps Engineer / SRE Lead: someone with deep AWS experience, proven skill at scaling real-time platforms, and the leadership to guide our dev team like a mini-CTO.
The Role
This is not a “keep the lights on” job. This is a fix it, lead it, and scale it role.
You’ll take ownership of our AWS infrastructure and bring order to complexity. You’ll design for resilience, performance, and cost-effectiveness. You’ll work alongside our PHP/Symfony developers, Angular frontend engineers, and mobile app team to make sure what’s shipped works under real-world load.
And importantly, you’ll lead: mentoring, setting standards, running post-mortems, and giving the team the confidence that the platform won’t collapse the next time queues spike.
We need someone with the gravitas to say, “This is how it needs to be done,” and the skill to back it up.
Key Responsibilities
Infrastructure & Reliability
Own and optimise AWS (EC2, RDS/Postgres, ElasticSearch, ECS/EKS, S3, CloudWatch, Route 53, IAM, WAF/Shield, KMS).
Harden RabbitMQ with HA clustering, quorum queues, DLQs, back-pressure handling, and autoscaled consumers.
Tune and scale RDS: partitioning, read replicas, hot/cold tables, pgbouncer connection pooling, query/index optimisation.
Manage ElasticSearch lifecycle: shard strategy, ILM (hot-warm-cold), slow-log tuning, snapshots.
Design resilient, self-healing architecture that survives traffic surges and node failures.
Observability & Operations
Implement unified monitoring (Prometheus/Grafana, ELK, CloudWatch).
Define SLOs/SLIs for ingest, queues, webhooks, and dashboards; manage error budgets.
Build actionable alerting, golden signals dashboards, and well-documented runbooks.
Lead incident response and post-mortems that create lasting fixes.
Automation & Delivery
Manage IaC with Terraform (and Terragrunt where useful).
Improve CI/CD pipelines (GitLab/Jenkins/Drone), enabling blue/green and canary deployments.
Automate backups, restores, failover testing, and disaster recovery with tested RTO/RPO.
Embed security by design: IAM least-privilege, secrets in KMS/Secrets Manager, TLS everywhere.
Collaboration & Leadership
Work closely with developers (PHP/Symfony, Angular, iOS/Android) to ensure new features won’t destabilise infra.
Perform infra-aware code reviews (e.g., job worker efficiency, DB query impact, webhook idempotency).
Coach engineers on reliability-first thinking; introduce pragmatic SRE practices (on-call rotations, error budgets, post-mortems).
Act as a technical authority and trusted partner to leadership — effectively a “mini-CTO” for infrastructure.
Required Skills & Experience
5+ years in DevOps/SRE roles with deep AWS expertise.
Hands-on with EC2, ECS/EKS, RDS/Postgres, ElasticSearch, CloudWatch, IAM, VPC, Route 53, WAF/Shield, and KMS.
Strong experience with Terraform / IaC at scale.
Messaging systems at scale (RabbitMQ a must; Kafka/SQS a bonus).
High-throughput RDBMS management (partitioning, replicas, query optimisation).
ElasticSearch production operations and tuning.
Linux and networking fundamentals.
Proven CI/CD pipeline automation with Docker/Kubernetes.
Able to read and review PHP/Symfony and TypeScript/Angular code for infra impact.
Strong communicator with leadership mentality: confident, accountable, able to mentor and influence.
Nice to Have
Experience with IoT/telematics or fleet-scale data systems.
Familiarity with real-time Socket.IO/WebSocket workloads (used by our iOS and Angular teams).
AWS cost-optimisation and FinOps experience.
Background in compliance-heavy environments (HVNL, CoR, HACCP, FTC/FBT reporting).
What Success Looks Like
30 days: Infra fully mapped, SLOs defined, top reliability gaps plugged, first incident runbook created.
60 days: RabbitMQ HA + DLQs in place, DR restore tested, dashboards live with golden signals.
90 days: Blue/green deploy path established, RDS and Elastic tuned, mean-time-to-recover materially reduced, on-call rotation humane.
Why Join Us?
Solve real engineering challenges: high-volume, real-time, business-critical workloads.
Lead the way in stabilising and scaling one of Australia’s fastest-growing telematics platforms.
Flexibility to work from home, with occasional in-office collaboration in Sydney.
Join a passionate team of senior engineers (Symfony/PHP, Angular, iOS/Android) who need an infrastructure leader.
About GPS Hardware Australia
This company does not have any further information provided at this time. We encourage you to research the company by searching for them to learn more about the company or role in question before applying.
Senior Software Engineer - Data Visulisation
Spark Recruitment

DevOps Team lead
AC3 Pty Limited

Engineering Team Lead
AC3 Pty Limited

Senior Software Engineer - Java
The Onset

2x Engineering Leads (Golang)
Novus

Senior Data Engineer
eHealth NSW
Senior DevOps Engineer
The Onset

Senior Site Reliability Engineer (SRE)
Genesis IT Recruitment