Site Reliability Engineer

Essential:

  • Has been involved in cloud initiatives, contributed to SRE or Platform Engineering groups and helped deliver key infrastructure for core initiatives.
  • Expert working knowledge and understanding of on-prem, cloud platforms, particularly AWS, and the ability to implement, and optimize cloud architectures.
  • Familiarity with container orchestration platforms, specifically k8s and experience running large scale clusters at enterprise level.
  • Experience with Event Driven Systems
  • Can communicate clearly, present, evangelize, to those both technical and non-technical.
  • Experience with AWS best practices, including the well architected framework.
  • Experience with GitOps and using either ArgoCD or Flux to improve developer experience.
  • Creation of Terraform/OpenTofu modules and enabling product engineering teams to autonomously deploy applications whilst maintaining high standards.
  • Ability to automate provisioning, scaling, and maintenance of on-prem resources using tools like Ansible, TF, packer, etc
  • Experience with GitOps using tools like ArgoCD, gitlab CI to enhance developer experience, alongside developing secure and cost-effective CI/CD pipelines.
  • Good experience with monitoring tools and providing the right level of observability and monitoring for product engineering teams.
  • Demonstrate ability to be cost aware and experience on how to optimize cloud costs.
  • Ability to collaborate and work effectively as a team, providing mentoring to junior members of the team.
  • Must be very comfortable working, communicating with, and regularly presenting to highly technical software engineers, cloud engineers, product/business leaders, as well as executive leadership.
  • A community builder and always working within the community to find bright spots and bring that knowledge back to the larger community.
  • Passionate about cloud, understanding problems and delivering solutions.
  • Continuous learning, understanding how that applies to the business, and communicating such out to educate and enable the technical organization, product groups, etc.
  • Good analytical and problem-solving skills, with the ability to make informed and timely decisions.

Responsibilities:

The Platform Engineer supports in the implementation of core infrastructure and security services in client’s chosen cloud platforms, delivering highly available and scalable services, producing

automation solutions for time intensive processes and upholding security and compliance objectives in all aspects of the secure cloud ecosystem.

  • Supports automation initiatives of cloud infrastructure provisioning and configuration management, self-service, infrastructure-as-code services, auto scale initiatives, and DevOps deployments
  • Supports with the architecture and implementation of infrastructure-as-code, and policy-as-code objectives under supervision
  • Acts as a technical resource to other colleagues/engineers and provides mentorship.
  • Supports in the maintenance of platform components and tools according to the DevOps model – monitoring availability, latency and overall system health, troubleshooting and resolving issues
  • promptly under supervision
  • Works with quality assurance and technical writers to complete development cycles
  • Supports in the building, testing, and integration of complex interfaces between different systems, working with the team on complex integration
  • Provides guidance, monitoring and observability, improving developer experience and enabling teams to become autonomous.
  • Troubleshoots customer environments and assist with escalations
  • Supports efforts to remediate security gaps to help strengthen security posture
Company
Infoplus Technologies UK Limited
Location
City of London, Greater London, UK
Posted
Company
Infoplus Technologies UK Limited
Location
City of London, Greater London, UK
Posted