Proven track record of successfully managing complex technical products, preferably in software development, IT operations, or cloud infrastructure environments. Experience working with modern telemetry tools (e.g., Prometheus, Grafana, Datadog, NewRelic, etc.) and automation platforms (e.g., Ansible, Terraform, Jenkins, etc.). Strong understanding of Agile and Scrum methodologies. Strong analytical, problem-solving, and communication skills. Ability to translate … you to bring your true self to work so you can help enrich our diverse workforce. You will be part of a collaborative and creative culture where we encourage new ideas and are committed to sustainability across our global business. You will experience the critical role we have in helping to re-engineer the financial ecosystem to support and More ❯
tools; JIRA, Confluence; * Experience in monitoring/reporting tools such as Splunk, Grafana/Prometheus etc * Experience in Agile practices * Working knowledge of environment monitoring tools such as GCO, NewRelic, Prometheus, Grafana. * Collaboration Skills: Proactive can-do attitude; A creative approach towards solving technical problems; Able to work efficiently with colleagues in multiple locations; Willing to collaborate across domains, for More ❯
tools; JIRA, Confluence; * Experience in monitoring/reporting tools such as Splunk, Grafana/Prometheus etc * Experience in Agile practices * Working knowledge of environment monitoring tools such as GCO, NewRelic, Prometheus, Grafana. * Collaboration Skills: Proactive can-do attitude; A creative approach towards solving technical problems; Able to work efficiently with colleagues in multiple locations; Willing to collaborate across domains, for More ❯
different platforms and types of software, from software engineers and DevOps all the way to the C-suite. Responsibilities Solve customer's technical problems by adopting the platform, integrating new data and existing integrations Understanding customers technical requirements and business goals to consistently create new artifacts and deliver value Lead the onboarding process, from new integrations, creation … with Azure and GCP Background knowledge and hands-on practice in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumologic, NewRelic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations (OpenTelemetry/fluentd/ More ❯
Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, NewRelic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud More ❯
Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, NewRelic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud More ❯
languages such as Python, Bash, TypeScript, and PowerShell. Familiarity with DevOps & Site Reliability Engineering (SRE) principles , practices , and tools . Hands-on experience with monitoring and logging solutions (e.g., NewRelic, Coralogix , AWS CloudWatch, Azure Monitor). S t rong problem-solving , stakeholder management , written, and verbal communication skills. Proven experience managing multiple projects and topics simultaneously. Work … Days Ago Elsevier is a global information analytics business that helps institutions and professionals advance healthcare, open science and improve performance for the benefit of humanity.We help researchers make new discoveries, collaborate with their colleagues, and give them the knowledge they need to find funding. We help governments and universities evaluate and improve their research strategies. We help doctors More ❯
Site Reliability Engineer roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (NewRelic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud More ❯
on large-scale production systems, delivering highly impactful products that make a difference to our millions of users. As a MLOps Engineering Manager at Trainline you will Build a new team of MLOps Engineers working alongside ML Engineers, Data Engineers, Software Engineers, Data Scientists and Product Managers Define MLOps processes and steer tooling and infrastructure choices across the technology … like MLFlow and Airflow, or on common problems such as model and API monitoring, data drift and validation, autoscaling, access permissions Have previously worked with monitoring tools such as NewRelic or Grafana Understand the use of feature stores and related data technologies for operational machine learning products Are proficient with Python and have Spark knowledge. Have leadership …/DS libraries (scikit-learn, numpy, pandas, LightGBM, LangChain/LangGraph, TensorFlow, etc ) PySpark AWS cloud infrastructure: EMR, ECS, ECR, Athena, etc. MLOps: Terraform, Docker, Spacelift, Airflow, MLFlow Monitoring: NewRelic CI/CD: Jenkins, Github Actions More information: Enjoy fantastic perks like private healthcare & dental insurance, a generous work from abroad policy, 2-for-1 share purchase More ❯
Java, Swift, and Xamarin for our mobile apps. Responsibilities Will be a core member of the team to design, architect, develop, code reviews and test our key applications Design new application features and integrations in collaboration with team members to deliver complex changes Design and implement scalable and resilient cloud solutions with security and disaster recovery in mind Help … or equivalent): EC2, S3, CloudFront, Elastic Beanstalk, Dynamo DB Basic networking knowledge and troubleshooting Experience with any of the following tools and technologies: Atlassian Jira, GitHub, Azure DevOps, Aha!, NewRelic, Sumo Logic More ❯
should demonstrate that you have: The ambition for creating and nurturing a culture of DevOps across a Technology organisation; working with existing infrastructure and Software Engineering teams to define new practices and evolve ways of working. Experience with the various forms of Cloud infrastructure, hosting and services e.g. IaaS, PaaS, Serverless computing and all-in-one cloud-based solutions. … knowledge of containerisation and orchestration tools such as Docker and Kubernetes, ideally running on Azure (AKS). Experience integrating and configuring various logging, monitoring, and alerting tools (e.g. Splunk, NewRelic) that provide operational insight into the health of live applications and systems. Experience in applying a range of cloud security tools and techniques (e.g. threat modelling, vulnerability More ❯
Proven expertise and experience with database technologies including NoSQL databases like MongoDB, RDBMS such as Postgres and MySQL Exposure to Docker, Kubernetes, AWS, Helm, Terraform, Vault, Grafana, ELK Stack, NewRelic Relevant experience in the maintenance of data APIs and data lake architectures, including experience with Apache Iceberg, Trino/Presto, Clickhouse, Snowflake, BigQuery. Master's degree in More ❯
and container orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like NewRelic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience by enabling actionable monitoring and alerting. Drive cloud cost visibility … in building and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially NewRelic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire, and More ❯
new monitoring queries to drive our alerting, or coordinating across multiple teams to manage the response to an incident. Our technology stack: AWS (including ECS and RDS), OpenTelemetry, NewRelic, Python, Postgres, Liquibase, Angular, Docker Who you are: Four or more years professional experience in a customer-facing technical support or engineering role Excellent verbal and written communication skills, with … internal and client-facing platforms Coordinating any response required to issues with the platform, taking ultimate responsibility for seeing incidents through to resolution Assisting with the technical onboarding of new B2B clients, helping them get up and running with our API Working to improve our ability to effectively support our platform, including improving our monitoring and alerting capabilities Innovating More ❯
we expect you to be a key contributor in promoting this mindset. What can you expect from working with us? Contribute to our technological direction - We have lots of new systems to design and build along with existing platforms to maintain and operate so there are plenty of opportunities for you to get involved. We need your help to … push the boundaries of quality, increase our test coverage and automation, quality awareness and try new things. Agile, cross-functional working - We work in autonomous teams consisting of Product Owner, UI/UX Designers, QA, and Front and Back End Engineers. Depending on the undertaking, we also embed or collaborate with others from across the business such as Infrastructure … Driven Development (BDD) to define and capture acceptance criteria Excellent analytical thinking and problem-solving skills Great communication and test coordination skills Debugging and analysis of issues (we use NewRelic and AWS Cloudwatch) Visual difference testing (we use Percy) API testing (we use Postman) Front end testing using Javascript (we use Cypress) Cross browser/Device testing (we use BrowserStack More ❯
areas include: Video: Continuing the Mimir rollout and addressing an extensive list of feature requests. Print: Tackling a significant challenge-simplifying our print publishing processes and technology by leveraging new automation tools. Your Role We're looking for a passionate Mid-Level Software/DevOps Engineer to join our team with the prospect of leading the development, management, and … tools across MySQL/MariaDB along with SQL queries and procedures. The role will include utilisation of AWS Cloud technologies and infrastructure, including monitoring tools such as AWS CloudWatch, NewRelic and Zabbix. It will encompass AWS services and technologies particularly related to hosting and scaling applications, such as EC2, S3, Lambda, and IAM, and knowledge of concepts … as required. Systems Maintenance and Optimisation: Develop, Configure and/or maintain various systems, in-house integrations and applications within the print ecosystem to facilitate the publishing workflow (inc. new features and workflows, as required). Deployment of configuration and software changes to all environments (inc. documentation of releases for users and technical resources). Perform capacity planning, and More ❯
is a plus Knowledge of Redis and log queries is a plus Experience in automations/AI would be an advantage Experience administering multiple monitoring systems such as Datadog, NewRelic, Kubernetes, Grafana and Elastic Cloud Experience with Cloud Computing, AWS, Microservices Architecture, Unix and Linux Systems Life @ Empowered to think big. Try new opportunities while working with a talented … ambitious and supportive team. Transformational and proactive working environment. Elevate employees to find thoughtful and innovative solutions. Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth. Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one More ❯
in implementing changes while following ITIL change management processes. Understanding of basic security principles and best practices for securing infrastructure. Optional but advantageous technical skills: Proficient using observability tools (NewRelic and Thousand Eyes), BI platform and data visualisation tools (such as Tableau and Power BI) and technology tools (Jira, Confluence). System Administration: Proficiency in Linux/Unix and Windows More ❯
+ Video: Continuing the Mimir rollout and addressing an extensive list of feature requests. + Print: Tackling a significant challenge-simplifying our print publishing processes and technology by leveraging new automation tools. Your Role We're looking fora passionateMid-Level Software/DevOps Engineerto join our team with the prospect of leading the development, management, and optimisation of our … tools across MySQL/MariaDB along with SQL queries and procedures. The role will include utilisation of AWS Cloud technologies and infrastructure, including monitoring tools such as AWS CloudWatch, NewRelic and Zabbix. It will encompass AWS services and technologies particularly related to hosting and scaling applications, such as EC2, S3, Lambda, and IAM, and knowledge of concepts … required. Systems Maintenance and Optimisation: + Develop, Configure and/or maintain various systems, in-house integrations and applications within the print ecosystem to facilitate the publishing workflow (inc. new features and workflows, as required). + Deployment of configuration and software changes to all environments (inc. documentation of releases for users and technical resources). + Perform capacity More ❯
City Of Westminster, London, United Kingdom Hybrid / WFH Options
Track24 Limited
maintain security best practices. Containerisation & Orchestration: Deploy and manage containerised applications using Docker and other orchestration tools. Observability & Monitoring: Provision and maintain observability platforms such as DataDog, Splunk, or NewRelic to gain monitoring and performance insights. Incident Management: Establish and oversee monitoring and incident management processes to ensure system reliability. Site Reliability Engineering (SRE): Perform SRE duties More ❯
Are you a passionate Software Engineer looking for an exciting new challenge? Join this team and transition into maintaining and enhancing the reliability of one of the world's largest platforms. In this role, you will utilise your expertise in Golang coding to develop robust applications, ensuring the systems remain resilient, scalable, and efficient. If you thrive in fast … Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as Grafana, Prometheus, NewRelic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for More ❯
Carlo As businesses increasingly rely on data + AI for competitive advantage, reliability has become a non-negotiable. Named a CBInsights AI100 company and described by Forbes as the "NewRelic for data", Monte Carlo supports some of the world's most prestigious companies, including Fox, Roche, Honeywell, and CreditKarma to deliver trustworthy data + AI at scale. More ❯
Carlo As businesses increasingly rely on data + AI for competitive advantage, reliability has become a non-negotiable. Named a CBInsights AI100 company and described by Forbes as the "NewRelic for data", Monte Carlo supports some of the world's most prestigious companies, including Fox, Roche, Honeywell, and CreditKarma to deliver trustworthy data + AI at scale. More ❯