Observability Job Vacancies

201 to 225 of 451 Observability Jobs

Senior Site Reliability Engineer

Addison, Texas, United States
INSPYR Solutions
unwarranted access to corporate data. Review outstanding issues daily to assure that troubleshooting and resolutions are current. Cross-functional collaboration with application engineering, QA, and infrastructure teams to ensure observability and reliability. Perform tool evaluation and selection in support of observability and automation Qualifications Education Level: Bachelor's Degree Preferred experience includes AWS or Azure certifications. 7+ years of total … or closely related roles. At least 3 years of direct experience with AWS and/or Azure, including infrastructure provisioning, automation, and monitoring. Experience with implementing, managing, and using observability tools, data visualization, and application monitoring platforms such as Dynatrace, AWS CloudWatch, Azure Monitor, Grafana, Prometheus, or Datadog. Familiarity with error budgets and their role in balancing reliability and innovation. More ❯
Employment Type: Permanent
Salary: USD 150,000 Annual
Posted:

Senior DevOps Platform Engineer

London, United Kingdom
CDW LLC
including Salesforce-specific pipelines. Build and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. … Infrastructure as Code with Terraform and Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer with Security Clearance

Washington, Washington DC, United States
Hybrid / WFH Options
ClearanceJobs
and cloud security best practices. • Proficiency in Kubernetes, Docker, and container orchestration. • Knowledge of Linux system administration and scripting (Python, Bash). • Experience with logging, monitoring, and observability tools in a cloud-native environment. • Strong troubleshooting, problem-solving, and automation mindset. Responsibilities/Impact as a SRE: • AWS GovCloud Operations: Manage and optimize cloud-based infrastructure in AWS GovCloud, ensuring … FedRAMP compliance and high availability. • Reliability & Performance: Monitor and enhance system performance, scalability, and reliability through observability tools, automation, and best practices. • Security & Compliance: Implement and maintain security controls aligned with FedRAMP, NIST 800-53, and other federal cybersecurity standards. • Infrastructure as Code (IaC): Develop and manage infrastructure automation using Terraform and Ansible. • CI/CD & Automation: Enhance DevSecOps pipelines More ❯
Employment Type: Permanent
Salary: USD 260,000 Annual
Posted:

Senior Developer

Warrington, Cheshire, United Kingdom
Hybrid / WFH Options
ECS Resource Group Ltd
transition from project-based delivery to product-focused development. Embed disciplined code promotion processes and improve CI/CD practices. Drive improvements in code quality and maintainability. Enhance application observability, including logging and monitoring. Provide technical guidance and advocate for best development practices. Essential Skills: Strong knowledge of JavaScript & TypeScript . Experience with Next.js and Node.js . Familiarity with Git More ❯
Employment Type: Contract
Rate: £495 - £500/day inside ir35
Posted:

AI/ML Engineer

London, United Kingdom
Hiring Group
secure handling of sensitive operational data and compliance with relevant standards Developed and maintained robust APIs for system integration Drove operational excellence and continuous improvement Implemented and managed monitoring, observability, and troubleshooting tools for deployed systems Designed and handled containerised applications (e.g., Docker, Kubernetes) Qualifications Bachelor's degree in Computer Science, Engineering, or a related technical field Relevant experience as More ❯
Employment Type: Permanent
Salary: £50000 - £100000/annum
Posted:

Founding Full Stack Engineer

Nationwide, United Kingdom
Hybrid / WFH Options
W Talent
PostgreSQL Architect scalable document processing pipelines for large datasets Build AI-native user experiences and intelligent agent workflows using state-of-the-art LLMs Improve system performance, stability, and observability Deploy to Azure using infrastructure-as-code (Bicep) and CI/CD via GitHub Actions Collaborate directly with users to deeply understand workflows and pain points Influence engineering best practices More ❯
Employment Type: Permanent
Salary: £80000 - £130000/annum
Posted:

Data Platform Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Rise Technical Recruitment Limited
data is delivered on time and without failure. The ideal candidate will have a strong experience working with streaming and batch data systems, a solid understanding of monitoring a observability, and hands-on experience working with AWS, Apache Flink, Kafka, and Python. This is a fantastic opportunity to step into a SRE role focused on data reliability in a modern More ❯
Employment Type: Permanent, Work From Home
Salary: £90,000
Posted:

Graduate Platform Engineer

London, United Kingdom
BAE Systems (New)
technologies: Logical reasoning, scripting ability, security concepts (light) Infrastructure as Code (Terraform) AWS infrastructure (VPC, EC2, IAM) Linux tooling and system admin CI/CD pipelines from infra perspective Observability, logging, monitoring GitOps, container orchestration (K8s) Benefits As well as a competitive pension scheme, BAE Systems also offers employee share plans, an extensive range of flexible discounted health, wellbeing & lifestyle More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Snowflake Centre of Excellence Lead

London, United Kingdom
Hybrid / WFH Options
Kubrick
colleagues and clients across the Snowflake ecosystemExperience in design and delivering business solutions on other modern data platforms (e.g. Databricks, Azure, AWS or GCP native stacks)Experience with platform observability and CI/CD for data platformsHands-on experience with modern data engineering tools such as dbt, Fivetran, Matillion or AirflowHistory of supporting pre-sales activities in a product or More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineering Manager: Java, React/React Native & AWS

London, United Kingdom
Hybrid / WFH Options
IG KnowHow
a high-performing engineering team, splitting time between coding and people management. Drive delivery of new crypto product features end-to-end, from design to production. Ensure code quality, observability, scalability, and security are embedded in every release. Foster a collaborative, growth-focused team culture with clear goals and high accountability. Coordinate closely with Product, Design, and cross-functional teams More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Linux Systems Engineer (Kernel)

New York, United States
Bloomberg
back to the open-source community; it is a rewarding experience you can explore with us. We'll expect you to: Build and evolve eBPF-based tools to enhance observability of the network and other operating system layers Improve Bloomberg's internal Linux kernel regression testing framework Contribute to upstream Linux kernel development and enhancement requests Investigate and resolve complex More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Staff Software Engineer

London, United Kingdom
Optimizely
will: Design and evolve the architecture of highly scalable, reliable, and secure distributed systems. Drive technical excellence across the engineering organization by setting standards for code quality, system design, observability, and operational best practices. Collaborate closely with Product, UX, and Application Engineering teams to deliver impactful features while ensuring architectural soundness and scalability. Mentor and guide senior and mid-level More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Arista Network Engineer with Security Clearance

San Diego, California, United States
ClearanceJobs
will be used to enhance network resiliency, automation, and operational agility. Key Technologies: • Arista Cloud Networking with Spine & Leaf Architecture • Extensible Operating System (EOS) • CloudVision for centralized management and observability • NetDL for unified telemetry • AVA (AI/ML threat detection) for proactive security and operations • Zero Trust security architecture for robust access control and network segmentation • Integration with NXP SD More ❯
Employment Type: Permanent
Salary: USD 200,000 Annual
Posted:

Senior Staff, Software Engineer

Ireland
Hybrid / WFH Options
Fanatics Inc
require both strategic foresight and technical precision. Set engineering standards by developing modular, performant, and maintainable code that leads by example. Own the full product lifecycle-including design, deployment, observability, and long-term maintenance-ensuring platform reliability at scale. Collaborate cross-functionally with Product, Quant, and Engineering leadership to align technical execution with business goals. Apply advanced software design methodologies More ❯
Employment Type: Permanent
Salary: EUR 125,000 - 150,000 Annual
Posted:

UK Backend Engineer

United Kingdom
Hybrid / WFH Options
Hagerty Insurance Agency
apply technologies such as: Languages : C# Interservice Communication : REST APIs, message queues Hosting & Infrastructure : AWS, containers, Terraform, CI/CD (Azure DevOps) Security : OAuth 2, encryption, secure design patterns Observability : Logging, metrics, tracing, alerting Able to clearly communicate your thoughts and actively incorporate feedback from others. Clear and thoughtful communicator who values shared success, ongoing feedback, and continuous learning. Other More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Solution Architect

London, United Kingdom
Komodor
AI SRE assistant. Kubernetes promises agility, elasticity, reliability and high availability, but it also introduces complexity, high operational overhead, and cost overruns due to over provisioning of workloads. Traditional observability only surfaces the "what" - Komodor goes further by delivering the "why", "where" and the "how"; providing a full platform to detect, investigate and remediate while optimizing workloads. By combining our More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineer

London, United Kingdom
Hybrid / WFH Options
Ebury
Contribute to the design and implementation of new systems and services, meeting reliability and scalability standards. Develop and maintain infrastructure and application monitoring, incident management, and troubleshooting procedures. Utilize observability tools to gain insights into system performance and health, guiding improvement decisions. Design and implement automation tools and processes to boost efficiency and minimize downtime. Participate in on-call rotation More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

AWS DevOps Engineer

City of London, London, England, United Kingdom
Revybe IT Recruitment Ltd
and help shape how platform engineering is done as the team continues to scale. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Configuration Management Ansible Monitoring and Observability Grafana, Prometheus Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python or Java (scripting, automation) GitHub Actions (CI/CD pipelines) What They’re Looking For Experience in AWS … cloud infrastructure (ideally in a regulated or high-traffic environment) Previous experience working with Monitoring and Observability Tools Hands-on Kubernetes know-how, specifically with EKS. Solid IaC experience with Terraform. Experience with containerisation (Docker, Helm) and CI/CD (GitHub Actions or similar) Solid scripting/Automation experience with Python or Java A good communicator who enjoys working collaboratively More ❯
Employment Type: Full-Time
Salary: £55,000 - £75,000 per annum
Posted:

Site Reliability Engineer Manager

Manchester, Lancashire, England, United Kingdom
FDM Group
contributor to the stability, performance, and scalability of services, supporting the organisations digital transformation and long-term technology vision. You’ll work actively with container platforms, VMware infrastructure, and observability tooling, ensuring their services are resilient and efficient. You’ll also lead and participate in post-mortems, drive automation, and continuously improve the platform through engineering-led solutions. This role … of platform technologies, including VMware infrastructure, container platforms and orchestration (e.g., Kubernetes, OpenShift), databases, and applications Manage environments and support CI/CD pipelines using Infrastructure as Code Improve observability using tools such as Dynatrace, ensuring proactive monitoring and alerting Lead and contribute to post-mortems to identify and implement long-term fixes aligning with organisations long term objectives Troubleshoot … Code and CI/CD Experience with container platforms and orchestration such as Docker, Kubernetes and OpenShift Hands-on experience with VMware technologies in a production environment Familiarity with observability platforms, such as Dynatrace and experience with either Linux or Windows operating systems Proven ability to troubleshoot across a broad range of platform technologies A mindset focused on continuous improvement More ❯
Employment Type: Contractor
Rate: £50,000 - £70,000 per annum
Posted:

Data Architect

United Kingdom
Hybrid / WFH Options
WebLife Labs
Operations and DevOps Implement Infrastructure as Code using tools like Terraform and AWS CloudFormation to automate provisioning and scaling of data platforms Enhance platform reliability and performance by applying observability practices including monitoring, logging, and alerting with appropriate tooling Design and manage CI/CD pipelines for data applications, ensuring automated testing, version control, smooth deployments, and rollback strategies Define … with standards including GDPR, SOC 2, and ISO frameworks Implement data partitioning strategies, tenant isolation protocols, and cost-efficient scaling mechanisms for multi-tenant environments Design and support SaaS observability practices covering SLA monitoring, usage metering, and compliance adherence Collaboration and Leadership Collaborate with Data Analysts, Data Scientists, AI Engineers, and Business Stakeholders to translate product requirements into scalable cloud … practices for data applications, including Git-based workflows and containerization/orchestration using Docker, ECS, GKE, or Kubernetes Platform and Architecture Skills Solid understanding of platform engineering concepts including observability, monitoring tools (CloudWatch, Prometheus, Grafana), and automated scaling Proven experience designing and supporting multi-tenant SaaS data platforms with strategies for data partitioning, tenant isolation, and cost management Exposure to More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Manager II, Software Engineering

Dublin, Ireland
Kaseya Limited
data science teams to deliver AI-enhanced features and intelligent automation. Guide the integration of AI/ML into both engineering workflows and customer-facing capabilities. Establish and evolve observability practices including structured logging, distributed tracing, and real-time alerting. Promote a culture of automation across testing, deployment, infrastructure, and compliance. Partner with QA and DevOps to implement shift-left … CI/CD pipelines, and infrastructure as code (IaC). Demonstrated experience with AI/ML technologies and their practical application in product development or engineering efficiency. Familiarity with observability stacks and SRE practices. Proficiency in TDD, BDD, and integrating quality gates into the development lifecycle. Extensive experience with multi-tenant SaaS architectures and managing performance at scale. Experience with … on multiple concurrent initiatives. Ability to balance technical depth with strategic thinking and business alignment. Tools Development & Deployment:GitHub, Docker, Kubernetes AI/ML:Azure AI, OpenAI, and similar Observability:Dynatrace, New Relic, Grafana, or similar QA & Testing:Selenium, Playwright, Postman, Cucumber, or similar Automation & IaC:Terraform, Ansible, Bicep, or similar Incident Management:PagerDuty, Opsgenie, or similar Security & Compliance:Snyk More ❯
Employment Type: Permanent
Salary: EUR 150,000 - 200,000 Annual
Posted:

Need Lead NodeJS & API

Dallas, Texas, United States
AETG Services PVT LTD
retry logic, circuit breakers, andrate-limiting to ensure the APIs can withstand transient failures. Use techniques such as load balancing, failover mechanisms, anddistributed architectures to improve fault tolerance. Monitoring & Observability: Set up and maintain real-time monitoring and alerting using tools likePrometheus, Grafana, ELK stack, Datadog, or New Relic. Ensure comprehensive logging, tracing, andmetrics collection (e.g., through OpenTelemetry,Jaeger, or … role. Strong expertise in designing and building RESTful APIs using Node.js. Experience in building highly available, fault-tolerant systems that can handle production-level traffic. Proficiency in monitoring and observability tools (e.g.,Prometheus, Grafana, ELK stack, Datadog, New Relic Experience with resilience patterns such ascircuit breakers, retry logic, andrate limiting. Deep understanding of API security best practices (OAuth2, JWT, API More ❯
Employment Type: Any
Salary: USD Annual
Posted:

Senior Software Engineer - Public Cloud Managed Compute

New York, United States
Bloomberg
and CTO Ref # Description & Requirements Our Team: The Public Cloud Engineering organization provides a suite of services to facilitate Bloomberg's usage of public cloud. From security, to observability, to networking, to access management, to compute, our organization provides the foundational building blocks on which Bloomberg's solutions on public cloud are built. Within this organization, our team provides … kubernetes clusters, for deploying containerized workloads Utilities to deploy and manage virtual machines and kubernetes clusters on public cloud Integrations with other aspects of public cloud lifecycle, such as observability, security, and access management What's in it for you: You will be part of a team that is building the foundation to support a multi-cloud environment for public More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineer SRE with Security Clearance

Hampton, Virginia, United States
ALTA IT Services
optimize and maintain Kubernetes environments and CI/CD pipelines. Develop and refine automation scripts to enhance system reliability, including automated recovery and self-healing capabilities. Build and maintain observability frameworks, integrating metrics, logging, and tracing tools for proactive issue identification. Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. A minimum of … stack, or Datadog). Experience with Elastic will be highly helpful with this position. Hands-on experience with incident response, including designing and improving incident management processes. Expertise in Observability practices, including metrics, logs, traces, and understanding of distributed tracing tools (e.g., OpenTelemetry). Strong problem-solving skills with a focus on building resilient, fault-tolerant systems. Excellent communication skills More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Squad Engineering Manager

London, South East, England, United Kingdom
Harvey Nash
the future team through recruitment and onboarding. Required Skills - We're primarily using AWS, utilising Lambda, ECS, SQS, API Gateway among others. Our database engine is MongoDB and our observability platform is Datadog. Our application is written in Typescript/NodeJS and our infrastructure is defined in Terraform. Experience working with JavaScript/TypeScript but also open to other languages More ❯
Employment Type: Full-Time
Salary: £115,000 per annum
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£67,500
Median
£80,000
75th Percentile
£100,000
90th Percentile
£130,000