Monitor, maintain, and troubleshoot distributed containerized services using Docker and Docker Swarm. Respond to and resolve incidents, working to minimize downtime and ensure high system availability. Investigate system performance, log anomalies, and service issues, escalating when appropriate. Collaborate with DevOps and software engineering teams to implement improvements and automation. Maintain thorough documentation of system configurations, processes, and known issues. … RDS. Solid understanding of Linux server environments, command-line operations, and scripting. Experience in supporting real-time or mission-critical systems (security, IoT, or similar sectors). Familiarity with logaggregation, monitoring, and alerting tools (e.g., ELK, Prometheus, Grafana). Good understanding of networking, VPNs, load balancing, DNS, and firewalls. Comfortable with Git and CI/CD workflows. More ❯
Monitor, maintain, and troubleshoot distributed containerized services using Docker and Docker Swarm. Respond to and resolve incidents, working to minimize downtime and ensure high system availability. Investigate system performance, log anomalies, and service issues, escalating when appropriate. Collaborate with DevOps and software engineering teams to implement improvements and automation. Maintain thorough documentation of system configurations, processes, and known issues. … RDS. Solid understanding of Linux server environments, command-line operations, and scripting. Experience in supporting real-time or mission-critical systems (security, IoT, or similar sectors). Familiarity with logaggregation, monitoring, and alerting tools (e.g., ELK, Prometheus, Grafana). Good understanding of networking, VPNs, load balancing, DNS, and firewalls. Comfortable with Git and CI/CD workflows. More ❯
different tech stacks, web or mobile. You've previously worked with monitoring systems for availability, performance or security, stress and performance testing with observability patterns: Distributed Tracing/OpenTracing, LogAggregation, Audit Logging, Exception Tracking, Health Check API, Application MetricS, Self-Healing/Multi-Cloud. You have an understanding of security concerns, threats and approaches for dealing with More ❯