Strong skills in scripting languages (e.g., Python, Bash) to automate repetitive tasks and knowledge of configuration management tools (e.g., Ansible, Puppet, Chef). Expertise in setting up and maintaining monitoring systems (e.g., Prometheus, Grafana). Some other highly valued skills may include: Experience with cloud platforms (e.g., AWS, Azure, Google Cloud). Knowledge of containerization and orchestration tools (e.g. … practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
Strong skills in scripting languages (e.g., Python, Bash) to automate repetitive tasks and knowledge of configuration management tools (e.g., Ansible, Puppet, Chef). Expertise in setting up and maintaining monitoring systems (e.g., Prometheus, Grafana). Some other highly valued skills may include: Experience with cloud platforms (e.g., AWS, Azure, Google Cloud). Knowledge of containerization and orchestration tools (e.g. … practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
services.SQL OptimizationStrong skills in scripting languages (e.g., Python, Bash) to automate repetitive tasks and knowledge of configuration management tools (e.g., Ansible, Puppet, Chef).Expertise in setting up and maintaining monitoring systems (e.g., Prometheus, Grafana).Some other highly valued skills may include:Experience with cloud platforms (e.g., AWS, Azure, Google Cloud).Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes … best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. AccountabilitiesAvailability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning.Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.Development of tools and scripts to automate More ❯
practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
In Memory Caching technologies - Redis, GridGain, Apache IgniteProgramming languages: Java, Python, Go LangContainer orchestration/Cloud platform: RedHat Openshift/AWS/AzureDevOps tools - Ansible, Chef, Kubernetes, GitLabSRE logging & Monitoring Tools - ELK stack, Grafana, Prometheus, Open TelemetryOther highly valued skills include:Strong understanding of Agile application development methodology.Strong knowledge of API development/principlesCollaborating with the development teams to … best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. AccountabilitiesAvailability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning.Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.Development of tools and scripts to automate More ❯
best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. AccountabilitiesAvailability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning.Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.Development of tools and scripts to automate More ❯
practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts … to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure More ❯
best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. AccountabilitiesAvailability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning.Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.Development of tools and scripts to automate More ❯