The OpsWorks Co. team is seeking an experienced Senior Site Reliability Engineer to join a large, fast-growing trading project originating in Britain. The position requires both being capable of working independently and collaborating with other team members to assess the business risks, define its requirements, and create and execute a roadmap.
This is a full-time, long-term, remote position.
You will fit perfectly in this role if you have the following qualifications
Experience and understanding of programming languages (C#, SQL 92)
Demonstrable experience and understanding in Python, Powershell and Bash
Strong experience of working with analysis and monitoring tools such as Grafana and Splunk
Ability to trace network-level issues and understanding of DNS
Experience in cloud technologies such as Amazon AWS or MS Azure (beyond basic server maintenance)
Proficiency in determining cause of application errors and ability to read and diagnose error logs
Excellent written and oral communication skills
Take responsibility for managing and implementing monitoring systems
Perform analysis on bespoke software applications functionality and work closely with product, IT operational, and development teams
Establish root cause of application errors and deal with high-priority issues with development teams
Be a part of the team responsible for code deployments and updates
Consult with software development teams, internal users/clients with a goal to improve platform performance
Document and log application performance issues into the company incident management for analytics and metrics
Work with the Jira-based helpdesk system to manage issues through their lifecycle
Manage and implement monitoring systems
Debug and diagnose software issues/bugs on the platforms and work with development/product teams to resolve and funnel into development