Site Reliability Engineering Manager - 09302021

Site Reliability Team · Toronto, Ontario
Department Site Reliability Team
Employment Type Full-Time
Minimum Experience Manager/Supervisor

Job Summary: 

As Site Reliability Engineering Manager at Acerta, you participate in all stages of the development and operation process of the product, with a focus on the CI/CD portions of the software development-deployment workflow. You also participate in the full software life-cycle in the development of new tools, frameworks, systems, and components towards maintaining technical infrastructure in an automated approach. 

Working closely with the Product, Software Development, Marketing, customer success teams, the SRE leader contributes from architecting and design of the services, components, to user experiences and system performance. You are responsible for adherence to, and evaluation of the quality, consistency, and robustness of code that both you and your peers develop. This includes evaluation of the code against the quality guidelines as set forth by the code style-guides and recommendationsautomate build and test processes, and deploy based on quality standards and testing/building results.  You are responsible for ensuring that we follow the industry standards cloud availability, performance, observability, scalability, quality, and cost effectiveness. You are also responsible for ensuring that we adhere and comply with Acerta’s information technology controls and policies. 


Key Responsibilities: 

  • Prepare and participate in roadmap planning, Sprint planning and reviews, retrospectives, and daily stand-ups
    • Evangelize and defend the vision of the product with the development team on a day-to-day basis to ensure customer satisfaction and acceptance 
    • Assist Product Management and Software Engineering to drive the Agile software development process and team to deliver software meeting business requirements 
    • Work with CTO, and team at large, to define the proper product architecture on Azure and AWS based on Current system and future business goals 
    • Build and maintain highly available development, test and production environments 
    • Design, build, configure, and execute continuous integration and deployment pipelines 
    • Create and enhance application and infrastructure monitoring and observability 
    • Develop and implement cost optimization strategies 
    • Assist in the architecture, design, implementation, and lead Azure/AWS public cloud build (connectivity, network, security, containerization, monitoring) 
    • Be responsible infrastructure as code initiative and deploy repeatable software runtime environments 
    • Manage storage, compute efficiency, and optimization activities, including evaluating the configuration of compute size, storage solutions, and other services (network services, automation, and load balancing) 
    • Assist with application integration and troubleshooting in this infrastructure for a complex application environment, including management of dependencies on services, platforms, and other applications within the cloud infrastructure 
     Improve DevOps process automation and tooling to implement standards and boundaries in a way that empowers our application development teams to help themselves for their infrastructure and deployment needs. 
    • Manage microservices centric deployment and operation of the environments 
    • Write scripts and automation using Python/Javascript/Java/Bash 
    • Configure and manage systems like MySQL, PostgreSQLMongoDB, Kafka, Kubernetes, etc. 
    • Production support 

Experience:

  • Between 5 – 8 years of work experience in IT with strong understanding of SRE/DevOps and service management principles 


Key Competencies: 

  • Ability to work in a fast-paced Action-Oriented agile environment 
  • Experience in a production Environment with deep knowledge of Azure cloud services 
  • Experienced in deployment and developer workflow using Docker and Kubernetes 
  • Deployment, logging, monitoring, security and automatic failover experience with container orchestration platforms on Azure or AWS 
  • In-depth knowledge of security best-practices, policy, access management and cryptography 
  • Experience in micro services architecture and service mesh 
  • Hands-on expertise in configuration management and infrastructure deployment tools like Terraform, etc. 
  • CollaborativeCustomer-focussed 
  • Detail-oriented with Excellent analytical skills 
  • Experience with Machine Learning software product, especially time-series and operations data. 
  • Flexibility to adjust to changing priorities, requirements, and schedules. 
  • Open-minded and Pragmatic 
  • Resilient 
  • Resource planning 
  • Strong oral and written communication 
  • Team player 

 

Technical Leadership 

  • Lead, coach, task manage, technical plan, and mentor team members and more junior engineers 
  • Work with clients, and manage long term on-going relationship with key customers 
  • Work with cross-functional and geographically distributed teams 
  • Define and implement performance improvement strategies 
  • Train and mentor development teams in leading technologies 

Thank You

Your application was submitted successfully.

  • Location
    Toronto, Ontario
  • Department
    Site Reliability Team
  • Employment Type
    Full-Time
  • Minimum Experience
    Manager/Supervisor