Companies you'll love to work for

Site Reliability Engineer

Gloat

Gloat

Software Engineering
Israel
Posted on Thursday, May 30, 2024

Site Reliability Engineer

  • R&D
  • Israel
  • Senior

Description

About the company:

Gloat puts people and companies in motion. Our Agile Workforce Operating System is helping the world's most renowned enterprises become dynamic organizations, future-fit for any eventuality, and poised for continuous growth and innovation in today's ever-changing economic climate.

We deliver AI-powered intelligence, infrastructure, and applications that enable organizations to effectively tackle change with agility, unlock capacity and productivity, and reduce workforce risk. Today we support industry leaders around the world including HSBC, Spotify, Nestle, Standard Chartered Bank, Schneider Electric, and many more.

Life at Gloat:

Gloat is a revolutionary startup with a global workforce. We have offices in Tel Aviv, New York City and London and work with customers around the globe. We value collaboration, innovative thinking, and curiosity and we’re looking for bright, driven, and passionate people to grow with us. If you care about empowering businesses and people to reach their potential, you’re in for a fun ride.

We’re looking for an experienced, highly motivated SRE engineer, to utilize SRE methodologies and technologies in order to implement highly scalable and available production environments.

As an SRE engineer at Gloat, you will be part of a growing DevOps group. You will have the freedom to explore and implement the newest technologies. You will be responsible for implementing monitoring and alerting infrastructure and defining the correct measurements for a highly available production environment. You will learn new things every minute of every day and constantly be challenged. There will not be a single boring moment of work but the opposite; exciting, motivating, and stimulating. Our team has the honor and responsibility to support some of the biggest enterprise clients in the world.

Responsibilities

  • Design and implement reliable, highly available and scalable production infrastructure.
  • Seek for new technologies, from POC through implementation.
  • Ensure high uptime and reliability of the production environment.
  • Perform root cause analysis for complex failures and offer modern solutions and tools.
  • Analyze performance and stability issues.
  • Work closely with DevOps, R&D, product, and support to define cross-organizational processes.
  • Design, develop, and drive troubleshooting & mitigation tools as part of driving self-healing agenda.
  • Educate engineers on how to approach and debug production issues across services and levels of the stack.

Requirements

  • 2 + years SRE experience
  • Solid knowledge of Kubernetes
  • Proven Monitoring and alerting experience (ELK, Grafana, Prometheus, etc.)
  • Experience implementing services in one of the big clouds (AWS, Azure, GCP, etc.)
  • Experience with a programming language (Python, Java, Go, Ruby, etc.)
  • Scripting and automation skills (Bash, Python, etc.) .
  • Networking skills
  • Experience with IAC tools such as Terraform, etc.

Advantages:

  • Experience in multi cloud environments
  • Microservice architecture implementation experience
  • Experience with SaaS production infrastructure

At Gloat, we believe that building the most important company in the history of human capital begins with having a diverse and inclusive workforce ourselves. This means that we look for individuals who can bring unique strengths, perspectives, skills, and backgrounds to our existing teams. Gloat is proud to be an Equal Opportunity Employer, and does/will not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, veteran status, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.