Posted on: 
April 12, 2024

Site Reliability Engineer

Job Description

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.

We have built the fastest-growing, open-source, library of pre-trained models in the world. With more than 1 Million+ models and 320K+ stars on GitHub, over 15.000 companies are using HF technology in production, including leading AI organizations such as Google, Elastic, Salesforce, Grammarly and NASA.

About the role:

We are looking for a Site Reliability Engineer responsible for maintaining and scaling our product infrastructure. The ideal candidate will have experience maintaining large infrastructure for AI workflows and strong experience supporting teams to create best practices for reliability and scalability.

Responsabilities:

  • Design, develop, deploy, and maintain reliable and scalable infrastructure.
  • Manage large Kubernetes clusters.
  • Measure and optimize system performance.
  • Patch infrastructure to avoid vulnerabilities.
  • Keep important, revenue-critical systems up and running despite outages and configuration errors.
  • Provide primary operational support and engineering for multiple teams.

Qualifications:

  • 7+ years of experience in a Site Reliability Engineer or Infrastructure Engineer role.
  • Strong knowledge of cloud providers such as AWS, GCP, infra-as-code frameworks and observability tools.
  • Strong communication, collaboration, and documentation skills.
  • Experience with Linux, Git, containers, networking and command line tools.
  • Collaborate and communicate asynchronously.

About you:

If you are a passionate Site Reliability Engineer with a keen interest in AI and thrive in a challenging and innovative setting, we would love to hear from you. Join our team and contribute to the advancement of AI technologies while working alongside talented professionals in a collaborative and stimulating environment.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We offer health, dental, and vision benefits for employees and their dependents. We also offer parental leave and flexible paid time off.

We support our employees wherever they are. While we have office spaces in NYC and Paris, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We want our teammates to be shareholders. All employees have company equity as part of their compensation package. If we succeed in becoming a category-defining platform in machine learning and artificial intelligence, everyone enjoys the upside.

We support the community. We believe major scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.

Apply now

More job openings