LLM Ops & Evaluation
sintra.ai
Quick summary: We're looking for someone to own the quality of our AI employees - prompts, evaluations, datasets, and model selection. You'll build the systems that determine whether our AI helpers actually help. This is a foundational role: you'll design our evaluation framework, be our primary technical contact with LLM providers, and be responsible for improving LLM outputs across the app.
This role is for someone who's shipped LLM-powered products at scale, thinks in systems, and wants to define how AI quality works at a company scaling fast.
Why join Sintra?
We build AI employees for small businesses. Real helpers with personalities, not faceless chatbots. They handle the work that keeps owners up at night - answering customer emails, posting on social media, analyzing sales data. For business owners who've always worked alone, we're giving them their first team.
50,000+ businesses use Sintra because for the first time someone made AI actually useful for them. While Silicon Valley builds for tech companies, we're building for the florist who needs help with Instagram, the contractor drowning in invoices, the restaurant owner who can't keep up with reviews.
The timing matters. LLMs just got good enough to actually do the work - not just talk about it. We're at the moment where this becomes real infrastructure for millions of businesses.
We raised $17M in seed funding. Team of 50, based in Vilnius, shipping daily. We move fast, take ownership of what we build, and live by one principle - work is play.
Who we're looking for
- 2+ years hands-on with LLMs in production
- Has built evaluation systems, not just written prompts
- Strong technical skill set - you'll automate evals and work closely with engineering
- Systems thinker who can handle hundreds of use cases × user customization
- Clear communicator - you'll be our point person with technical teams of leading AI labs
What you'll do
- Own end-to-end quality of all AI outputs across the app
- Design and build our evaluation framework - automated tests, human review loops, quality scoring
- Create, version, and optimize prompts for every use case
- Build and maintain test datasets that catch regressions before users do
- Hire and lead a team of prompt engineers and eval specialists as we scale
Our hiring process
- Fill in the application form. If we see a fit, we'll reach out for an intro call.
- Complete a take-home task that mirrors real work you'd do here. Be prepared to explain what you did and why.
- Join us for a tech call. Meet the team, see if we're right for each other.
- Get an offer if it's a mutual fit.
We understand good people have options, that's why we move super fast. Life's too short for drawn-out hiring processes.
What we offer
- Compensation & Equity
Top-of-market salary in Vilnius plus meaningful equity, so that you own a part of what you build. Salary range for this role: €5,000-8,000/month depending on expertise and experience. - Seamless Relocation
Relocation bonus and support to make your move to Vilnius smooth.

