LLM Ops & Evaluations
sintra.ai
Location
Vilnius
Employment Type
Full time
Location Type
Hybrid
Department
Engineering
Quick summary: We're looking for someone to own the quality of our AI employees - prompts, evaluations, datasets, and model selection. You'll build the systems that determine whether our AI helpers actually help. This is a foundational role: you'll design our evaluation framework, be our primary technical contact with LLM providers, and be responsible for improving LLM outputs across the app.
This role is for someone who's shipped LLM-powered products at scale, thinks in systems, and wants to define how AI quality works at a company scaling fast.
Why join Sintra?
We build AI employees for small businesses. Real helpers with personalities, not faceless chatbots. They handle the work that keeps owners up at night - answering customer emails, posting on social media, analyzing sales data. For business owners who've always worked alone, we're giving them their first team.
50,000+ businesses use Sintra because for the first time someone made AI actually useful for them. While Silicon Valley builds for tech companies, we're building for the florist who needs help with Instagram, the contractor drowning in invoices, the restaurant owner who can't keep up with reviews.
The timing matters. LLMs just got good enough to actually do the work - not just talk about it. We're at the moment where this becomes real infrastructure for millions of businesses.
We raised $17M in seed funding. Team of 50, based in Vilnius, shipping daily. We move fast, take ownership of what we build, and live by one principle - work is play.
Who we're looking for
2+ years hands-on with LLMs in production
Has built evaluation systems, not just written prompts
Strong technical skill set - you'll automate evals and work closely with engineering
Systems thinker who can handle hundreds of use cases × user customization
Clear communicator - you'll be our point person with technical teams of leading AI labs
What you'll do
Own end-to-end quality of all AI outputs across the app
Design and build our evaluation framework - automated tests, human review loops, quality scoring
Create, version, and optimize prompts for every use case
Build and maintain test datasets that catch regressions before users do
Hire and lead a team of prompt engineers and eval specialists as we scale
Our hiring process
Fill in the application form. If we see a fit, we'll reach out for an intro call.
Complete a take-home task that mirrors real work you'd do here. Be prepared to explain what you did and why.
Join us for a tech call. Meet the team, see if we're right for each other.
Get an offer if it's a mutual fit.
We understand good people have options, that's why we move super fast. Life's too short for drawn-out hiring processes.
What we offer
Compensation & EquityTop-of-market salary in Vilnius plus meaningful equity, so that you own a part of what you build. Salary range for this role: €5,000-8,000/month depending on expertise and experience.
Seamless RelocationRelocation bonus and support to make your move to Vilnius smooth.
By submitting your application you confirm that you have read and understood our Privacy Notice for Job Candidates.

