The Context: Porters is building the operating system for autonomous banking, and process accuracy is our moat. We are processing tens of thousands of documents per month, deriving instructions for banks to act on. We are looking for an AI Systems Engineer to perpetually increase our AI accuracy, move ideas from the “lab” into production in our high-volume pipeline. Your goal is 99.99% accuracy on very complex document types.

The Mission: You will lead the systematic R&D required to tune LLMs for high-stakes document types (e.g., insolvency notices, seizure orders), ranging from improved "prompting" via rethinking our AI document extraction approach to fine-tuning models. In short, you are building the engineering framework that allows us to deploy models successfully at scale.

This job is in person (Zurich, Switzerland) or remote (CET +/-3 hours).

Your responsibilities

Own the Evaluation Workbench: Manage the full prompt evaluation lifecycle. You will define test batches, manage model execution, and implement rigorous performance scoring to improve our current production engine
Develop Robust Metrics: Select and create custom quantitative metrics (Evals) tailored to banking benchmarks. You will move us beyond generic "accuracy" to granular measurements of extraction quality, hallucination rates, and specific field-level precision.
Systematic Prompt Optimization (Closed-Loop): Conduct in-depth analysis of evaluation results to identify gaps. You will iteratively refine system prompts, few-shot examples, and Chain-of-Thought (CoT) logic to maximize performance against our metrics.
Pioneer Automation: Contribute to the design of a closed-loop feedback system. Your long-term goal is to build the architecture that autonomously updates prompts or model parameters based on evaluation outcomes.
Document Insights: Clearly document evaluation methodologies and prompt versions, providing stakeholders with actionable data on the trade-offs between cost, latency, and accuracy.

Your profile

Academic Background: Bachelor’s, Master’s or PhD in Data Science, Computational Linguistics, Artificial Intelligence, Computer Science, or a closely related quantitative field.
Technical Proficiency: Strong proficiency in Python is mandatory, including experience with data manipulation libraries like Pandas and NumPy, and experience building evaluation pipelines using modern frameworks (Hugging Face, OpenAI SDKs, LangChain).
Evaluation & Metrics: Demonstrated understanding of LLM evaluation techniques. You know the difference between human-in-the-loop vs. automated evals (LLM-as-a-Judge) and can design metrics that objectively measure "quality."
Prompt Engineering: Deep understanding of prompt engineering principles (few-shot, chain-of-thought) and how to apply them systematically rather than ad-hoc.
Analytical Aptitude: Proven ability to approach unstructured problems with a systematic, data-driven methodology. You are capable of translating complex evaluation results into concrete, code-based refinements.

Porters is an equal opportunity employer.

This job is no longer accepting applications

See open jobs at Porters.See open jobs similar to "Senior AI Engineer" Earlybird Venture Capital.

See more open positions at Porters

Powered by Getro.com

Privacy policy Cookie policy

ABOUT US

PORTFOLIO

APPROACH

PERSPECTIVES

JOBS

Career Opportunities at
Earlybird Portfolio Companies

Senior AI Engineer

Your responsibilities

Your profile

ABOUT US

PORTFOLIO

APPROACH

PERSPECTIVES

JOBS

Career Opportunities at Earlybird Portfolio Companies

Senior AI Engineer

Your responsibilities

Your profile

Career Opportunities at
Earlybird Portfolio Companies