Senior Machine Learning Engineer, RAG
Klue
👋 Klue Engineering is hiring!
We're looking for a Senior Machine Learning Engineer to join our ML team in Toronto, focusing on building and optimizing state-of-the-art RAG (Retrieval Augmented Generation) systems. You'll be joining us at an exciting time as we reinvent our RAG systems, making this an excellent opportunity for someone with strong ML and IR fundamentals who wants to dive deep into practical LLM applications.
💡 FAQ
Q: Klue who?
A: Klue is a VC-backed, capital-efficient growing SaaS company. Tiger Global and Salesforce Ventures led our US$62m Series B in the fall of 2021. We’re creating the category of competitive enablement: helping companies understand their market and outmaneuver their competition. We benefit from having an experienced leadership team working alongside several hundred risk-taking builders who elevate every day.
We’re one of Canada’s Most Admired Corporate Cultures by Waterstone HC, a Deloitte Technology Fast 50 & Fast 500 winner, and recipient of both the Startup of the Year and Tech Culture of the Year awards at the Technology Impact Awards.
Q: What are the responsibilities, and how will I spend my time?
A: In this role, you'll focus on optimizing our RAG systems with scientific rigor and reproducible results. You'll measure and improve retrieval systems across the spectrum from BM25 to semantic search, using comprehensive evaluation metrics including Recall@K and Precision@K. A key challenge will be developing optimal chunking and enrichment strategies for diverse data sources including news articles, website changes, documents, CRM entries, call recordings and internal communications. You'll explore how different data types and formats impact retrieval performance and develop strategies to maintain high relevance across all sources.
Beyond RAG and retrieval, you'll work on prompt engineering to effectively utilize the retrieved context. This includes developing zero-shot and few-shot prompts with structured inputs/outputs, and implementing tight iteration loops with the right evaluation metrics.
You'll also work on training and fine-tuning smaller, more efficient models that can match the performance of large LLMs at a fraction of the cost. This includes creating labeled datasets (sometimes using prompts), conducting careful hyperparameter optimizations, and building automated training pipelines. You'll also deploy and monitor these models in production, optimize their latency, and implement comprehensive offline/online metrics to track their performance.
Throughout all this work, you'll apply your deep understanding of the latest breakthroughs in the field to connect new research advances to practical improvements in our systems. Working closely with backend engineers, you'll help build scalable, production-ready systems that turn cutting-edge ML experiments into reliable business value.
Q: What experience are we looking for?
Masters or PhD in Machine Learning, NLP, or related field
2+ years building and optimizing retrieval systems
2+ years training/fine-tuning transformer models
Strong foundation in evaluating RAG systems - both retrieval and generation
Deep understanding of retrieval metrics and their trade-offs
Strong grasp of embedding models, semantic similarity techniques, and clustering similar content
Knowledge of query augmentation and content enrichment strategies
Expertise in automated LLM evaluation, including LLM-as-judge methodologies
Skilled at prompt engineering - including zero-shot, few-shot, and chain-of-thought
Experience deploying models to production and monitoring the health of the system and the predictions.
Knowledge of ML infrastructure, model serving, and observability best practices
Proven ability to balance scientific rigor with driving business impact
Track record of staying current with ML research and breakthrough papers
Q: What makes you thrive at Klue?
A: We're looking for builders who:
Take ownership and run with ambiguous problems
Jump into new areas and rapidly learn what's needed to deliver solutions
Bring scientific rigor while maintaining a pragmatic delivery focus
See unclear requirements as an opportunity to shape the solution
Q: What technologies do we use?
LLM platforms: OpenAI, Anthropic, open-source models
ML frameworks: PyTorch, Transformers, spaCy
Search/Vector DBs: Elasticsearch, Pinecone, PostgreSQL
MLOps tools: Weights & Biases, MLflow, Langfuse
Infrastructure: Docker, Kubernetes, GCP
-
Development: Python, Git, CI/CD
How We Work at Klue:
Hybrid. Best of both worlds (remote & in-office)
Our main Canadian hubs are in Vancouver and Toronto. Ideally, this role would be located in Toronto.
You and your team will be in office at least 2 days per week.