Cambridge spin-out Trismik raises £2.2M to revolutionize AI evaluation

Sep 25, 2025 | By Kailee Rainse

Cambridge University spin-out Trismik has emerged from stealth with a £2.2 million Pre-Seed round to tackle AI evaluation using a science-based method inspired by human IQ testing. As traditional benchmarks like MMLU and GSM8K reach saturation, with many leading models scoring over 90%, Trismik offers a fresh approach to measuring AI capabilities.

SUMMARY

Trismik’s platform adjusts evaluation difficulty in real-time based on model responses, much like human aptitude tests that tailor questions to estimate intelligence.

The team applies Item Response Theory and Computerised Adaptive Testing—core techniques from psychometrics—to evaluate large language models (LLMs). They say this method provides faster, more scalable insights into what a model can actually do.

According to Professor Nigel Collier, NLP researcher at Cambridge and Trismik’s Chief Scientific Officer, if we want to trust AI, our methods have to be as rigorous as our ideas.

RECOMMENDED FOR YOU

Funding

German Biorobotics SWARM Biotactics Secures €10M In Seed Funding Round

Kailee Rainse

Jun 26, 2025

Funding

Silana funding news -Robotics Startup Silana Secures €1.5 Million in Funding

Team SR

May 22, 2024

Funding

[Funding alert] Munich-based Solar Tech Company Sono Motors Secures Investment from Yorkville

Team SR

Nov 28, 2023

"Benchmark saturation is creating problems in every domain, from general knowledge, to reasoning, math, and coding.

Scientists, researchers and technical teams face mounting pressure as evaluation is exploding in importance and has become essential for tying AI to trust.We need an evaluation framework that scales and can support this."

Trismik’s platform adjusts evaluation difficulty in real-time based on model responses, much like human aptitude tests that tailor questions to estimate intelligence. This approach allows the system to achieve nearly the same accuracy rankings as traditional tests while using far fewer questions.

Early results are promising: adaptive tests matched conventional evaluation rankings with Spearman correlations over 0.96, using only 8.5% of test items. The company says this could reduce evaluation costs by up to 95%, a significant benefit for teams spending six-figure sums on GPU compute just to assess models.

This scientific method is built on Professor Collier’s decades of research. With over 200 publications in NLP and AI, Collier now focuses on making AI systems measurable, explainable, and trustworthy. He teamed up in 2023 with CEO Rebekka Mikkola, a serial founder with enterprise AI sales experience, through a Cambridge Enterprise-backed design partnership with a major UK telco. Later, Marco Basaldella, a former Amazon scientist and TEDx speaker, joined as CTO.

With new regulations like the EU AI Act and sector-specific compliance frameworks on the horizon, the need for precise, transparent evaluation is growing. At the same time, rapid AI development pressures teams to deliver models that are safe, aligned, and effective. Traditional benchmarks often fall short—they don’t reflect proprietary data or domain-specific tasks and remain static, unable to adapt as models evolve.

Trismik’s recent £2.2 million Pre-Seed funding round was led by Twinpath Ventures, with participation from Cambridge Enterprise Ventures, Parkwalk Advisors, Fund F, Vento Ventures, and the angel network Ventures Together.

“The AI evaluation market is at an inflection point. Every AI team we speak with is drowning in evaluation overhead, it has become the hidden bottleneck preventing teams from shipping faster and with confidence,” said John Spindler, from lead investor Twinpath Ventures.

“Trismik's approach is compelling because it applies proven scientific methods from a completely different domain to solve this problem.

When you can reduce evaluation time by two orders of magnitude while actually increasing measurement precision, you fundamentally change what's possible in AI development cycles.”

Trismik is now starting to launch its LLM evaluation platform for AI developers. The platform currently supports both traditional and adaptive testing on datasets covering areas like factual accuracy, alignment, reasoning, safety, and domain knowledge, with a simple interface for quick testing.

The company plans to expand the platform into a full environment for LLM experimentation, adding features like fine-tuning, prompt engineering, compliance tracking, and performance visualization.

“Trismik exemplifies Cambridge’s continued contribution to global AI development, with the team combining world-class academic credentials and practical industry experience that has given them the unique authority to define how AI capabilities should be measured,” said Dr Christine Martin, Head of Ventures at Cambridge Enterprise.

“By solving a pivotal challenge in AI adoption, Trismik is positioned to drive trust at scale — we’re excited to support their journey to market.”

The funding will be used to launch Trismik’s adaptive AI evaluation platform, which aims to replace slow and costly benchmarking with faster, more accurate assessments.

Early access to the platform is available on Trismik’s website, and its adaptive testing has already been tested on seven models and five benchmark datasets. The team plans to share more technical results and case studies later this year. Enterprise users will start onboarding at the end of 2025, with the full enterprise solution expected to launch in early 2026.

About Trismik

Trismik is a scientific platform for testing large language models (LLMs). Its adaptive testing lets AI developers evaluate models in seconds instead of days. The platform combines scientific accuracy with fast results, using clear visualizations and easy-to-use dashboards. Trismik is a spin-out from Cambridge University, working to shape the future of LLM evaluation. The team has experience from Cambridge, Salesforce, and Amazon, with over 38 years of research, 200+ published papers, and more than 12,000 citations.