Funding

Austria’s MOSTLY AI Launches $100K To Synthetic Data Innovation

Jun 9, 2025 | By Kailee Rainse

Austrian startup MOSTLY AI has launched a $100,000 prize challenge inviting participants to create the best synthetic data set from a real data set.

SUMMARY

  • Austrian startup MOSTLY AI has launched a $100,000 prize challenge inviting participants to create the best synthetic data set from a real data set.

The competition is open to everyone and will be judged on data anonymisation, accuracy, usability, and efficiency. The aim is to encourage fresh innovation in the field of synthetic data. The winning code will be open-sourced for public use.

MOSTLY AI’s platform creates high-quality synthetic data that closely mimics real data without exposing sensitive information. It is known for its accuracy and is used in advanced AI and machine learning applications.

Read Also - London-based Altura Secures €8M Series A Round For AI Bid Platform

The platform helps organisations safely use sensitive data while solving the limitations of traditional anonymisation methods.

MOSTLY AI is one of Austria’s top-funded startups, having raised $25 million in 2022. Its global clients include Citi Bank, the U.S. Department of Homeland Security, and Erste Group.

The company recently open-sourced its core technology to support broader innovation and understanding in the field.

I also spoke with Alexandra Ebert, MOSTLY AI’s Chief AI & Data Democratisation Officer, to learn more about this initiative.

According to Ebert, the company wanted to do something bold— "something that hasn't really been done in the past 20 years, at least not at this scale."

"The last time something similar happened was the Netflix Prize, which offered a $1 million reward. While we're not Netflix (yet!), the idea is similar: to spark innovation using synthetic data."

As concerns around data privacy grow, both big companies and startups are turning to synthetic data to train their AI models. For example, Nvidia recently acquired a synthetic data startup for $320 million. Governments are also starting to take notice—synthetic data is even mentioned in the UK Government’s AI Opportunities Action Plan.

According to Ebert, synthetic data has huge potential—not only for businesses but also for society as a whole.

"It can help accelerate healthcare research, climate insights, and open up innovation for startups and smaller players by giving them access to granular, relevant, privacy-safe data.

The goal is to inspire many more competitions in the future, where synthetic data can play a central role in making meaningful datasets more accessible. It's a push away from the unrealistic "toy datasets" we see on platforms like Kaggle, toward something much closer to real-world complexity and value."

The competition uses real-world data that is publicly available, yet not widely known—making it more realistic than typical Kaggle datasets while still being accessible.

According to Ebert, "We've lightly masked the datasets by replacing some column names with fun placeholders like "cat" and "dolphin" to prevent reverse engineering."

The competition has gained strong interest from students and early-career computer science professionals, especially from regions like the Global South.

While the $100,000 prize may not appeal to senior data scientists at companies like Meta or AWS, it’s a major opportunity for rising talent around the world.

Ebert detailed: "We only have two key eligibility rules: participants must have a GitHub account created before the competition launch (to avoid people gaming the system with multiple accounts), and their submissions must meet minimum privacy and accuracy thresholds to be considered for the leaderboard."

In addition to privacy and accuracy, the top five submissions in each challenge will also be judged on creativity, ease of use, and how well they can be applied to different scenarios.

Ebert detailed: "We're not just looking for solutions that overfit the dataset — we want ideas that could be useful across domains and inspire broader applications of synthetic data."

About MOSTLY AI

Founded in 2017, MOSTLY AI builds technology to generate high-quality, privacy-safe synthetic data at scale. Its open-source Synthetic Data SDK and enterprise platform help organisations safely share, access, and gain valuable insights from their data.

Recommended Stories for You