The Power of Product Testing With Synthetic Data: Ipsos Views
The Power of Product Testing With Synthetic Data: Ipsos Views
THE POWER OF
PRODUCT TESTING
WITH SYNTHETIC DATA
Humanizing AI series,
part two
Colin Ho, Ph.D
Dr. Nikolai Reynolds
The Power of Product Testing with Synthetic Data The Power of Product Testing with Synthetic Data
At Ipsos, we believe
synthetic data opens
brand new possibilities
for market research,
At Ipsos, we champion the unique blend of Human particularly in the field
Intelligence (HI) and Artificial Intelligence (AI) to of product testing.
propel innovation and deliver impactful, human-
centric insights for our clients.
Our Human Intelligence stems from our expertise
in prompt engineering, data science, and our
unique, high quality data sets – which embeds Synthetic data is about to change technologies to accelerate innovation and
the world. From fast-tracking drug growth for businesses around the world.
creativity, curiosity, ethics, and rigor into our
development in healthcare, simulating The following pages present Ipsos’ insights
AI solutions, powered by our Ipsos Facto Gen AI fraudulent transactions in financial of testing products with synthetic data,
platform. Our clients benefit from insights that services, and fueling autonomous vehicle providing readers with:
tests in the automotive sector, it is already
are safer, faster and grounded in the human demonstrating its value across various • Recommendations for generating
context. business contexts. and evaluating high-quality
synthetic data sets
#IpsosHiAi At Ipsos, we believe synthetic data
opens brand new possibilities for market • Specific applications in product
research, particularly in the field of product testing for consumer goods and
testing. However, many businesses remain services
uncertain about the quality of synthetic
data, or how to evaluate it. This paper Today, we find different types of
aims to bridge these gaps. synthetic data in the market research
industry, each with their strengths and
As the world’s largest and leading product weaknesses. In this paper, we focus
testing adviser, Ipsos has been at the on data augmentation, i.e., enhancing
forefront of leveraging cutting-edge datasets with synthetic data.
Data imputation
Data augmentation and fusion
Enhancing datasets with Filling in missing data points
If an AI has not been trained on
synthetic data to create a using existing information real-world data that is relevant to
more comprehensive sample,
while maintaining statistical
your business, it will not be able to
integrity generate synthetic data that shares
the same properties as real-world
data. It’s as simple as that!
Gen AI agents
and persona bots Full synthetic data
Tailored digital assistants that Utilizing entirely artificial
mimic consumer segments, samples made up of synthetic
offering insights from respondents Generating and evaluating synthetic data
synthesized responses
We use data to make better business with them. This is the most critical point
decisions in the real world, and synthetic to remember from this paper: if an AI has
data can be employed in many ways to not been trained on real-world data that is
support decision-making. As such, while relevant to your business, it will not be able
We start with a general overview of product testing, specifically. This paper synthetic data does not correspond to real to generate synthetic data that shares the
what is needed to generate high-quality covers only the generation of synthetic events or people, it still needs to mimic the same properties as real-world data. It’s as
synthetic data and how to evaluate its numerical data; the format used most statistical properties and patterns of real- simple as that!
quality. As the effects and nuances by quantitative researchers. It does not world data. This raises two fundamental
of synthetic data can only truly be cover the application of synthetic images, questions: The evaluation process is
understood when considered in specific video, data imputation, or synthetic straightforward as well. Synthetic
application areas, we investigate personas, all of which also fall under the 01 What is required to generate numerical data should, at minimum,
how synthetic data can be applied to broad umbrella of synthetic data1. synthetic data that closely mimics mirror real-world data on common
real data? statistical measures— such as means,
data distributions, variances, and
02 How can synthetic data be relationships between variables (e.g.,
evaluated for its resemblance correlations). A direct comparison
As explained in the Ipsos Views to real-world data? between synthetic and human data on
paper, Synthetic Data: From these common metrics will provide us
Hype to Reality1, synthetic Before an AI can generate synthetic with a sense of how well a synthetic
data is artificial data that is data that mirrors real-world data, an AI data set approximates human data. The
generated from a model that is needs to be trained on real-world data. closer synthetic data is to human data,
trained to mimic the statistical As discussed in Ipsos’ first paper in the the less risk we assume when using it,
properties and patterns of Humanizing AI series, AIs are simply but there is always some risk because
real-world data. algorithms; they have no intelligence synthetic data can never perfectly mimic
of their own, until they are trained. It is real data in every aspect. We should use
through learning from training data that synthetic data, therefore, only when we
AIs gain the intelligence we associate are willing to accept some risk.
Generating synthetic data using LLMs Generating synthetic data using non-LLM
Approaches to generating synthetic data of such data in their training sets. approaches
can be divided into two categories: LLMs Studies have shown, for example, that
(Large Language Models) and non-LLMs, cultural values generated by LLMs align Long before LLMs stole the spotlight, numerical data. DL algorithms are
differentiated by their text-based and more closely with the Anglosphere and data scientists have used Deep Learning particularly effective at producing
numerical-based nature, respectively. Protestant Europe than those of other (DL) algorithms6 to generate synthetic synthetic numerical data that closely
Off-the-shelf, or public LLMs, which are countries4. Third, information can quickly numerical data7. DL algorithms, including mirrors the statistical properties of
pre-trained on extensive datasets such as become outdated. the types used in LLMs, are potent tools real-world datasets. While pre-trained
websites, online books, and social media for the generation of synthetic data, each LLMs like ChatGPT are designed for
posts, can generate quality synthetic data Therefore, to generate high-quality possessing unique advantages. natural language tasks, DL models can
in subject areas included in their training. synthetic data using LLMs, it is crucial to be trained specifically for numerical
train them on updated, country-specific LLMs are particularly effective at data synthesis, enabling customization
However, off-the-shelf LLMs have real-world data relevant to the subject of generating human-like text data. They for domain-specific or market-relevant
limitations in producing realistic synthetic interest. This process requires access to can provide detailed and contextually rich applications. At Ipsos, we have a strong
data (see Figure 1) 2,3. First, their training recent, pertinent, and specialized data, textual data, making them highly valuable history of leveraging DL techniques for
data is limited in coverage – many topics statistical and data science expertise, and in applications such as content creation, data synthesis. In the following section,
are too mundane or private to be found substantial investment of time and effort language translation, and chatbots8. we detail the results of applying DL
online. Second, LLMs are often biased to ensure the synthetic data accurately models to generate synthetic data for
toward Western, English-speaking reflects real-world statistical properties Non-LLM DL offers significant product testing, analyzed on a product-
countries, due to the predominance and patterns5. advantages for generating synthetic by-product basis.
samples (e.g., 75, 100) but for brevity, will prototypes, new formulations) and In general, synthetic data works
share only the findings from our trials with does not provide robust numerical data
50 humans. on respondent level. To add, we did In our experiments, we validated results • Data distribution (e.g., the
not consider weighting the data as an by comparing findings from an all-human distribution of people’s responses
Moving on, data from the 50 humans was alternative to DL, because weighting does dataset with those from a dataset across the answer options on
used to train a DL algorithm to generate not help in creating additional sample sizes augmented with synthetic data (see individual questions)
synthetic data. We did not use an off-the- for subgroups and is often not accepted in Figure 1)10. As a reminder, in our approach,
shelf LLM because the public data that product testing. we are not just replicating or copying and • The relationships between variables
LLMs have been pre-trained on does not pasting existing respondent data. in the data (i.e., correlations between
include people’s multi-sensory experiences overall liking and product attributes)
on products in the given category (e.g., In general, we found that the two datasets
were remarkably similar, in in terms of: Most importantly, the two datasets
showed differences in variances but led
• The relative performances of to the same business decision in every
products (e.g., rankings, dataset we tested (see Figure 2).
statistical significance)
Data Correlations
distribution
Source:
Ipsos
A key benefit of product testing with In our all-human sample, we had about 100
synthetic data is the ability to augment brand users per product. In the all-human
data for hard-to-reach populations. dataset, there were some differences
Once augmented, differences that were
previously not statistically significant
between three products tested, but the
differences did not reach statistical
When seeking feedback
may become significant due to the significance. Once augmented with 100 on products from a
boost in sample size. In one of our tests,
for example, we generated synthetic
synthetic brand users per product, the
differences between products became
specific target group,
responses to augment people who statistically significant due to the increase we must set recruitment
used a particular brand of product. in sample size (see Figure 3).
quotas that align with the
Figure 3: Human brand users augmented with synthetic brand users business objectives.
50 Humans
100 Humans (Brand users) +
(Brand users) vs. 150 Synthetic Some caution is warranted
(Brand users)
We have presented a promising picture of synthetic data. As previously mentioned,
synthetic data alone comes with an inherent trade-off, as it can never fully match the
accuracy of real-world data. The ability of a part-human, part-synthetic dataset to
replicate the findings of an all-human dataset depends on:
Source:
Ipsos
Overall, we do not believe synthetic data Like the prodigy in the movie, an AI can
should completely replace humans – at be fed all the knowledge in the world Key takeaways
least not in product testing. In the 1997 that exists, but an AI will never be able to
movie “Good Will Hunting”, the late actor, experience the world like a human being.
01 02
Robin Williams, portrayed a professor who There is something unique and beautiful
mentors a young genius, played by Matt about being human and being able to feel
Damon. The prodigy holds a vast amount of of the warmth of the sun on our face, enjoy
knowledge, due to his superhuman ability the melody or beat of music, or the ability
to absorb information from books. In one to behold a beautiful sunset – experiences
of the scenes in the movie, the professor that AI will never fully be able to replicate, Synthetic data will never be human. Accuracy hinges on the training data.
counsels the prodigy on the distinction no matter how advanced it becomes. AI alone can never echo our product The value of synthetic data is not binary
between book knowledge and real-world How humans react to products, or life in experiences, which combine the five (good or bad); the accuracy of synthetic
experience. The professor says, “But I’ll general, is not captured solely in the brain senses, emotions, expectations, and data depends on many factors including
bet you can’t tell me what it smells like in as factual or semantic knowledge, our context. Therefore, our goal is to the differences in the data we are trying
the Sistine Chapel. You’ve never actually bodies and sensory experiences play a augment human input with synthetic to replicate, and the representativeness
stood there and looked up at that beautiful significant role, too. data, not replace it. of the real-world data we are training
ceiling”. True knowledge comes from living an AI to learn from. The use of synthetic
life, not from books, pictures, videos or any data should be strategic, considering the
other representations of the real world. associated risks and benefits.
03
When accurate, it can power product
testing. Synthetic data can boost market
research agility, making it ideal for
resource-intensive areas like product
testing – reducing costs, saving time, with
additional benefits for detailed sub-group
True knowledge comes analyses.
from living life, not from
books, pictures, videos or
any other representations
of the real world.
2 Illic, Maya, Bangia, Ajay, Legg Jim (2024). Conversations with AI Part V. Is there SYNTHETIC DATA
depth and empathy with AI twins? Ipsos Views From hype to reality -
a guide to responsible adoption
Michel Guidi
Benoit Hubert
Ciprian Sava
Rich Timpone, Ph.D.
3 Moore Chris, Stronge Cameron, Bhudiya, Manjula (2024). Judgment Day: The
Machines Have Arrived – But how good are they at answering choice experiments?
Sawtooth Conference.
4 Yan Tao, Olga Viberg, Ryan S. Baker and René F. Kizilcec (2024). Cultural bias and
cultural alignment of large language models. PNAS Nexus, Vol. 3, No. 9
5 Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting,
and David Wingate (2022). Out of One, Many: Using Language Models to Simulate
Human Samples. arXiv. TOWARDS MORE
AGILE AND EFFICIENT
EXPLORING
PRODUCT TESTING THE CHANGING
6 AI-based Deep Learning is a way for computers to learn by analyzing large amounts Opportunities and limitations for small sample sizes
AI LANDSCAPE
of data and finding patterns, much like how humans learn from experience. It uses
Author: Nikolai Reynolds, Josef Zach, Jinho Cho
Contributor: Colin Ho
May 2021
From Analytical to Generative AI
neural networks inspired by the human brain to recognize information and make
April 2023
AUTHORS
7 Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014). Generative Adversarial ABSTRACT
The historic assumption that larger sample sizes are needed when testing prototypes in the early stage of the product
Networks. arXiv.
for product tests derives from the hypothesis that there is development and (for cost rationalization studies) in the later
a risk of unreliable and varying consumer responses, i.e., stage of product development. Our findings suggest small
variance. Today, large consumer panels and databasing of sample sizes can be considered when the objective is to
consumer responses allow us to revisit historic variance create differentiating products such as for pre-screening
assumptions for product development. In this paper we of prototypes. Such pre-screening allows us to save cost
investigate the variance of products tested across regions and time by reducing the number of products required for
IPSOS
and categories using different scales from 36,500 consumers further testing. However, for other types of testing such as
in our database. We assess how reliable a smaller sample cost rationalization studies, or when subgroups need to be
8 Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language
size of n=50 is versus larger samples of n=150 or more analyzed, larger sample sizes are recommended.
VIEWS
Models are Unsupervised Multitask Learners.
IPSOS | TOWARDS MORE AGILE AND EFFICIENT PRODUCT TESTING 1
9 Reynolds, N., Zach, J., Cho, J, Ho, C. (2021). Towards more agile and efficient ADAPTING PRODUCT
product testing. Opportunities and limitations for smaller sample sizes, Ipsos POV. BEYOND CHALLENGING TESTING IN
THE HYPE THE STATUS QUO CHALLENGING TIMES
Innovation predictions in What makes a new Be Contactless, Leverage Technology, Get Social
the era of machine learning product succeed
10 We also compared the results from the all-synthetic data to the all-human data;
By Virginia Weil and Nikolai Reynolds | June 2020
July 2022