A well-designed survey can provide valuable information about a person’s attitudes, thoughts, evaluations, and behavioral intentions (or past behaviors). However, many of us have taken surveys that were inefficient, irrelevant, or confusing. We may have wondered: “how do these questions relate to my life?” or “why am I answering these questions?”
Bad surveys have real consequences. Survey participants may give inaccurate data if (1) they are forced to complete a survey that they have no vested interest in completing or (2) they are too exhausted to think through their survey responses. If a person is not required to complete a survey, they may stop and provide incomplete information. Bad surveys also can lead stakeholders to obtain inaccurate or biased data and make inaccurate findings, implications, and strategic plans. Stated differently, bad survey design can negatively bias data and make findings and implications out-of-sample (e.g., their data does not allow the researcher to answer the research questions that led to administering the survey in the first place).
What follows are considerations regarding how I think about survey research. I discuss what an ideal survey could look like, concerns that I have when thinking through survey design, and steps that I use to assess construct and structural validity. I end with the steps that I take to create my own survey measure or assess the quality of an already-existing survey.
What would an ideal survey look like?
People use surveys for many reasons with many desired outcomes. Survey developers often must balance precision with reachability (see Clifton, 2020, for a discussion about differentiating between concerns about reliability and validity). To me, an ideal survey is one that best accomplishes the researchers and stakeholders’ goals. The survey is also one where both the survey items and survey platform are well designed (refer to best practices in survey design by de Leeuw, Hox, & Dillman, 2012).
That may seem intuitive, but many surveys often only provide weak data regarding the original research question or intention. The survey may be too long, not measure what they think they are measuring (e.g., lacking content validity), or be designed in a way that participants lose interest or are unable to complete the survey. Here are the questions that I ask myself when developing a survey:
- What is the goal of the survey?
- How short can the survey be to accomplish the survey goal(s)?
- How likely will my or our survey provide information related to the survey goal(s)?
- Is the survey designed to best measure the construct(s) of interest in the population of interest?
- Do participants have enough vested interest (Crano & Prislin, 1995; Johnson et al., 2014) or motivation to complete the survey and provide accurate data?
When should I use something that already exists? When should I create something new?
This question comes from two competing concerns:
- Specific attitudes (Ajzen & Fishbein, 2005) or specific identities (Tajfel & Turner, 1986, see also Hogg, 2018) relate to specific behaviors.
- Measures of general constructs should perform similarly across samples.
I would create a specific measure when:
- I am measuring specific attitudes or specific behaviors.
- I am trying to identify or discover something new (e.g., what Clifton, 2020, would call discovery of a concept).
- A stakeholder wants a new or modified measure for a relevant construct or need.
I would use or modify a general or preexisting measure when:
- An agreed upon measure exists—especially if it fits the stakeholders’ needs.
- A priorly developed measure has been used in similar circumstances. This measure should have decent-enough structural validity, and stakeholders should not have a motivation to create a new measure.
- A framework exist for what should be included in a measure of a construct (e.g., horizontal and vertical evaluations of the self and others, see Koch et al, 2021; or social identification, see Cameron, 2004 or Leach et al., 2008)
Like many aspects of social and behavioral science research, the challenge is to identify the tradeoffs and to decide how to structure the study design and methodology.
How can I offer evidence that I am measuring what I think I am measuring?
Within any specific study, I will assess a measure’s structural validity through confirmatory factor analyses. I usually use a set of indicators (David Kenny and Rex Kline discuss them in much more detail):
- Chi-square and associated probability value (although this test may say the model has a bad model fit due to type 1 errors )
- A goodness of fit measure (e.g., Comparative Fit Index), where values closer to 1 are better.
- One or more badness of fit measures (e.g., RMSEA or SRMR), where values closer to 0 are better.
- One or more comparative fit measures (e.g., AIC, BIC, or similar), where lower numbers are better.
Across studies, I look for whether a scale is valid and reliable across measurements or samples. I would likely use a confirmatory factor analysis to investigate whether my data is similar enough to the factor structure from prior research. Internal consistency measures (e.g., Cronbach’s alpha or similar, Xiao & Hau, 2023) are a good proxy for confirmatory factor analyses if they are available, but I would really want to do my own CFA when possible.
What would I do if I was making my own survey?
When making my own survey, I would usually follow four general steps. Other authors may further organize or clarify the steps (see Boateng et al., 2018 for a thorough example), but this is how I group the steps for survey development.
- I identify a construct that needs a survey or scale through literature reviews, discussions with experts, and interviews or focus groups,
- I create potential items and testing an initial factor structure through exploratory or confirmatory factor analyses,
- I establish structural validity through confirmatory factor analyses and assess concurrent or discriminant validity through correlations, and
- I attempt to show predictive validity through a research study (e.g., experiment, quasi-experiment, or longitudinal analysis).
These steps allow me to assess whether something is needed, test out potential items to best measure a construct, and then show that the construct predicts something novel or better than prior measures or related constructs.
What would I do if I was thinking about using someone else’s scale?
Often, researchers would want to use someone else’s measure to assess a construct of interest. In these cases, I would follow these steps to assess whether a scale should be used in my research.
First, I would check for validity within the survey measure. I would want to make sure that the survey measure has face validity (e.g., it looks like it measures what it says it measures), structural validity (e.g., survey items generally measure a hypothesized latent construct), and construct validity (e.g., the survey items measures the construct of interest and not something else).
If stakeholders or their colleagues previously used the survey measure, I would see how well it performed in similar samples. If not, I would see whether a pilot test could help assess how well a measure could perform in the intended population.
Conclusion
This is an overview of how I think about survey research today. Developing better surveys can help organizations obtain better data while helping participants provide more accurate information. Concurrently, seemingly relatively minor decisions have major implications for how data are collected and analyzed as well as the interpretations that can come from data analyses. No survey may be perfect, but having better surveys can lead to better findings, implications, and decisions based on survey data.

A picture of New York City at sunrise from a plane landing at JFK Airport.
Leave a comment