The internet is becoming an increasingly common tool for survey research, particularly among “hidden” or vulnerable populations, such as men who have sex with men (MSM). Web-based research has many advantages for participants and researchers, but fraud can present a significant threat to data integrity. Investigators at the University of Kentucky College of Public Health undertook an analysis to evaluate fraud detection strategies in a Web-based survey of young MSM, and to describe new protocols to improve fraud detection in Web-based survey research. The results of their research appear in JMIR Public Health and Surveillance. Ms. April Ballard, is the first author. Mr. Trey Cardwell, is a co-author, along with corresponding author Dr. April Young, associate professor of epidemiology. Ms. Ballard, who received her MPH in environmental health and epidemiology from the University of Kentucky College of Public Health, is currently a PhD student at Emory University Rollins School of Public Health.
The study involved a cross-sectional Web-based survey that examined individual- and network-level risk factors for HIV transmission and substance use among young MSM residing in 15 counties in Central Kentucky. Study staff evaluated each survey entry that was at least 50 percent complete, looking for fraud using an algorithm involving eight criteria based on a combination of geolocation data, survey data, and personal information. They classified entries as “fraudulent”, “potentially fraudulent”, or “valid”, and performed descriptive analyses to describe each fraud detection criterion among entries.
Of the 414 survey entries, the final categorization resulted in 119 (or 28.7 percent) entries identified as “fraud”, 42 (or 10.1 percent) as “potential fraud”, and 253 (or 61.1 percent) as “valid”. Geolocation outside of the study area was the most frequently violated criterion. However, 33.3 percent (82 out of 246) of the entries that had ineligible geolocations belonged to participants who were in eligible locations (as verified by their request to mail payment to an address within the study area, or participation at a local event). The second most frequently violated criterion was an invalid phone number (94 out of 414 entries, or 22.7 percent), followed by mismatching names within an entry (43 out of 414, or 10.4 percent) and unusual email addresses (37 out of 414, or 8.9 percent). Fewer than than 5 percent (18 out of 414) of the entries had some combination of personal information items matching that of a previous entry.
The authors suggest that researchers conducting Web-based surveys of MSM should be vigilant about the potential for fraud. Specifically, researchers should have a fraud detection algorithm in place prior to data collection, and should not rely on the Internet Protocol (IP) address or geolocation alone – but should use a combination of indicators.