Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2003

Open Access 01-12-2003 | Research article

How does correlation structure differ between real and fabricated data-sets?

Authors: Noori Akhtar-Danesh, Mahshid Dehghan-Kooshkghazi

Published in: BMC Medical Research Methodology | Issue 1/2003

Login to get access

Abstract

Background

Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually greater than that found in real data-sets. We aim to study the differences between real and fabricated data-sets in term of the association between two variables.

Method

Three examples are presented where outcomes from made up (fabricated) data-sets are compared with the results from three real data-sets and with appropriate simulated data-sets. Data-sets were made up by faculty members in three universities. The first two examples are devoted to the correlation structures between continuous variables in two different settings: first, when there is high correlation coefficient between variables, second, when the variables are not correlated. In the third example the differences between real data-set and fabricated data-sets are studied using the independent t-test for comparison between two means.

Results

In general, higher correlation coefficients are seen in made up data-sets compared to the real data-sets. This occurs even when the participants are aware that the correlation coefficient for the corresponding real data-set is zero. The findings from the third example, a comparison between means in two groups, shows that many people tend to make up data with less or no differences between groups even when they know how and to what extent the groups are different.

Conclusion

This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70. However, when inspecting for the differences between means in different groups, the same rule may not be applicable as we observed smaller differences between groups in made up compared to the real data-set. We also showed that inspecting the scatter-plot of two variables can be considered as a useful tool for uncovering fabricated data.
Appendix
Available only for authorised users
Literature
1.
go back to reference Horton R: The clinical trial: Deceitful, disputable, unbelievable, unhelpful, and shameful – What next?. Controlled Clinical Trials. 2001, 22: 593-604. 10.1016/S0197-2456(01)00175-1.CrossRefPubMed Horton R: The clinical trial: Deceitful, disputable, unbelievable, unhelpful, and shameful – What next?. Controlled Clinical Trials. 2001, 22: 593-604. 10.1016/S0197-2456(01)00175-1.CrossRefPubMed
2.
go back to reference Neaton JD, Bartsch GE, Broste SD, Cohen JD, Simon NM: A case of data alteration in the multiple risk factor intervention trial (MRFIT). Controlled Clinical Trials. 1991, 12: 731-740. 10.1016/0197-2456(91)90036-L.CrossRefPubMed Neaton JD, Bartsch GE, Broste SD, Cohen JD, Simon NM: A case of data alteration in the multiple risk factor intervention trial (MRFIT). Controlled Clinical Trials. 1991, 12: 731-740. 10.1016/0197-2456(91)90036-L.CrossRefPubMed
3.
go back to reference Ranstam J, Buyse M, George SL, et al: Fraud in medical research: an international survey of biostatisticians. Controlled Clinical Trials. 2000, 21: 415-427. 10.1016/S0197-2456(00)00069-6.CrossRefPubMed Ranstam J, Buyse M, George SL, et al: Fraud in medical research: an international survey of biostatisticians. Controlled Clinical Trials. 2000, 21: 415-427. 10.1016/S0197-2456(00)00069-6.CrossRefPubMed
6.
go back to reference Buyse M, George SL, Evans S, et al: The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine. 1999, 18: 3435-3451. 10.1002/(SICI)1097-0258(19991230)18:24<3435::AID-SIM365>3.3.CO;2-F.CrossRefPubMed Buyse M, George SL, Evans S, et al: The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine. 1999, 18: 3435-3451. 10.1002/(SICI)1097-0258(19991230)18:24<3435::AID-SIM365>3.3.CO;2-F.CrossRefPubMed
7.
go back to reference DeMets DL: Distinctions between fraud, bias, errors, misunderstanding, and incompetence. Controlled Clinical Trials. 1997, 18: 637-650. 10.1016/S0197-2456(97)00010-X.CrossRefPubMed DeMets DL: Distinctions between fraud, bias, errors, misunderstanding, and incompetence. Controlled Clinical Trials. 1997, 18: 637-650. 10.1016/S0197-2456(97)00010-X.CrossRefPubMed
8.
go back to reference Bailey KR: Detecting fabrication of data in a multicenter collaborative animal study. Controlled Clinical Trials. 1991, 12: 741-752. 10.1016/0197-2456(91)90037-M.CrossRefPubMed Bailey KR: Detecting fabrication of data in a multicenter collaborative animal study. Controlled Clinical Trials. 1991, 12: 741-752. 10.1016/0197-2456(91)90037-M.CrossRefPubMed
9.
go back to reference Machin D, Campbell M, Fayers P, Pinol A: Sample Size Tables for Clinical Studies. 1997, Oxford: Blackwell Science Machin D, Campbell M, Fayers P, Pinol A: Sample Size Tables for Clinical Studies. 1997, Oxford: Blackwell Science
10.
go back to reference Akhtar-Danesh N: The incidence of congenital malformation in Southern Iran, 1987–1988: an epidemiological survey. MSc thesis. 1988, Shiraz University of Medical Sciences Akhtar-Danesh N: The incidence of congenital malformation in Southern Iran, 1987–1988: an epidemiological survey. MSc thesis. 1988, Shiraz University of Medical Sciences
11.
go back to reference Rideout E, England-Oxford V, Brown B, et al: A comparison of problem-based and conventional curricula in nursing education. Advances in Health Sciences Education. 2002, 7: 3-17. 10.1023/A:1014534712178.CrossRefPubMed Rideout E, England-Oxford V, Brown B, et al: A comparison of problem-based and conventional curricula in nursing education. Advances in Health Sciences Education. 2002, 7: 3-17. 10.1023/A:1014534712178.CrossRefPubMed
Metadata
Title
How does correlation structure differ between real and fabricated data-sets?
Authors
Noori Akhtar-Danesh
Mahshid Dehghan-Kooshkghazi
Publication date
01-12-2003
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2003
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-3-18

Other articles of this Issue 1/2003

BMC Medical Research Methodology 1/2003 Go to the issue