Sources of Error in Statistics

Diogo Marques

2 anos ago

Let’s say you have 50 people in your classroom and ask all of them one by one if they like chocolate or not.

40 people said yes, so 40/50 = 80% of people like chocolate, and 20% don’t.

What about the whole college campus, which has a total of 20,000 people? Can you infer that 80% also like chocolate?

Here is where Inferential Statistics comes in and the introduction of errors. This is an example but serves as a benchmark for the idea.

The whole point to infer is to do things that are possible given the current resources. 20,000 people might not be that bad, since it is a yes-no question. But what about if the researcher gets all technical and wants to know 100 things about the chocolate, like texture, different flavors, size, shape, different types of almonds, temperature, hours of the day people eat it, and so on.

I think you get the picture. This can start becoming increasingly complicated, especially if we have an even larger population, like 200,000 people, or even larger like a whole country.

So, statisticians know that errors are introduced when conducting their research. In this article, I list all the ones I could find.

Note that there are some general ones and others that are more specific to each field, from Genetics to Economics and Finance and many other fields.

Here’s the list:

Sampling error: Sampling error occurs when a sample of data is not representative of the population from which it was drawn. This can happen if the sample is not randomly selected, if the sample size is too small, or if the sample contains outliers.
Measurement error: Measurement error occurs when the instruments used to collect data are not precise or are not used correctly. This can happen if the instruments are not calibrated properly, if the instructions for using the instruments are not clear, or if the data collectors are not trained properly.
Nonresponse error: Nonresponse error occurs when some members of the sample do not respond to the survey or do not provide complete information. This can happen if the survey design is not effective, if the survey is not well-timed, or if the survey is not culturally appropriate.
Selection bias: Selection bias occurs when the data collection process is not random, and certain groups are more or less likely to be included in the sample. This can happen if the sample is not representative of the population, or if the sample is self-selected.
Confounding: Confounding occurs when two variables are correlated and it is not clear which variable is causing the effect. This can happen if the study design is not well-controlled, if the sample size is too small, or if the data is not collected in a way that allows for accurate causal inference.
Observer bias: Observer bias occurs when the data collectors or researchers have a preconceived notion about the outcome of the study and this affects the way they collect or analyze the data.
Data entry errors: Data entry errors occur when the data is entered into a database incorrectly.
Data analysis errors: Data analysis errors occur when the data is analyzed in a way that is not appropriate for the research question or when the data is interpreted incorrectly.
Model misspecification: Model misspecification occurs when the model that is used to analyze the data does not correctly reflect the underlying processes that generated the data.
Publication bias: Publication bias refers to the tendency for researchers to publish only studies that have statistically significant findings, while studies that have null or inconclusive results are less likely to be published.
Endogeneity: Endogeneity occurs when the variables of interest are correlated with the error term in the model, making it difficult to identify the causal relationship between variables.
Model risk: Model risk occurs when the model used to make predictions or decisions is not accurate, leading to suboptimal or incorrect decisions.
Data availability: Data availability errors occur when data is not available or is difficult to obtain, leading to incomplete or inaccurate analyses.
Data quality: Data quality errors occur when data is of poor quality, leading to inaccurate or unreliable analyses.
Data timeliness: Data timeliness errors occur when data is not available in a timely manner, making it difficult to make timely decisions.
Liquidity Risk: Liquidity risk occurs when an institution or market participant is unable to meet its financial obligations because it cannot readily liquidate its assets.
Credit Risk: Credit risk occurs when a borrower defaults on a loan or is unlikely to be able to repay the loan.
Operational Risk: Operational risk occurs when an institution or market participant is exposed to the risk of loss due to inadequate or failed internal processes, human errors, systems or external events.
Market risk: Market risk occurs when an institution or market participant is exposed to the risk of loss due to changes in the value of its assets or liabilities.
Political Risk: Political risk occurs when an institution or market participant is exposed to the risk of loss due to changes in government policies or instability in a country.
Natural Disaster Risk: Natural Disaster Risk occurs when an institution or market participant is exposed to the risk of loss due to natural disasters such as floods, earthquakes, hurricanes etc.

The following are specific to the Behavioral Economics Field.
Experimenter demand effect: The experimenter demand effect occurs when participants in a study change their behavior because they believe that’s what the experimenter wants them to do.
Self-selection bias: Self-selection bias occurs when participants in a study self-select themselves into the study based on certain characteristics, leading to a sample that is not representative of the population.
Social desirability bias: Social desirability bias occurs when participants in a study change their answers to questions to conform to social norms, leading to inaccurate results.
Anchoring and adjustment: Anchoring and adjustment occurs when people use an initial value as a reference point and adjust their estimates accordingly, leading to inaccurate results.
Confirmation bias: Confirmation bias occurs when people look for information that confirms their existing beliefs, and ignore information that contradicts their beliefs, leading to inaccurate results.
Self-serving bias: Self-serving bias occurs when people attribute their successes to their own abilities, but attribute their failures to external factors, leading to inaccurate results.
Illusory superiority: Illusory superiority occurs when people overestimate their own abilities or the abilities of their group, leading to inaccurate results.
Overconfidence: Overconfidence occurs when people are too confident in their own abilities or the abilities of their group, leading to inaccurate results.
Hindsight bias: Hindsight bias occurs when people believe that an outcome was more predictable than it actually was, after they know what the outcome is, leading to inaccurate results.
Emotion-based biases: Emotion-based biases occur when people make decisions based on their emotions, rather than objective information, leading to inaccurate results.
Representativeness bias: Representativeness bias occurs when people make judgments about the probability of an event based on how similar it is to a prototype, rather than on base rates, leading to inaccurate results.

The following are specific when someone is conducting an interview
Confirmation bias: Confirmation bias occurs when interviewers look for information that confirms their existing beliefs, and ignore information that contradicts their beliefs.
Self-fulfilling prophecy: Self-fulfilling prophecy occurs when interviewers’ expectations of the candidate influence their perception of the candidate’s performance.
Halo effect: Halo effect occurs when interviewers’ overall impression of the candidate influences their perception of the candidate’s specific traits.
Horns effect: Horns effect occurs when interviewers’ negative impression of the candidate influences their perception of the candidate’s specific traits.
Social desirability bias: Social desirability bias occurs when interviewers change their judgments of the candidate based on the candidate’s social characteristics, such as race, gender, or age.
Recency bias: Recency bias occurs when interviewers give more weight to the most recent information they received about the candidate, disregarding other information they received earlier.
Primacy bias: Primacy bias occurs when interviewers give more weight to the first information they received about the candidate, disregarding other information they received later.
Stereotyping: Stereotyping occurs when interviewers make judgments about the candidate based on their stereotypes of the candidate’s group, rather than on the candidate’s individual characteristics.
Interviewer bias: Interviewer bias occurs when interviewers are more likely to rate a candidate positively if they share certain characteristics with the interviewer.
Leniency or harshness bias: Leniency bias occurs when interviewers rate candidates more positively, and harshness bias occurs when interviewers rate candidates more negatively, than they deserve based on their performance.

The following list is specific to Clinical Psychologists:
Therapist expectancy effect: The therapist expectancy effect occurs when the therapist’s expectations of the patient’s outcome influence the patient’s outcome.
Observer bias: Observer bias occurs when the therapist’s judgments of the patient’s behavior or symptoms are influenced by their own preconceptions or biases.
Diagnostic overshadowing: Diagnostic overshadowing occurs when the therapist’s attention is focused on the patient’s diagnosis, rather than the patient’s individual characteristics, leading to inaccurate assessment and treatment.
Confirmation bias: Confirmation bias occurs when the therapist looks for information that confirms their existing beliefs about the patient, and ignores information that contradicts their beliefs.
Stereotyping: Stereotyping occurs when the therapist makes judgments about the patient based on their stereotypes of the patient’s group, rather than on the patient’s individual characteristics.
Treatment bias: Treatment bias occurs when the therapist is more likely to recommend a certain treatment for the patient based on the therapist’s own preferences or beliefs, rather than on the patient’s individual needs.
Self-disclosure bias: Self-disclosure bias occurs when the therapist shares personal information with the patient in a way that is not appropriate or relevant to the therapy session.
Self-esteem bias: Self-esteem bias occurs when the therapist’s own self-esteem influences the therapist’s perceptions of the patient’s self-esteem.
Projection bias: Projection bias occurs when the therapist attributes their own characteristics, feelings or motivations to the patient.
Burnout bias: Burnout bias occurs when the therapist’s own burnout or stress affects the therapist’s ability to provide effective treatment.