How do I get started?
How do I get more help?
Support
The structure of our knowledge makes that knowledge amenable to gathering and sharing by various methods. But a good tool for managing this information would go a long way, and being experts in fitting tools to purposes, we will likely conclude that the tool that lets us track and share our knowledge is a robust but elegant content management system (CMS). Such a system would support a narrow range of utterances:
"This is a tool we are exploring/testing/have tested for this or that learning purpose." Tool, phase, and purpose. "This is information we have found about a specific tool from a specific research method: interview, lit review, observation, etc." Data, research method, tool, and purpose.
"Here is a reference to a specific tool we plan to explore, along with possible purposes."
Such a system would work as a kind of dashboard. It would also allow us to track our work internally and simultaneously publish whatever elements we are ready to share. Moving knowledge from system to system is inefficient, however, and when support comes first, sharing our knowledge will always take a back seat. Therefore the knowledge management system and the knowledge sharing system would ideally be one and the same.
Indeed, if efficiency is a paramount concern (as it should be), and the relevant resources are far-flung, then it would be easier to point to them than to gather them together — a bibliography, not an encyclopedia. Given that tools like delicious.com, Tumblr, Zotero, Instapaper, and Evernote, to name a few, point to information elsewhere or gather disparate resources in a single location, it's also possible that the CMS is overkill for ed tech knowledge tracking.
5. Ed tech verifies hypotheses of a few definite types.
In ed tech, our hypotheses have typical forms. For example:
Our work on the first level is verification . We assure ourselves that Tool T really has the functions claimed. Hypotheses start out basic and increase in complexity of the evaluative criteria we use. There is no point in testing the functions of a tool that does not work on the platforms you need it to run on. What operating systems are supported? Functions required must then be identified for testing on the relevant operating systems.
Our work on the next two levels is (1) discovery and (2) verification and evaluation. We find out how well the tool works, verify existing claims, and evaluate the facts we gather based on our local framework of needs and standards.
6. Well-formed ed tech recommendations are sourced, supported, and qualified.
Recommendations will be more compelling when they carry with them the sources of our authority: our research methods and evidence. Recommendations cannot be unqualified, however; there are always caveats.
Here it seems wise to label the tools we recommend in terms of the level of support faculty and students can expect. Three levels seem crucial.
7. Piloting is a kind of testing and evaluation based on strategic criteria.
A pilot is one kind of testing. But piloting typically only happens when a decision is to be made about the relative value of buying or supporting a specific tool for a specific purpose. When we decide to recommend and support a tool, even given all the proper caveats, such a recommendation comes with a cost: even if no money is spent, time and therefore money are used up. When such expenditures, whether in labor or dollars, rise above a certain threshold, a special kind of evaluation is needed. The criteria of such an evaluation are largely strategic:
The hypothesis involved has a characteristic form: "Tool T meets our standards for recommendation and support." The form is trivial; the devil is in the standards.
One possible set of standards follows; bigger and smaller schools will have different values when counting impact.
Dimension/ | Depth of Impact | Breadth of Impact | Level of Innovation | Alignment with Goals | Attention Likely |
---|---|---|---|---|---|
Strong | Impacts over 1,000 students in a single academic year. | Benefits an entire school or program. | Represents a quantum leap for us and puts us at the top of our peer institutions. | Aligns with at least two strategic goals at three different levels: the profession, the university, our unit, the relevant schools and departments | Likely to attract positive attention. |
Moderate | Impacts over 100 and under 1,000 students in a single academic year. | Benefits several professors or a department. | Represents an incremental advance or brings us up to the level of our peers. | Aligns with at least two strategic goals at two different levels: the profession, the university, our unit, the relevant schools and departments. | Likely to attract mixed attention. |
Weak | Impacts under 100 students in a single academic year. | Benefits one professor. | Represents the status quo for our institution. | Does not clearly align with any of the relevant strategic goals: the profession, the university, our unit, the relevant schools and departments. | Likely to attract negative attention |
To what extent breadth of impact, say, outweighs potential negative attention is something to decide in practice. Having a clear set of standards absolves no-one from making judgment calls. After enough evaluations have been completed using an instrument like this one, it should be possible to set the borderlines more clearly and even to weight the factors so that, for instance, insufficient documentation is a deal-breaker and possible negative attention is merely a nuisance — or vice-versa.
The verification of these hypotheses can only come through practice. To the extent that any part of them cannot be verified, that part needs to be thrown away and the hypothesis adjusted accordingly. The preceding is not just a framework and hypotheses: as research, it's a call to a community to share and discuss results, evidence, and methods. Although our work is always local, case-based reasoning will suggest analogies even for those whose work seems on the surface far-flung. Research is future-oriented and remains forever open. I offer this framework in that spirit.
My thanks go to my supervisor at Yale University, Edward Kairiss. He asked me to reflect on what a “pilot” was, and when I found that I needed to step back and get a wider view, not only did he not balk, he encouraged me. What is written here would not exist without his encouragement and support. Additional thanks go to David Hirsch, whose organizational work at Yale provided a model of what the reflective practitioner can accomplish.
© 2015 Edward R. O'Neill. The text of this EDUCAUSE Review article is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 license .
Home Market Research
A research study starts with a question. Researchers worldwide ask questions and create research hypotheses. The effectiveness of research relies on developing a good research hypothesis. Examples of research hypotheses can guide researchers in writing effective ones.
In this blog, we’ll learn what a research hypothesis is, why it’s important in research, and the different types used in science. We’ll also guide you through creating your research hypothesis and discussing ways to test and evaluate it.
A hypothesis is like a guess or idea that you suggest to check if it’s true. A research hypothesis is a statement that brings up a question and predicts what might happen.
It’s really important in the scientific method and is used in experiments to figure things out. Essentially, it’s an educated guess about how things are connected in the research.
A research hypothesis usually includes pointing out the independent variable (the thing they’re changing or studying) and the dependent variable (the result they’re measuring or watching). It helps plan how to gather and analyze data to see if there’s evidence to support or deny the expected connection between these variables.
Hypotheses are really important in research. They help design studies, allow for practical testing, and add to our scientific knowledge. Their main role is to organize research projects, making them purposeful, focused, and valuable to the scientific community. Let’s look at some key reasons why they matter:
A hypothesis plays a pivotal role in the scientific method by providing a basis for testing existing theories. For example, a hypothesis might test the predictive power of a psychological theory on human behavior.
It serves as a launching pad for investigation activities, which offers researchers a clear starting point. A research hypothesis can explore the relationship between exercise and stress reduction.
A well-formulated hypothesis guides the entire research process. It ensures that the study remains focused and purposeful. For instance, a hypothesis about the impact of social media on interpersonal relationships provides clear guidance for a study.
In some cases, a hypothesis can suggest new theories or modifications to existing ones. For example, a hypothesis testing the effectiveness of a new drug might prompt a reconsideration of current medical theories.
A hypothesis clarifies the data requirements for a study, ensuring that researchers collect the necessary information—a hypothesis guiding the collection of demographic data to analyze the influence of age on a particular phenomenon.
Hypotheses are instrumental in explaining complex social phenomena. For instance, a hypothesis might explore the relationship between economic factors and crime rates in a given community.
Hypotheses establish clear relationships between phenomena, paving the way for empirical testing. An example could be a hypothesis exploring the correlation between sleep patterns and academic performance.
A hypothesis guides researchers in selecting the most appropriate analysis techniques for their data. For example, a hypothesis focusing on the effectiveness of a teaching method may lead to the choice of statistical analyses best suited for educational research.
A hypothesis is a specific idea that you can test in a study. It often comes from looking at past research and theories. A good hypothesis usually starts with a research question that you can explore through background research. For it to be effective, consider these key characteristics:
When you use these characteristics as a checklist, it can help you create a good research hypothesis. It’ll guide improving and strengthening the hypothesis, identifying any weaknesses, and making necessary changes. Crafting a hypothesis with these features helps you conduct a thorough and insightful research study.
The research hypothesis comes in various types, each serving a specific purpose in guiding the scientific investigation. Knowing the differences will make it easier for you to create your own hypothesis. Here’s an overview of the common types:
The null hypothesis states that there is no connection between two considered variables or that two groups are unrelated. As discussed earlier, a hypothesis is an unproven assumption lacking sufficient supporting data. It serves as the statement researchers aim to disprove. It is testable, verifiable, and can be rejected.
For example, if you’re studying the relationship between Project A and Project B, assuming both projects are of equal standard is your null hypothesis. It needs to be specific for your study.
The alternative hypothesis is basically another option to the null hypothesis. It involves looking for a significant change or alternative that could lead you to reject the null hypothesis. It’s a different idea compared to the null hypothesis.
When you create a null hypothesis, you’re making an educated guess about whether something is true or if there’s a connection between that thing and another variable. If the null view suggests something is correct, the alternative hypothesis says it’s incorrect.
For instance, if your null hypothesis is “I’m going to be $1000 richer,” the alternative hypothesis would be “I’m not going to get $1000 or be richer.”
The directional hypothesis predicts the direction of the relationship between independent and dependent variables. They specify whether the effect will be positive or negative.
If you increase your study hours, you will experience a positive association with your exam scores. This hypothesis suggests that as you increase the independent variable (study hours), there will also be an increase in the dependent variable (exam scores).
The non-directional hypothesis predicts the existence of a relationship between variables but does not specify the direction of the effect. It suggests that there will be a significant difference or relationship, but it does not predict the nature of that difference.
For example, you will find no notable difference in test scores between students who receive the educational intervention and those who do not. However, once you compare the test scores of the two groups, you will notice an important difference.
A simple hypothesis predicts a relationship between one dependent variable and one independent variable without specifying the nature of that relationship. It’s simple and usually used when we don’t know much about how the two things are connected.
For example, if you adopt effective study habits, you will achieve higher exam scores than those with poor study habits.
A complex hypothesis is an idea that specifies a relationship between multiple independent and dependent variables. It is a more detailed idea than a simple hypothesis.
While a simple view suggests a straightforward cause-and-effect relationship between two things, a complex hypothesis involves many factors and how they’re connected to each other.
For example, when you increase your study time, you tend to achieve higher exam scores. The connection between your study time and exam performance is affected by various factors, including the quality of your sleep, your motivation levels, and the effectiveness of your study techniques.
If you sleep well, stay highly motivated, and use effective study strategies, you may observe a more robust positive correlation between the time you spend studying and your exam scores, unlike those who may lack these factors.
An associative hypothesis proposes a connection between two things without saying that one causes the other. Basically, it suggests that when one thing changes, the other changes too, but it doesn’t claim that one thing is causing the change in the other.
For example, you will likely notice higher exam scores when you increase your study time. You can recognize an association between your study time and exam scores in this scenario.
Your hypothesis acknowledges a relationship between the two variables—your study time and exam scores—without asserting that increased study time directly causes higher exam scores. You need to consider that other factors, like motivation or learning style, could affect the observed association.
A causal hypothesis proposes a cause-and-effect relationship between two variables. It suggests that changes in one variable directly cause changes in another variable.
For example, when you increase your study time, you experience higher exam scores. This hypothesis suggests a direct cause-and-effect relationship, indicating that the more time you spend studying, the higher your exam scores. It assumes that changes in your study time directly influence changes in your exam performance.
An empirical hypothesis is a statement based on things we can see and measure. It comes from direct observation or experiments and can be tested with real-world evidence. If an experiment proves a theory, it supports the idea and shows it’s not just a guess. This makes the statement more reliable than a wild guess.
For example, if you increase the dosage of a certain medication, you might observe a quicker recovery time for patients. Imagine you’re in charge of a clinical trial. In this trial, patients are given varying dosages of the medication, and you measure and compare their recovery times. This allows you to directly see the effects of different dosages on how fast patients recover.
This way, you can create a research hypothesis: “Increasing the dosage of a certain medication will lead to a faster recovery time for patients.”
A statistical hypothesis is a statement or assumption about a population parameter that is the subject of an investigation. It serves as the basis for statistical analysis and testing. It is often tested using statistical methods to draw inferences about the larger population.
In a hypothesis test, statistical evidence is collected to either reject the null hypothesis in favor of the alternative hypothesis or fail to reject the null hypothesis due to insufficient evidence.
For example, let’s say you’re testing a new medicine. Your hypothesis could be that the medicine doesn’t really help patients get better. So, you collect data and use statistics to see if your guess is right or if the medicine actually makes a difference.
If the data strongly shows that the medicine does help, you say your guess was wrong, and the medicine does make a difference. But if the proof isn’t strong enough, you can stick with your original guess because you didn’t get enough evidence to change your mind.
Step 1: identify your research problem or topic..
Define the area of interest or the problem you want to investigate. Make sure it’s clear and well-defined.
Start by asking a question about your chosen topic. Consider the limitations of your research and create a straightforward problem related to your topic. Once you’ve done that, you can develop and test a hypothesis with evidence.
Review existing literature related to your research problem. This will help you understand the current state of knowledge in the field, identify gaps, and build a foundation for your hypothesis. Consider the following questions:
Based on your literature review, create a specific and concise research question that addresses your identified problem. Your research question should be clear, focused, and relevant to your field of study.
Determine the key variables involved in your research question. Variables are the factors or phenomena that you will study and manipulate to test your hypothesis.
The null hypothesis is a statement that there is no significant difference or effect. It serves as a baseline for comparison with the alternative hypothesis.
Choose research methods that align with your study objectives, such as experiments, surveys, or observational studies. The selected methods enable you to test your research hypothesis effectively.
Creating a research hypothesis usually takes more than one try. Expect to make changes as you collect data. It’s normal to test and say no to a few hypotheses before you find the right answer to your research question.
Testing hypotheses is a really important part of research. It’s like the practical side of things. Here, real-world evidence will help you determine how different things are connected. Let’s explore the main steps in hypothesis testing:
Before testing, clearly articulate your research hypothesis. This involves framing both a null hypothesis, suggesting no significant effect or relationship, and an alternative hypothesis, proposing the expected outcome.
Plan how you will gather information in a way that fits your study. Make sure your data collection method matches the things you’re studying.
Whether through surveys, observations, or experiments, this step demands precision and adherence to the established methodology. The quality of data collected directly influences the credibility of study outcomes.
Choose a statistical test that aligns with the nature of your data and the hypotheses being tested. Whether it’s a t-test, chi-square test, ANOVA, or regression analysis, selecting the right statistical tool is paramount for accurate and reliable results.
Following the statistical analysis, evaluate the results in the context of your null hypothesis. You need to decide if you should reject your null hypothesis or not.
When discussing what you found in your research, be clear and organized. Say whether your idea was supported or not, and talk about what your results mean. Also, mention any limits to your study and suggest ideas for future research.
QuestionPro is a survey and research platform that provides tools for creating, distributing, and analyzing surveys. It plays a crucial role in the research process, especially when you’re in the initial stages of hypothesis development. Here’s how QuestionPro can help you to develop a good research hypothesis:
A research hypothesis is like a guide for researchers in science. It’s a well-thought-out idea that has been thoroughly tested. This idea is crucial as researchers can explore different fields, such as medicine, social sciences, and natural sciences. The research hypothesis links theories to real-world evidence and gives researchers a clear path to explore and make discoveries.
QuestionPro Research Suite is a helpful tool for researchers. It makes creating surveys, collecting data, and analyzing information easily. It supports all kinds of research, from exploring new ideas to forming hypotheses. With a focus on using data, it helps researchers do their best work.
Are you interested in learning more about QuestionPro Research Suite? Take advantage of QuestionPro’s free trial to get an initial look at its capabilities and realize the full potential of your research efforts.
LEARN MORE FREE TRIAL
Sep 2, 2024
Aug 30, 2024
Aug 29, 2024
Nature of technology ; Technology ; Technological innovation ; Technological evolution ; Technological change ; Technological progress ; Technological advances
Hypothesis refers to a supposition put forward in a provisional manner and in need of further epistemic and empirical support. Technology analysis explains the relationships underlying the source, evolution, and diffusion of technology for technological, economic and social change. Technology analysis considers technology as a complex system that evolves with incremental and radical innovations to satisfy needs, achieve goals, and/or solve problems of adopters to take advantage of important opportunities or to cope with consequential environmental threats.
Technology has an important role for competitive advantage of firms and nations, for industrial and economic change in society (Arthur 2009 ; Hosler 1994 ; Sahal 1981 ). Technology can be defined as a complex system, composed of more than one entity or...
This is a preview of subscription content, log in via an institution to check access.
Institutional subscriptions
Ahmad S (1966) On the theory of induced innovation. Econ J 76:344–57
Google Scholar
Arthur WB (1989) Competing technologies, increasing returns, and lock-in by historical events. Econ J 99:116–131
Article Google Scholar
Arthur WB (1994) Increasing returns and path dependence in the economy. University of Michigan Press, Ann Arbor
Book Google Scholar
Arthur BW (2009) The nature of technology. What it is and how it evolves. Allen Lane–Penguin Books, London
Binswanger HP (1974) A cost function approach to the measurement of elasticities of factor demand and elasticities of substitution. Am J Agric Econ 56:377–386
Binswanger HP, Ruttan VW (1978) Induced innovation: technology, institutions and development. Johns Hopkins University Press, Baltimore
Chamberlin TC (1897) The method of multiple working hypotheses. J Geol 5(8):837–848
Coccia M (2005a) Metrics to measure the technology transfer absorption: analysis of the relationship between institutes and adopters in northern Italy. Int J Technol Transf Commer 4(4):462–486. https://doi.org/10.1504/IJTTC.2005.006699
Coccia M (2005b) Measuring intensity of technological change: the seismic approach. Technol Forecast Soc Chang 72(2):117–144. https://doi.org/10.1016/j.techfore.2004.01.004
Coccia M (2005c) A taxonomy of public research bodies: a systemic approach. Prometheus 23(1):63–82. https://doi.org/10.1080/0810902042000331322
Coccia M (2006a) Analysis and classification of public research institutes, world review of science. Technol Sustain Dev 3(1):1–16. https://doi.org/10.1504/WRSTSD.2006.008759
Coccia M (2006b) Classifications of innovations: survey and future directions. Working Paper Ceris del Consiglio Nazionale delle Ricerche, Ceris-Cnr Working Paper, vol 8, no 2 – ISSN (Print): 1591-0709. Available at arXiv Open access e-prints: http://arxiv.org/abs/1705.08955
Coccia M (2010) Democratization is the driving force for technological and economic change. Technol Forecast Soc Chang 77(2):248–264. https://doi.org/10.1016/j.techfore.2009.06.007
Coccia M (2014a) Driving forces of technological change: the relation between population growth and technological innovation-analysis of the optimal interaction across countries. Technol Forecast Soc Chang 82(2):52–65. https://doi.org/10.1016/j.techfore.2013.06.001
Coccia M (2014b) Path-breaking target therapies for lung cancer and a far-sighted health policy to support clinical and cost effectiveness. Health Policy Technol 1(3):74–82. https://doi.org/10.1016/j.hlpt.2013.09.007
Coccia M (2014c) Steel market and global trends of leading geo-economic players. Int J Trade Global Markets 7(1):36–52. http://doi.org/10.1504/IJTGM.2014.058714
Coccia M (2015a) General sources of general purpose technologies in complex societies: theory of global leadership-driven innovation, warfare and human development. Technol Soc 42:199–226. https://doi.org/10.1016/j.techsoc.2015.05.008
Coccia M (2015b) Technological paradigms and trajectories as determinants of the R&D corporate change in drug discovery industry. Int J Knowl Learn 10(1):29–43. https://doi.org/10.1504/IJKL.2015.071052
Coccia M (2015c) Spatial relation between geo-climate zones and technological outputs to explain the evolution of technology. Int J Transit Innovat Sys 4(1–2):5–21. http://doi.org/10.1504/IJTIS.2015.074642
Coccia M (2016a) Radical innovations as drivers of breakthroughs: characteristics and properties of the management of technology leading to superior organizational performance in the discovery process of R&D labs. Tech Anal Strat Manag 28(4):381–395. https://doi.org/10.1080/09537325.2015.1095287
Coccia M (2016b) The relation between price setting in markets and asymmetries of systems of measurement of goods. J Econ Asymmetr 14(Part B):168–178. https://doi.org/10.1016/j.jeca.2016.06.001
Coccia M (2016c) Problem-driven innovations in drug discovery: co-evolution of the patterns of radical innovation with the evolution of problems. Health Policy Technol 5(2):143–155. https://doi.org/10.1016/j.hlpt.2016.02.003
Coccia M (2017a) The fishbone diagram to identify, systematize and analyze the sources of general purpose technologies. J Adm Soc Sci 4(4):291–303. https://doi.org/10.1453/jsas.v4i4.1518
Coccia M (2017b) The source and nature of general purpose technologies for supporting next K-waves: global leadership and the case study of the U.S. Navy’s Mobile User Objective System. Technol Forecast Soc Chang 116:331–339. https://doi.org/10.1016/j.techfore.2016.05.019
Coccia M (2017c) Varieties of capitalism’s theory of innovation and a conceptual integration with leadership-oriented executives: the relation between typologies of executive, technological and socioeconomic performances. Int J Pub Se Perform Manage 3(2):148–168. https://doi.org/10.1504/IJPSPM.2017.084672
Coccia M (2017d) Disruptive firms and industrial change. J Econ Soc Thought 4(4):437–450. http://doi.org/10.1453/jest.v4i4.1511
Coccia M (2017e) Sources of disruptive technologies for industrial change. L’industria–rivista di economia e politica industriale 38(1):97–120
Coccia M (2018a) The origins of the economics of Innovation. J Econ Soc Thought 5(1):9–28, http://doi.org/10.1453/jest.v5i1.1574
Coccia M (2018b) Theorem of not independence of any technological innovation. J Econ Bibliogr 5(1):29–35. https://doi.org/10.1453/jeb.v5i1.1578
Coccia M (2019a) Comparative theories of the evolution of technology. In: Farazmand A (ed) Global Encyclopedia of public administration, public policy, and governance. Springer, Cham. https://doi.org/10.1007/978-3-319-31816-5_3841-1
Chapter Google Scholar
Coccia M (2019b) The theory of technological parasitism for the measurement of the evolution of technology and technological forecasting, Technol Forecast Soc Chang. https://doi.org/10.1016/j.techfore.2018.12.012
Coccia M (2019c) Killer technologies: the destructive creation in the technical change. ArXiv.org e-Print archive, Cornell University, USA. Permanent arXiv available at http://arxiv.org/abs/1907.12406
Coccia M (2019d) A theory of classification and evolution of technologies within a generalized Darwinism. Tech Anal Strat Manag 31(5):517–531. https://doi.org/10.1080/09537325.2018.1523385
Coccia M (2020) Deep learning technology for improving cancer care in society: new directions in cancer imaging driven by artificial intelligence. Technol Soc 60:1–11. https://doi.org/10.1016/j.techsoc.2019.101198
Coccia M, Benati I (2018) Comparative models of inquiry. In: Farazmand A (ed) Global encyclopedia of public administration, public policy, and governance. Springer International Publishing AG, part of Springer Nature. https://doi.org/10.1007/978-3-319-31816-5_1199-1
Coccia M, Wang L (2015) Path-breaking directions of nanotechnology-based chemotherapy and molecular cancer therapy. Technol Forecast Soc Chang 94:155–169. https://doi.org/10.1016/j.techfore.2014.09.007
Coccia M., Watts J. 2020. A theory of the evolution of technology: technological parasitism and the implications for innovation management. J Eng Technol Manage 55:101552. https://doi.org/10.1016/j.jengtecman.2019.11.003
David PA (1985) Clio and the economics of QWERTY. Am Econ Rev 76:332–337
David PA (1993) Path dependence and predictability in dynamic systems with local network externalities: a paradigm for historical economics. In: Foray D, Freeman C (eds) Technology and the wealth of nations: the dynamics of constructed advantage. Pinter Publishers, London, pp 208–231
Farrell CJ (1993) A theory of technological progress. Technol Forecast Soc Chang 44(2):161–178
Fisher JC, Pry RH (1971) A simple substitution model of technological change. Technol Forecast Soc Chang 3(2–3):75–88
Frankel M (1955) Obsolescence and technological change in a maturing economy. Am Econ Rev 45(3):296–319. Retrieved from http://www.jstor.org/stable/779
Freeman C (1974) The economics of industrial innovation. Penguin, Harmondsworth
Griliches Z (1957) Hybrid corn: an exploration in the economics of technological change. Econometrica 25:501–522
Hayami Y, Ruttan VW (1970) Factor prices and technical change in agricultural development: the United States and Japan, 1880–1960. J Polit Econ 78:1115–1141
Heidelberger M, Schiemann G (eds) (2009) The significance of the hypothetical in the natural sciences. Walter de Gruyter, Berlin/New York
Hicks J (1932/1963) The theory of wages. Macmillan, London
Hosler D (1994) The sounds and colors of power: the sacred metallurgical technology of Ancient West Mexico. MIT Press, Cambridge
Meeks J (1972) Concentration in the electric power industry: the impact of antitrust policy. Columbia Law Rev 72:64–130
Olmstead AL, Rhode P (1993) Induced innovation in American agriculture: a reconsideration. J Polit Econ 101(1):100–118. Stable URL: https://www.jstor.org/stable/2138675
Pistorius CWI, Utterback JM (1997) Multi-mode interaction among technologies. Res Policy 26(1):67–84
Rosenberg N (1976) On technological expectations. Econ J 86:523–535
Sahal D (1981) Patterns of technological innovation. Addison-Wesley, Reading
Schmookler J (1962) Determinants of industrial invention. In: Nelson RR (ed) The rate of direction of inventive activity: economic and social factors. Princeton University Press, Princeton
Schmookler J (1966) Invention and economic growth. Harvard University Press, Cambridge, MA
Schmookler J, Brownlee O (1962) Determinants of inventive activity. Am Econ Rev 52(2):165–176. Papers and Proceedings of the Seventy-Fourth Annual Meeting of the American Economic Association (May, 1962)
Utterback JM, Pistorius C, Yilmaz E (2020) The dynamics of competition and of the diffusion of innovations. MIT Sloan School of Management Working Paper-6054–20. https://hdl.handle.net/1721.1/124369
Download references
Authors and affiliations.
CNR – National Research Council of Italy, Torino, Italy
Mario Coccia
Yale University, New Haven, CT, USA
You can also search for this author in PubMed Google Scholar
Correspondence to Mario Coccia .
Editors and affiliations.
Florida Atlantic University, Boca Raton, FL, USA
Ali Farazmand
Reprints and permissions
© 2020 Springer Nature Switzerland AG
Cite this entry.
Coccia, M. (2020). Comparative Hypotheses for Technology Analysis. In: Farazmand, A. (eds) Global Encyclopedia of Public Administration, Public Policy, and Governance. Springer, Cham. https://doi.org/10.1007/978-3-319-31816-5_3973-1
DOI : https://doi.org/10.1007/978-3-319-31816-5_3973-1
Received : 26 March 2020
Accepted : 13 April 2020
Published : 19 June 2020
Publisher Name : Springer, Cham
Print ISBN : 978-3-319-31816-5
Online ISBN : 978-3-319-31816-5
eBook Packages : Springer Reference Economics and Finance Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences
Policies and ethics
Research methods for education with technology: four concerns, examples, and recommendations.
The success of education with technology research is in part because the field draws upon theories and methods from multiple disciplines. However, drawing upon multiple disciplines has drawbacks because sometimes the methodological expertise of each discipline is not applied when researchers conduct studies outside of their research training. The focus here is on research using methods drawn largely from psychology, for example, evaluating the impact of different systems on how students perform. The methodological concerns discussed are: low power; not using multilevel modeling; dichotomization; and inaccurate reporting of the numeric statistics. Examples are drawn from a recent set of proceedings. Recommendations, which are applicable throughout the social sciences, are made for each of these.
Spending on EdTech is around 19 billion dollars per year ( Koba, 2015 ). Research on using computer technology within education began soon after microcomputers began appearing in universities (e.g., Suppes, 1966 ). Given the amount of accumulated wisdom in the field and the amount of investment, it is a concern that the current EdTech landscape has been likened to the Wild West ( Reingold, 2015 ), with schools buying systems without convincing evidence of their efficacy. There are many issues that researchers in the field can address to better serve schools ( Wright, 2018 ). One issue is what to call the field. I have been using the phrase Education with Technology (EwT) for research on education and educational outcomes when using technology. I use EdTech to refer to the technology companies that sell technology aimed specifically at the education market.
There is some excellent research examining the effectiveness of technology for learning. For example, decades of high-quality research by Anderson and colleagues (e.g., Anderson et al., 1985 ; Ritter et al., 2007 ) on the Cognitive Tutor has shown the successful application of cognitive science to education software (see https://www.carnegielearning.com/ ). Two important aspects of this success story are: (1) the applications developed alongside the theory (ACT-R) that Anderson had developed for cognition, and (2) the successful application to the classroom took decades of rigorous research. The focus of this paper is to improve the quality of existing research in order to allow more progress to occur.
Four concerns of research methods were chosen. These were picked both because examples were found where there are concerns and recommendations exist for improvement that can be easily accommodated. Many other topics, covering different design and analytic methods (e.g., robust methods, visualizations), could also have been included, but having four seems a good number so that each receives sufficient attention. The four concerns are:
1. Power analysis (the sample size can be too low to have an adequate likelihood of producing meaningful results);
2. Multilevel modeling (the units are not independent, which is assumed for traditional statistical tests, and this usually means that the p -values are lower than they should be);
3. Dichotomizing (continuous variables are turned into dichotomous variables at arbitrary points, like the median, thereby losing information);
4. Inaccurate statistical reporting (sometimes because of typos, sometimes because of reading the wrong output, the reported statistics are incorrect).
The field of EwT was chosen for three reasons. First, it offers valuable potential for education, though the impact has failed to live up to the potential/hype (see Cuban, 2001 ; Reingold, 2015 ). There are several possible reasons for this (e.g., Wright, 2018 ), one of which is that the methods and statistical procedures used in empirical studies leave room for improvement. Second, it is an area in which I have been working. Third, as a multidisciplinary field, different researchers bring different expertise. It may be that a research team does not have someone trained in psychology and social science research methods (e.g., Breakwell et al., 2020 ). As someone who is trained in these procedures, I hope to bring my skills to this field.
Some examples will be used both to show that these issues arise and to illustrate the problems. It is important to stress that in any field it is possible to find illustrations of different concerns. Papers from the 2017 Artificial Intelligence in Education (AIED) conference in Wuhan, China, were examined. This conference is a showcase for mostly academic researchers developing and evaluating new procedures and technologies. The papers are published in five categories: papers, posters, doctoral, industry, and tutorials. Only the papers and posters are examined here: the doctoral papers often sought advice on how to conduct the planned research; the industry papers often described a product or were a case study using a product; and the tutorials gave accounts of what their audiences would learn.
According to their website 1 , only 36 of the 121 papers submitted for oral presentations were accepted as oral presentations. Thirty-seven of these were accepted as posters (and 7 of 17 papers submitted for posters were accepted). Of the 138 total submissions, 80 were accepted as a paper or a poster (58% acceptance rate). There were 36 papers and 37 posters in the proceedings, so not all accepted posters appeared in the proceedings. The main difference between oral presentations and posters for the proceedings is that authors of oral presentations were allowed 12 pages of text for their papers and authors of posters were allowed only four pages of text. In many cases it was difficult to know what methods and statistical techniques were used, particularly for the posters, presumably because the authors had to make difficult choices of what to include because of the length restrictions.
Reflecting the multidisciplinarity of the field, the papers differed in their approaches. Some papers were primarily focused on statistical procedures to classify student responses and behaviors. Others were demonstrations of software. The focus here is on research that used methods common to what Cronbach (1957) called the experimental and correlational psychologies. Of the 63 full papers and posters, 43 (68%) involved collecting new data from participants/students not simply to show the software could be used. Some of these were described as “user studies” and some as “pilot studies.” It is important to stress that while examples will be shown to illustrate concerns, some aspects of these studies were good and overall the conference papers are high-quality. For example, those evaluating the effectiveness of an intervention tended to use pre- and post-intervention measures and compare those in the intervention condition with a control condition.
The methods—both the design of the study and the statistical procedures—were examined for concerns that a reviewer might raise. Four concerns are discussed here and recommendations are made. These were chosen both by how much they may affect the conclusions and how easily they can be addressed. While these comments are critical, the purpose of the paper is to be constructive for the field. Only a couple of examples are shown for each concern. These were picked because of how well they illustrate the concern. Before doing this, some background on hypothesis testing is worth providing. Some statistical knowledge about this procedure is assumed in this discussion. At the end of each section specific readings are recommended.
Educational with Technology research is not done in isolation. While the theme of this paper is to look at how EwT researchers deal with some issues, there is a crisis within the sciences more broadly that requires discussion. The crisis is due to the realization that a substantial proportion (perhaps most) of the published research does not replicate ( Ioannidis, 2005 ; Open Science Collaboration, 2015 ). This occurs even in the top scientific journals ( Camerer et al., 2018 ). This has led to many suggestions for changing how science is done (e.g., Munafò et al., 2017 ). For discussion see papers in Lilienfeld and Waldman (2017) and a recent report by Randall and Welser (2018) . Unfortunately using traditional methods, which have been shown to produce results that are less likely to be replicated, are ones that can make the researchers' CVs look better ( Smaldino and McElreath, 2016 ).
One aspect that many are critical of is the use and often mis-use of hypothesis testing. It is worth briefly describing what this is. In broad terms, a scientist has a set of data, assumes some model H for the data, and calculates the distribution of different characteristics for plausible samples assuming this model is true. Suppose some characteristics of the observed data are far away from the distribution of plausible samples. This would be very rare if your assumed model were correct. “It follows that if the hypothesis H be true, what we actually observed would be a miracle. We don't believe in miracles nowadays and therefore we do not believe in H being true” ( Neyman, 1952 , p. 43). There are some problems with this approach. If we only react when the data would require a miracle to have occurred if H is true, scientific findings would accumulate too slowly. Instead, for most research situations a threshold below miracle is needed to allow evidence to accumulate, but then it is necessary to accept that sometimes errors occur because of this lower threshold. Neyman ( 1952 , p. 55) called this “an error of the first kind ” ( emphasis in original). What is important here is that the possibility of error is not only recognized, but quantified.
Hypothesis testing is usually done by testing what is called the null hypothesis. This is usually a point hypothesis and that there is no effect of the independent variable, no difference between groups, or no association. It is often denoted as H 0 . As a single point, it can never be true. This creates a conceptual problem: the procedure assumes a hypothesis that is always false ( Cohen, 1994 ).
The conditional probability is usually called the p -value or sometimes just p . Calculating the p -value for different problems can be complex. Traditionally most researchers have accepted a 5% chance of making a Type 1 error when the null hypothesis is true. This is called the α (alpha) level and if the observed conditional probability is less than this, researchers have adopted the unfortunate tradition of saying it is “significant.” Unfortunate because finding p < 5% does not mean the effect is “significant” in the English sense of the word. If comparing the scores for two groups of students, finding a “significant” effect in a sample only provides information that the direction of the true effect in the population is likely the same as observed in the sample. Recently there has been a move to use different α levels. In some branches of physics it is set much lower (see Lyons, 2013 ) for discoveries because the cost of falsely announcing a discovery is so high that it is worth waiting to claim one only when the data would have had to arise by almost a “miracle” if the null hypothesis were true. Some social scientists think it is appropriate to have a lower threshold than this ( Benjamin et al., 2018 ), but others have pointed out problems with this proposal (e.g., Amrhein and Greenland, 2018 ; McShane et al., 2019 ). For current purposes 5% will be assumed because it remains the most used threshold.
There are other problems with the hypothesis testing approach and scientific practices in general. Alternatives have been put forward (e.g., more visualization, pre-registering research, Bayesian models), but each alternative has limitations and can be mis-used. The remainder of this paper will not address these broader issues.
The report by the Open Science Collaboration (2015) , while focusing on psychology research, discusses topics relevant to those applicable to the EwT studies considered. Cohen (1994) presents a good discussion of what null hypothesis significance testing is and is not.
The hypothesis testing framework explicitly recognizes the possibility of errantly rejecting the null hypothesis. This has been the focus of much discussion because this can lead to publications that are accepted in journals, but do not replicate. Another problem is when research fails to detect an effect when the true effect is large enough to be of interest. This is a problem because this often limits further investigations. This is called a Type 2 error: “failure to reject H 0 when, in fact, it is incorrect, is called the error of the second kind” ( Neyman, 1942 , p. 303). As with Type 1 errors, the conditional probability of a Type 2 error is usually reported. Researchers specify the Minimum Effect that they design their study to Detect (MED). The conditional probability of a Type 2 error is usually reported as the probability of failing to find a significant effect conditional on this MED and is often denoted with the Greek letter β (beta). The statistical concept power is 1–β and convention is that it should usually be at least 80%. However, if it is relatively inexpensive to recruit participants or if your PhD/job prospects require that you detect an effect if it is as large as the MED, it would be wise to set your power higher, for example 95% (this is the default for the popular power package G * Power, Faul et al., 2007 , 2009 ).
Over the past 50 years several surveys of different literatures have shown that many studies have too few participants to be able to detect the effects of interest with a high likelihood (e.g., Sedlmeier and Gigerenzer, 1989 ). The problem of having too few participants exists in many fields. Button et al. (2013) , for example, found about 30% of the neuroscience studies they examined had power <11%. This means that these studies had only about a one-in-nine chance of observing a significant effect for an effect size of interest. It is important to re-enforce the fact that low power is a problem in many disciplines, not just EwT.
Conventional power analysis allows researchers to calculate a rough guide to how many participants to have in their study to give them a good chance of having meaningful results. Many journals and grant awarding bodies encourage (some require) power analysis to be reported. The specifics of power analysis are tightly associated with hypothesis testing, which is controversial as noted above, but the general notion that the planned sample size should be sufficient to have a high likelihood of yielding meaningful information is undisputed. If researchers stop using hypothesis testing, they will still need something like power analysis in order to plan their studies and to determine a rule for when to stop collecting data.
Tables (e.g., Cohen, 1992 ) and computer packages (e.g., Faul et al., 2007 , 2009 ) are available to estimate the sample size needed to have adequate power for many common designs. Simulation methods can be used for more complex designs not covered by the tables and packages (e.g., Browne et al., 2009 ; Green and MacLeod, 2016 ).
Deciding the minimum effect size for your study to detect (MED) is difficult. For many education examples a small improvement in student performance, if applied throughout their schooling, can have great consequences. For example, Chetty et al. (2011) estimated that a shift upwards of 1 percentile in test scores during kindergarten is associated with approximately an extra $130 per annum income when the student is 25–27 years old. When multiplied across a lifetime this becomes a substantial amount. Researchers would like to detect the most miniscule of effects, but that would require enormous samples. The cost would not be justified in most situations. It is worth contrasting this message with the so-called “two sigma problem.” Bloom (1984) discussed how good one-on-one tutoring could improve student performance a large amount: two sigma (two standard deviations) or from the 50th percentile to the 98th percentile. He urged researchers to look for interventions or sets of interventions that could produce shifts of this magnitude. Many in the EwT arena talk about this as a goal, but for product development to progress research must be able to identify much smaller shifts.
The choice of MED is sometimes influenced by the observed effects from similar studies. If you expect the effect to be X, and use this in your power calculations, then if your power is 80% this means that you have about a 4 in 5 chance of detecting the effect if your estimate of the expected effect is fairly accurate. However, if you are confident that your true effect size is X, then there is no reason for the study. It is usually better to describe the MED in relation to what you want to be able to detect rather than in relation to the expected effect.
To allow shared understanding when people discuss effect sizes, many people adopt Cohen's (1992) descriptions of small, medium, and large effects. While people have argued against using these without considering the research context ( Lipsey et al., 2012 ), given their widespread use they allow people to converse about effect sizes across designs and areas.
Two examples were chosen to show the importance of considering how many participants are likely to complete the study. The studies show the importance of considering what the minimum effect size to detect (MED) should be. Overall, across all 43 studies, the sample sizes ranged from <10 to 100.
Arroyo et al. (2017) compared collaboration and no collaboration groups. They collected pre and post-intervention scores and the plan was to compare some measure of improvement between the groups. Originally there were 52 students in the collaboration group and 57 in the no collaboration group. If they were using G * Power ( Faul et al., 2007 , 2009 ) and wanted a significance level of 5% and power of 80%, it appears the MED that they were trying to detect was d = 0.54sd. This MED is approximately the value Cohen describes as a medium effect. This value might be reasonable depending on their goals. Only 47 students completed the post-test. Assuming 24 and 23 of these students were in the two groups, respectively, the power is now only 44%. They were more likely to fail to detect an effect of this size than to detect one.
Another example where the sample size decreased was Al-Shanfari's et al. (2017) study of self-regulated learning. They compared three groups that varied depending on the visualizations used within the software (their Table 1). One hundred and ten students were asked to participate. This is approximately the sample size G * Power suggests for a one-way Anova with α = 5%, power of 80%, and a MED of f = 0.3, which is between Cohen's medium and large effects. The problem is some students did not agree to participate and others did not complete the tasks. This left few students: “9 students remained in the baseline group, 9 students in the combined group and 7 in the expandable model group” (p. 20). Assuming the same α and MED, the power is now about 22%. Even if the authors had found a significant effect, with power this low, the likelihood is fairly high that the direction of the effect could be wrong ( Gelman and Carlin, 2014 ).
Were these MEDs reasonable? The choice will vary by research project and this choice can be difficult. As noted above, in educational research, any manipulation that raises student outcomes, even a minute amount, if applied over multiple years of school, can produce large outcomes. Further, a lot of research compares an existing system to one with some slight adaptation so the expected effect is likely to be small. If the adaptation is shown to have even a slight advantage it may be worth implementing. If Arroyo et al. (2017) and Al-Shanfari et al. (2017) planned to design their studies to detect what Cohen (1992) calls small effects ( d = 0.2 and f = 0.1), the suggested samples sizes would have been n = 788 and n = 969. To yield 80% power to detect a 1 percentile shift, which Chetty et al. (2011) noted could be of great value, would require more than 10,000 students in each group.
1a. Report how you choose your sample size (as well as other characteristics of your sample). This often means reporting a power analysis. Try to have at least the number of participants suggested by the power analysis and justify the MED you used. The expected drop out rate should be factored into these calculations.
1b. If it is not feasible to get the suggested number of participants,
- Do not just do the study anyway. The power analysis shows that there is a low likelihood to find meaningful results so your time and your participants' time could be better spent. And do not just change the MED to fit your power analysis.
- Use more reliable measurements or a more powerful design (e.g., using covariates can increase power, but be careful, see for example, Meehl, 1970 ; Wright, 2019 ).
- Combine your efforts with other researchers. This is one of Munafò et al.'s (2017) recommendations and they give the example of The Many Lab ( https://osf.io/89vqh/ ). In some areas (e.g., high-energy particle physics) there are often dozens of authors on a paper. The “authors” are often differentiated by listing a few as co-speakers for the paper, and/or having some listed as “contributors” rather than “authors.”
- Change your research question. Often this means focusing your attention on one aspect of a broad topic.
- Apply for a grant that allows a large study to be conducted.
Caveat: Power analyses are not always appropriate. Power analysis is used to suggest a sample size. If you are just trying to show that your software can be used, then you do not need a large sample.
Cohen (1992) provides a brief primer for doing power analysis for many common research designs. Baguley (1994) and Lenth (2001) provide more critical perspectives of how power analysis is used.
Two common situations where multilevel modeling is used in education research are when the students are nested within classrooms and when each student has data for several measurements. For the first situation, the data for the students are said to be nested within the classrooms and for the second the measurements nested within the students. The problem for traditional statistical methods is that the data within the same higher level unit tend to be more similar with each other than with those in other units. The data are not independent: an assumption of most traditional statistical procedures. Educational statisticians and educational datasets have been instrumental in the development of ways to analyze data in these situations (e.g., Aitken et al., 1981 ; Aitkin and Longford, 1986 ; Goldstein, 2011 ). The approach is also popular in other fields, for example within ecology (e.g., Bolker et al., 2009 ), geography (e.g., Jones, 1991 ), medicine (e.g., Goldstein et al., 2002 ), psychology (e.g., Wright, 1998 ), and elsewhere. The statistical models have several names that can convey subtle differences (e.g., mixed models, hierarchical models, random coefficient models). Here the phrase “multilevel models” is used.
Suppose you are interested in predicting reading scores for a 1,000 students in a school district equally divided among 10 schools from hours spent on educational reading software. Both reading scores and hours spent likely vary among schools. If you ran the traditional regression:
It is assumed that the e i are independent of each other, but they are not. There are a few approaches to this; the multilevel approach assumes each of the 10 schools has a different intercept centered around a grand intercept, β 0 in Equation (1). The method assumes these are normally distributed and estimates the mean and standard deviation of this distribution. Letting the schools be indexed by j , the multilevel equation is:
where u j denotes the variation around the intercept. Most of the main statistical packages have multilevel procedures.
The R statistics environment ( R Core Team, 2019 ) will be used for this, and subsequent, examples 2 . It was chosen because of functionality (there are over ten thousand packages written for R) and because it is free, and therefore available to all readers. It can be downloaded from: https://cran.r-project.org/ . Here the package lme4 ( Bates et al., 2015 ) will be used. To fit the model in Equation (2) with a multilevel linear model you enter:
lmer(reading ~ hours + (1|school))
The two examples were picked to illustrate the two main ways that education data are often multilevel. The first is when the students are nested within classrooms and this is one of the first applications of multilevel modeling to education data (e.g., Aitkin and Longford, 1986 ). The second is where the students have several measurements. The measurements can be conceptualized as nested within the individual. These are often called repeated measures or longitudinal designs.
The textbook education example for multilevel modeling is where students are nested within a class. Li et al. (2017) used this design with 293 students nested within 18 classrooms. They compared student performance on inquiry and estimation skills using a linear regression. Inference from this statistic assumes that the data are independent from each other. It may be that the students in the different classrooms behave differently on these skills and that the teachers in these classrooms teach these skills differently. In fact, these are both highly likely. Not taking into account this variation is more likely to produce significant results than if appropriate analyses were done. Therefore, readers should be cautious with any reported p -values and the reported precision of any estimates.
Another common application of multilevel modeling is where each student provides multiple data points, as with Price et al. (2017) study of why students ask for a hint and how they use hints. Their data set had 68 students requesting 642 hints. Hints are nested within students. Students were also nested within classes and hints within assignments (and hints were sometimes clustered together), but the focus here is just hints being nested within students. The authors state that “the number of hints requested by student varied widely” (p. 316) so they were aware that there was student-level variation in hint frequency. There likely was also variation among students for why they requested hints and how they used the hints. One interest was whether the student did what the hint suggested: a binary variable. A generalized linear multilevel model could be used to predict which students and in which situations hints are likely to be followed. Instead Price et al. rely mostly on descriptive statistics plus a couple of inferential statistics using hints as the unit of study, thereby ignoring the non-independence of their data. Thus, their standard errors and p -values should not be trusted. For example, they examined whether how much time was spent looking at a hint predicted whether the hint was followed without considering that this will likely vary by student. Following a hint is a binary variable, and often a logistic regression is used for this. The lme4 package has a function for generalized linear multilevel regressions called glmer . Here is a model that they could have considered.
glmer(followHint ~ time + (1|student), family = "binomial")
While treating time as either a linear predictor of the probability of following a hint, or linear with the logit of the probability, is probably unwise, a curved relationship (e.g., a b-spline) could be estimated and plotted within the multilevel modeling framework. In R there is a function, bs, for b-splines:
glmer(follow ~ bs(time) + (1|student), family = "binomial")
2a. When the data are clustered in some way so that information about one item in a cluster provides information about others in the cluster, the data are not independent. This is an assumption of traditional statistical tests. The resulting p -values will usually be too low, but sometimes they will be too high, and sometimes the effects will be in the opposite direction. Alternatives should be considered. If the non-independence is ignored there should be justification and readers should be cautious about the uncertainty estimates (including p -values) of the results.
2b. There are alternatives to multilevel modeling. Some latent variable (including item response theory [IRT]) and Bayesian approaches can take into account individual variation and are sometimes nearly equivalent. In some disciplines it is common to estimate separate values for each school or student, what is sometimes called the fixed effect approach. There are arguments against this approach (e.g., Bell and Jones, 2015 ), but sometimes estimation problems with multilevel models mean the fixed effect is preferred ( Wright, 2017 ).
2c. When the data have a multilevel structure, multilevel modeling (or some other way to take into account the non-independence of the data) should be used. There are many resources available at http://www.bristol.ac.uk/cmm/learning/ to learn more about these procedures. Several multilevel packages are reviewed at http://www.bristol.ac.uk/cmm/learning/mmsoftware/ . Many free packages are available in R and these are discussed at: http://bbolker.github.io/mixedmodels-misc/MixedModels.html .
Goldstein (2011) provides detailed mathematical coverage of multilevel modeling. Hox (2010) provides a detailed textbook that is less mathematical. Field and Wright (2011) is an applied introduction.
Numerous authors have criticized researchers for splitting continuous measures into a small number of categories at arbitrary cut-points (e.g., Cohen, 1983 ; MacCallum et al., 2002 ). Sometimes the cut-scores are chosen at particular points for good reasons (e.g., the boiling and freezing points for water, the passing score on a teenager's driving test to predict parent anxiety), but even in these situations some information is lost and these particular breakpoints could be accounted for by allowing discontinuities in the models used for the data.
Consider the following example. Figure 1A shows the proportions of positive ratings for rigor and collaboration for New York City schools in 2014–2015 3 . The two variables are not dichotomized and there is a clear positive relationship. Other aspects of the data are also apparent, like the increased variance for lower values (proportions tend to have larger variance near 50% than at the extremes) and also the non-linearity related to 1.0 being the highest possible proportion. Non-linearity is important to examine. Figures 1B–D shows that information is lost when dichotomizing either variable. In Figure 1B the x-variable (rigorous) has been dichotomized by splitting the variable at the median, a procedure called a median split. The median is 0.86. Therefore, this procedure treats 0.70 and 0.85 as the same, and 0.87 and 0.99 as the same, but assumes there is some leap in rigor between 0.85 and 0.87. In Figure 1C the y-variable, collaboration, has been dichotomized. Here information about how collaborative a school is—beyond just whether they are in the top 50% of schools or not—is lost. In Figure 1D both variables have been dichotomized. The researcher might conduct a 2 × 2 χ 2 , but would not be able to detect any additional interesting patterns in the data.
Figure 1 . Data from New York City schools on the relationship between collaborative and rigorous ratings. (A) Shows the original variables. (B–D) Dichotomize the variables thereby presenting less information. A slight random “jitter” has been added to the values of dichotomized variables so that it can be seen when multiple schools have similar values.
The examples were chosen to illustrate two issues with dichotomization. The first was chosen because it uses a common, but much criticized, procedure called a median split. The choice of the example was also based on the authors providing enough data so that samples could be created that are consistent with the dichotomized data but lead to different conclusions if not dichotomized. The second example involves the authors using a complex method to dichotomize the data. This was chosen to stress that using a complex procedure does not prevent the loss of information.
Perez et al. (2017) allocated students either to a guided or to an unguided learning condition, and then focused on those 74 students who performed less well on a pre-test. They transformed the post-test score using a median split (they do not say how values at the median are classified, but here it is assumed the “high” group is at or above the median). Table 1 shows the results. Using a 2 × 2 χ 2 with Yates' correction the result is χ ( 1 ) 2 = 0.21, p = 0.65, with an odds ratio of 1.37 (the null is 1.00) with a 95% confidence interval from 0.50 to 3.81 (found using the odds.ratio function in the questionr package, Barnier et al., 2017 ). While the condition variable is a truly dichotomous variable—participants were either in the guided condition or not—the post-test scores vary. Dichotomizing the variable loses information about how much above or below the median the scores were.
Table 1 . The cross-tabulation table for ( Perez et al., 2017 ) data.
It is likely that Perez et al. (2017) were interested in whether their manipulation affected post-test scores. If they had analyzed their data taking into account information lost by dichotomizing, they might have detected a statistically significant difference. Suppose their post-scores were based on responses to 10 items. The data in Sample 1 in Table 2 are consistent with the dichotomized values in Table 1 . Perez et al. might have conducted a Wilcoxon rank sum test, calculated here using the defaults of R's function wilcox.test . A t -test leads to similar results, but readers might question the distribution assumptions of the t -test for these data. The result for Sample 1 is W = 472.5, a p -value of 0.02, with the guided condition performing better. The researchers could have concluded an advantage for this approach.
Table 2 . Possible datasets for the data in Table 1 .
However, Sample 2 of Table 2 is also consisted with the dichotomized values. It has W = 893.5, p = 0.02, but this finding is in the opposite direction with the guided condition doing worse. Perez et al.'s (2017) data might be like Sample 1, Sample 2, or neither of these.
The study by Li et al. (2017 , their Table 2) was mentioned earlier because multilevel modeling could have been used, but their use of dichtomization is also noteworthy. They recorded the number of inquiry skills and explanation skills each student used, and conducted some preliminary statistics. They dichotomize both variables (like Figure 1D ). Rather than using a median split on the total scores, they ran a K = 2 means cluster analysis on the individual items. The authors label the clusters high and low. If evidence were presented that people really were in two relatively homogeneous groups (using for example, taxometric methods, Waller and Meehl, 1998 ) then this could have been appropriate but if the constructs are dimensions information is lost. They then test the association that these dichotomized variables are associated and found the Pearson χ ( 1 ) 2 = 6.18, p = 0.01. Interestingly, they also calculated Pearson's correlation using the continuous measures ( r = 0.53, p < 0.001). It is unclear why both were done and, in relation to significance testing, it is inappropriate to test the same hypothesis multiple times even if one is inappropriate.
There are reasons to dichotomize. If the continuous variable results in a dichotomy, and that dichotomy is of interest, then dichotomizing can be useful. Sometimes it is useful to include a dummy variable for whether a person partakes in some behavior (e.g., having a computer at home; being an illegal drug user) and the amount of that behavior (e.g., hours at home using the computer; frequency of drug use). The concern here is when dichotomization (or splitting into more than two categories) is done without any substantive reason and where the cut-off points are not based on substantive reasons. Sometimes continuous variables are split into categories so that particular plots (e.g., barplots) and types of analyses (e.g., Anova, χ 2 tests) can be used as opposed to scatter plots and regression.
3a. If you believe a continuous variable or set of continuous variables may be based on a small number of categorical constructs, use appropriate methods (e.g., taxometric methods, Waller and Meehl, 1998 ) to justify this.
3b. Consider non-linear and dis-continuous models. Dummy variables can be included, along with quantitative variables, in regression models if you believe there are certain discontinuities in relationships.
3c. Do not dichotomize a variable just to allow you to use a statistical or graphical procedure if there are appropriate and available procedures for the non-dichotomized variables.
( MacCallum et al., 2002 ) provides a detailed and readable discussion about why dichotomization should usually be avoided.
Humans, including myself, make typing mistakes.
There are several reasons why people distrust scientific results. The easiest of these to address is errors in the numbers reported in tables and statistical reports. These types of errors will always be part of any literature, but it is important to lessen their likelihoods. Some examples were chosen to show different types of errors.
Pezzullo et al. ( 2017 , p. 306) report the following F statistics:
F (1, 115) = 2.4579, p = 0.0375 “significant main effect”
F (1, 115) = 2.9512, p = 0.0154 “significant interaction”
The p -values associated with these F statistics should be 0.12 and 0.09, respectively. The authors have turned non-significant findings into significant ones. There is no reason to think that this was a deliberate fabrication. If the authors had wanted to create significant effects where there was none, and they wanted to conceal this act, they could have changed the F -values too.
Some errors can be found with software like the freeware statcheck ( Nuijten et al., 2016 ). It reads statistical text and tries to determine if the statistic and p -value match. If in R (with the statcheck package loaded) you write:
statcheck("F(1,115) = 2.4579, p = .0375")
it tells you that there may be some errors in the expression. The software has been created to allow entire text to be analyzed, parsing out the statistical material. Nuijten and colleagues used this to analyze data from several American Psychological Association (APA) journals. They found that about 10% of p -values reported were incorrect. The package does not catch all errors so should not be the only thing relied upon to check a manuscript before submission (an analogy would be just using a spellchecker rather than proofreading).
Another example is from Talandron et al. ( 2017 , p. 377). They were interested in the incubation effect where waiting to solve a problem after failure can help to produce the correct response. One of their key findings was “the average number of attempts prior to post-incubation of all IE-True (M = 32, SD = 21) was significantly lower than those of IE-False (M = 46, SD = 22) [ t (169) = 1.97, two-tailed p < 0.01].” The true p -value for t (169) = 1.97 is 0.05.
The errors by Pezzullo et al. and Talandron et al. were relatively easy to identify. Other errors can be more difficult to notice. Sjödén et al. ( 2017 , p. 353) analyzed data of 163 students playing 3,983 games. They compared the number of games played by each student with the student's average goodness rating and found “Pearson r = 0.146; p = 0.000.” The p associated with r = 0.146 with n = 163 is, two-tailed, 0.06. The likely source of the error is that the wrong n has been used either when looking up the p -value manually or these student-level variables were repeated for each game the student played in the data file and the authors took the numbers from the statistics package without noticing this problem.
It is important to check the degrees of freedom carefully, because errant degrees of freedom may mean the wrong statistic is being reported. For example, Kumar ( 2017 , p. 531) compared student performance before and after some changes were made to the software that he was examining. He reports no significant main effect between these two groups. He then repeated the analyses including a covariate: number of puzzles solved during the task. He reports that the main effect is now significant: F (2,169) = 3.19, p = 0.044. The 2 in the numerator of the degrees of freedom is odd. There are only two groups so there should only be 1 degree of freedom for the numerator if this is a test of the difference between the model with the covariate and the model with the covariate plus the single grouping variable that distinguishes the two groups. If it is a typo and it is 1 then the F and/or the p is wrong. From the description it appears that the covariate also has only one degree of freedom. Because some statistics software produces the F value for the entire model as well as its components, it could be that Kumar took the statistic from the wrong part of the output. He argues that the covariate should have been associated with the outcome, so it would not be surprising that the covariate plus the group difference were statistically significant.
4a. While packages like statcheck ( Nuijten et al., 2016 ) can catch some errors, they will not catch all errors. As the software evolves, more (but not all) errors will be caught. This might have the negative affect of people relying on it too much (like not learning to spell because of the ubiquity of spellcheckers). Given that only some errors will be caught it is important not to treat this as if it is checking all numeric output. There will always be the chance of some typographical errors, but it is worth using modern technology to catch some errors.
4b. Procedures exist to include the statistical code in your word processing document that reads the data and creates the numeric output (and plots) directly. An example is the package knitr ( Xie, 2015 ). It allows you to write your paper in LaTeX and have chunks of R (and many other statistical packages), typing
names(knitr::knit_engines$get())
in R currently (Nov. 20, 2019) shows 41 languages, including STATA, SAS, Java Script, and Python) embedded within it. An author could write “The p -value was \Sexpr{t.test(DV~IV)$p.value}” in LaTeX and the p -value would appear in the document. This has the additional advantage that if an error in the data file is discovered and fixed, then the tables, plots, and any statistics embedded in the text can be automatically corrected.
4c. While the responsibility for checking numbers and words is primarily the authors, the reviewing process for conferences and journals could identify some of these errors and allow the authors to correct them. Some journals already do this. For example the Association of Psychological Science (APS) uses statcheck both before manuscripts are sent for review and it is required that authors submit a statcheck report with their final submission ( https://www.psychologicalscience.org/publications/psychological_science/ps-submissions#STATCHK ). It may be worthwhile to have statistical and methods reviews of submissions as is done in some medical journals. Some of the issues are discussed in Altman (1998) . If there are not enough statistics reviewers, other reviewers could be given guidelines for when to direct a submission to a statistics/methods reviewer. Example guidelines are in Greenwood and Freeman (2015) .
The statcheck webpage ( https://mbnuijten.com/statcheck/ ) has links to sources showing show to use it. The web page for knitr ( https://yihui.name/knitr/ ) will also provide more up-to-date information about at least that package than print sources. For advice to journal and conference referees and editors, see Greenwood and Freeman (2015) .
The crisis in behavioral science has led to several guidelines for how to avoid some of the pitfalls (e.g., Munafò et al., 2017 ). These include teaching more fundamentals and ethical issues in statistics and methods courses, pre-registering research design/analytic methods, using alternatives to hypothesis testing, and more transparent methods for disseminating research findings. These are issues within the current crisis in science. Stark and Saltelli (2018) discuss an under-lying cause of why bad science abounds: Cargo Cult Statistics. This is a phrase taken from Feynman's (1974) famous commencement address “Cargo Cult Science,” which itself is taken from Worsley (1957) . Stark and Saltelli define the statistical variety as “the ritualistic miming of statistics rather than conscientious practice” ( Stark and Saltelli, 2018 , p. 40). They describe how this miming is often the most effective way to get papers published (have it superficially look like other published papers) and having many publications is necessary for career development in modern academia. It is important to focus on both the broad issues like how research organizations reward output and on the specific issues that have created cargo cult statistics. The focus here is on how to address the more specific issues.
The area examined was the field of Education with Technology (EwT) and studies that might fit content-wise within applied psychology. EwT was chosen because of its importance for society. Its inter-disciplinarity means many of those conducting research had their formal research training outside that of those disciplines that tend to conducted studies on human participants. The hope is that this paper provides some helpful guidance.
Four issues were chosen in part because they can be addressed by researchers relatively easily: power analysis, multilevel modeling, dichotomization, and errors when reporting numeric statistics. Other issues could have been included (e.g., using better visualizations, using more robust methods), and with all of these issues, studies from many fields also show these (and other) concerns.
A small number of underlying themes relate both to the issues raised in this paper for EwT and to the crisis in science more generally.
1. Don't get excited by a p -value.
2. Don't think that because a paper is published that it is replicable and certainly not that it is the end of the story. The evidence reported in papers contributes to the story.
3. Empirical science, done well, is difficult and time-consuming. Time taken planning research is usually well spent.
4. The goals of science are different than the goals of many scientists and are not perfectly aligned with the structures put in place to reward scientists.
The author confirms being the sole contributor of this work and has approved it for publication.
DW is the Dunn Family Foundation Endowed Chair of Educational Assessment, and as such receives part of his salary from the foundation.
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
1. ^ http://www.springer.com/cda/content/document/cda_downloaddocument/9783319614243-p1.pdf?SGWID=0-0-45-1609692-p180928554 (accessed May 5, 2018).
2. ^ Power analyses (Concern #1) can also be conducted in R. There are function in the base R package including power.t.test and specialized packages for more involved designs including PoweR ( Lafaye de Micheaux and Tran, 2016 ) and pwr ( Champely, 2018 ).
3. ^ From https://data.cityofnewyork.us/Education/2014-2015-School-Quality-Reports-Results-For-High-/vrfr-9k4d .
Aitken, M., Anderson, D. A., and Hinde, J. P. (1981). Statistical modelling of data on teaching styles (with discussion). J. R. Stat. Soc. Ser. A 144, 419–461. doi: 10.2307/2981826
CrossRef Full Text | Google Scholar
Aitkin, M., and Longford, N. (1986). Statistical modelling issues in school effectiveness studies. J. R. Stat. Soc. Ser. A 149, 1–43. doi: 10.2307/2981882
Al-Shanfari, L., Epp, C. D., and Baber, C. (2017). “Evaluating the effect of uncertainty visualization in open learner models on students' metacognitive skills,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 15–27. doi: 10.1007/978-3-319-61425-0_2
Altman, D. G. (1998). Statistical reviewing for medical journals. Stat. Med. 17, 2661–2674. doi: 10.1002/(SICI)1097-0258(19981215)17:23<2661::AID-SIM33>3.0.CO;2-B
PubMed Abstract | CrossRef Full Text | Google Scholar
Amrhein, V., and Greenland, S. (2018). Remove, rather than redefine, statistical significance. Nat. Hum. Behav. 2:4. doi: 10.1038/s41562-017-0224-0
Anderson, J. R., Boyle, C. F., and Reiser, B. J. (1985). Intelligent tutoring systems. Science 228, 456–462. doi: 10.1126/science.228.4698.456
Arroyo, I., Wixon, N., Allessio, D., Woolf, B., Muldner, K., and Burleson, W. (2017). “Collaboration improves student interest in online tutoring,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 28–39. doi: 10.1007/978-3-319-61425-0_3
Baguley, T. (1994). Understanding statistical power in the context of applied research. Appl. Ergon. 35, 73–80. doi: 10.1016/j.apergo.2004.01.002
Barnier, J., François, B., and Larmarange, J. (2017). Questionr: Functions to Make Surveys Processing Easier. R Package Version 0.6.2 . Available online at: https://CRAN.R-project.org/package=questionr
Google Scholar
Bates, D., Mäechler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48. doi: 10.18637/jss.v067.i01
Bell, A. J. D., and Jones, K. (2015). Explaining fixed effects: random effects modelling of time-series, cross-sectional and panel data. Polit. Sci. Res. Method. 3, 133–153. doi: 10.1017/psrm.2014.7
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., et al. (2018). Redefine statistical significance. Nat. Hum. Behav. 2, 6–10. doi: 10.1038/s41562-017-0189-z
Bloom, B. S. (1984). The 2 sigma problem: the search for methods of group instruction as effective as one-to-one tutoring. Edu. Res. 13, 4–16. doi: 10.3102/0013189X013006004
Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., et al. (2009). Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evol. 24, 127–135. doi: 10.1016/j.tree.2008.10.008
Breakwell, G. M., Wright, D. B., and Barnett, J. (2020). Research Methods in Psychology. 5th Edn . London: Sage Publications.
Browne, W. J., Golalizadeh Lahi, M., and Parker, R. M. A. (2009). A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package . University of Bristol.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., et al. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376. doi: 10.1038/nrn3475
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T-H., Huber, J., and Johannesson, M (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644. doi: 10.1038/s41562-018-0399-z
Champely, S. (2018). Pwr: Basic Functions for Power Analysis. R Package Version 1.2-2 . Available online at: https://CRAN.R-project.org/package=pwr
Chetty, R., Friedman, J., Hilger, N., Saez, E., Schanzenbach, D., and Yagan, D. (2011). How does your kindergarten classroom affect your earnings? Evidence from Project STAR. Q. J. Econ. 126, 1593–1660. doi: 10.1093/qje/qjr041
Cohen, J. (1983). The cost of dichotomization. Appl. Psychol. Meas. 7, 249–253. doi: 10.1177/014662168300700301
Cohen, J. (1992). A power primer. Psychol. Bull. 112, 155–159. doi: 10.1037/0033-2909.112.1.155
Cohen, J. (1994). The earth is round ( p < 0.05). Am. Psychol. 49, 997–1003. doi: 10.1037/0003-066X.49.12.997
Cronbach, L. J. (1957). The two disciplines of scientific psychology. Am. Psychol. 12, 671–684. doi: 10.1037/h0043943
Cuban, L. (2001). Oversold and Underused: Computers in the Classroom . Cambridge, MA: Harvard University Press.
Faul, F., Erdfelder, E., Buchner, A., and Lang, A.-G. (2009). Statistical power analyses using G * Power 3.1: tests for correlation and regression analyses. Behav. Res. Methods , 41, 1149–1160. doi: 10.3758/BRM.41.4.1149
Faul, F., Erdfelder, E., Lang, A.-G., and Buchner, A. (2007). G * Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods , 39, 175–191. doi: 10.3758/BF03193146
Feynman, R. P. (1974). Cargo cult science. Eng. Sci. 37, 10–13.
Field, A. P., and Wright, D. B. (2011). A primer on using multilevel models in clinical and experimental psychopathology research. J. Exp. Psychopathol. 2, 271–293. doi: 10.5127/jep.013711
Gelman, A., and Carlin, J. (2014). Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspect. Psychol. Sci. 9, 641–651. doi: 10.1177/1745691614551642
Goldstein, H. (2011). Multilevel Statistical Models. 4th Edn . Chichester: Wiley. doi: 10.1002/9780470973394
Goldstein, H., Browne, W. J., and Rasbash, J. (2002). Multilevel modelling of medical data. Stat. Med. 21, 3291–3315. doi: 10.1002/sim.1264
Green, P., and MacLeod, C. J. (2016). simr: An R package for power analysis of generalized linear mixed models by simulation. Methods Ecol. Evol. 7, 493–498. doi: 10.1111/2041-210X.12504
Greenwood, D. C., and Freeman, J. V. (2015). How to spot a statistical problem: advice for a non-statistical reviewer. BMC Med. 13:270. doi: 10.1186/s12916-015-0510-5
Hox, J. J. (2010). Multilevel Analysis. Techniques and Applications. 2nd Edn . New York, NY: Routledge. doi: 10.4324/9780203852279
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med. 2:e124. doi: 10.1371/journal.pmed.0020124
Jones, K. (1991). Multi-Level Models for Geographical Research . Norwich, UK: Environmental Publications.
Koba, M. (2015). Education Tech Funding Soars–But Is It Working in the Classroom? Fortune . Available online at: http://fortune.com/2015/04/28/education-tech-funding-soars-but-is-it-working-in-the-classroom/
Kumar, A. N. (2017). “The effect of providing motivational support in Parsons puzzle tutors,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 528–531. doi: 10.1007/978-3-319-61425-0_56
Lafaye de Micheaux, P., and Tran, V. A. (2016). PoweR: a reproducible research tool to ease Monte Carlo power simulation studies for goodness-of-fit tests in R. J. Stat. Softw. 69, 1–42. doi: 10.18637/jss.v069.i03
Lenth, R. V. (2001). Some practical guidelines for effctive sample size determination. Am. Stat. 55, 187–193. doi: 10.1198/000313001317098149
Li, H., Gobert, J., and Dickler, R. (2017). “Dusting off the messy middle: Assessing students' inquiry skills through doing and writing,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 175–187. doi: 10.1007/978-3-319-61425-0_15
Lilienfeld, S. O., and Waldman, I. D. (Eds.). (2017). Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions . New York, NY: Wiley. doi: 10.1002/9781119095910
Lipsey, M., Puzio, K., Yun, C., Hebert, M. A., Roberts, M., Anthony, K. S., et al. (2012). Translating the Statistical Representation of the Effects of Education Interventions Into More Readily Interpretable Forms. National Center for Education Statistics (NCSER 20133000) . Washington, DC: IES. Available online at: https://ies.ed.gov/ncser/pubs/20133000/pdf/20133000.pdf
Lyons, L. (2013). Discovering the Significance of 5σ . Available online at: https://arxiv.org/pdf/1310.1284
MacCallum, R. C., Zhang, S., Preacher, K. J., and Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychol. Methods 7, 19–40. doi: 10.1037//1082-989X.7.1.19
McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2019). Abandon statistical significance. Am. Stat. 73, 235–245. doi: 10.1080/00031305.2018.1527253
Meehl, P. E. (1970). “Nuisance variables and the ex post facto design,” in Minnesota Studies in the Philosophy of Science: Vol IV. ANALYSIS of Theories and Methods of Physics and Psychology , eds M. Radner and S. Winokur (Minneapolis, MN: University of Minnesota Press), 373–402.
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., et al. (2017). A manifesto for reproducible science. Nat. Hum. Behav. 1:0021. doi: 10.1038/s41562-016-0021
Neyman, J. (1942). Basic ideas and some recent results of the theory of testing statistical hypotheses. J. R. Stat. Soc. 105, 292–327. doi: 10.2307/2980436
Neyman, J. (1952). Lecture and Conferences on Mathematical Statistics and Probability. 2nd Edn . Washington, DC: US Department of Agriculture.
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., and Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behav. Res. Methods 48, 1205–1226. doi: 10.3758/s13428-015-0664-2
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science 349:943. doi: 10.1126/science.aac4716
Perez, S., Massey-Allard, J., Butler, D., Ives, J., Bonn, D., Yee, N., et al. (2017). “Identifying productive inquiry in virtual labs using sequence mining,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 287–298. doi: 10.1007/978-3-319-61425-0_24
Pezzullo, L. G., Wiggins, J. B., Frankosky, M. H., Min, W., Boyer, K. E., Mott, B. W., et al. (2017). “Thanks Alisha, Keep in Touch: gender effects and engagement with virtual learning companions,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 299–310. doi: 10.1007/978-3-319-61425-0_25
Price, T. W., Zhi, R., and Barnes, T. (2017). “Hint generation under uncertainty: the effect of hint quality on help-seeking behavior,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 311–322. doi: 10.1007/978-3-319-61425-0_26
R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing . Available online at: https://www.R-project.org/
Randall, D., and Welser, C. (2018). The Irreproducibility Crisis of Modern Science. Causes, Consequences, and the Road to Reform. National Association of Scholars . Available online at: https://www.nas.org/reports/the-irreproducibility-crisis-of-modern-science
Reingold, J. (2015). Why Ed Tech is Currently ‘the Wild Wild West’. Fortune . Available online at: http://fortune.com/2015/11/04/ed-tech-at-fortune-globalforum-2015
Ritter, S., Anderson, J. R., Koedinger, K. R., and Corbett, A. (2007). Cognitive tutor: applied research in mathematics education. Psychonom. Bull. Rev. 14, 249–255. doi: 10.3758/BF03194060
Sedlmeier, P., and Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychol. Bull. 105, 309–316. doi: 10.1037//0033-2909.105.2.309
Sjödén, B., Lind, M., and Silvervarg, A. (2017). “Can a teachable agent influence how students respond to competition in an educational game?,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 347–358. doi: 10.1007/978-3-319-61425-0_29
Smaldino, P. E., and McElreath, R. (2016). The natural selection of bad science. R. Soc. Open Sci. 3:160384. doi: 10.1098/rsos.160384
Stark, P. B., and Saltelli, A. (2018). Cargo-cult statistics and scientific crisis. Significance 40–43. doi: 10.1111/j.1740-9713.2018.01174.x
Suppes, P. (1966). The uses of computers in education. Sci. Am. 215, 206–220. doi: 10.1038/scientificamerican0966-206
Talandron, M. M. P., Rodrigo, M. M. T., and Beck, J. E. (2017). “Modeling the incubation effect among students playing an educational game for physics,” in Artificial Intelligence in Education , eds E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay (Gewerbestrasse: Springer), 371–380. doi: 10.1007/978-3-319-61425-0_31
Waller, N. G., and Meehl, P. E. (1998). Multivariate Taxometric Procedures: Distinguishing Types From Continua . Thousand Oaks, CA: Sage Publications.
Worsley, P. M. (1957). The Trumpet Shall Sound: A Study of ‘Cargo Cults’ in Melanesia . New York, NY: Schocken Books.
Wright, D. B. (1998). Modelling clustered data in autobiographical memory research: the multilevel approach. Appl. Cognit. Psychol. 12, 339–357. doi: 10.1002/(SICI)1099-0720(199808)12:4<339::AID-ACP571>3.0.CO;2-D
Wright, D. B. (2017). Some limits using random slope models to measure student and school growth. Front. Educ. 2:58. doi: 10.3389/feduc.2017.00058
Wright, D. B. (2018). A framework for research on education with technology. Front. Educ. 3:21. doi: 10.3389/feduc.2018.00021
Wright, D. B. (2019). Allocation to groups: examples of Lord's paradox. Br. J. Educ. Psychol . doi: 10.1111/bjep.12300. [Epub ahead of print].
Xie, Y. (2015). Dynamic Documents With R and knitr. 2nd Edn . Boca Raton, FL: Chapman and Hall/CRC.
Keywords: EdTech, statistical methods, crisis in science, power, multilevel modeling, dichotomization
Citation: Wright DB (2019) Research Methods for Education With Technology: Four Concerns, Examples, and Recommendations. Front. Educ. 4:147. doi: 10.3389/feduc.2019.00147
Received: 01 September 2019; Accepted: 27 November 2019; Published: 10 December 2019.
Reviewed by:
Copyright © 2019 Wright. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Daniel B. Wright, daniel.wright@unlv.edu ; dbrookswr@gmail.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Hypothesis Definition, Format, Examples, and Tips
Verywell / Alex Dos Diaz
Falsifiability of a hypothesis.
Hypotheses examples.
A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.
Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."
A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.
In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:
The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.
Unless you are creating an exploratory study, your hypothesis should always explain what you expect to happen.
In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.
Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.
In many cases, researchers may find that the results of an experiment do not support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.
In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."
In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."
So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:
Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the journal articles you read . Many authors will suggest questions that still need to be explored.
To form a hypothesis, you should take these steps:
In the scientific method , falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.
Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that if something was false, then it is possible to demonstrate that it is false.
One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.
A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.
Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.
For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.
These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.
One of the basic principles of any type of scientific research is that the results must be replicable.
Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.
Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.
To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.
The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:
A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the dependent variable if you change the independent variable .
The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."
Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.
Descriptive research such as case studies , naturalistic observations , and surveys are often used when conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.
Once a researcher has collected data using descriptive methods, a correlational study can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.
Experimental methods are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).
Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually cause another to change.
The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.
Thompson WH, Skau S. On the scope of scientific hypotheses . R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607
Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:]. Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z
Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004
Nosek BA, Errington TM. What is replication ? PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691
Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies . Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18
Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.
By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
Our editors will review what you’ve submitted and determine whether to revise the article.
scientific hypothesis , an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an “If…then” statement summarizing the idea and in the ability to be supported or refuted through observation and experimentation. The notion of the scientific hypothesis as both falsifiable and testable was advanced in the mid-20th century by Austrian-born British philosopher Karl Popper .
The formulation and testing of a hypothesis is part of the scientific method , the approach scientists use when attempting to understand and test ideas about natural phenomena. The generation of a hypothesis frequently is described as a creative process and is based on existing scientific knowledge, intuition , or experience. Therefore, although scientific hypotheses commonly are described as educated guesses, they actually are more informed than a guess. In addition, scientists generally strive to develop simple hypotheses, since these are easier to test relative to hypotheses that involve many different variables and potential outcomes. Such complex hypotheses may be developed as scientific models ( see scientific modeling ).
Depending on the results of scientific evaluation, a hypothesis typically is either rejected as false or accepted as true. However, because a hypothesis inherently is falsifiable, even hypotheses supported by scientific evidence and accepted as true are susceptible to rejection later, when new evidence has become available. In some instances, rather than rejecting a hypothesis because it has been falsified by new evidence, scientists simply adapt the existing idea to accommodate the new information. In this sense a hypothesis is never incorrect but only incomplete.
The investigation of scientific hypotheses is an important component in the development of scientific theory . Hence, hypotheses differ fundamentally from theories; whereas the former is a specific tentative explanation and serves as the main tool by which scientists gather data, the latter is a broad general explanation that incorporates data from many different scientific investigations undertaken to explore hypotheses.
Countless hypotheses have been developed and tested throughout the history of science . Several examples include the idea that living organisms develop from nonliving matter, which formed the basis of spontaneous generation , a hypothesis that ultimately was disproved (first in 1668, with the experiments of Italian physician Francesco Redi , and later in 1859, with the experiments of French chemist and microbiologist Louis Pasteur ); the concept proposed in the late 19th century that microorganisms cause certain diseases (now known as germ theory ); and the notion that oceanic crust forms along submarine mountain zones and spreads laterally away from them ( seafloor spreading hypothesis ).
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.
There are 5 main steps in hypothesis testing:
Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.
Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.
After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.
The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.
Discover proofreading & editing
For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.
There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).
If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.
Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.
Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .
Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.
In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.
In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).
The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .
In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.
In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.
However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.
If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”
These are superficial differences; you can see that they mean the same thing.
You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.
If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Methodology
Research bias
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.
A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).
Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/
Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.
Please enter the email address you used for your account. Your sign in information will be sent to your email address after it has been verified.
A hypothesis is a statement or proposition that is made for the purpose of testing through empirical research. It represents an educated guess or prediction that can be tested through observation and experimentation. A hypothesis is often formulated using a logical construct of "if-then" statements, allowing researchers to set up experiments to determine its validity. It serves as the foundation of a scientific inquiry, providing a clear focus and direction for the study. In essence, a hypothesis is a provisional answer to a research question , which is then subjected to rigorous testing to determine its accuracy.
In this blog post, we'll explore 100 different hypothesis examples, showing you how these simple statements set the stage for discovery in various academic fields. From the mysteries of chemical reactions to the complexities of human behavior, hypotheses are used to kickstart research in numerous disciplines. Whether you're new to the world of academia or just curious about how ideas are tested, these examples will offer insight into the fundamental role hypotheses play in learning and exploration.
In the exploration of various academic disciplines, hypotheses play a crucial role as foundational statements that guide research and inquiry. From understanding complex biological processes to navigating the nuances of human behavior in sociology, hypotheses serve as testable predictions that shape the direction of scientific investigation. The examples provided across the fields of medicine, computer science, sociology, and education illustrate the diverse applications and importance of hypotheses in shaping our understanding of the world. Whether improving medical treatments, enhancing technological systems, fostering social equality, or elevating educational practices, hypotheses remain central to scientific progress and societal advancement. By formulating clear and measurable hypotheses, researchers can continue to unravel complex phenomena, contribute to their fields, and ultimately enrich human knowledge and well-being.
Header image by Qunica .
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Patricia farrugia.
* Michael G. DeGroote School of Medicine, the
† Division of Orthopaedic Surgery and the
‡ Departments of Surgery and
§ Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ont
There is an increasing familiarity with the principles of evidence-based medicine in the surgical community. As surgeons become more aware of the hierarchy of evidence, grades of recommendations and the principles of critical appraisal, they develop an increasing familiarity with research design. Surgeons and clinicians are looking more and more to the literature and clinical trials to guide their practice; as such, it is becoming a responsibility of the clinical research community to attempt to answer questions that are not only well thought out but also clinically relevant. The development of the research question, including a supportive hypothesis and objectives, is a necessary key step in producing clinically relevant results to be used in evidence-based practice. A well-defined and specific research question is more likely to help guide us in making decisions about study design and population and subsequently what data will be collected and analyzed. 1
In this article, we discuss important considerations in the development of a research question and hypothesis and in defining objectives for research. By the end of this article, the reader will be able to appreciate the significance of constructing a good research question and developing hypotheses and research objectives for the successful design of a research study. The following article is divided into 3 sections: research question, research hypothesis and research objectives.
Interest in a particular topic usually begins the research process, but it is the familiarity with the subject that helps define an appropriate research question for a study. 1 Questions then arise out of a perceived knowledge deficit within a subject area or field of study. 2 Indeed, Haynes suggests that it is important to know “where the boundary between current knowledge and ignorance lies.” 1 The challenge in developing an appropriate research question is in determining which clinical uncertainties could or should be studied and also rationalizing the need for their investigation.
Increasing one’s knowledge about the subject of interest can be accomplished in many ways. Appropriate methods include systematically searching the literature, in-depth interviews and focus groups with patients (and proxies) and interviews with experts in the field. In addition, awareness of current trends and technological advances can assist with the development of research questions. 2 It is imperative to understand what has been studied about a topic to date in order to further the knowledge that has been previously gathered on a topic. Indeed, some granting institutions (e.g., Canadian Institute for Health Research) encourage applicants to conduct a systematic review of the available evidence if a recent review does not already exist and preferably a pilot or feasibility study before applying for a grant for a full trial.
In-depth knowledge about a subject may generate a number of questions. It then becomes necessary to ask whether these questions can be answered through one study or if more than one study needed. 1 Additional research questions can be developed, but several basic principles should be taken into consideration. 1 All questions, primary and secondary, should be developed at the beginning and planning stages of a study. Any additional questions should never compromise the primary question because it is the primary research question that forms the basis of the hypothesis and study objectives. It must be kept in mind that within the scope of one study, the presence of a number of research questions will affect and potentially increase the complexity of both the study design and subsequent statistical analyses, not to mention the actual feasibility of answering every question. 1 A sensible strategy is to establish a single primary research question around which to focus the study plan. 3 In a study, the primary research question should be clearly stated at the end of the introduction of the grant proposal, and it usually specifies the population to be studied, the intervention to be implemented and other circumstantial factors. 4
Hulley and colleagues 2 have suggested the use of the FINER criteria in the development of a good research question ( Box 1 ). The FINER criteria highlight useful points that may increase the chances of developing a successful research project. A good research question should specify the population of interest, be of interest to the scientific community and potentially to the public, have clinical relevance and further current knowledge in the field (and of course be compliant with the standards of ethical boards and national research standards).
Feasible | ||
Interesting | ||
Novel | ||
Ethical | ||
Relevant |
Adapted with permission from Wolters Kluwer Health. 2
Whereas the FINER criteria outline the important aspects of the question in general, a useful format to use in the development of a specific research question is the PICO format — consider the population (P) of interest, the intervention (I) being studied, the comparison (C) group (or to what is the intervention being compared) and the outcome of interest (O). 3 , 5 , 6 Often timing (T) is added to PICO ( Box 2 ) — that is, “Over what time frame will the study take place?” 1 The PICOT approach helps generate a question that aids in constructing the framework of the study and subsequently in protocol development by alluding to the inclusion and exclusion criteria and identifying the groups of patients to be included. Knowing the specific population of interest, intervention (and comparator) and outcome of interest may also help the researcher identify an appropriate outcome measurement tool. 7 The more defined the population of interest, and thus the more stringent the inclusion and exclusion criteria, the greater the effect on the interpretation and subsequent applicability and generalizability of the research findings. 1 , 2 A restricted study population (and exclusion criteria) may limit bias and increase the internal validity of the study; however, this approach will limit external validity of the study and, thus, the generalizability of the findings to the practical clinical setting. Conversely, a broadly defined study population and inclusion criteria may be representative of practical clinical practice but may increase bias and reduce the internal validity of the study.
Population (patients) | ||
Intervention (for intervention studies only) | ||
Comparison group | ||
Outcome of interest | ||
Time |
A poorly devised research question may affect the choice of study design, potentially lead to futile situations and, thus, hamper the chance of determining anything of clinical significance, which will then affect the potential for publication. Without devoting appropriate resources to developing the research question, the quality of the study and subsequent results may be compromised. During the initial stages of any research study, it is therefore imperative to formulate a research question that is both clinically relevant and answerable.
The primary research question should be driven by the hypothesis rather than the data. 1 , 2 That is, the research question and hypothesis should be developed before the start of the study. This sounds intuitive; however, if we take, for example, a database of information, it is potentially possible to perform multiple statistical comparisons of groups within the database to find a statistically significant association. This could then lead one to work backward from the data and develop the “question.” This is counterintuitive to the process because the question is asked specifically to then find the answer, thus collecting data along the way (i.e., in a prospective manner). Multiple statistical testing of associations from data previously collected could potentially lead to spuriously positive findings of association through chance alone. 2 Therefore, a good hypothesis must be based on a good research question at the start of a trial and, indeed, drive data collection for the study.
The research or clinical hypothesis is developed from the research question and then the main elements of the study — sampling strategy, intervention (if applicable), comparison and outcome variables — are summarized in a form that establishes the basis for testing, statistical and ultimately clinical significance. 3 For example, in a research study comparing computer-assisted acetabular component insertion versus freehand acetabular component placement in patients in need of total hip arthroplasty, the experimental group would be computer-assisted insertion and the control/conventional group would be free-hand placement. The investigative team would first state a research hypothesis. This could be expressed as a single outcome (e.g., computer-assisted acetabular component placement leads to improved functional outcome) or potentially as a complex/composite outcome; that is, more than one outcome (e.g., computer-assisted acetabular component placement leads to both improved radiographic cup placement and improved functional outcome).
However, when formally testing statistical significance, the hypothesis should be stated as a “null” hypothesis. 2 The purpose of hypothesis testing is to make an inference about the population of interest on the basis of a random sample taken from that population. The null hypothesis for the preceding research hypothesis then would be that there is no difference in mean functional outcome between the computer-assisted insertion and free-hand placement techniques. After forming the null hypothesis, the researchers would form an alternate hypothesis stating the nature of the difference, if it should appear. The alternate hypothesis would be that there is a difference in mean functional outcome between these techniques. At the end of the study, the null hypothesis is then tested statistically. If the findings of the study are not statistically significant (i.e., there is no difference in functional outcome between the groups in a statistical sense), we cannot reject the null hypothesis, whereas if the findings were significant, we can reject the null hypothesis and accept the alternate hypothesis (i.e., there is a difference in mean functional outcome between the study groups), errors in testing notwithstanding. In other words, hypothesis testing confirms or refutes the statement that the observed findings did not occur by chance alone but rather occurred because there was a true difference in outcomes between these surgical procedures. The concept of statistical hypothesis testing is complex, and the details are beyond the scope of this article.
Another important concept inherent in hypothesis testing is whether the hypotheses will be 1-sided or 2-sided. A 2-sided hypothesis states that there is a difference between the experimental group and the control group, but it does not specify in advance the expected direction of the difference. For example, we asked whether there is there an improvement in outcomes with computer-assisted surgery or whether the outcomes worse with computer-assisted surgery. We presented a 2-sided test in the above example because we did not specify the direction of the difference. A 1-sided hypothesis states a specific direction (e.g., there is an improvement in outcomes with computer-assisted surgery). A 2-sided hypothesis should be used unless there is a good justification for using a 1-sided hypothesis. As Bland and Atlman 8 stated, “One-sided hypothesis testing should never be used as a device to make a conventionally nonsignificant difference significant.”
The research hypothesis should be stated at the beginning of the study to guide the objectives for research. Whereas the investigators may state the hypothesis as being 1-sided (there is an improvement with treatment), the study and investigators must adhere to the concept of clinical equipoise. According to this principle, a clinical (or surgical) trial is ethical only if the expert community is uncertain about the relative therapeutic merits of the experimental and control groups being evaluated. 9 It means there must exist an honest and professional disagreement among expert clinicians about the preferred treatment. 9
Designing a research hypothesis is supported by a good research question and will influence the type of research design for the study. Acting on the principles of appropriate hypothesis development, the study can then confidently proceed to the development of the research objective.
The primary objective should be coupled with the hypothesis of the study. Study objectives define the specific aims of the study and should be clearly stated in the introduction of the research protocol. 7 From our previous example and using the investigative hypothesis that there is a difference in functional outcomes between computer-assisted acetabular component placement and free-hand placement, the primary objective can be stated as follows: this study will compare the functional outcomes of computer-assisted acetabular component insertion versus free-hand placement in patients undergoing total hip arthroplasty. Note that the study objective is an active statement about how the study is going to answer the specific research question. Objectives can (and often do) state exactly which outcome measures are going to be used within their statements. They are important because they not only help guide the development of the protocol and design of study but also play a role in sample size calculations and determining the power of the study. 7 These concepts will be discussed in other articles in this series.
From the surgeon’s point of view, it is important for the study objectives to be focused on outcomes that are important to patients and clinically relevant. For example, the most methodologically sound randomized controlled trial comparing 2 techniques of distal radial fixation would have little or no clinical impact if the primary objective was to determine the effect of treatment A as compared to treatment B on intraoperative fluoroscopy time. However, if the objective was to determine the effect of treatment A as compared to treatment B on patient functional outcome at 1 year, this would have a much more significant impact on clinical decision-making. Second, more meaningful surgeon–patient discussions could ensue, incorporating patient values and preferences with the results from this study. 6 , 7 It is the precise objective and what the investigator is trying to measure that is of clinical relevance in the practical setting.
The following is an example from the literature about the relation between the research question, hypothesis and study objectives:
Study: Warden SJ, Metcalf BR, Kiss ZS, et al. Low-intensity pulsed ultrasound for chronic patellar tendinopathy: a randomized, double-blind, placebo-controlled trial. Rheumatology 2008;47:467–71.
Research question: How does low-intensity pulsed ultrasound (LIPUS) compare with a placebo device in managing the symptoms of skeletally mature patients with patellar tendinopathy?
Research hypothesis: Pain levels are reduced in patients who receive daily active-LIPUS (treatment) for 12 weeks compared with individuals who receive inactive-LIPUS (placebo).
Objective: To investigate the clinical efficacy of LIPUS in the management of patellar tendinopathy symptoms.
The development of the research question is the most important aspect of a research project. A research project can fail if the objectives and hypothesis are poorly focused and underdeveloped. Useful tips for surgical researchers are provided in Box 3 . Designing and developing an appropriate and relevant research question, hypothesis and objectives can be a difficult task. The critical appraisal of the research question used in a study is vital to the application of the findings to clinical practice. Focusing resources, time and dedication to these 3 very important tasks will help to guide a successful research project, influence interpretation of the results and affect future publication efforts.
FINER = feasible, interesting, novel, ethical, relevant; PICOT = population (patients), intervention (for intervention studies only), comparison group, outcome of interest, time.
Competing interests: No funding was received in preparation of this paper. Dr. Bhandari was funded, in part, by a Canada Research Chair, McMaster University.
IMAGES
VIDEO
COMMENTS
A good alternative hypothesis example is "Attending physiotherapy sessions improves athletes' on-field performance." or "Water evaporates at 100°C. ... Example - Research question: What are the factors that influence the adoption of the new technology? Research hypothesis: There is a positive relationship between age, education and ...
15 Hypothesis Examples. A hypothesis is defined as a testable prediction, and is used primarily in scientific experiments as a potential or predicted outcome that scientists attempt to prove or disprove (Atkinson et al., 2021; Tan, 2022). In my types of hypothesis article, I outlined 13 different hypotheses, including the directional hypothesis ...
Here are some good research hypothesis examples: "The use of a specific type of therapy will lead to a reduction in symptoms of depression in individuals with a history of major depressive disorder.". "Providing educational interventions on healthy eating habits will result in weight loss in overweight individuals.".
Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.
4 Comparative Hypotheses for Technology Analysis. tire cords, telephone versus telegraph usage, etc. Overall, then, a predator-prey interaction has. an emerging technology in the role of predator ...
A solid research hypothesis, informed by a good research question, influences the research design and paves the way for defining clear research objectives. ... Example: "Implementing technology-based learning tools (IV) is likely to enhance student engagement in the classroom (DV), because interactive and multimedia content increases student ...
Hypotheses in research need to satisfy specific criteria to be considered scientifically rigorous. Here are the most notable qualities of a strong hypothesis: Testability: Ensure the hypothesis allows you to work towards observable and testable results. Brevity and objectivity: Present your hypothesis as a brief statement and avoid wordiness.
A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.
To explore the aptness of this framework, I offer the following seven hypotheses. 1. Ed tech supports and replicates the university's mission, using the methods characteristic of scholarship in general and research in particular. We who work in educational technology support the university's mission to preserve, create, and disseminate knowledge.
A research hypothesis helps test theories. A hypothesis plays a pivotal role in the scientific method by providing a basis for testing existing theories. For example, a hypothesis might test the predictive power of a psychological theory on human behavior. It serves as a great platform for investigation activities.
A snapshot analysis of citation activity of hypothesis articles may reveal interest of the global scientific community towards their implications across various disciplines and countries. As a prime example, Strachan's hygiene hypothesis, published in 1989,10 is still attracting numerous citations on Scopus, the largest bibliographic database ...
Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...
Hypothesis refers to a supposition put forward in a provisional manner and in need of further epistemic and empirical support. Technology analysis explains the relationships underlying the source, evolution, and diffusion of technology for technological, economic and social change. Technology analysis considers technology as a complex system ...
7 Statistical hypothesis. A statistical hypothesis is when you test only a sample of a population and then apply statistical evidence to the results to draw a conclusion about the entire population. Instead of testing everything, you test only a portion and generalize the rest based on preexisting data. Examples:
Here are some research hypothesis examples: If you leave the lights on, then it takes longer for people to fall asleep. If you refrigerate apples, they last longer before going bad. If you keep the curtains closed, then you need less electricity to heat or cool the house (the electric bill is lower). If you leave a bucket of water uncovered ...
Before doing this, some background on hypothesis testing is worth providing. Some statistical knowledge about this procedure is assumed in this discussion. At the end of each section specific readings are recommended. Crisis in Science and Hypothesis Testing. Educational with Technology research is not done in isolation.
A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process. Consider a study designed to examine the relationship between sleep deprivation and test ...
hypothesis. science. scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ...
Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).
INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...
A hypothesis is a statement or proposition that is made for the purpose of testing through empirical research. It represents an educated guess or prediction that can be tested through observation and experimentation. A hypothesis is often formulated using a logical construct of "if-then" statements, allowing researchers to set up experiments to determine its validity. It serves as the ...
A hypothesis is a tentative, testable answer to a scientific question. Once a scientist has a scientific question she is interested in, the scientist reads up to find out what is already known on the topic. Then she uses that information to form a tentative answer to her scientific question. Sometimes people refer to the tentative answer as "an ...
Research hypothesis. The primary research question should be driven by the hypothesis rather than the data. 1, 2 That is, the research question and hypothesis should be developed before the start of the study. This sounds intuitive; however, if we take, for example, a database of information, it is potentially possible to perform multiple ...