The more ambitious you are, the more difficulty you can expect and the longer it will take you to complete your research. Too, the more time that you take, the less likely that you will complete your work. Constructing hypotheses is the stage where you limit your research focus by selecting a few interesting relationships to examine.
There are a few reasons [you can likely think of others]:
Thus, it is better to begin with a small number of hypotheses that lend themselves to fairly straight-forward testing.
An assumption is a reasonable explanation which will not be tested and which cannot be supported or rejected on the basis of available evidence.
A hypothesis is an assumption which is being tested. It should be a likely or reasonable explanation.
Often, the hypothesis transforms the problem statement question into a testable situation [a functional relationship]. The problem statement might be "what is the effect of bibliographic instruction on student GPA. Note that two variables are mentioned: BI and GPA. The hypothesis links an dependent variable [the variable being studied] and an independent variable [the likely cause of change in the dependent variable. Here, we hypothesize that there is a relationship between bibliographic instruction and student grade point average. The student GPA would be the dependent or result variable while BI would be the independent or causal variable. A testing relationship reflects your expectations of this relationship. For example, student GPA is a function of BI so that students who have completed bibliographic instruction will have a significantly higher GPA than those who did not.
Examples of de facto relationships that could easily be turned into a hypothesis:
An alternative variable is an additional or competing independent variable. You may test more than one independent variable with an additional hypothesis.
An antecedent variable is one that comes before the dependent variable and may need to be controlled. For example, parent wealth or college entrance test scores might be antecedent variables. We would control by insuring that students with high test scores and students with low test scores, for example, are found in both the treatment group [those that receive BI] and the control group [those that do not]. In a study of librarian salary, we would likely need to control for seniority, degrees, type of position, professional activity and the like to insure that we are comparing apples with apples and not with oranges.
A generalization is a tested hypothesis which has been supported. When combined so that they have broader applicability, generalizations cumulate into theories. Note that a theory is supported by evidence. If a theory does not work in the real world, it is not a theory.
The process begins with the dependent variable which by convention is labeled Y. The dependent variable is the one that you wish to change or is the result of the outcome of some action. Examples might include:
All variables may be dichotomous [two values only] or multi-valued [with more than two values]. For example, the variable "sex" typically has two values "m" and "f." The variable "salary" has several values.
These are the variables likely to be responsible for change in the dependent variable [sometimes called the likely causal variable]. They stand by themselves, thus are independent, and by convention are labeled X.
Being as imaginative as possible, identify as many different independent variables as possible that might be responsible for change in your dependent variable. These variables may be found in the literature, by analysis of existing data, by observation, and by creative reflection.
Beware of starting with only a "favored" variable. The point of hypothesis testing is to discover and test relationships and not just confirm an existing belief.
Each independent - dependent variable relationship is a hypothesis. You need to have enough hypotheses to accommodate the most likely relationships or explanations, but not too many.
Variable relationships are normally constructed according to a standard model: "variable a is a function of variable b so that...." The dependent variable comes first as in "mutilation is a function of photo duplication cost so that as cost decreases mutilation also decreases."
Functional hypotheses may be directional or non-directional. Directional hypotheses specify the particular direction of the expected change and require a one-tailed test. Here is an example, "use of foreign language material is a function of time so that as time increases foreign language use will decrease. Non-directional hypotheses do not specify a direction and require a two-tailed test. Here is an example, "use of foreign language material is a function of time so that as time increases foreign language use will change."
In the sciences, null hypotheses are frequently used. The null hypothesis posits no relationship and then sees if a relationship exists. Here is an example, "the use of foreign language material has not changed over time." This is a conservative approach and works well with statistical techniques, many of which are aimed at measuring the likelihood that a difference found is truly significant [significant means generalizable to a larger population]. If the null hypothesis is not supported, the way is open for an alternative hypothesis.
Normally, you will filter the number of potential hypotheses to those most likely to offer a persuasive explanation.
The one or two hypotheses selected should be testable and the hypothesis should clearly indicate what sort of data is needed to support or reject.
Cross-tabulation is often quite useful in visualizing the relationships to be tested, especially with nominal data. Cross-tabulation consists of rows and columns for variables and values. For example, row 2 might be for the variable "sex" and the values that might be placed in a cell would be "1" for female and "2" for male. Each row would represent a separate case. Nominal data is data that has not arithmetic property as in the example above. We cannot or should not add the 1s and 2s to generate a mean.
Do not confuse causation with association. Most problems have multiple causes and most dependent variables are affected by a variety of independent variables. In order to "prove" that variable B is solely responsible for change in variable A, you need to demonstrate that B happens before A, that every occurrence of B is connected with an occurrence of B, and that no other variable is responsible.
