Title: | Case Classification and Selection Based on Regression Results |
---|---|
Description: | Researchers doing a mixed-methods analysis (nested analysis as developed by Lieberman (2005) <doi:10.1017/S0003055405051762>) can use the package for the classification of cases and case selection using results of a linear regression. One can designate cases as typical, deviant, extreme and pathway case and use different case selection strategies for the choice of a case belonging to one of these types. |
Authors: | Ingo Rohlfing [aut, cre] |
Maintainer: | Ingo Rohlfing <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-01-23 03:54:23 UTC |
Source: | https://github.com/ingorohlfing/mmrcaseselection |
Extremeness of a case is calculated by the difference between a case's value on the independent variable and the variable's mean value.
extreme_on_x(lmobject = NULL, ind_var = NULL)
extreme_on_x(lmobject = NULL, ind_var = NULL)
lmobject |
Object generated with |
ind_var |
Independent variable for which extremeness values should be calculated. Has to be entered as a character. |
Calculating the absolute value of the difference between the cases' values and the variable's mean value is proposed by Seawright, Jason (2016): The Case for Selecting Cases That Are Deviant or Extreme on the Independent Variable. Sociological Methods & Research 45 (3): 493-525. (https://doi.org/10.1177/0049124116643556)
A dataframe with
- all variables in the linear model,
- absolute extremeness (absolute value of difference between variable score and mean value of variable),
- extremeness (difference between variable score and mean value of variable), which can be useful when the direction of extremeness is relevant.
The rows are ordered in decreasing order of the absolute extreme values.
df <- lm(mpg ~ disp + wt, data = mtcars) extreme_on_x(df, "wt")
df <- lm(mpg ~ disp + wt, data = mtcars) extreme_on_x(df, "wt")
Extremeness of a case is calculated by the difference between a case's value on the dependent variable and the variable's mean value.
extreme_on_y(lmobject)
extreme_on_y(lmobject)
lmobject |
Object generated with |
Calculating the absolute value of the difference between the cases' values and the variable's mean value is proposed by Seawright, Jason (2016): The Case for Selecting Cases That Are Deviant or Extreme on the Independent Variable. Sociological Methods & Research 45 (3): 493-525. (https://doi.org/10.1177/0049124116643556)
A dataframe with
- all variables in the linear model,
- absolute extremeness (absolute value of difference between variable score and mean value of variable),
- extremeness (difference between variable score and mean value of variable), which can be useful when the direction of extremeness is relevant.
The rows are ordered in decreasing order of the absolute extreme values.
df <- lm(mpg ~ disp + wt, data = mtcars) extreme_on_y(df)
df <- lm(mpg ~ disp + wt, data = mtcars) extreme_on_y(df)
Identification of the most deviant case (= worst predicted case), based on regression estimates.
most_deviant(lmobject)
most_deviant(lmobject)
lmobject |
Object generated with |
Proposed by Seawright, Jason and John Gerring (2008): Case Selection Techniques in Case Study Research: A Menu of Qualitative and Quantitative Options. Political Research Quarterly 61 (2): 294-308. (https://journals.sagepub.com/doi/pdf/10.1177/1065912907313077)
The most deviant case with the largest absolute residual of all cases.
df <- lm(mpg ~ disp + wt, data = mtcars) most_deviant(df)
df <- lm(mpg ~ disp + wt, data = mtcars) most_deviant(df)
The case with the largest negative difference between the observed value and the predicted value on the outcome. Depending on the research question, there might be a specific interest in the case for which the model performs worst and yields a larger predicted value.
most_overpredicted(lmobject)
most_overpredicted(lmobject)
lmobject |
Object generated with |
The most overpredicted case with the largest negative residual (the most negative residual).
df <- lm(mpg ~ disp + wt, data = mtcars) most_overpredicted(df)
df <- lm(mpg ~ disp + wt, data = mtcars) most_overpredicted(df)
The most typical case (= best predicted case) based on regression estimates.
most_typical(lmobject)
most_typical(lmobject)
lmobject |
Object generated with |
Proposed by Seawright, Jason and John Gerring (2008): Case Selection Techniques in Case Study Research: A Menu of Qualitative and Quantitative Options. Political Research Quarterly 61 (2): 294-308. (https://journals.sagepub.com/doi/pdf/10.1177/1065912907313077)
The most typical case having the smallest absolute residual of all cases.
df <- lm(mpg ~ disp + wt, data = mtcars) most_typical(df)
df <- lm(mpg ~ disp + wt, data = mtcars) most_typical(df)
The case with the largest positive difference between the observed value and the predicted value on the outcome. Depending on the research question, there might be a specific interest in the case for which the model performs worst and yields a smaller predicted value.
most_underpredicted(lmobject)
most_underpredicted(lmobject)
lmobject |
Object generated with |
The most underpredicted case with the largest positive residual (the most positive residual).
df <- lm(mpg ~ disp + wt, data = mtcars) most_underpredicted(df)
df <- lm(mpg ~ disp + wt, data = mtcars) most_underpredicted(df)
Calculation of pathway values, defined as the difference between residuals of full model and reduced model lacking the pathway variable. The larger the difference, the more a case qualifies as a pathway case suitable for the analysis of mechanisms.
pathway(full_model, reduced_model)
pathway(full_model, reduced_model)
full_model |
Full model including covariate of interest (= pathway variable) |
reduced_model |
Reduced model excluding covariate of interest |
The difference between the absolute residuals of the full and reduced model follows the approach developed by Weller and Barnes (2014): Finding Pathways: Mixed-Method Research for Studying Causal Mechanisms. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139644501).
The calculation of the absolute difference between the full-model and reduced-model residuals, given a case's reduced-model residual is larger than its full-model residual, follows the proposal by Gerring (2007): Is There a (Viable) Crucial-Case Method? Comparative Political Studies 40 (3): 231-253. https://journals.sagepub.com/doi/10.1177/0010414006290784)
A dataframe with
- all full model variables,
- full model residuals (full_resid
),
- reduced model residuals (reduced_resid
),
- pathway values following Weller/Barnes (pathway_wb
),
- pathway values following Gerring (pathway_gvalue
),
- variable showing whether Gerring's criterion for a pathway
case is met (pathway_gstatus
)
df_full <- lm(mpg ~ disp + wt, data = mtcars) df_reduced <- lm(mpg ~ wt, data = mtcars) pathway(df_full, df_reduced)
df_full <- lm(mpg ~ disp + wt, data = mtcars) df_reduced <- lm(mpg ~ wt, data = mtcars) pathway(df_full, df_reduced)
Plot of residuals against pathway variable
pathway_xvr(full_model, reduced_model, pathway_type)
pathway_xvr(full_model, reduced_model, pathway_type)
full_model |
Full model including covariate of interest (= pathway variable) |
reduced_model |
Reduced model excluding covariate of interest |
pathway_type |
Type of pathway values. |
A plot of the chosen type of pathway values against the pathway
variable created with ggplot2
.
df_full <- lm(mpg ~ disp + wt, data = mtcars) df_reduced <- lm(mpg ~ wt, data = mtcars) pathway_xvr(df_full, df_reduced, pathway_type = "pathway_wb")
df_full <- lm(mpg ~ disp + wt, data = mtcars) df_reduced <- lm(mpg ~ wt, data = mtcars) pathway_xvr(df_full, df_reduced, pathway_type = "pathway_wb")
Case are designated as typical (= well predicted) and deviant (= badly predicted) based on the prediction interval. The x% prediction interval represents the range that we expect to include x% of outcome values in repeated samples. For example, a 95% prediction interval ranging from 0-5 conveys that 95% of future outcome values will be in the range of 0-5. If the observed outcome is inside the prediction interval, the case is classified (or designated) as typical and as deviant otherwise.
predint(lmobject, piwidth = 0.95)
predint(lmobject, piwidth = 0.95)
lmobject |
Object generated with |
piwidth |
Width of the prediction interval (default is 0.95). |
Proposed by Rohlfing, Ingo and Peter Starke (2013): Building on Solid Ground: Robust Case Selection in Multi-Method Research. *Swiss Political Science Review* 19 (4): 492-512. (https://doi.org/10.1111/spsr.12052)
A dataframe with the observed outcome, fitted outcome, upper and lower bound of the % prediction interval and classification of cases as typical or deviant.
df <- lm(mpg ~ disp + wt, data = mtcars) predint(df, piwidth = 0.9)
df <- lm(mpg ~ disp + wt, data = mtcars) predint(df, piwidth = 0.9)
Presented in Rohlfing, Ingo and Peter Starke (2013): Building on Solid Ground: Robust Case Selection in Multi-Method Research. Swiss Political Science Review 19 (4): 492-512. (https://doi.org/10.1111/spsr.12052)
predint_plot(pred_df)
predint_plot(pred_df)
pred_df |
A dataframe created with |
A plot of the observed outcome against the fitted outcome with
prediction intervals and case classifications. Created with
ggplot2
.
df <- lm(mpg ~ disp + wt, data = mtcars) predint_status <- predint(df, piwidth = 0.9) predint_plot(predint_status)
df <- lm(mpg ~ disp + wt, data = mtcars) predint_status <- predint(df, piwidth = 0.9) predint_plot(predint_status)
The share of the standard deviation of the residuals is used to designate cases as typical or deviant.
residstd(lmobject, stdshare = 1)
residstd(lmobject, stdshare = 1)
lmobject |
Object generated with |
stdshare |
Share of standard deviation of residuals distinguishing between typical and deviant cases (default is 1). |
Proposed by Lieberman, Evan S. (2005): Nested Analysis as a Mixed-Method Strategy for Comparative Research. American Political Science Review 99 (3): 435-452. https://doi.org/10.1017/S0003055405051762.
A dataframe with the observed outcome, fitted outcome, residual standard deviation and classification of cases as typical or deviant.
df <- lm(mpg ~ disp + wt, data = mtcars) residstd(df, stdshare = 1)
df <- lm(mpg ~ disp + wt, data = mtcars) residstd(df, stdshare = 1)
Plot of typical and deviant cases based on residuals' standard deviation
residstd_plot(resid_df)
residstd_plot(resid_df)
resid_df |
A dataframe created with |
A plot of the observed outcome against the fitted outcome with
interval and case classifications. Created with ggplot2
.
df <- lm(mpg ~ disp + wt, data = mtcars) residstd_status <- residstd(df, stdshare = 1) residstd_plot(residstd_status)
df <- lm(mpg ~ disp + wt, data = mtcars) residstd_status <- residstd(df, stdshare = 1) residstd_plot(residstd_status)