---
title: "Minimization of partitions"
author: "Ingo Rohlfing"
date: "`r format(Sys.Date())`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Minimization of partitions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, message = F}
library(QCAcluster)
library(knitr) # nicer html tables
```

Two functions allow empirical researchers to partition clustered data on one or two dimensions and to derive solutions for the pooled data and for each partition.

- `partition_min()` is available for producing *conservative* or *parsimonious* models;  
- `partition_min_inter()` should be used for *intermediate* models. For programming purposes, we opted for a separate function for the intermediate solution.

## Panel data: Minimization of cross sections and time series
We first illustrate how one can decompose panel data on two dimensions. In a *between-unit* perspective, the panel is partitioned into multiple cross sections with the `time` argument that specifies the cross section ID. In a *within-unit* perspective, the data is decomposed into multiple time series with the `units` argument that specifies the unit (or time series) ID. The arguments of the functions are:

- `n_cut`: Frequency threshold for pooled data
- `incl_cut`: Inclusion threshold (a.k.a. consistency threshold) for pooled data
- `solution` (only for `partition_min()`): Either `C` for conservative solution (a.k.a. complex solution) or `P` for parsimonious solution
- `BE_cons` and `WI_cons`: Inclusion thresholds for cross sections and time series. The length of the numeric vector should equal the number of units and time series.
- `BE_ncut` and `WI_ncut`: Frequency thresholds for the cross sections and time series. The length of the numeric vector should equal the number of units and time series.

### Conservative and parsimonious solution
We first illustrate the parsimonious solution with dataset from [Thiem (2011)](https://doi.org/10.1017/S1755773910000251).
```{r}
# load data (see data description for details)
data(Thiem2011)
# partition data into time series (within-unit) and cross sections (between-unit)
Thiem_pars <- partition_min(
  dataset = Thiem2011,
  units = "country", time = "year",
  cond = c("fedismfs", "homogtyfs", "powdifffs", "comptvnsfs", 
           "pubsupfs", "ecodpcefs"),
  out = "memberfs",
  n_cut = 6, incl_cut = 0.8,
  solution = "P", # parsimonious solution
  BE_cons = c(0.9, 0.8, 0.7, 0.8, 0.85, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8),
  BE_ncut = rep(1, 11),
  WI_cons = c(0.75, 0.8, 0.9, 0.8, 0.85, rep(0.75, 10)),
  WI_ncut = rep(1, 15))
kable(Thiem_pars)
```

The output of `partition_min()` is a dataframe summarizing the solutions for the pooled data and the partitions and the consistency and coverage values for the solution. The column `model` shows whether model ambiguity is given for the pooled data or individual partitions *if* one can derive any model from the data in the first place. 

There are different reasons why one might not be able to derive a partition-specific solution:

- All rows could be consistent 
- All rows could be inconsistent
- There is no variation across cases of a partition and all cases belong to the same truth table row. 

When one the reason applies, it is listed in the column `solution`.

### Intermediate solution
The intermediate solution is derived with `partition_min_inter()`. The only command that is new compared to `partition_min()` is `intermediate` that is available for specifying the *directional expectations*. The data structure for [Schwarz 2016](https://doi.org/10.1080/07036337.2016.1203309) is an unbalanced panel with eight countries, ten years and 74 observations in total. We assume that one is only interested in the between-unit dimension and wants to derive one solution per cross section. For this reason, the argument for the within-unit dimension (`unit`) is not specified.
```{r, error = T}
# load data (see data description for details)
data(Schwarz2016)
# partition data into cross sections
Schwarz_inter <- partition_min_inter(
  Schwarz2016, 
  time = "year", 
  cond = c("poltrans", "ecotrans", "reform", "conflict", "attention"), 
  out = "enlarge", 
  n_cut = 1, incl_cut = 0.8, 
  WI_cons = rep(0.8, 8), BE_cons = c(0.75, 0.75, 0.75, 0.75, 0.75,
                                     0.8, 0.8, 0.8, 0.8, 0.8),
  WI_ncut = rep(1, 8), BE_ncut = rep(1, 10),
  intermediate = c("1", "1", "1", "1", "1"))
kable(Schwarz_inter)
```

## Multilevel data
Clustered data can be partitioned on a single dimension if there is only one dimension as an in multilevel data where lower-level units are nested in higher-level units. The analysis is then similar to the partition of panel data along one dimension. We use the dataset by [Grauvogel and von Soest (2014)](https://doi.org/10.1017/S1755773910000251) for illustrating the analysis of multilevel data. The study analyzes the effect of sanctions on authoritarian regimes. The data distinguishes between the source of the sanction (`Sender`) and the target country (`Target`). All sanctions have been imposed by the EU, UN or US, which means that target countries are nested in three different senders. We partition the data on the dimension of senders to see how solutions differ across senders.

```{r, error = T}
# load data (see data description for details)
data(Grauvogel2014)
# partition data by sender country (higher-level unit)
GS_pars <- partition_min(
  dataset = Grauvogel2014,
  units = "Sender",
  cond = c("Comprehensiveness", "Linkage", "Vulnerability",
           "Repression", "Claims"),
  out = "Persistence",
  n_cut = 1, incl_cut = 0.75,
  solution = "P",
  BE_cons = rep(0.75, 3),
  BE_ncut = rep(1, 3))
kable(GS_pars)
```

### Other packages used in this vignette
 Yihui Xie (2021): *knitr: A General-Purpose Package for Dynamic Report Generation in
 R.* R package version 1.33.

 Yihui Xie (2015): *Dynamic Documents with R and knitr.* 2nd edition. Chapman and
 Hall/CRC. ISBN 978-1498716963

 Yihui Xie (2014): *knitr: A Comprehensive Tool for Reproducible Research in R.* In
 Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing
 Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595