---
title: "Cluster analysis in OpenBudget.eu"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cluster analysis in OpenBudget.eu}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

`Cluster.OBeu` is used on [OpenBudgets.eu](http://openbudgets.eu/tools/) data mininig tool platform with [OpenCPU integration of R and JavaScript](https://www.opencpu.org/) to estimate and return the necessary parameters for cluster analysis visualizations for budget or expenditure datasets of Municipality across Europe.

The vignette shows the way `Cluster.OBeu` (in R and OpenCPU environment) is fitted with datasets of [OpenBudgets.eu](http://openbudgets.eu) according to the [OpenBudgets.eu data model](https://github.com/openbudgets/data-model). Detailed documentation about OpenBudgets.eu data model can be found [here](http://openbudgets.eu/assets/deliverables/D1.4.pdf)

The input and the resulted object are in json format.

First you have to load the library
```{r load, warning=FALSE, include=TRUE}
# load Cluster.OBeu
library(Cluster.OBeu)
```

# Cluster analysis on OpenBudgets.eu platform

`open_spending.cl` is designed to estimate and return the clustering model measures of [OpenBudgets.eu](http://openbudgets.eu/) datasets.

The available clustering algorithms are hierarchical, kmeans from R base, pam, clara, fuzzy from [cluster package](https://CRAN.R-project.org/package=cluster) and model based algorithms from [mclust package](https://CRAN.R-project.org/package=mclust). It can be used to find the appropriate clustering algorithm and/or the appropriate clustering number of the input data according to the internal and stability measures from [clValid package](https://CRAN.R-project.org/package=clValid).

The input data must be a JSON link according to the [OpenBudgets.eu data model](https://github.com/openbudgets/data-model). There are different parameters that a user could specify, e.g. `dimensions`, `measured.dimensions` and `amounts` should be defined by the user, to form the dimensions of the dataset. `open_spending.cl` estimates and returns the json data that are described with the [OpenBudgets.eu data model](https://github.com/openbudgets/data-model), using `cl.analysis` function.

+--------------------+-------------------------------------------------------------+
| Input              | Description                                                 |
+====================+=============================================================+
| json_data          | The json string, URL or file from Open Spending API         |
+--------------------+-------------------------------------------------------------+
| dimensions         | The dimensions of the input data                            |
+--------------------+-------------------------------------------------------------+
| amounts            | The amounts of the input data                               |
+--------------------+-------------------------------------------------------------+
| measured.dimensions| The dimensions to which correspond amount/numeric variables |
+--------------------+-------------------------------------------------------------+
| cl.aggregate       | Aggregate function of the input data                        |
+--------------------+-------------------------------------------------------------+
| cl.method          | clustering algorithm                                        |
+--------------------+-------------------------------------------------------------+
| cl.num             | Number of clusters                                          |
+--------------------+-------------------------------------------------------------+
| cl.dist            | Distance metric                                             |
+--------------------+-------------------------------------------------------------+

Table: `open_spending.cl` input 


The following table shows a sort description of the `open_spending.cl` return components:

| Component      | Description                        |
| :------------: | :--------------------------------: |
| cluster.method | Label of the clustering algorithm  |
| raw.data       | Input data                         |
| data.pca       | Principal components               |
| modelparam     | Clustering model specifications    |
| compare        | Clustering measures                |

Table: `open_spending.cl` return

# Examples

The dataset in the following example is being used in [OpenBudgets.eu platform](http://openbudgets.eu/) and concerns the income of Aragon. in 2007.

## In R environment

`open_spending.cl` function's input are data as json link and described with [OpenBudgets.eu data model](https://github.com/openbudgets/data-model).

```{r data}
aragon_income = "http://apps.openbudgets.eu/api/3/cubes/aragon-2007-income__3209b/aggregate?drilldown=fundingClassification.prefLabel%7CeconomicClassification.prefLabel&aggregates=amount.sum"
```

```{r open_spending, eval=FALSE, message=FALSE, warning=FALSE, include=TRUE}
results = open_spending.cl(
  json_data =  aragon_income, 
  dimensions ="economicClassification.prefLabel",
  amounts = "amount.sum",
  measured.dimensions = "fundingClassification.prefLabel",
  cl.method="kmeans" 
  )
# Pretty output using prettify of jsonlite library
jsonlite::prettify(results)
```

## In OpenCPU environment

### Select library and function

1. Go to: yourserver/ocpu/test

2. Copy and paste the following function to the endpoint 
```{r, eval=FALSE, include=TRUE}
../library/Cluster.OBeu/R/open_spending.cl
# library/ {name of the library} /R/ {function}
```

3. **Select Method**: **`Post`**

### Add parameters 

Click **add parameters** every time you want to add a new parameters and values.

4. Define the input data:

    - **Param Name**: **`json_data`**
    - **Param Value** (*URL* of json data): **`"https://apps.openbudgets.eu//api/3/cubes/aragon-2007-income__3209b/aggregate?drilldown=fundingClassification.prefLabel%7CeconomicClassification.prefLabel&aggregates=amount.sum"`** 
    (or any other json URL with the data)

5. Define the *dimensions* parameter:

    - **Param Name**: **`dimensions`**
    - **Param Value**: **`"economicClassification.prefLabel"`**


6. Define the *amount* parameter:

    - **Param Name**: **`amounts`**
    - **Param Value**: **`"amount.sum"`**

7. Define the *measured dimension* parameter:

    - **Param Name**: **`measured.dimensions`**
    - **Param Value**: **`"fundingClassification.prefLabel"`**


You add likewise further parameters and change the default parameters of `cl.method`, `cl.num`, `cl.dist`, see Cluster.OBeu *reference manual* for further details.


8. Ready! Click on **Ajax request**!

## Results

9. copy the **/ocpu/tmp/{this_id_number}/R/.val** (second on the right panel)

10. finally, paste **`yourserver/ocpu/tmp/{this_id_number}/R/.val`** on a new tab.


# Further Details

+ [HTTP in OpenCPU](https://www.opencpu.org/api.html)
+ [OpenCPU Help](https://www.opencpu.org/help.html)
+ [OpenCPU JavaScript Client](https://www.opencpu.org/jslib.html)
+ [OpenCPU on CRAN](https://cran.r-project.org/package=opencpu)

# Github
+ [OpenCPU package *development version*](https://github.com/opencpu/opencpu)
+ [Cluster.OBeu package *development version*](https://github.com/okgreece/Cluster.OBeu)