--- title: "How to perform a kfino outlier detection" author: "B. Cloez & I. Sanchez" date: "`r format(Sys.time(), '%B %d, %Y')`" output: html_document: toc: yes toc_float: true number_sections: true vignette: > %\VignetteIndexEntry{How to perform a kfino outlier detection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(kfino) library(dplyr) library(ggplot2) ``` # Objectives This vignette describes how to use the **kfino** algorithm on time courses in order to detect impulse noised outliers and predict the parameter of interest. **Kalman filter with impulse noised outliers** (kfino) is a robust sequential algorithm allowing to filter data with a large number of outliers. This algorithm is based on simple latent linear Gaussian processes as in the Kalman Filter method and is devoted to detect impulse-noised outliers. These are data points that differ significantly from other observations. `ML` (Maximization Likelihood) and `EM` (Expectation-Maximization algorithm) algorithms were implemented in `kfino`. The method is described in full details in the following arxiv preprint: https://arxiv.org/abs/2208.00961. To test the **kfino** algorithm, we enclosed real data sets into the **kfino** package. Those data sets were created for the publication describing the mobile and automated walk-over-weighing system: https://doi.org/10.1016/j.compag.2018.08.022 To test the feasibility of using an automated weighing prototype suitable for a range of contrasting sheep farming systems, the authors automatically recorded the weight of 15 sheep grazing outdoor in spring. The **kfino** package has 4 data sets available of automated weighing: * `spring1`: contains the weighing data of one animal grazing outdoor in spring (203 data points recorded) * `merinos1`: contains the weighing data of one merinos lamb (397 data points recorded) * `merinos2 `: contains the weighing data of one merinos lamb difficult to model without an appropriate method like **kfino** (345 data points recorded) * `lambs`: contains the weighing data of four merinos lambs # Description of the `spring1` dataset We start by using the `spring1` data set: ```{r} data(spring1) # Dimension of this dataset dim(spring1) head(spring1) ``` The range weight of this animal is between 30 and 75 kg and must be given in `param`, a list of initial parameters to include in the `kfino_fit()` function call. The user can either perform an outlier detection (and prediction) given initial parameters or on optimized initial parameters (on m0, mm and pp). `param` list is composed of: * m0 = (optional) the initial weight, NULL if the user wants to optimize it, * mm = (optional) the target weight, NULL if the user wants to optimize it, * pp = (optional) the probability to be correctly weighed, NULL if the user wants to optimize it, * aa = the rate of weight change, default 0.001, * expertMin = the minimal weight expected by the user, * expertMax = the maximal weight expected by the user, * sigma2_m0 = the variance of m0, default 1, * sigma2_mm = the variance of mm, related to the unit of Tvar, default 0.05, * sigma2_pp = the variance of pp, related to the unit of Yvar, default 5, * K = a constant value in the outlier function (trapezium), by default K=5, * seqp = a vector, sequence of pp probability to be correctly weighted. default seq(0.5,0.7,0.1) # Kfino algorithm on the `spring1` dataset ## Parameters (m0, mm and pp) not optimized If the user chooses to not optimize the initial parameters, all the list must be completed according to expert knowledge of the data set. Here, the user supposes that the initial weight is around 41 and the target one around 45. ```{r,error=TRUE} # --- Without Optimisation on parameters param2<-list(m0=41, mm=45, pp=0.5, aa=0.001, expertMin=30, expertMax=75, sigma2_m0=1, sigma2_mm=0.05, sigma2_pp=5, K=2, seqp=seq(0.5,0.7,0.1)) resu2<-kfino_fit(datain=spring1, Tvar="dateNum",Yvar="Poids", param=param2, doOptim=FALSE, verbose=TRUE) ``` resu2 is a list of 3 elements: * **detectOutlier**: The whole input data set with the detected outliers flagged and the prediction of the analyzed variable. the following columns are joined to the columns present in the input data set: - prediction: the parameter of interest - Yvar - predicted - label_pred: the probability of the value being well predicted - lwr: lower bound of the confidence interval of the predicted value - upr: upper bound of the confidence interval of the predicted value - flag: flag of the value (OK value, KO value (outlier), OOR value (out of range values defined by the user in `kfino_fit`) * **PredictionOK**: A subset of `detectOutlier` data set with the predictions of the analyzed variable on possible values (OK and KO values) * **kfino.results**: kfino results (a list of vectors, prediction, probability to be an outlier , likelihood, confidence interval of the prediction and the flag of the data) on input parameters that were optimized if the user chooses this option ```{r} # structure of detectOutlier data set str(resu2$detectOutlier) # head of PredictionOK data set head(resu2$PredictionOK) # structure of kfino.results list str(resu2$kfino.results) ``` Using the `kfino_plot()`function allows the user to visualize the results: ```{r} # flags are qualitative kfino_plot(resuin=resu2,typeG="quali", Tvar="Day",Yvar="Poids",Ident="IDE") # flags are quantitative kfino_plot(resuin=resu2,typeG="quanti", Tvar="Day",Yvar="Poids",Ident="IDE") ``` ## Parameters (m0, mm and pp) optimized The user can use either (Maximization Likelihood) `ML` or (Expectation-Maximization algorithm) `EM` method. If the user chooses to optimize the initial parameters, m0, mm and pp must be set to NULL. ```{r} # --- With Optimisation on parameters param1<-list(m0=NULL, mm=NULL, pp=NULL, aa=0.001, expertMin=30, expertMax=75, sigma2_m0=1, sigma2_mm=0.05, sigma2_pp=5, K=2, seqp=seq(0.5,0.7,0.1)) ``` ### Maximized Likelihood (ML) method ```{r,error=TRUE} resu1<-kfino_fit(datain=spring1, Tvar="dateNum",Yvar="Poids", param=param1, doOptim=TRUE, method="ML", verbose=TRUE) # flags are qualitative kfino_plot(resuin=resu1,typeG="quali", Tvar="Day",Yvar="Poids",Ident="IDE") # flags are quantitative kfino_plot(resuin=resu1,typeG="quanti", Tvar="Day",Yvar="Poids",Ident="IDE") ``` Prediction of the weight on the cleaned dataset: ```{r,error=TRUE} kfino_plot(resuin=resu1,typeG="prediction", Tvar="Day",Yvar="Poids",Ident="IDE") ``` ### Expectation-Maximization (EM) method ```{r,error=TRUE} resu1b<-kfino_fit(datain=spring1, Tvar="dateNum",Yvar="Poids", param=param1, doOptim=TRUE, method="EM", verbose=TRUE) # flags are qualitative kfino_plot(resuin=resu1b,typeG="quali", Tvar="Day",Yvar="Poids",Ident="IDE") # flags are quantitative kfino_plot(resuin=resu1b,typeG="quanti", Tvar="Day",Yvar="Poids",Ident="IDE") kfino_plot(resuin=resu1b,typeG="prediction", Tvar="Day",Yvar="Poids",Ident="IDE") ``` # Description of the `merinos1` dataset The user can test the **kfino** method using another data set given in the package. Here, we test with the `merinos1` data set on a ewe lamb. For this animal,the range weight is between 10 and 45 kg and must be given in the initial parameters of the `kfino_fit()`function. ```{r} data(merinos1) # Dimension of this dataset dim(merinos1) head(merinos1) ``` # Kfino algorithm on the `merinos1` dataset ## Parameters (m0, mm and pp) optimized ```{r,error=TRUE} # --- With Optimisation on parameters param3<-list(m0=NULL, mm=NULL, pp=NULL, aa=0.001, expertMin=10, expertMax=45, sigma2_m0=1, sigma2_mm=0.05, sigma2_pp=5, K=2, seqp=seq(0.5,0.7,0.1)) resu3<-kfino_fit(datain=merinos1, Tvar="dateNum",Yvar="Poids", doOptim=TRUE,param=param3, verbose=TRUE) # flags are qualitative kfino_plot(resuin=resu3,typeG="quali", Tvar="Day",Yvar="Poids",Ident="IDE") # flags are quantitative kfino_plot(resuin=resu3,typeG="quanti", Tvar="Day",Yvar="Poids",Ident="IDE") ``` Prediction of the weight on the cleaned dataset: ```{r,error=TRUE} kfino_plot(resuin=resu3,typeG="prediction", Tvar="Day",Yvar="Poids",Ident="IDE") ``` # References 1. E.González-García *et. al.* (2018) A mobile and automated walk-over-weighing system for a close and remote monitoring of liveweight in sheep. vol 153: 226-238. https://doi.org/10.1016/j.compag.2018.08.022 2. González García, Eliel, 2021, Individual liveweight of Mérinos d'Arles ewelambs, measured with a Walk-over-Weighing (WoW) system under Mediterranean grazing conditions, https://doi.org/10.15454/IXSHF7, Recherche Data Gouv, V5, UNF:6:q4HEDt0n8nzxYRxc+9KK8g==[fileUNF] # Session informations ```{r session,echo=FALSE,message=FALSE, warning=FALSE} sessionInfo() ```