Documentation
¶
Overview ¶
Package model provides models for item rating and ranking.
There are two kinds of models: rating model and ranking model. Although rating models could be used for ranking, performance won't be guaranteed and even won't make sense, vice versa.
- Item rating models include: Random, Baseline, SVD(optimizer=Regression), SVD++, NMF, KNN, SlopeOne, CoClustering
- Item ranking models includes: ItemPop, WRMF, SVD(optimizer=BPR)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BPR ¶
type BPR struct {
Base
// Model parameters
UserFactor [][]float64 // p_u
ItemFactor [][]float64 // q_i
// Fallback model
UserRatings []*base.MarginalSubSet
ItemPop *ItemPop
// contains filtered or unexported fields
}
BPR means Bayesian Personal Ranking, is a pairwise learning algorithm for matrix factorization model with implicit feedback. The pairwise ranking between item i and j for user u is estimated by:
p(i >_u j) = \sigma( p_u^T (q_i - q_j) )
Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.01. Lr - The learning rate of SGD. Default is 0.05. nFactors - The number of latent factors. Default is 10. NEpochs - The number of iteration of the SGD procedure. Default is 100. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.001.
func (*BPR) Fit ¶
func (bpr *BPR) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the BPR model.
type Base ¶
type Base struct {
Params base.Params // Hyper-parameters
UserIndexer *base.Indexer // Users' ID set
ItemIndexer *base.Indexer // Items' ID set
// contains filtered or unexported fields
}
Base model must be included by every recommendation model. Hyper-parameters, ID sets, random generator and fitting options are managed the Base model.
func (*Base) Fit ¶
func (model *Base) Fit(trainSet core.DataSet, options *base.RuntimeOptions)
Fit has not been implemented,
func (*Base) Init ¶
func (model *Base) Init(trainSet core.DataSetInterface)
Init the Base model. The method must be called at the beginning of Fit.
type BaseLine ¶
type BaseLine struct {
Base
UserBias []float64 // b_u
ItemBias []float64 // b_i
GlobalBias float64 // mu
// contains filtered or unexported fields
}
BaseLine predicts the rating for given user and item by
\hat{r}_{ui} = b_{ui} = μ + b_u + b_i
If user u is unknown, then the Bias b_u is assumed to be zero. The same applies for item i with b_i. Hyper-parameters:
Reg - The regularization parameter of the cost function that is
optimized. Default is 0.02.
Lr - The learning rate of SGD. Default is 0.005.
NEpochs - The number of iteration of the SGD procedure. Default is 20.
RandomState - The random seed. Default is 0.
func NewBaseLine ¶
NewBaseLine creates a baseline model.
func (*BaseLine) Fit ¶
func (baseLine *BaseLine) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the BaseLine model.
type CoClustering ¶
type CoClustering struct {
Base
GlobalMean float64 // A^{global}
UserMeans []float64 // A^{R}
ItemMeans []float64 // A^{R}
UserClusters []int // p(i)
ItemClusters []int // y(i)
UserClusterMeans []float64 // A^{RC}
ItemClusterMeans []float64 // A^{CC}
CoClusterMeans [][]float64 // A^{COC}
// contains filtered or unexported fields
}
CoClustering [5] is a novel collaborative filtering approach based on weighted co-clustering algorithm that involves simultaneous clustering of users and items.
Let U={u_i}^m_{i=1} be the set of users such that |U|=m and P={p_j}^n_{j=1} be the set of items such that |P|=n. Let A be the m x n ratings matrix such that A_{ij} is the rating of the user u_i for the item p_j. The approximate matrix \hat{A}_{ij} is given by
\hat{A}_{ij} = A^{COC}_{gh} + (A^R_i - A^{RC}_g) + (A^C_j - A^{CC}_h)
where g=ρ(i), h=γ(j) and A^R_i, A^C_j are the average ratings of user u_i and item p_j, and A^{COC}_{gh}, A^{RC}_g and A^{CC}_h are the average ratings of the corresponding co-cluster, user-cluster and item-cluster respectively.
Hyper-parameters:
NEpochs - The number of iterations of the optimization procedure. Default is 20. NUserClusters - The number of user clusters. Default is 3. NItemClusters - The number of item clusters. Default is 3. RandomState - The random seed. Default is 0.
func NewCoClustering ¶
func NewCoClustering(params base.Params) *CoClustering
NewCoClustering creates a CoClustering model.
func (*CoClustering) Fit ¶
func (coc *CoClustering) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the CoClustering model.
func (*CoClustering) Predict ¶
func (coc *CoClustering) Predict(userId, itemId string) float64
Predict by the CoClustering model.
func (*CoClustering) SetParams ¶
func (coc *CoClustering) SetParams(params base.Params)
SetParams sets hyper-parameters for the CoClustering model.
type FM ¶
type FM struct {
Base
UserFeatures []*base.SparseVector
ItemFeatures []*base.SparseVector
// Model parameters
GlobalBias float64 // w_0
Bias []float64 // w_i
Factors [][]float64 // v_i
// Fallback model
UserRatings []*base.MarginalSubSet
ItemPop *ItemPop
// contains filtered or unexported fields
}
FM is the implementation of factorization machine [12]. The prediction is given by
\hat y(x) = w_0 + \sum^n_{i=1} w_i x_i + \sum^n_{i=1} \sum^n_{j=i+1} <v_i, v_j>x_i x_j
Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.02. Lr - The learning rate of SGD. Default is 0.005. nFactors - The number of latent factors. Default is 100. NEpochs - The number of iteration of the SGD procedure. Default is 20. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.
func (*FM) Fit ¶
func (fm *FM) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the factorization machine.
type ItemPop ¶
ItemPop recommends items by their popularity. The popularity of a item is defined as the occurrence frequency of the item in the training data set.
func (*ItemPop) Fit ¶
func (pop *ItemPop) Fit(set core.DataSetInterface, options *base.RuntimeOptions)
Fit the ItemPop model.
type KNN ¶
type KNN struct {
Base
GlobalMean float64
SimMatrix [][]float64
LeftRatings []*base.MarginalSubSet
RightRatings []*base.MarginalSubSet
UserRatings []*base.MarginalSubSet
LeftMean []float64 // Centered KNN: user (item) Mean
StdDev []float64 // KNN with Z Score: user (item) standard deviation
Bias []float64 // KNN Baseline: Bias
// contains filtered or unexported fields
}
KNN for collaborate filtering.
Type - The type of KNN ('Basic', 'Centered', 'ZScore', 'Baseline').
Default is 'basic'.
Similarity - The similarity function. Default is MSD.
UserBased - User based or item based? Default is true.
K - The maximum k neighborhoods to predict the rating. Default is 40.
MinK - The minimum k neighborhoods to predict the rating. Default is 1.
func (*KNN) Fit ¶
func (knn *KNN) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the KNN model.
type KNNImplicit ¶
type KNNImplicit struct {
Base
Matrix [][]float64
Users []*base.MarginalSubSet
}
KNNImplicit is the KNN model for implicit feedback.
func NewKNNImplicit ¶
func NewKNNImplicit(params base.Params) *KNNImplicit
NewKNNImplicit creates a KNN model for implicit feedback.
func (*KNNImplicit) Fit ¶
func (knn *KNNImplicit) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the KNN model.
func (*KNNImplicit) Predict ¶
func (knn *KNNImplicit) Predict(userId, itemId string) float64
Predict by the KNN model.
type NMF ¶
type NMF struct {
Base
GlobalMean float64 // the global mean of ratings
UserFactor [][]float64 // p_u
ItemFactor [][]float64 // q_i
// contains filtered or unexported fields
}
NMF [3] is the Matrix Factorization process with non-negative latent factors. During the MF process, the non-negativity, which ensures good representativeness of the learnt model, is critically important. Hyper-parameters:
Reg - The regularization parameter of the cost function that is
optimized. Default is 0.06.
NFactors - The number of latent factors. Default is 15.
NEpochs - The number of iteration of the SGD procedure. Default is 50.
InitLow - The lower bound of initial random latent factor. Default is 0.
InitHigh - The upper bound of initial random latent factor. Default is 1.
func (*NMF) Fit ¶
func (nmf *NMF) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the NMF model.
type SVD ¶
type SVD struct {
Base
// Model parameters
UserFactor [][]float64 // p_u
ItemFactor [][]float64 // q_i
UserBias []float64 // b_u
ItemBias []float64 // b_i
GlobalMean float64 // mu
// Fallback model
UserRatings []*base.MarginalSubSet
ItemPop *ItemPop
// contains filtered or unexported fields
}
SVD algorithm, as popularized by Simon Funk during the Netflix Prize. The prediction \hat{r}_{ui} is set as:
\hat{r}_{ui} = μ + b_u + b_i + q_i^Tp_u
If user u is unknown, then the Bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i and q_i. Hyper-parameters:
UseBias - Add useBias in SVD model. Default is true. Reg - The regularization parameter of the cost function that is optimized. Default is 0.02. Lr - The learning rate of SGD. Default is 0.005. nFactors - The number of latent factors. Default is 100. NEpochs - The number of iteration of the SGD procedure. Default is 20. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.
func (*SVD) Fit ¶
func (svd *SVD) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the SVD model.
type SVDpp ¶
type SVDpp struct {
Base
TrainSet core.DataSetInterface
UserFactor [][]float64 // p_u
ItemFactor [][]float64 // q_i
ImplFactor [][]float64 // y_i
UserBias []float64 // b_u
ItemBias []float64 // b_i
GlobalMean float64 // mu
// contains filtered or unexported fields
}
SVDpp (SVD++) [10] is an extension of SVD taking into account implicit interactions. The predicted \hat{r}_{ui} is:
\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)
Where the y_j terms are a new set of item factors that capture implicit interactions. Here, an implicit rating describes the fact that a user u rated an item j, regardless of the rating value. If user u is unknown, then the bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i, q_i and y_i. Hyper-parameters:
Reg - The regularization parameter of the cost function that is
optimized. Default is 0.02.
Lr - The learning rate of SGD. Default is 0.007.
NFactors - The number of latent factors. Default is 20.
NEpochs - The number of iteration of the SGD procedure. Default is 20.
InitMean - The mean of initial random latent factors. Default is 0.
InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.
func (*SVDpp) Fit ¶
func (svd *SVDpp) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the SVD++ model.
type SlopeOne ¶
type SlopeOne struct {
Base
GlobalMean float64 // Mean of ratings in training set
UserRatings []*base.MarginalSubSet // Ratings by each user
UserMeans []float64 // Mean of each user's ratings
Dev [][]float64 // Deviations
}
SlopeOne [4] predicts ratings by the form f(x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both.
First, deviations between pairs of items are computed. Given a training set χ, and any two items j and i with ratings u_j and u_i respectively in some user evaluation u (annotated as u∈S_{j,i}(χ)), the average deviation of item i with respect to item j is computed by:
dev_{j,i} = \sum_{u∈S_{j,i}(χ)} \frac{u_j-u_i} {card(S_{j,i}(χ)}
The computation on deviations could be parallelized.
In the predicting stage, Given that dev_{j,i} + u_i is a prediction for u_j given u_i, a reasonable predictor might be the average of all such predictions
P(u)_j = \frac{1}{card(R_j) \sum_{i∈R_j}(dev_{j,i} + u_i)
where R_j = {i|i ∈ S(u), i \ne j, card(S_{j,i}(χ)) > 0} is the set of all relevant items. The subset of the set of items consisting of all those items which are rated in u is S(u).
func NewSlopOne ¶
NewSlopOne creates a SlopeOne model.
func (*SlopeOne) Fit ¶
func (so *SlopeOne) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the SlopeOne model.
type WRMF ¶
type WRMF struct {
Base
// Model parameters
UserFactor *mat.Dense // p_u
ItemFactor *mat.Dense // q_i
// Fallback model
UserRatings []*base.MarginalSubSet
ItemPop *ItemPop
// contains filtered or unexported fields
}
WRMF [7] is the Weighted Regularized Matrix Factorization, which exploits unique properties of implicit feedback datasets. It treats the data as indication of positive and negative preference associated with vastly varying confidence levels. This leads to a factor model which is especially tailored for implicit feedback recommenders. Authors also proposed a scalable optimization procedure, which scales linearly with the data size. Hyper-parameters:
NFactors - The number of latent factors. Default is 10. NEpochs - The number of training epochs. Default is 50. InitMean - The mean of initial latent factors. Default is 0. InitStdDev - The standard deviation of initial latent factors. Default is 0.1. Reg - The strength of regularization.
func (*WRMF) Fit ¶
func (mf *WRMF) Fit(set core.DataSetInterface, options *base.RuntimeOptions)
Fit the WRMF model.