Finding rowwise maximum values in sparse matrices using RcppArmadillo

Danny Morris

2019/07/01

This post demonstrates a fast and efficient stategy for finding the maximum value and position of the maximum value for each row in a sparse matrix using RcppArmadillo. This basic function can be used in the context of similarity search where the rows i and columns j are entities in a square matrix and the values represent the similarity between the ith and jth entity. The values in the matrix can be obtained using any means. The position of the rowwise maximum value for the ith row corresponds to the jth entity that is most similar to the ith entity. I’ve successfully used this approach to search large databases for highly similar customer records, indicating potential duplication.

library(Matrix)
library(Rcpp)
library(RcppArmadillo)
library(magrittr)

Sample data

i <- c(1,3:8) 
j <- c(2,9,6:10)
x <- 7 * (1:7)

sparse_matrix <- sparseMatrix(i, j, x = x) 

sparse_matrix
## 8 x 10 sparse Matrix of class "dgCMatrix"
##                              
## [1,] . 7 . . .  .  .  .  .  .
## [2,] . . . . .  .  .  .  .  .
## [3,] . . . . .  .  .  . 14  .
## [4,] . . . . . 21  .  .  .  .
## [5,] . . . . .  . 28  .  .  .
## [6,] . . . . .  .  . 35  .  .
## [7,] . . . . .  .  .  . 42  .
## [8,] . . . . .  .  .  .  . 49

Inline Rcpp functions

rowwise_max <- Rcpp::cppFunction(
  "arma::sp_mat sp_row_max(arma::sp_mat X) {
   return arma::max(X, 1);
  }", depends= "RcppArmadillo"
)

rowwise_max_idx <- Rcpp::cppFunction(
  "arma::urowvec sp_row_max_id(arma::sp_mat X) {
   return arma::index_max(X.t(), 0);
  }", depends= "RcppArmadillo"
)
rowwise_max(sparse_matrix) %>% as.vector()

rowwise_max_idx(sparse_matrix) %>% as.vector()

External C++ script

rowwise_max.cpp

# include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
arma::urowvec rowwise_max2(arma::sp_mat X) {
  return arma::index_max(X, 1);
}

// [[Rcpp::export]]
arma::sp_mat rowwise_max_idx2(arma::sp_mat X) {
  return arma::max(X.t(), 0);
}
rowwise_max2(sparse_matrix) %>% as.vector()
rowwise_max_idx2(sparse_matrix) %>% as.vector()