1. 程式人生 > >Notes—Dense Vector and Sparse Vector

Notes—Dense Vector and Sparse Vector

在spark.ml.linalg裡有兩種vector——DenseVector 和 Sparse Vector,兩者都繼承於Vectors

1.兩者區別
DenseVector: a value array

def:
Vectors.dense(values: Array[Double])
(直接把所有的元素都列出來了)

SparseVector : an index and a value array

def:
Vectors.sparse(size: Int, indices: Array[Int], values: Array[Double])
(儲存元素的個數、以及非零元素的編號index和值value)

import org.apache.spark.mllib.linalg.{Vector, Vectors}

// Create a dense vector (1.0, 0.0, 3.0).
val dv: Vector = Vectors.dense(1.0, 0.0, 3.0)

// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values
corresponding to nonzero entries.

val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0
, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0)))

2.含類標籤的點LabeledPoint

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint

// Create a labeled point with a positive label and
a dense feature vector. // 相當於這組dense特徵的標籤是1 val pos = LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0)) // Create a labeled point with a negative label and a sparse feature vector. // 相當於這組dense特徵的標籤是0 val neg = LabeledPoint(0.0, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)))