1. 程式人生 > >R語言的多維視覺化方法(ggplot二維圖表現多維)

R語言的多維視覺化方法(ggplot二維圖表現多維)

轉載自http://www.edvancer.in/create-a-multi-dimensional-visualisation-in-r/

大意就是在二維圖的基礎上,用不同的符號,顏色,大小等表現多維

Aim of any visualisation is to gain insight into the data, which by no means should be limited to just two factors at a time. Because in real life you always have multiple factors involved in any process. Challenge here is that traditional scatter plots can at max be scaled to 3 dimensions. Beyond that it becomes impossible to add more axes to your plot. But i don’t agree with the thought that inability to add more axes results in restriction on dimensions that you can show in your scatter plot. Visualisation on 2D planes is not restricted to just two dimensions opposed to general belief.

Let me give you a simple example using “mtcars” data in R. Those who are not really  interested in programming, can ignore the code bit. Data set mtcars contains information on how various factors affect mileage of a car. here is a quick look at the data.

kable(head(mtcars,3))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

We’ll start with a simple 2D plot depicting how mileage varies with weight of the vehicle

library(ggplot2)
p=ggplot(mtcars,aes(y=Mileage,x=Weight))
p+geom_point(size=4)

plot of chunk unnamed-chunk-3

As apparent from the plot, Mileage goes down with increase in Weight. Now lets put in another dimension in this and see how having automatic transmission affects mileage.

p=ggplot(mtcars,aes(y=Mileage,x=Weight,color=Transmission))
p+geom_point(size=4)

plot of chunk unnamed-chunk-5

We can see that most of the cars with automatic transmission tend to have higher mileage. One thing to note here is that most of the high weight cars tend to have manual transmission which might be the real underlying reason for cars with automatic transmission to have higher mileage.

Ok, now lets add one more dimension to find out how number of gears change across these different vehicles.

p=ggplot(mtcars,aes(y=Mileage,x=Weight,color=Transmission,size=Gears))
p+geom_point()+scale_size_discrete(range = c(4,6))

plot of chunk unnamed-chunk-6

You can see number of gears dont really affect mileage as they tend to take all possible values across entire range of mileage, same goes for weight. But a curious thing to observe here is that cars with automatic transmission managed to have higher number of gears in comparison to manual transmission cars. In fact there seems to be a limit on the number of gears which can be in the manual transmission cars.

Lets add one more dimension depciting number of cylinders in engines.

p=ggplot(mtcars,aes(y=Mileage,x=Weight,color=Transmission,size=Gears,shape=Cylinders))
p+geom_point()+scale_size_discrete(range = c(4,6))

plot of chunk unnamed-chunk-7

You can see that number of cylinders certainly seem to have an effect on mileage. Low mileage and high weight cars tend to have 8 cylinders in the engine where as high mileage and low weight cars tend to have 4 cylinders.

If you have noticed , by now we have 5 dimensions in a 2-D plot. Lession here is that visualising multiple factors [ dimension ] is not really about making n-D [impossible!] plot. Its not really feasible to add more axes to your plot. What we can do however is to give more features to our “points”, which is exactly what we have done here. The additional 3 dimensions that we introduced, are by adding features like shape, size and color to our points.

Thats what I wanted to convey, let your imagination [ and Mr hadley wickham :author of ggplot2] take you out of those dimensionality constraints! Happy Plotting in R!