我需要在一个图表中绘制一个显示计数的柱状图和一个显示率的折线图,我可以分别做这两个,但当我把它们放在一起时,我的第一层(即geom_bar)的比例被第二层(即geom_line)重叠。
我可以将geom_line的轴向右移动吗?
我需要在一个图表中绘制一个显示计数的柱状图和一个显示率的折线图,我可以分别做这两个,但当我把它们放在一起时,我的第一层(即geom_bar)的比例被第二层(即geom_line)重叠。
我可以将geom_line的轴向右移动吗?
当前回答
常见的用例有双y轴,例如,显示每月温度和降水的气体图。这里是一个简单的解决方案,从威震天的解决方案中推广,允许你设置变量的下限为零:
示例数据:
climate <- tibble(
Month = 1:12,
Temp = c(-4,-4,0,5,11,15,16,15,11,6,1,-3),
Precip = c(49,36,47,41,53,65,81,89,90,84,73,55)
)
将以下两个值设置为接近数据限制的值(您可以使用这些值来调整图形的位置;坐标轴仍然是正确的):
ylim.prim <- c(0, 180) # in this example, precipitation
ylim.sec <- c(-4, 18) # in this example, temperature
下面根据这些极限进行必要的计算,并制作出图本身:
b <- diff(ylim.prim)/diff(ylim.sec)
a <- ylim.prim[1] - b*ylim.sec[1]) # there was a bug here
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = a + Temp*b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
ggtitle("Climatogram for Oslo (1961-1990)")
如果你想确保红线对应右边的y轴,你可以在代码中添加一个主题句:
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = a + Temp*b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
theme(axis.line.y.right = element_line(color = "red"),
axis.ticks.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red")
) +
ggtitle("Climatogram for Oslo (1961-1990)")
右轴的颜色:
其他回答
我承认并同意哈德利(和其他人)的观点,即单独的y量表“存在根本缺陷”。说到这里,我经常希望ggplot2有这个特性——特别是当数据是宽格式的,并且我想快速地可视化或检查数据时(即仅供个人使用)。
虽然tidyverse库可以很容易地将数据转换为长格式(这样facet_grid()就可以工作),但这个过程仍然不是简单的,如下所示:
library(tidyverse)
df.wide %>%
# Select only the columns you need for the plot.
select(date, column1, column2, column3) %>%
# Create an id column – needed in the `gather()` function.
mutate(id = n()) %>%
# The `gather()` function converts to long-format.
# In which the `type` column will contain three factors (column1, column2, column3),
# and the `value` column will contain the respective values.
# All the while we retain the `id` and `date` columns.
gather(type, value, -id, -date) %>%
# Create the plot according to your specifications
ggplot(aes(x = date, y = value)) +
geom_line() +
# Create a panel for each `type` (ie. column1, column2, column3).
# If the types have different scales, you can use the `scales="free"` option.
facet_grid(type~., scales = "free")
从ggplot2 2.2.0开始,您可以添加如下的辅助轴(取自ggplot2 2.2.0公告):
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(
"mpg (US)",
sec.axis = sec_axis(~ . * 1.20, name = "mpg (UK)")
)
常见的用例有双y轴,例如,显示每月温度和降水的气体图。这里是一个简单的解决方案,从威震天的解决方案中推广,允许你设置变量的下限为零:
示例数据:
climate <- tibble(
Month = 1:12,
Temp = c(-4,-4,0,5,11,15,16,15,11,6,1,-3),
Precip = c(49,36,47,41,53,65,81,89,90,84,73,55)
)
将以下两个值设置为接近数据限制的值(您可以使用这些值来调整图形的位置;坐标轴仍然是正确的):
ylim.prim <- c(0, 180) # in this example, precipitation
ylim.sec <- c(-4, 18) # in this example, temperature
下面根据这些极限进行必要的计算,并制作出图本身:
b <- diff(ylim.prim)/diff(ylim.sec)
a <- ylim.prim[1] - b*ylim.sec[1]) # there was a bug here
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = a + Temp*b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
ggtitle("Climatogram for Oslo (1961-1990)")
如果你想确保红线对应右边的y轴,你可以在代码中添加一个主题句:
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = a + Temp*b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
theme(axis.line.y.right = element_line(color = "red"),
axis.ticks.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red")
) +
ggtitle("Climatogram for Oslo (1961-1990)")
右轴的颜色:
我发现这个答案对我帮助最大,但发现有一些边缘情况,它似乎不能正确处理,特别是消极的情况,以及极限距离为0的情况(如果我们从最大/最小数据中获取极限,就会发生这种情况)。测试似乎表明,这是一致的
我使用以下代码。这里我假设我们有[x1,x2]我们想把它变换成[y1,y2]。我处理这个问题的方法是将[x1,x2]转换为[0,1](一个足够简单的转换),然后[0,1]转换为[y1,y2]。
climate <- tibble(
Month = 1:12,
Temp = c(-4,-4,0,5,11,15,16,15,11,6,1,-3),
Precip = c(49,36,47,41,53,65,81,89,90,84,73,55)
)
#Set the limits of each axis manually:
ylim.prim <- c(0, 180) # in this example, precipitation
ylim.sec <- c(-4, 18) # in this example, temperature
b <- diff(ylim.sec)/diff(ylim.prim)
#If all values are the same this messes up the transformation, so we need to modify it here
if(b==0){
ylim.sec <- c(ylim.sec[1]-1, ylim.sec[2]+1)
b <- diff(ylim.sec)/diff(ylim.prim)
}
if (is.na(b)){
ylim.prim <- c(ylim.prim[1]-1, ylim.prim[2]+1)
b <- diff(ylim.sec)/diff(ylim.prim)
}
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = ylim.prim[1]+(Temp-ylim.sec[1])/b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~((.-ylim.prim[1]) *b + ylim.sec[1]), name = "Temperature"), limits = ylim.prim) +
scale_x_continuous("Month", breaks = 1:12) +
ggtitle("Climatogram for Oslo (1961-1990)")
这里的关键部分是,我们用~((.-ylim.prim[1]) *b + ylim.sec[1])转换次要y轴,然后对实际值y = ylim.prim[1]+(Temp-ylim.sec[1])/b)应用逆。我们还应该确保limits = ylim.prim。
Hadley的回答参考了Stephen Few的报告《双缩放轴在图中是最好的解决方案吗?》
我不知道OP中的“counts”和“rate”是什么意思,但快速搜索会给我counts和Rates,所以我得到了一些关于北美登山事故的数据:
Years<-c("1998","1999","2000","2001","2002","2003","2004")
Persons.Involved<-c(281,248,301,276,295,231,311)
Fatalities<-c(20,17,24,16,34,18,35)
rate=100*Fatalities/Persons.Involved
df<-data.frame(Years=Years,Persons.Involved=Persons.Involved,Fatalities=Fatalities,rate=rate)
print(df,row.names = FALSE)
Years Persons.Involved Fatalities rate
1998 281 20 7.117438
1999 248 17 6.854839
2000 301 24 7.973422
2001 276 16 5.797101
2002 295 34 11.525424
2003 231 18 7.792208
2004 311 35 11.254019
然后,我尝试按照Few在上述报告第7页建议的那样绘制图表(并按照OP的要求将计数绘制为柱状图,将率绘制为折线图):
The other less obvious solution, which works only for time series, is to convert all sets of values to a common quantitative scale by displaying percentage differences between each value and a reference (or index) value. For instance, select a particular point in time, such as the first interval that appears in the graph, and express each subsequent value as the percentage difference between it and the initial value. This is done by dividing the value at each point in time by the value for the initial point in time and then multiplying it by 100 to convert the rate to a percentage, as illustrated below.
df2<-df
df2$Persons.Involved <- 100*df$Persons.Involved/df$Persons.Involved[1]
df2$rate <- 100*df$rate/df$rate[1]
plot(ggplot(df2)+
geom_bar(aes(x=Years,weight=Persons.Involved))+
geom_line(aes(x=Years,y=rate,group=1))+
theme(text = element_text(size=30))
)
这就是结果:
但我不是很喜欢它,我不能轻易地给它加上一个传奇……
1 威廉森,杰德,等人。2005年北美登山事故。The Mountaineers Books, 2005。