在R中,mean()和median()是标准函数,它们执行您所期望的功能。Mode()告诉您对象的内部存储模式,而不是参数中出现次数最多的值。但是是否存在一个标准库函数来实现向量(或列表)的统计模式?


当前回答

对Ken Williams的回答做了一个小修改,增加了可选的params na。Rm和return_multiple。

与依赖names()的答案不同,此答案在返回值中维护x的数据类型。

stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) {
  if(na.rm){
    x <- na.omit(x)
  }
  ux <- unique(x)
  freq <- tabulate(match(x, ux))
  mode_loc <- if(return_multiple) which(freq==max(freq)) else which.max(freq)
  return(ux[mode_loc])
}

要显示它与可选参数一起工作并维护数据类型:

foo <- c(2L, 2L, 3L, 4L, 4L, 5L, NA, NA)
bar <- c('mouse','mouse','dog','cat','cat','bird',NA,NA)

str(stat_mode(foo)) # int [1:3] 2 4 NA
str(stat_mode(bar)) # chr [1:3] "mouse" "cat" NA
str(stat_mode(bar, na.rm=T)) # chr [1:2] "mouse" "cat"
str(stat_mode(bar, return_mult=F, na.rm=T)) # chr "mouse"

感谢@Frank的简化。

其他回答

下面是一个查找模式的函数:

mode <- function(x) {
  unique_val <- unique(x)
  counts <- vector()
  for (i in 1:length(unique_val)) {
    counts[i] <- length(which(x==unique_val[i]))
  }
  position <- c(which(counts==max(counts)))
  if (mean(counts)==max(counts)) 
    mode_x <- 'Mode does not exist'
  else 
    mode_x <- unique_val[position]
  return(mode_x)
}

另一个可能的解决方案:

Mode <- function(x) {
    if (is.numeric(x)) {
        x_table <- table(x)
        return(as.numeric(names(x_table)[which.max(x_table)]))
    }
}

用法:

set.seed(100)
v <- sample(x = 1:100, size = 1000000, replace = TRUE)
system.time(Mode(v))

输出:

   user  system elapsed 
   0.32    0.00    0.31 

计算模式大多是在有因素变量的情况下才可以使用

labels(table(HouseVotes84$V1)[as.numeric(labels(max(table(HouseVotes84$V1))))])

HouseVotes84是在“mlbench”包中可用的数据集。

它会给出最大标签值。它更容易由内置函数本身使用,而无需编写函数。

模式并不是在所有情况下都有用。所以函数应该处理这种情况。试试下面的函数。

Mode <- function(v) {
  # checking unique numbers in the input
  uniqv <- unique(v)
  # frquency of most occured value in the input data
  m1 <- max(tabulate(match(v, uniqv)))
  n <- length(tabulate(match(v, uniqv)))
  # if all elements are same
  same_val_check <- all(diff(v) == 0)
  if(same_val_check == F){
    # frquency of second most occured value in the input data
    m2 <- sort(tabulate(match(v, uniqv)),partial=n-1)[n-1]
    if (m1 != m2) {
      # Returning the most repeated value
      mode <- uniqv[which.max(tabulate(match(v, uniqv)))]
    } else{
      mode <- "Two or more values have same frequency. So mode can't be calculated."
    }
  } else {
    # if all elements are same
    mode <- unique(v)
  }
  return(mode)
}

输出,

x1 <- c(1,2,3,3,3,4,5)
Mode(x1)
# [1] 3

x2 <- c(1,2,3,4,5)
Mode(x2)
# [1] "Two or more varibles have same frequency. So mode can't be calculated."

x3 <- c(1,1,2,3,3,4,5)
Mode(x3)
# [1] "Two or more values have same frequency. So mode can't be calculated."

这里有另一个解决方案:

freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])