我如何确定文件编码在OS X?

我试图在TextMate中输入一些UTF-8字符到LaTeX文件(它说它的默认编码是UTF-8)，但LaTeX似乎不理解它们。

运行cat my_file.tex可以在Terminal中正确显示字符。运行ls -al会显示一些我以前从未见过的东西:文件列表旁边的“@”:

-rw-r--r--@  1 me      users      2021 Feb 11 18:05 my_file.tex

(并且，是的，我在LaTeX中使用\usepackage[utf8]{inputenc}。)

我找到了iconv，但这似乎不能告诉我编码是什么-它只会转换一旦我弄清楚。

当前回答

我实现了下面的bash脚本，它为我工作。

它首先尝试将file——mime-encoding返回的encoding中的iconv转换为utf-8。

如果失败，它将遍历所有编码，并显示原始文件和重新编码的文件之间的差异。它跳过了产生较大diff输出的编码(“大”由MAX_DIFF_LINES变量或第二个输入参数定义)，因为这些编码很可能是错误的。

如果使用这个脚本导致了“不好的事情”，不要责怪我。这里有一个rm -f，所以有怪物。我试图通过对带有随机后缀的文件使用它来防止不良影响，但我不做任何承诺。

在Darwin 15.6.0上测试。

#!/bin/bash

if [[ $# -lt 1 ]]
then
  echo "ERROR: need one input argument: file of which the enconding is to be detected."
  exit 3
fi

if [ ! -e "$1" ]
then
  echo "ERROR: cannot find file '$1'"
  exit 3
fi

if [[ $# -ge 2 ]]
then
  MAX_DIFF_LINES=$2
else
  MAX_DIFF_LINES=10
fi


#try the easy way
ENCOD=$(file --mime-encoding $1 | awk '{print $2}')
#check if this enconding is valid
iconv -f $ENCOD -t utf-8 $1 &> /dev/null
if [ $? -eq 0 ]
then
  echo $ENCOD
  exit 0
fi

#hard way, need the user to visually check the difference between the original and re-encoded files
for i in $(iconv -l | awk '{print $1}')
do
  SINK=$1.$i.$RANDOM
  iconv -f $i -t utf-8 $1 2> /dev/null > $SINK
  if [ $? -eq 0 ]
  then
    DIFF=$(diff $1 $SINK)
    if [ ! -z "$DIFF" ] && [ $(echo "$DIFF" | wc -l) -le $MAX_DIFF_LINES ]
    then
      echo "===== $i ====="
      echo "$DIFF"
      echo "Does that make sense [N/y]"
      read $ANSWER
      if [ "$ANSWER" == "y" ] || [ "$ANSWER" == "Y" ]
      then
        echo $i
        exit 0
      fi
    fi
  fi
  #clean up re-encoded file
  rm -f $SINK
done

echo "None of the encondings worked. You're stuck."
exit 3

2017-06-09 19:58:27

其他回答

检查编码的强制方法可能只是在十六进制编辑器或类似工具中检查文件。(或编写程序检查)查看文件中的二进制数据。UTF-8格式相当容易识别。所有ASCII字符都是单字节，值低于128 (0x80) 多字节序列遵循wiki文章中显示的模式

如果您能找到一种更简单的方法来让程序为您验证编码，这显然是一种捷径，但如果所有其他方法都失败了，那么这个方法就可以了。

2009-02-11 23:38:32

我实现了下面的bash脚本，它为我工作。

它首先尝试将file——mime-encoding返回的encoding中的iconv转换为utf-8。

在Darwin 15.6.0上测试。

#!/bin/bash

if [[ $# -lt 1 ]]
then
  echo "ERROR: need one input argument: file of which the enconding is to be detected."
  exit 3
fi

if [ ! -e "$1" ]
then
  echo "ERROR: cannot find file '$1'"
  exit 3
fi

if [[ $# -ge 2 ]]
then
  MAX_DIFF_LINES=$2
else
  MAX_DIFF_LINES=10
fi


#try the easy way
ENCOD=$(file --mime-encoding $1 | awk '{print $2}')
#check if this enconding is valid
iconv -f $ENCOD -t utf-8 $1 &> /dev/null
if [ $? -eq 0 ]
then
  echo $ENCOD
  exit 0
fi

#hard way, need the user to visually check the difference between the original and re-encoded files
for i in $(iconv -l | awk '{print $1}')
do
  SINK=$1.$i.$RANDOM
  iconv -f $i -t utf-8 $1 2> /dev/null > $SINK
  if [ $? -eq 0 ]
  then
    DIFF=$(diff $1 $SINK)
    if [ ! -z "$DIFF" ] && [ $(echo "$DIFF" | wc -l) -le $MAX_DIFF_LINES ]
    then
      echo "===== $i ====="
      echo "$DIFF"
      echo "Does that make sense [N/y]"
      read $ANSWER
      if [ "$ANSWER" == "y" ] || [ "$ANSWER" == "Y" ]
      then
        echo $i
        exit 0
      fi
    fi
  fi
  #clean up re-encoded file
  rm -f $SINK
done

echo "None of the encondings worked. You're stuck."
exit 3

2017-06-09 19:58:27

@符号表示文件具有扩展属性。xattr文件显示了它拥有的属性，xattr -l文件也显示了属性值(有时可能很大-尝试例如xattr /System/Library/Fonts/HelveLTMM来查看存在于资源分叉中的旧式字体)。

2009-02-12 06:38:08

在file命令上使用-I(大写i)选项似乎可以显示文件编码。

file -I {filename}

2010-03-17 09:47:52

在终端中输入文件myfile.tex有时可以使用一系列算法和神奇数字告诉您文件的编码和类型。它相当有用，但不要依赖它提供具体或可靠的信息。

可以定位的。字符串文件(在本地化的Mac OS X应用程序中找到)通常报告为utf - 16c源文件。

2009-03-08 09:50:05

我如何确定文件编码在OS X?

推荐文章

最新文章

标签