我正在寻找一种简单的方法来获得mime类型,其中文件扩展名是不正确的或没有给出,类似于这个问题只有在. net。


当前回答

@Steve Morgan和@Richard Gourlay,这是一个很好的解决方案,谢谢你们。一个小缺点是,当文件中的字节数为255或以下时,mime类型有时会产生“application/octet-stream”,这对于期望产生“text/plain”的文件来说有点不准确。对于这种情况,我更新了你原来的方法如下:

如果文件中的字节数小于或等于255,并且推导出的mime类型是"application/octet-stream",那么创建一个新的字节数组,该数组由重复n次的原始文件字节组成,直到字节总数为>= 256。然后重新检查新字节数组上的mime-type。

修改方法:

Imports System.Runtime.InteropServices

<DllImport("urlmon.dll", CharSet:=CharSet.Auto)> _
Private Shared Function FindMimeFromData(pBC As System.UInt32, <MarshalAs(UnmanagedType.LPStr)> pwzUrl As System.String, <MarshalAs(UnmanagedType.LPArray)> pBuffer As Byte(), cbSize As System.UInt32, <MarshalAs(UnmanagedType.LPStr)> pwzMimeProposed As System.String, dwMimeFlags As System.UInt32, _
ByRef ppwzMimeOut As System.UInt32, dwReserverd As System.UInt32) As System.UInt32
End Function
Private Function GetMimeType(ByVal f As FileInfo) As String
    'See http://stackoverflow.com/questions/58510/using-net-how-can-you-find-the-mime-type-of-a-file-based-on-the-file-signature
    Dim returnValue As String = ""
    Dim fileStream As FileStream = Nothing
    Dim fileStreamLength As Long = 0
    Dim fileStreamIsLessThanBByteSize As Boolean = False

    Const byteSize As Integer = 255
    Const bbyteSize As Integer = byteSize + 1

    Const ambiguousMimeType As String = "application/octet-stream"
    Const unknownMimeType As String = "unknown/unknown"

    Dim buffer As Byte() = New Byte(byteSize) {}
    Dim fnGetMimeTypeValue As New Func(Of Byte(), Integer, String)(
        Function(_buffer As Byte(), _bbyteSize As Integer) As String
            Dim _returnValue As String = ""
            Dim mimeType As UInt32 = 0
            FindMimeFromData(0, Nothing, _buffer, _bbyteSize, Nothing, 0, mimeType, 0)
            Dim mimeTypePtr As IntPtr = New IntPtr(mimeType)
            _returnValue = Marshal.PtrToStringUni(mimeTypePtr)
            Marshal.FreeCoTaskMem(mimeTypePtr)
            Return _returnValue
        End Function)

    If (f.Exists()) Then
        Try
            fileStream = New FileStream(f.FullName(), FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
            fileStreamLength = fileStream.Length()

            If (fileStreamLength >= bbyteSize) Then
                fileStream.Read(buffer, 0, bbyteSize)
            Else
                fileStreamIsLessThanBByteSize = True
                fileStream.Read(buffer, 0, CInt(fileStreamLength))
            End If

            returnValue = fnGetMimeTypeValue(buffer, bbyteSize)

            If (returnValue.Equals(ambiguousMimeType, StringComparison.OrdinalIgnoreCase) AndAlso fileStreamIsLessThanBByteSize AndAlso fileStreamLength > 0) Then
                'Duplicate the stream content until the stream length is >= bbyteSize to get a more deterministic mime type analysis.
                Dim currentBuffer As Byte() = buffer.Take(fileStreamLength).ToArray()
                Dim repeatCount As Integer = Math.Floor((bbyteSize / fileStreamLength) + 1)
                Dim bBufferList As List(Of Byte) = New List(Of Byte)
                While (repeatCount > 0)
                    bBufferList.AddRange(currentBuffer)
                    repeatCount -= 1
                End While
                Dim bbuffer As Byte() = bBufferList.Take(bbyteSize).ToArray()
                returnValue = fnGetMimeTypeValue(bbuffer, bbyteSize)
            End If
        Catch ex As Exception
            returnValue = unknownMimeType
        Finally
            If (fileStream IsNot Nothing) Then fileStream.Close()
        End Try
    End If
    Return returnValue
End Function

其他回答

如果你想要托管你的ASP. mimetype,来自Nuget的guessmimetype将是最终的解决方案。NET解决方案在非windows环境。

文件扩展名映射非常不安全。如果攻击者上传无效的扩展名,映射字典将允许可执行文件在.jpg文件中分发。 因此,始终使用内容嗅探库来了解真正的内容类型。

 public  static string MimeTypeFrom(byte[] dataBytes, string fileName)
 {
        var contentType = HeyRed.Mime.MimeGuesser.GuessMimeType(dataBytes);
        if (string.IsNullOrEmpty(contentType))
        {
            return HeyRed.Mime.MimeTypesMap.GetMimeType(fileName);
        }
  return contentType;

如果有人对此感兴趣,他们可以将优秀的perl模块File::Type移植到。net。在代码中是一组文件头魔术数字查找每个文件类型或正则表达式匹配。

这里有一个。net文件类型检测库http://filetypedetective.codeplex.com/,但它目前只检测少量的文件。

我写了一个mime类型的验证器。请与您分享。

private readonly Dictionary<string, byte[]> _mimeTypes = new Dictionary<string, byte[]>
    {
        {"image/jpeg", new byte[] {255, 216, 255}},
        {"image/jpg", new byte[] {255, 216, 255}},
        {"image/pjpeg", new byte[] {255, 216, 255}},
        {"image/apng", new byte[] {137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82}},
        {"image/png", new byte[] {137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82}},
        {"image/bmp", new byte[] {66, 77}},
        {"image/gif", new byte[] {71, 73, 70, 56}},
    };

private bool ValidateMimeType(byte[] file, string contentType)
    {
        var imageType = _mimeTypes.SingleOrDefault(x => x.Key.Equals(contentType));

        return file.Take(imageType.Value.Length).SequenceEqual(imageType.Value);
    }

您也可以查看注册表。

    using System.IO;
    using Microsoft.Win32;

    string GetMimeType(FileInfo fileInfo)
    {
        string mimeType = "application/unknown";

        RegistryKey regKey = Registry.ClassesRoot.OpenSubKey(
            fileInfo.Extension.ToLower()
            );

        if(regKey != null)
        {
            object contentType = regKey.GetValue("Content Type");

            if(contentType != null)
                mimeType = contentType.ToString();
        }

        return mimeType;
    }

无论如何,您都必须利用mime数据库——无论它们是从扩展还是魔术数字映射的,都有点微不足道——windows注册表就是这样一个地方。 对于独立于平台的解决方案,必须将此DB与代码一起发布(或作为独立的库)。

在Urlmon.dll中,有一个名为FindMimeFromData的函数。

来自文档

MIME类型检测或“数据嗅探”是指从二进制数据中确定适当的MIME类型的过程。最终结果取决于服务器提供的MIME类型头、文件扩展名和/或数据本身的组合。通常,只有前256字节的数据是重要的。

因此,从文件中读取第一个(最多)256字节,并将其传递给FindMimeFromData。