varchar和nvarchar SQL Server数据类型之间的主要性能差异是什么?

我正在我的学校使用SQL Server 2005为一个小型web应用程序开发数据库。我在varchar vs nvarchar的问题上看到了几个学派的思想:

使用varchar，除非你要处理大量国际化的数据，否则就使用nvarchar。只要用nvarchar就可以了。

我开始看到观点二的优点了。我知道nvarchar占用了两倍的空间，但这并不一定是一个大问题，因为它只存储几百个学生的数据。对我来说，不担心它，允许所有东西都使用nvarchar似乎是最简单的方法。还是我遗漏了什么?

当前回答

为什么在所有这些讨论中，没有提到UTF-8?能够存储完整的unicode字符跨度并不意味着必须总是为每个字符分配两个字节(或使用unicode术语的“码位”)。所有的ASCII都是UTF-8。SQL Server检查VARCHAR()字段，文本是严格的ASCII(即顶部字节位零)?我希望不是。

如果您希望存储unicode并希望与旧的仅使用ascii的应用程序兼容，我认为使用VARCHAR()和UTF-8将是神奇的子弹:它只在需要时使用更多的空间。

对于那些不熟悉UTF-8的人，我可以推荐一个入门。

2009-12-10 00:10:00

其他回答

如果您使用NVARCHAR只是因为系统存储过程需要它，最常见的情况是莫名其妙的sp_executesql，并且您的动态SQL非常长，那么从性能角度来看，您最好在VARCHAR中进行所有字符串操作(连接、替换等)，然后将最终结果转换为NVARCHAR并将其输入到proc参数中。所以，不要总是使用NVARCHAR!

2017-04-12 18:17:18

我不愿意在这里再补充一个答案，因为已经有很多了，但有几个问题需要说明，这些问题要么没有说明，要么没有讲清楚。

首先:不要总是使用NVARCHAR。这是一种非常危险，而且通常代价高昂的态度/方法。说“永远不要使用游标”也不好，因为它们有时是解决特定问题的最有效的方法，而执行WHILE循环的常见变通方法几乎总是比正确执行游标要慢。

The only time you should use the term "always" is when advising to "always do what is best for the situation". Granted that is often difficult to determine, especially when trying to balance short-term gains in development time (manager: "we need this feature -- that you didn't know about until just now -- a week ago!") with long-term maintenance costs (manager who initially pressured team to complete a 3-month project in a 3-week sprint: "why are we having these performance problems? How could we have possibly done X which has no flexibility? We can't afford a sprint or two to fix this. What can we get done in a week so we can get back to our priority items? And we definitely need to spend more time in design so this doesn't keep happening!").

其次:@gbn的回答涉及到一些非常重要的问题，当路径不是100%清晰时，在做某些数据建模决策时需要考虑。但还有更多需要考虑的问题:

size of transaction log files time it takes to replicate (if using replication) time it takes to ETL (if ETLing) time it takes to ship logs to a remote system and restore (if using Log Shipping) size of backups length of time it takes to complete the backup length of time it takes to do a restore (this might be important some day ;-) size needed for tempdb performance of triggers (for inserted and deleted tables that are stored in tempdb) performance of row versioning (if using SNAPSHOT ISOLATION, since the version store is in tempdb) ability to get new disk space when the CFO says that they just spent $1 million on a SAN last year and so they will not authorize another $250k for additional storage length of time it takes to do INSERT and UPDATE operations length of time it takes to do index maintenance etc, etc, etc.

浪费空间会对整个系统产生巨大的级联效应。我写过一篇文章，详细介绍了这个主题:磁盘很便宜!奥利吗?(免费登记;对不起，我不掌握这项政策)。

第三:虽然有些答案错误地集中在“这是一个小应用程序”方面，有些答案正确地建议“使用合适的”，但没有一个答案为O.P.提供了真正的指导。问题中提到的一个重要细节是，这是他们学校的网页。太棒了!所以我们可以提出:

Fields for Student and/or Faculty names should probably be NVARCHAR since, over time, it is only getting more likely that names from other cultures will be showing up in those places. But for street address and city names? The purpose of the app was not stated (it would have been helpful) but assuming the address records, if any, pertain to just to a particular geographical region (i.e. a single language / culture), then use VARCHAR with the appropriate Code Page (which is determined from the Collation of the field). If storing State and/or Country ISO codes (no need to store INT / TINYINT since ISO codes are fixed length, human readable, and well, standard :) use CHAR(2) for two letter codes and CHAR(3) if using 3 letter codes. And consider using a binary Collation such as Latin1_General_100_BIN2. If storing postal codes (i.e. zip codes), use VARCHAR since it is an international standard to never use any letter outside of A-Z. And yes, still use VARCHAR even if only storing US zip codes and not INT since zip codes are not numbers, they are strings, and some of them have a leading "0". And consider using a binary Collation such as Latin1_General_100_BIN2. If storing email addresses and/or URLs, use NVARCHAR since both of those can now contain Unicode characters. and so on....

第四:现在你的NVARCHAR数据占用的空间是适合VARCHAR的数据所需的两倍("fits nicely" =不会变成"?")，不知怎么的，就像变魔术一样，应用程序确实增长了，现在至少有一个字段中有数百万条记录，其中大多数行是标准ASCII，但有些包含Unicode字符，所以你必须保留NVARCHAR，考虑以下问题:

If you are using SQL Server 2008 - 2016 RTM and are on Enterprise Edition, OR if using SQL Server 2016 SP1 (which made Data Compression available in all editions) or newer, then you can enable Data Compression. Data Compression can (but won't "always") compress Unicode data in NCHAR and NVARCHAR fields. The determining factors are: NCHAR(1 - 4000) and NVARCHAR(1 - 4000) use the Standard Compression Scheme for Unicode, but only starting in SQL Server 2008 R2, AND only for IN ROW data, not OVERFLOW! This appears to be better than the regular ROW / PAGE compression algorithm. NVARCHAR(MAX) and XML (and I guess also VARBINARY(MAX), TEXT, and NTEXT) data that is IN ROW (not off row in LOB or OVERFLOW pages) can at least be PAGE compressed, but not ROW compressed. Of course, PAGE compression depends on size of the in-row value: I tested with VARCHAR(MAX) and saw that 6000 character/byte rows would not compress, but 4000 character/byte rows did. Any OFF ROW data, LOB or OVERLOW = No Compression For You! If using SQL Server 2005, or 2008 - 2016 RTM and not on Enterprise Edition, you can have two fields: one VARCHAR and one NVARCHAR. For example, let's say you are storing URLs which are mostly all base ASCII characters (values 0 - 127) and hence fit into VARCHAR, but sometimes have Unicode characters. Your schema can include the following 3 fields: ... URLa VARCHAR(2048) NULL, URLu NVARCHAR(2048) NULL, URL AS (ISNULL(CONVERT(NVARCHAR([URLa])), [URLu])), CONSTRAINT [CK_TableName_OneUrlMax] CHECK ( ([URLa] IS NOT NULL OR [URLu] IS NOT NULL) AND ([URLa] IS NULL OR [URLu] IS NULL)) ); In this model you only SELECT from the [URL] computed column. For inserting and updating, you determine which field to use by seeing if converting alters the incoming value, which has to be of NVARCHAR type: INSERT INTO TableName (..., URLa, URLu) VALUES (..., IIF (CONVERT(VARCHAR(2048), @URL) = @URL, @URL, NULL), IIF (CONVERT(VARCHAR(2048), @URL) <> @URL, NULL, @URL) ); You can GZIP incoming values into VARBINARY(MAX) and then unzip on the way out: For SQL Server 2005 - 2014: you can use SQLCLR. SQL# (a SQLCLR library that I wrote) comes with Util_GZip and Util_GUnzip in the Free version For SQL Server 2016 and newer: you can use the built-in COMPRESS and DECOMPRESS functions, which are also GZip. If using SQL Server 2017 or newer, you can look into making the table a Clustered Columnstore Index. While this is not a viable option yet, SQL Server 2019 introduces native support for UTF-8 in VARCHAR / CHAR datatypes. There are currently too many bugs with it for it to be used, but if they are fixed, then this is an option for some scenarios. Please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?", for a detailed analysis of this new feature.

2015-09-30 16:41:48

在某些特殊情况下，您会有意限制数据类型，以确保它不包含某个特定集合中的字符。例如，我有一个场景，我需要在数据库中存储域名。域名的国际化在当时是不可靠的，所以最好限制在基础水平上的输入，并有助于避免任何潜在的问题。

2015-09-04 04:10:49

For your application, nvarchar is fine because the database size is small. Saying "always use nvarchar" is a vast oversimplification. If you're not required to store things like Kanji or other crazy characters, use VARCHAR, it'll use a lot less space. My predecessor at my current job designed something using NVARCHAR when it wasn't needed. We recently switched it to VARCHAR and saved 15 GB on just that table (it was highly written to). Furthermore, if you then have an index on that table and you want to include that column or make a composite index, you've just made your index file size larger.

做决定时要考虑周全;在SQL开发和数据定义中，似乎很少有“默认答案”(当然，除了不惜一切代价避免游标)。

2009-01-30 00:46:20

磁盘空间不是问题…但是记忆和性能会。双倍的页面阅读量，双倍的索引大小，奇怪的LIKE和=恒定的行为等等

你需要存储中文等脚本吗?是或不是…

来自MS BOL的《Unicode的存储和性能影响》

编辑:

最近的SO问题强调了nvarchar性能有多差…

SQL Server在搜索nvarchar字符串时使用高CPU

2008-10-13 19:33:23

varchar和nvarchar SQL Server数据类型之间的主要性能差异是什么?

推荐文章

最新文章

标签