为什么SELECT *是不好的做法?如果您添加了想要的新列,这难道不意味着需要更改的代码更少吗?
我知道SELECT COUNT(*)在某些db上是一个性能问题,但是如果您真的想要每个列呢?
为什么SELECT *是不好的做法?如果您添加了想要的新列,这难道不意味着需要更改的代码更少吗?
我知道SELECT COUNT(*)在某些db上是一个性能问题,但是如果您真的想要每个列呢?
当前回答
If you name the columns in a SELECT statement, they will be returned in the order specified, and may thus safely be referenced by numerical index. If you use "SELECT *", you may end up receiving the columns in arbitrary sequence, and thus can only safely use the columns by name. Unless you know in advance what you'll be wanting to do with any new column that gets added to the database, the most probable correct action is to ignore it. If you're going to be ignoring any new columns that get added to the database, there is no benefit whatsoever to retrieving them.
其他回答
这里有一个重要的区别,我认为大多数答案都忽略了。
SELECT *不是问题。返回SELECT *的结果是问题所在。
举个例子,在我看来:
WITH data_from_several_tables AS (
SELECT * FROM table1_2020
UNION ALL
SELECT * FROM table1_2021
...
)
SELECT id, name, ...
FROM data_from_several_tables
WHERE ...
GROUP BY ...
...
这避免了大多数答案中提到的使用SELECT *的所有“问题”:
读取的数据比预期的多?现代数据库中的优化器会意识到实际上并不需要所有列 源表的列顺序会影响输出吗?我们仍然选择和 显式返回数据。 消费者不能看到他们从SQL?您所操作的列在代码中是显式的。 索引可能不被使用?同样,现代优化器应该处理这个问题,就像我们没有选择*一样
这里有一个可读性/可重构性的优势——不需要重复很长的列列表或其他常见的查询子句(如过滤器)。如果在使用SELECT *和SELECT <columns>(在绝大多数情况下-显然总是在关键情况下配置运行代码)时,查询计划有任何不同,我会感到惊讶。
If you name the columns in a SELECT statement, they will be returned in the order specified, and may thus safely be referenced by numerical index. If you use "SELECT *", you may end up receiving the columns in arbitrary sequence, and thus can only safely use the columns by name. Unless you know in advance what you'll be wanting to do with any new column that gets added to the database, the most probable correct action is to ignore it. If you're going to be ignoring any new columns that get added to the database, there is no benefit whatsoever to retrieving them.
使用列名进行选择提高了数据库引擎从索引访问数据的可能性,而不是查询表数据。
当数据库模式发生变化时,SELECT *使您的系统暴露在意想不到的性能和功能变化中,因为您要将任何新列添加到表中,即使您的代码还没有准备好使用或显示这些新数据。
在很多情况下,SELECT *会在应用程序的运行时导致错误,而不是在设计时。它隐藏了应用程序中列更改或坏引用的信息。
有三个主要原因:
Inefficiency in moving data to the consumer. When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access. Indexing issues. Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason. Binding Problems. When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.
但这对SELECT *来说也不全是坏事。我在以下用例中大量使用它:
Ad-hoc queries. When trying to debug something, especially off a narrow table I might not be familiar with, SELECT * is often my best friend. It helps me just see what's going on without having to do a boatload of research as to what the underlying column names are. This gets to be a bigger "plus" the longer the column names get. When * means "a row". In the following use cases, SELECT * is just fine, and rumors that it's a performance killer are just urban legends which may have had some validity many years ago, but don't now: SELECT COUNT(*) FROM table; in this case, * means "count the rows". If you were to use a column name instead of * , it would count the rows where that column's value was not null. COUNT(*), to me, really drives home the concept that you're counting rows, and you avoid strange edge-cases caused by NULLs being eliminated from your aggregates. Same goes with this type of query: SELECT a.ID FROM TableA a WHERE EXISTS ( SELECT * FROM TableB b WHERE b.ID = a.B_ID); in any database worth its salt, * just means "a row". It doesn't matter what you put in the subquery. Some people use b's ID in the SELECT list, or they'll use the number 1, but IMO those conventions are pretty much nonsensical. What you mean is "count the row", and that's what * signifies. Most query optimizers out there are smart enough to know this. (Though to be honest, I only know this to be true with SQL Server and Oracle.)