SQL语法提示工具SQL Prompt教程：为什么SELECT *（BP005）在生产代码中不好？（下）-控件新闻-慧都网

SQL语法提示工具SQL Prompt教程：为什么SELECT *（BP005）在生产代码中不好？（下）

翻译|使用教程|编辑：莫成敏|2019-10-28 16:30:45.737|阅读 255 次

概述：SQL Prompt是一款实用的SQL语法提示工具。如果“提示”警告您在SELECT语句中使用星号或“star”（*），请考虑将其替换为显式列列表。它将防止不必要的网络负载和查询性能问题，并避免在插入表时如果列顺序更改而造成问题。本文是该教程的下半部分内容！

# 慧都年终大促·界面/图表报表/文档/IDE等千款热门软控件火热促销中 >>

相关链接：

SQL Prompt根据数据库的对象名称、语法和代码片段自动进行检索，为用户提供合适的代码选择。自动脚本设置使代码简单易读--当开发者不大熟悉脚本时尤其有用。SQL Prompt安装即可使用，能大幅提高编码效率。此外，用户还可根据需要进行自定义，使之以预想的方式工作。

点击下载SQL Prompt试用版

如果“提示”警告您在SELECT语句中使用星号或“star”（*），请考虑将其替换为显式列列表。它将防止不必要的网络负载和查询性能问题，并避免在插入表时如果列顺序更改而造成问题。这篇文章主要描述该教程的后半部分内容，“为什么SELECT *在生产代码中不好？”的一些内容（紧接上文），还有“在应用程序中选择*”的内容。

误解

使用SELECT *，您不能确保代码始终以相同的顺序返回相同的列，这意味着它对数据库重构没有弹性。对表源的上游修改可以更改列的顺序或数量。如果使用来传输数据，INSERT INTO…SELECT *，那么最佳结果将是一个错误，因为分配数据的后果是错误的目标列可能会令人恐惧

我将演示如果在生产代码中使用它，然后需要进行一些数据库重构，那么这将是多么危险。在这里，我们在复制敏感信息时会犯一个错误。这是非常容易做到的，并且可能导致财务违规，而不会触发任何错误。如果您情绪紧张，请立即移开视线。

/* we create a table just for our testing */
  CREATE TABLE dbo.ExchangeRates --lets pretend we have this data
    (
    CurrencyRateDate DATETIME NOT NULL,
    AverageRate MONEY NOT NULL,
    EndOfDayRate MONEY NOT NULL,
    FromCurrency NVARCHAR(50) NOT NULL,
    FromRegion NVARCHAR(50) NOT NULL,
    ToCurrency NVARCHAR(50) NOT NULL,
    ToRegion NVARCHAR(50) NOT NULL
    );
  /* we now steal data for it from AdventureWorks next-door */
  INSERT INTO dbo.ExchangeRates
  SELECT CurrencyRate.CurrencyRateDate, CurrencyRate.AverageRate,
      CurrencyRate.EndOfDayRate, Currency.Name AS FromCurrency,
      CountryRegion.Name AS FromRegion, CurrencyTo.Name AS ToCurrency,
      CountryRegionTo.Name AS ToRegion
      FROM Adventureworks2016.Sales.CurrencyRate
        INNER JOIN Adventureworks2016.Sales.Currency
          ON CurrencyRate.FromCurrencyCode = Currency.CurrencyCode
        INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency
          ON Currency.CurrencyCode = CountryRegionCurrency.CurrencyCode
        INNER JOIN Adventureworks2016.Person.CountryRegion
          ON CountryRegionCurrency.CountryRegionCode = CountryRegion.CountryRegionCode
        INNER JOIN Adventureworks2016.Sales.Currency AS CurrencyTo
          ON CurrencyRate.ToCurrencyCode = CurrencyTo.CurrencyCode
        INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency AS CountryRegionCurrencyTo
          ON CurrencyTo.CurrencyCode = CountryRegionCurrencyTo.CurrencyCode
        INNER JOIN Adventureworks2016.Person.CountryRegion AS CountryRegionTo
          ON CountryRegionCurrencyTo.CountryRegionCode = CountryRegionTo.CountryRegionCode;
  GO
  /* so we start our test by creating a view to show exchange rates from equador  */
  CREATE VIEW dbo.EquadorExhangeRates
  AS
  SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate,
         ExchangeRates.EndOfDayRate, ExchangeRates.FromCurrency,
         ExchangeRates.FromRegion, ExchangeRates.ToCurrency, ExchangeRates.ToRegion
    FROM dbo.ExchangeRates
    WHERE ExchangeRates.FromRegion = 'Ecuador';
  go
  /* now we just fill a table variable with the first ten rows from the view and display them */
  DECLARE  @MyUsefulExchangeRates TABLE
    (
    CurrencyRateDate DATETIME NOT NULL,
    AverageRate MONEY NOT NULL,
    EndOfDayRate MONEY NOT NULL,
    FromCurrency NVARCHAR(50) NOT NULL,
    FromRegion NVARCHAR(50) NOT NULL,
    ToCurrency NVARCHAR(50) NOT NULL,
    ToRegion NVARCHAR(50) NOT NULL
    );
  INSERT INTO @MyUsefulExchangeRates (
    CurrencyRateDate, AverageRate, EndOfDayRate,
    FromCurrency, FromRegion,ToCurrency, ToRegion)
    SELECT * --this isn't good at all
      FROM dbo.EquadorExhangeRates;
  --disply the first ten rows from the table to see what we have
  SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate,
    UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion
    FROM @MyUsefulExchangeRates AS UER
    ORDER BY UER.CurrencyRateDate DESC;
  GO
  /* end of first part. Now someone decides to alter the view */
  alter VIEW dbo.EquadorExhangeRates
  AS
  SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate,
    ExchangeRates.EndOfDayRate, ExchangeRates.ToCurrency, ExchangeRates.ToRegion, ExchangeRates.FromCurrency,
    ExchangeRates.FromRegion
    FROM dbo.ExchangeRates
    WHERE ExchangeRates.FromRegion = 'Ecuador';
  GO
  /* we repeat the routine to extract the first ten rows exactly as before */
  DECLARE  @MyUsefulExchangeRates TABLE
    (
    CurrencyRateDate DATETIME NOT NULL,
    AverageRate MONEY NOT NULL,
    EndOfDayRate MONEY NOT NULL,
    FromCurrency NVARCHAR(50) NOT NULL,
    FromRegion NVARCHAR(50) NOT NULL,
    ToCurrency NVARCHAR(50) NOT NULL,
    ToRegion NVARCHAR(50) NOT NULL
    );
  INSERT INTO @MyUsefulExchangeRates(
    CurrencyRateDate, AverageRate, EndOfDayRate,
    FromCurrency, FromRegion,ToCurrency, ToRegion)
    SELECT * --bad, bad, bad
      FROM dbo.EquadorExhangeRates;
  --check that the data is the same. It isn't is it? No sir!
  SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate,
    UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion
    FROM @MyUsefulExchangeRates AS UER
    ORDER BY UER.CurrencyRateDate DESC;
  GO
  /* now just tidy up and tear down */
  DROP VIEW dbo.EquadorExhangeRates
  DROP TABLE  dbo.ExchangeRates

这是“之前”和“之后”结果…。

SQL语法提示工具SQL Prompt教程：为什么SELECT *（BP005）在生产代码中不好？（下）

如您所见，通过切换“to”和“from”列，我们“无意”破坏了数据。引用列列表在您的代码中是多余的。但是，它的执行速度甚至比仅用星号指定所有列（假设它们按特定顺序排列）时的速度甚至更快。

约束问题

当我们使用SELECT *与大量的联接表时，我们可以并且可能会有重复的列名。这是来自AdventureWorks的简单查询：

SELECT *
    FROM HumanResources.Employee AS e
      INNER JOIN Person.Person AS p
        ON p.BusinessEntityID = e.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
        ON e.BusinessEntityID = edh.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON edh.DepartmentID = d.DepartmentID
    WHERE (edh.EndDate IS NULL);

此代码将显示重复的列名称：

DECLARE @SourceCode NVARCHAR(4000)=' 
  SELECT *
    FROM HumanResources.Employee AS e
      INNER JOIN Person.Person AS p
        ON p.BusinessEntityID = e.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
        ON e.BusinessEntityID = edh.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON edh.DepartmentID = d.DepartmentID
    WHERE (edh.EndDate IS NULL);
  --'
  SELECT Count(*) AS Duplicates, name
    FROM sys.dm_exec_describe_first_result_set(@SourceCode, NULL, 1)
    GROUP BY name
    HAVING Count(*) > 1
    ORDER BY Count(*) DESC;

SQL语法提示工具SQL Prompt教程：为什么SELECT *（BP005）在生产代码中不好？（下）

这将给试图在选择命名列时理解这种结果的应用程序带来问题。如果您尝试根据结果创建一个临时表，使用SELECT…INTO会失败。

SELECT * INTO MyTempTable
    FROM HumanResources.Employee AS e
      INNER JOIN Person.Person AS p
        ON p.BusinessEntityID = e.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
        ON e.BusinessEntityID = edh.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON edh.DepartmentID = d.DepartmentID
    WHERE (edh.EndDate IS NULL);
  Msg 2705, Level 16, State 3, Line 19
  Column names in each table must be unique. Column name 'BusinessEntityID' in table 'MyTempTable' is specified more than once.

同样，这意味着您的SELECT *代码很脆弱。如果有人在一个表中更改了名称，则可能会在SELECT * INTO其他位置的上创建重复的列，而您只能挠头，想知道为什么正常工作的例程突然崩溃了

有一个地方SELECT *具有特殊的意义，不能被替代。这是在将结果转换为JSON时，并且您需要将联接表作为对象嵌入的结果时发生的情况。

SELECT * 
    FROM HumanResources.Employee AS employee
      INNER JOIN Person.Person AS person
        ON person.BusinessEntityID = employee.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS history
        ON employee.BusinessEntityID = history.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON  history.DepartmentID = d.DepartmentID
    WHERE ( history.EndDate IS NULL) FOR JSON AUTO

这将为您提供…（我仅显示数组中的第一个文档）

[{"BusinessEntityID": 1,"NationalIDNumber": "295847284","LoginID": "adventure-works\\ken0","JobTitle": "Chief Executive Officer","BirthDate": "1969-01-29","MaritalStatus": "S","Gender": "M","HireDate": "2009-01-14","SalariedFlag": true, "VacationHours": 99, "SickLeaveHours": 69, "CurrentFlag": true, "rowguid": "F01251E5-96A3-448D-981E-0F99D789110D","ModifiedDate": "2014-06-30T00:00:00",
        "person": [{
            "BusinessEntityID": 1, "PersonType": "EM","NameStyle": false, "FirstName": "Ken","MiddleName": "J","LastName": "Sánchez","EmailPromotion": 0, "Demographics": "0<\/TotalPurchaseYTD><\/IndividualSurvey>","rowguid": "92C4279F-1207-48A3-8448-4636514EB7E2","ModifiedDate": "2009-01-07T00:00:00",
            "history": [{
                "BusinessEntityID": 1, "DepartmentID": 16, "ShiftID": 1, "StartDate": "2009-01-14","ModifiedDate": "2009-01-13T00:00:00",
                "d": [{
                    "DepartmentID": 16, "Name": "Executive","GroupName": "Executive General and Administration","ModifiedDate": "2008-04-30T00:00:00"
                }]
            }]
        }]
    }}

这里没有冲突，因为ModifiedDate列被封装在表示源表的对象中

对应的XML给出如下：

<employee BusinessEntityID="1" NationalIDNumber="295847284" LoginID="adventure-works\ken0" 
          JobTitle="Chief Executive Officer" BirthDate="1969-01-29" MaritalStatus="S" Gender="M" HireDate="2009-01-14" SalariedFlag="1" VacationHours="99"
          SickLeaveHours="69" CurrentFlag="1" rowguid="F01251E5-96A3-448D-981E-0F99D789110D" ModifiedDate="2014-06-30T00:00:00">
 <person BusinessEntityID="1" PersonType="EM" NameStyle="0" FirstName="Ken" MiddleName="J" LastName="Sánchez" EmailPromotion="0" 
         rowguid="92C4279F-1207-48A3-8448-4636514EB7E2" ModifiedDate="2009-01-07T00:00:00">
     <Demographics>
         <IndividualSurvey
             xmlns="//schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey">
             <TotalPurchaseYTD>0</TotalPurchaseYTD>
         </IndividualSurvey>
     </Demographics>
     <history BusinessEntityID="1" DepartmentID="16" ShiftID="1" StartDate="2009-01-14" ModifiedDate="2009-01-13T00:00:00">
         <d DepartmentID="16" Name="Executive" GroupName="Executive General and Administration" ModifiedDate="2008-04-30T00:00:00"/>
     </history>
  </person>
</employee>

可维护性

在布置代码时，您指定的列不仅避免在将值分配给正确的列或变量时出错，而且还使代码更具可读性。尽您所能，仅出于将来的目的，或者有一天要负责维护代码的可怜的灵魂，就应详细说明所涉及的列的名称。当然，代码看起来有些笨拙，但是如果您的肩膀上出现了一位仙女，并说如果您两次键入代码，您的代码将更加清晰和可靠，您会这样做吗？

在应用程序中选择*

有时，您会看到长时间运行的查询，这些查询请求所有列并且源于一个应用程序，通常是使用LINQ的应用程序。通常，这不是故意的，但是开发人员犯了一个错误，没有指定列的说明，看起来无辜的LINQ查询会转换为SELECT *或包含每个列的列列表。如果该WHERE条款过于笼统，或者甚至被完全遗漏，那么后果就更加复杂了，因为网络始终是最慢的组件，所有不必要的数据都在网络上堆积。

例如，使用Adventureworks和LinqPad，可以在LINQ中执行此操作：

Persons.OrderBy (p => p.BusinessEntityID).Take (100)

…LINQ将其转换为实际执行的查询。您会看到它选择了所有列…

SELECT TOP (100) [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
  FROM [Person].[Person] AS [t0]
  ORDER BY [t0].[BusinessEntityID]

同样，这个表达式

from row in Persons select row

…将提供整个表格中每一行的每一列。

SELECT [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
  FROM [Person].[Person] AS [t0]

相比之下，这…

from row in Persons.Where(i => i.LastName == "Bradley") select row.FirstName+" "+row.LastName

…翻译成更明智的：

-- Region Parameters
  DECLARE @p0 NVarChar(1000) = 'Bradley'
  DECLARE @p1 NVarChar(1000) = ' '
  -- EndRegion
  SELECT ([t0].[FirstName] + @p1) + [t0].[LastName] AS [value]
  FROM [Person].[Person] AS [t0]
  WHERE [t0].[LastName] = @p0

结论

一般的代码味道是请求提供比您需要的更多的数据。允许数据源为您进行过滤几乎总是更好、更快的方法。使用SELECT *，在某些情况下是完全合法的，通常是这个更普遍问题的标志。对于那些精通C＃或VB但不精通SQL的开发人员来说，诱使他们下载整行甚至整个表，并在更熟悉的领域进行过滤是很诱人的。额外的网络负载和延迟本身应该足以阻止这种做法，但这通常被误认为是“数据库慢”。长列列表（通常列出所有列）几乎与SELECT *一样有害，尽管SELECT *在进行任何重构时会带来额外的风险。

本教程内容到这里就结束了，感兴趣的朋友可以继续关注我们，后面会不管更新新的文章内容！您也可以下载SQL Prompt免费版评估一下~