SQL優化的五個建議
1.學習如何恰當地建立索引(Learn How to Create Indexes Properly)
學習如何正確地建立索引是你可以提高你的SQL查詢效能能做的最好的事情,在特殊情況下,索引能更快地訪問資料庫,對於資料庫初學者來說,索引是一個神祕或者說困難的事情,他們要麼什麼都沒檢索到,要麼試圖檢索所有東西。當然,這些方法都不正確,如果一點索引都沒有,你的查詢有可能會很慢;如果你索引所有東西,會導致你的updates和insert觸發器效率很低下。
2.只檢索你真正需要的資料(Only Retrieve the Data You Really Need)
查詢需要的列資訊最常用的方法是使用*,但可能有些列不是你真正需要的;如果表很小,檢索附加列都沒有太大區別,但是,對於較大的資料集,指定列查詢可能會節省大量的查詢時間。然而,請牢記一點,許多流行的ORM 不會簡單讓你建立擇表中的列的子集的一個查詢。
Similarly, if you only need a limited number of rows you should use the
LIMIT
clause (or your database’s
equivalent). Take a look at the following code:
For instance, if you only want to display the first 10 records out of 50,000 on your website, it is advisable to inform the database about it. This way, the database will stop the search after finding 10 rows rather than scan the whole table:
The
LIMIT
statement is available
in MySQL and Postgresql, but other databases have ways to achieve a similar effect.
These above examples illustrate the general idea – you should always think whether you need all the rows returned by an SQL statement. If you don’t, there is always some room for improvement.
3.避免在左手邊的運算子的功能(Avoid Functions on the Left Hand-Side of the Operator)
Functions are a handy way to provide complex tasks and they can be used both in the
SELECT
clause and in the
WHERE
clause. Nevertheless,
their application in WHERE
clauses may result in major performance issues. Take a look at the following example:
Even if there is an index on the appointment_date column in the table users, the query will still need to perform a full table scan. This is because we use the
DATEDIFF
function on the column
appointment_date. The output of the function is evaluated at run time, so the server has to visit all the rows in the table to retrieve the necessary data. To enhance performance, the following change can be made:
This time, we aren’t using any functions in the
WHERE
clause, so the system
can utilize an index to seek the data more efficiently.
4.考慮擺脫相關子查詢(Consider Getting Rid of Correlated Subqueries)
A correlated subquery is a subquery which depends on the outer query. It uses the data obtained from the outer query in its
WHERE
clause. Suppose you want
to list all users who have made a donation. You could retrieve the data with the following code:
In the above case, the subquery runs once for each row of the main query, thus causing possible inefficiency. Instead, we can apply a join:
If there are millions of users in the database, the statement with the correlated subquery will most likely be less efficient than the
INNER JOIN
because it needs
to run millions of times. But if you were to look for donations made by a single user, the correlated subquery might not be a bad idea. As a rule of thumb, if you look for many or most of the rows, try to avoid using correlated subqueries. Keep in mind, however,
that using correlated subqueries might be inevitable in some cases.
5.避免LIKE模式開頭的萬用字元字元(Avoid Wildcard Characters at the Beginning of a
LIKE
Pattern)
Whenever possible, avoid using the
LIKE
pattern in the following
way:
The use of the
%
wildcard at the beginning
of the LIKE
pattern will prevent
the database from using a suitable index if such exists. Since the system doesn’t know what the beginning of the name column is, it will have to perform a full table scan anyway. In many cases, this may slow the query execution. If the query can be rewritten
in the following way:
then the performance may be enhanced. You should always consider whether a wildcard character at the beginning is really essential.
小貼士-讀取執行計劃(Read the Execution Plan)
The performance of your SQL queries depends on multiple factors, including your database model, the indexes available and the kind of information you wish to retrieve. The best way to keep track of what’s happening with your queries is to analyse the execution plan produced by the optimizer. You can use it to experiment and find the best solution for your statements