二分搜索 - Binary Search
二分搜索是一種在有序數組中尋找目標值的經典方法,也就是說使用前提是『有序數組』。非常簡單的題中『有序』特征非常明顯,但更多時候可能需要我們自己去構造『有序數組』。下面我們從最基本的二分搜索開始逐步深入。
一、lower/upper bound
定義 lower bound 為在給定升序數組中大於等於目標值的最小索引,upper bound 則為小於等於目標值的最大索引,下面上代碼和測試用例。
import java.util.*; public class Main { public static void main(String[] args) {int[] nums = new int[]{1,2,2,3,4,6,6,6,13,18}; System.out.println(lowerBound(nums, 6)); // 5 System.out.println(upperBound(nums, 6)); // 7 System.out.println(lowerBound(nums, 7)); // 8 System.out.println(upperBound(nums, 7)); // 7 } /* * nums[index] >= target, min(index)*/ public static int lowerBound(int[] nums, int target) { if (nums == null || nums.length == 0) return -1; int lb = -1, ub = nums.length; while (lb + 1 < ub) { int mid = lb + (ub - lb) / 2; if (nums[mid] < target) { lb = mid; }else { ub = mid; } } return lb + 1; } /* * nums[index] <= target, max(index) */ public static int upperBound(int[] nums, int target) { if (nums == null || nums.length == 0) return -1; int lb = -1, ub = nums.length; while (lb + 1 < ub) { int mid = lb + (ub - lb) / 2; if (nums[mid] > target) { ub = mid; } else { lb = mid; } } return ub - 1; } }
源碼分析
以lowerBound
的實現為例,以上二分搜索的模板有幾個非常優雅的實現:
while
循環中lb + 1 < ub
, 而不是等號,因為取等號可能會引起死循環。初始化lb < ub
時,最後循環退出時一定有lb + 1 == ub
.mid = lb + (ub - lb) / 2
, 可有效防止兩數相加後溢出。lb
和ub
的初始化,初始化為數組的兩端以外,這種初始化方式比起0
和nums.length - 1
有不少優點,詳述如下。
如果遇到有問插入索引的位置時,可以分三種典型情況:
- 目標值在數組範圍之內,最後返回值一定是
lb + 1
- 目標值比數組最小值還小,此時
lb
一直為-1
, 故最後返回lb + 1
也沒錯,也可以將-1
理解為數組前一個更小的值 - 目標值大於等於數組最後一個值,由於循環退出條件為
lb + 1 == ub
, 那麽循環退出時一定有lb = A.length - 1
, 應該返回lb + 1
綜上所述,返回lb + 1
是非常優雅的實現。其實以上三種情況都可以統一為一種方式來理解,即索引-1
對應於數組前方一個非常小的數,索引ub
即對應數組後方一個非常大的數,那麽要插入的數就一定在lb
和ub
之間了。
有時復雜的邊界條件處理可以通過『補項』這種優雅的方式巧妙處理。
關於lb 和 ub 的初始化,由於mid = lb + (ub - lb) / 2
, 且有lb + 1 < ub
,故 mid 還是有可能為ub - 1
或者lb + 1
的,在需要訪問mid + 1
或者mid - 1
處索引的元素時可能會越界。這時候就需要將初始化方式改為lb = 0, ub = A.length - 1
了,最後再加一個關於lb, ub
處索引元素的判斷即可。如 Search for a Range 和 Find Peak Element. 尤其是 Find Peak Element 中 lb 和 ub 的初始值如果初始化為-1和數組長度會帶來一些麻煩。
二、最優解
除了在有序數組中尋找目標值這種非常直接的二分搜索外,我們還可以利用二分搜索求最優解(最大值/最小值),通常這種題中只是隱含了『有序數組』,需要我們自己構造。
用數學語言來描述就是『求滿足某條件 C(x) 的最小/大的 x』,以求最小值為例,對於任意滿足條件的 xxx, 如果所有的 x≤x′≤UBx \leq x^\prime \leq UBx≤x′≤UB 對於 C(x′)C(x^\prime)C(x′) 都為真(其中 UB
可能為無窮大,也可能為滿足條件的最大的解,如果不滿足此條件就不能保證二分搜索的正確性),那麽我們就能使用二分搜索進行求解,其中初始化時下界lb
初始化為不滿足條件的值LB
, 上界初始化為滿足條件的上界UB
. 隨後在while
循環內部每次取中,滿足條件就取ub = mid
, 否則lb = mid
, 那麽最後ub
就是要求的最小值。求最大值時類似,只不過處理的是lb
.
以 POJ No.1064 為例。
Cable master Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 66707 Accepted: 13737 Description Inhabitants of the Wonderland have decided to hold a regional programming contest. The Judging Committee has volunteered and has promised to organize the most honest contest ever. It was decided to connect computers for the contestants using a "star" topology - i.e. connect them all to a single central hub. To organize a truly honest contest, the Head of the Judging Committee has decreed to place all contestants evenly around the hub on an equal distance from it. To buy network cables, the Judging Committee has contacted a local network solutions provider with a request to sell for them a specified number of cables with equal lengths. The Judging Committee wants the cables to be as long as possible to sit contestants as far from each other as possible. The Cable Master of the company was assigned to the task. He knows the length of each cable in the stock up to a centimeter,and he can cut them with a centimeter precision being told the length of the pieces he must cut. However, this time, the length is not known and the Cable Master is completely puzzled. You are to help the Cable Master, by writing a program that will determine the maximal possible length of a cable piece that can be cut from the cables in the stock, to get the specified number of pieces. Input The first line of the input file contains two integer numb ers N and K, separated by a space. N (1 = N = 10000) is the number of cables in the stock, and K (1 = K = 10000) is the number of requested pieces. The first line is followed by N lines with one number per line, that specify the length of each cable in the stock in meters. All cables are at least 1 meter and at most 100 kilometers in length. All lengths in the input file are written with a centimeter precision, with exactly two digits after a decimal point. Output Write to the output file the maximal length (in meters) of the pieces that Cable Master may cut from the cables in the stock to get the requested number of pieces. The number must be written with a centimeter precision, with exactly two digits after a decimal point. If it is not possible to cut the requested number of pieces each one being at least one centimeter long, then the output file must contain the single number "0.00" (without quotes). Sample Input 4 11 8.02 7.43 4.57 5.39 Sample Output 2.00
Problem
有 N 條繩子,它們的長度分別為 Li. 如果從它們中切割出 K 條長度相同的繩子的話,這 K 條繩子每條最長能有多長?答案保留到小數點後兩位。
輸入
N = 4, L = {8.02, 7.43, 4.57, 5.39}, K = 11
輸出
2.00
題解
這道題看似是一個最優化問題,我們來嘗試下使用模板二的思想求解,**令 C(x)C(x)C(x) 為『可以得到 KKK 條長度為 xxx 的繩子』。**根據題意,我們可以將上述條件進一步細化為:C(x)=∑i(floor(Li/x))≥KC(x) = \sum_i(floor(L_i / x)) \geq KC(x)=i∑(floor(Li/x))≥K
我們現在來分析下可行解的上下界。由於答案保留小數點後兩位,顯然繩子長度一定大於0,大於0的小數點後保留兩位的最小值為0.01
, 顯然如果問題最後有解,0.01
一定是可行解中最小的,且這個解可以分割出的繩子條數是最多的。一般在 OJ 上不同變量都是會給出範圍限制,那麽我們將上界初始化為最大範圍 + 0.01
, 它一定在可行解之外(也可以遍歷一遍數組取數組最大值,但其實二分後復雜度相差不大)。使用二分搜索後最後返回lb
即可。
import java.io.*; import java.util.*; public class Main { public static void main(String[] args) { Scanner in = new Scanner(System.in); int n = in.nextInt(); int k = in.nextInt(); double[] nums = new double[n]; for (int i = 0; i < n; i++) { nums[i] = in.nextDouble(); } System.out.printf("%.2f\n", Math.floor(solve(nums, k) * 100) / 100); } public static double solve(double[] nums, int K) { double lb = 0.00, ub = 10e5 + 0.01; // while (lb + 0.001 < ub) { for (int i = 0; i < 100; i++) { double mid = lb + (ub - lb) / 2; if (C(nums, mid, K)) { lb = mid; } else { ub = mid; } } return lb; } public static boolean C(double[] nums, double seg, int k) { int count = 0; for (double num : nums) { count += Math.floor(num / seg); } return count >= k; } }
源碼分析
方法C
只做一件事,給定數組nums
, 判斷是否能切割出K
條長度均為seg
的繩子。while
循環中使用lb + 0.001 < ub
, 不能使用0.01
, 因為計算mid
時有均值的計算,對於double
型數值否則會有較大誤差。
三、二分搜索的 while
結束條件判定
對於整型我們通常使用lb + 1 < ub
, 但對於double
型數據來說會有些精度上的丟失,使得結束條件不是那麽好確定。像上題中采用的方法是題目中使用的精度除10。但有時候這種精度可能還是不夠,如果結束條件lb + EPS < ub
中使用的 EPS 過小時 double 型數據精度有可能不夠從而導致死循環的產生!這時候我們將while
循環體替換為for (int i = 0; i < 100; i++)
, 100 次循環後可以達到 10?3010^{-30}10?30 精度範圍,一般都沒問題。
二分搜索 - Binary Search