10月 212014

The most hard part in testing is to write test cases, which is time-consuming and error-prone. Fortunately, besides Python built-in modules such as

`doctest`

, `unittest`

, there are quite a few third-party packages that could help with automated testing. My favorite one is pytest, which enjoys proven record and syntax sugar. ### Step 1: test-driven development

For example, there is a coding challenge on Leetcode:

Find Minimum in Rotated Sorted ArraySuppose a sorted array is rotated at some pivot unknown to you beforehand.

(i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2).

Find the minimum element.

You may assume no duplicate exists in the array.

The straightforward way to find a minimal element in an array(or list in Python) is sequential searching, which goes through every element and has a time complexity of

However, this question provides a rotated sorted array, which suggests a binary search and reduces the complexity from

`O(N)`

. If the array is sorted, then the minimal one is the first element that only costs `O(1)`

.However, this question provides a rotated sorted array, which suggests a binary search and reduces the complexity from

`O(N)`

to `O(logN)`

.As usual, write the test cases first. The great thing for pytest is that it significantly simplies the effort to code the test cases: in this example, I only use 3 lines to generate 101 test cases to cover all conditions from 0 to 99 and also include an null test.

Next step is to code the function. It is easy to transplant the iterative approach of binary search to this question. If the pointer is between a sorted segment, then return the most left element as minimal. Otherwise, adjust the right boundary and the left boundary.

`# test1.py`

import pytest

# Prepare 101 test cases

array = list(range(100))

_testdata = [[array[i: ] + array[ :i], 0] for i in range(100)]

_testdata += [pytest.mark.empty(([], None))]

# Code the initial binary search function

def findMinPrev(num):

lo, hi = 0, len(num) - 1

while lo <= hi:

if num[lo] <= num[hi]:

return num[lo]

mid = (lo + hi) / 2

if num[mid] < num[hi]:

hi = mid - 1

else:

lo = mid + 1

@pytest.mark.parametrize('input, expected', _testdata)

def test_findMinPrev(input, expected):

assert findMinPrev(input) == expected

After running the

`py.test -v test1.py`

command, part of the results shows below. 65 tests passed and 36 failed; the failed cases return the much bigger values that suggests out of boundary during loops, and the selection of the boudaries may be too aggresive. `test1.py:20: AssertionError`

_________________________ test_findMinPrev[input98-0] _________________________

input = [98, 99, 0, 1, 2, 3, ...], expected = 0

@pytest.mark.parametrize('input, expected', _testdata)

def test_findMinPrev(input, expected):

> assert findMinPrev(input) == expected

E assert 98 == 0

E + where 98 = findMinPrev([98, 99, 0, 1, 2, 3, ...])

test1.py:20: AssertionError

==================== 36 failed, 65 passed in 0.72 seconds =====================

Now I adjust the right boundary slightly and finally come up with a solution that passes all the tests.

`def findMin(num):`

lo, hi = 0, len(num) - 1

while lo <= hi:

if num[lo] <= num[hi]:

return num[lo]

mid = (lo + hi) / 2

if num[mid] < num[hi]:

hi = mid

else:

lo = mid + 1

### Step 2: performance profiling

Besides the right solution, I am also interested in if the binary search method has indeed improved the performance. This step I choose line_profiler given its line-by-line ability of profiling. I take the most basic one (the sequential search) as benchmark, and also include the method that applies the

`min`

function since a few functions similar to it in Pyhton implement vectorizaiton to speed up. The test case is a rotated sorted array with 10 million elements. `# test2.py`

from line_profiler import LineProfiler

from sys import maxint

@profile

def findMinRaw(num):

"""Sequential searching"""

if not num:

return

min_val = maxint

for x in num:

if x < min_val:

min_val = x

return min_val

@profile

def findMinLst(num):

"""Searching by list comprehension"""

if not num:

return

return min(num)

@profile

def findMin(num):

""""Binary search"""

lo, hi = 0, len(num) - 1

while lo <= hi:

if num[lo] <= num[hi]:

return num[lo]

mid = (lo + hi) / 2

if num[mid] < num[hi]:

hi = mid

else:

lo = mid + 1

# Prepare a rotated array

array = list(range(10000000))

_testdata = array[56780: ] + array[ :56780]

# Test the three functions

findMinRaw(_testdata)

findMinLst(_testdata)

findMin(_testdata)

After running

`kernprof -l -v test2.py`

, I have the output as below. The sequential search has hit the loops 10000001 times and costs almost 14 seconds. The `min`

function encapsulate all details inside and uses 0.5 seconds which is 28 times faster. On the contrary, the binary search method only takes 20 loops to find the minimal value and spends just 0.0001 seconds. As a result, while dealing with large number, an improved algorithm can really save time. `Total time: 13.8512 s`

File: test2.py

Function: findMinRaw at line 4

Line # Hits Time Per Hit % Time Line Contents

==============================================================

4 @profile

5 def findMinRaw(num):

6 1 13 13.0 0.0 if not num:

7 return

8 1 3 3.0 0.0 min_val = maxint

9 10000001 16031900 1.6 47.5 for x in num:

10 10000000 17707821 1.8 52.5 if x < min_val:

11 2 5 2.5 0.0 min_val = x

12 1 3 3.0 0.0 return min_val

Total time: 0.510298 s

File: test2.py

Function: findMinLst at line 15

Line # Hits Time Per Hit % Time Line Contents

==============================================================

15 @profile

16 def findMinLst(num):

17 1 4 4.0 0.0 if not num:

18 return

19 1 1243016 1243016.0 100.0 return min(num)

Total time: 0.000101812 s

File: test2.py

Function: findMin at line 22

Line # Hits Time Per Hit % Time Line Contents

==============================================================

22 @profile

23 def findMin(num):

24 1 15 15.0 6.0 lo, hi = 0, len(num) - 1

25 20 40 2.0 16.1 while lo <= hi:

26 20 48 2.4 19.4 if num[lo] <= num[hi]:

27 1 2 2.0 0.8 return num[lo]

28 19 54 2.8 21.8 mid = (lo + hi) / 2

29 19 50 2.6 20.2 if num[mid] < num[hi]:

30 5 10 2.0 4.0 hi = mid

31 else:

32 14 29 2.1 11.7 lo = mid + 1