Using Numpy.where function to replace for loops with if-else statements
Originally published in January 2018, here.
While taking the Deep Learning course, Andrew Ng mentioned in the programming assignment that it was possible to vectorise/replace for loops with if-else. So intuitive me went looking for a solution. Most solutions from Stackoverflow didn’t really help and the documentation explanation was way off my initial basic understanding, (though now I get it well), so I had to spend additional 2 hours playing with few examples to grasp the concept. I’ll just be sharing some examples I played with and hopefully you’ll be able to use the np.where function next time in you code like intuition.
The goal to replace this kind of for loop used together with an if else
import numpy as np
a = (np.random.rand(1,10)) #create a random (1,10) array
result = np.zeros(10) #create an array of zeros to hold the result
result = (b.reshape(1,10))# reshape b to make sure its a (1,10) array and not (10,)
print ("a: \n" + str(a))
for i in range(10):
if a[0,i]>0.5:
result[0,i] = 1
else:
result[0,i] = 0
print ("result: \n" +str(result))
With something as simple as this:
import numpy as np
a = (np.random.rand(1,10)) #create a random (1,10) array
print ("a: \n" + str(a))
result = (np.where(a>0.5,1,0))
print ("result: \n" +str(result))
t np.where does is that it first creates a similar array of size of the first parameter. The first param is the array we are looping through and checking through each entry if the value is >0.5, second param is the value that is being replaced in the new array if the condition is true, and the third parameter is the value that is being replaced in the new array if the condition is false.
This is a sample result of what both codes would output.
a:
[[ 0.76456104 0.75182292 0.11825977 0.97635566 0.3220488 0.44064402
0.08142387 0.65055208 0.88407891 0.40323535]]
result:
[[1 1 0 1 0 0 0 1 1 0]]
Take a look at another example that would help clear some more fog from the documentation.
import numpy as np
array = [[True, False], [True, True]] # (2,2) array of boolean values
array_cond_true = [[1, 2], [3, 4]] # (2,2) array of where to pick values if array at position is true
array_cond_false = [[9, 8], [7, 6]] # (2,2) array of where to pick values if array at position is false
print ("result: \n" + str(np.where(array,
array_cond_true,
array_cond_false)))
The output
result:
[[1 8]
[3 4]]
This example is run np.where creates a 2x2 array (i.e. the size of the array of the first param), it then checks through the entries of the array in the first param, at whatever position the entry it true, it looks over to the second param and gets the value of the array at that position, if it’s false it looks over the third param and gets the value of the array at that position. Then outputs the new array. This second example code is actually from the documentation, though just modified a bit to be more simpler
How about performance
I know you may be asking how about performance, which is better. If you will be using this in programming your AI algorithms, of-course this should be important to you, so lets go over that. I recreated the first example to take over 1,000,000 entries, this should be okay to show reasonable time difference.
import numpy as np
import time
a = (np.random.rand(1,1000000)) #create a random (1,10) array
result1 = np.zeros((1,1000000),dtype=np.int) #create an array of zeros to hold the result
#print ("a: \n" + str(a))
tic = time.time()
for i in range(1000000):
if a[0,i]>0.5:
result1[0,i] = 1
else:
result1[0,i] = 0
#print ("result: \n" +str(result))
toc = time.time()
print ("time passed for result1: " + str(toc-tic) + "ms")
tic = time.time()
result2 = (np.where(a>0.5,1,0))
#print ("result2: \n" +str(result2))
toc = time.time()
print ("time passed for result2: " + str(toc-tic) + "ms")
assert((result1 == result2).all)
Output
time passed for result1: 1.2310867309570312ms
time passed for result2: 0.014905214309692383ms
With np.where is pretty huge compared to using for loops. You can head over to the documentation and read more examples with ease.
Good luck!!!