NumPy 陣列資料型態

在〈Matplotlib 散佈圖〉中，收集每個座標點時，並沒有使用 NumPy 的風格，而是直接使用迴圈：

n = 128

xs = []
ys = []
for y in range(n):
    for x in range(n):
        if x & y == 0:
            xs.append(x)
            ys.append(y)

若要改為使用 NumPy 實現，需要重新看待資料，實際上，以上的範例在做的動作，是將 range(n) 範圍內的數字過濾，分別收集，只不過同時寫在迴圈裡，若將之分開：

n = 128

xs = []
for y in range(n):
    for x in range(n):
        if x & y == 0:
            xs.append(x)

ys = []
for y in range(n):
    for x in range(n):
        if x & y == 0:
            ys.append(x)

其實也不必寫成巢狀迴圈：

n = 128

xs = []
for elem in range(n ** 2):
    x = elem // n
    y = elem % n
    if x & y == 0:
        xs.append(x)

ys = []
for elem in range(n):
    x = elem // n
    y = elem % n
    if x & y == 0:
        ys.append(x)

也就是說，可以先處理 range(n ** 2)，

n = 128

nums = []
for num in range(n ** 2):
    x = num // n
    y = num % n
    if x & y == 0:
        nums.append(num)

xs = [num // n for num in nums]
ys = [num % n for num in nums]

如果要改為 NumPy，必須要能對 np.arange(n ** 2) 進行過濾，在〈NumPy 陣列索引〉看過，NumPy 的陣列可以指定布林陣列來進行過濾，因此首先是將 np.arange(n ** 2) 轉換為布林陣列，因此先定義出轉換函式並將之向量化：

def quotientAndRemainderZero(elem, n):
    quotient = elem // n
    remainder = elem % n
    return quotient & remainder == 0

quotientAndRemainderZero = np.frompyfunc(quotientAndRemainderZero, 2, 1)

然而，如果撰寫以下的程式碼試圖進行過濾：

n = 128
nums = np.arange(n ** 2)
nums = nums[quotientAndRemainderZero(nums, n)]

執行後會出現以下的錯誤：

IndexError: arrays used as indices must be of integer (or boolean) type

嗯？quotientAndRemainderZero(nums, n) 不就是布林陣列嗎？嗯…形態上不是！

雖然 Python 本身是動態定型語言，然而 NumPy 為了效能，陣列的元素會定義型態，如果不指定，NumPy 會試著自動判別，例如：

>>> import numpy as np
>>> a1 = np.array([1, 2, 3])
>>> a1.dtype
dtype('int32')
>>> a2 = np.array([1.1, 2.1, 3.1])
>>> a2.dtype
dtype('float64')
>>> a3 = np.array([True, False])
>>> a3.dtype
dtype('bool')
>>>

資料型態是為了底層在運算時能更有效率，透過 array 建立陣列時，若必要也可以自行指定 dtype。

方才的問題在於，frompyfunc 文件談到，其向量化後的函式，執行後傳回的陣列，元素型態會是 PyObject，這是因為事先無法預測 frompyfunc 被指定的函式執行後會傳回什麼型態。

解決的方式之一是，明確將型態轉換為 bool：

n = 128
nums = np.arange(n ** 2)
nums = nums[quotientAndRemainderZero(nums, n).astype(np.bool)]

或者是透過 where，這個函式的作用其實是三元運算子的概念，例如：

>>> a = np.array([10, 20, 30])
>>> np.where(a > 10, a, a * 10)
array([100,  20,  30])
>>> np.where(a > 10)
(array([1, 2], dtype=int64),)
>>>

where 的第一個參數接受布林陣列，如果元素是 True，選擇第二個參數指定陣列對應位置的值，否則選擇第三個參數指定陣列對應位置的值，如果只指定第一個參數，傳回 True 元素的索引。

因此底下的指定，就會有過濾元素的效果：

n = 128
nums = np.arange(n ** 2)
nums = nums[np.where(quotientAndRemainderZero(nums, n))]

整合以上的觀念，就可以使用底下的程式碼來畫出謝爾賓斯基三角形：

import numpy as np
import matplotlib.pyplot as plt

def scatter_plot(title, plotsize, axislim, markersize, xs, ys):
    plt.title(title) 
    plt.gcf().set_size_inches(plotsize)
    plt.xlim(axislim)
    plt.ylim(axislim)
    plt.xlabel('x')           
    plt.ylabel('y')   
    plt.scatter(xs, ys, marker = 's', s = markersize ** 2)
    plt.show()

def sierpinski(n):
    def quotientAndRemainderZero(elem, n):
        quotient = elem // n
        remainder = elem % n
        return quotient & remainder == 0

    quotientAndRemainderZero = np.frompyfunc(quotientAndRemainderZero, 2, 1)

    nums = np.arange(n ** 2)
    nums = nums[np.where(quotientAndRemainderZero(nums, n))]
    return (nums % n, nums // n)


n = 128
x, y = sierpinski(n)

plotwidth = 6
axislim = (-0.5, n - 0.5)
plotsize = (plotwidth, plotwidth)
PTS_PER_INCH = 72
plotwidth_pts = PTS_PER_INCH * plotwidth
marksize = 0.775 * plotwidth_pts / n

scatter_plot(
    'Sierpinski triangle',
    plotsize,
    axislim,
    marksize,
    x, 
    y
)