Python以Classmethod建構多形

May 19, 2023, 8:03 p.m.
程式語言

假設有一批物件彼此之間互相呼叫與工作時,負責「生產他們的函數」的通用性會成為一個問題!如果有新的物件出現,就必須要針對新的物件重新實現「產生物件的函式」。Python的@classmethod修飾符剛好可以成為解決這個問題的工具。

以Map Reduce為範例

資料來源類別(InputData)

class InputData(object):
    def read(self):
        raise NotImplementedError

class PathInputData(InputData):
    def __init__(self, path):
        super().__init__()
        self.path = path

    def read(self):
        return open(self.path, 'r', encoding='utf-8').read()

  • Path Input Data實作了針對「檔案路徑」的read方法

Worker函數(實現Map Reduce的物件)

class Worker(object):
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self):
        raise NotImplementedError

class LineCountWorker(Worker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        try:
            self.result += other.result
        except Exception as e:
            print(e)
  • LineCountWorker具體的實現了:
    • map:要怎麼處理InputData類別的資料
    • reduce:生成對應的結果

建構InputData與Worker物件,協調MapReduce流程的函數們

def generate_input(data_dir):
    for name in os.listdir(data_dir):
        yield PathInputData(os.path.join(data_dir, name))

def create_workers(input_list):
    workers = []
    for input_data in input_list:
        workers.append(LineCountWorker(input_data))
    return workers

def execute(workers):
    threads = [Thread(target=w.map) for w in workers]

    for thread in threads: thread.start()
    for therad in threads: thread.join()

    first, rest = workers[0], workers[1:]

    for worker in rest:
        first.reduce(worker)

    return first.result

  • generate_input:從data_dir建構一系列的PathInputData物件
  • create_workers:從上述的InputData列表中,建立一系列的Worker們
  • execute:執行Worker列表內的Worker

Map Reduce的執行函數

def map_reduce(data_dir):
    inputs = generate_input(data_dir)
    workers = create_workers(inputs)
    return execute(workers)

Map Reduce執行函數不夠通用

從最後的程式碼可以看到,generate_input跟create_workers都是定死了PathInputDataLineCountWorker兩種類別。

稍微修改InputData類別

class GenericInputData(object):
    def read(self):
        raise NotImplementedError

    @classmethod
    def generate_inputs(cls, config):
        raise NotImplementedError

class  PathInputData(GenericInputData):
    # ...
    def read(self):
        # ...
        pass

    @classmethod
    def generate_input(cls, config):
        data_dir = config['data_dir']
        for name in os.list_dir(data_dir):
            yield cls(os.path.join(data_dir, name))
  • 把剛剛的generate_input丟進去InputData類別裡面
  • GenericInputDatagenerate_inputs接受一個config參數,這個config參數要怎麼使用,則是交給後面的子類別來決定。以PathInputdata為例子,config參數用來設定要讀取的檔案路徑

以及Worker類別

class GenericWorker(object):
    # ...
    # def map ...
    # ...

    # def reduce ...

    @classmethod
    create_workers(cls, input_class, config):
        workers = []
        for input_data in input_class.generate_inputs(config):
            workers.append(cls(input_data))
        return workers

class LineCountWorker(GenericWorker):
    # ...

改進的Map Reduce函數

def mapreduce(worker_class, input_class, config):
    workers = worker_class.create_workers(input_class, config)
    return execute(workers)
  • map_reduce就不用去管worker_class,跟input_class是誰了(之前是寫死在沒改進的genereate_inputcreate_worker裡面
  • 我們接下來只要一直去實作新的GenericInputDataGenericWorker即可
  • 後面的execute跟map reduce的流程都不用管了

Summary

未來遇到這種需要重構「建構新Object」的問題,都可以講這些任務改由基礎類別負責,並使用[[@classmethod]]來修飾這些生成函數。

Tags:

Object-Oriented
Python