Slide 12
Slide 12 text
model_0 = Gemma2ForSequenceClassification.from_pretrained(
cfg.gemma_dir,
device_map = torch.device('cuda:0'),
use_cache = False,
)
model_1 = Gemma2ForSequenceClassification.from_pretrained(
cfg.gemma_dir,
device_map = torch.device('cuda:1'),
use_cache = False,
)
…
data = data.sort_values("length", ascending = False)
sub_1 = data.iloc[0::2].copy()
sub_2 = data.iloc[1::2].copy()
with ThreadPoolExecutor(max_workers=2) as executor:
results = executor.map(inference, (sub_1, sub_2), (model_0, model_1), (device_0, device_1))
Data Parallelism - 各GPUにモデルをロードして並列処理
GPU 0 にモデルをロード
GPU 1 にモデルをロード
データを2つに分割して
各GPUで並列処理