Slide 13
Slide 13 text
ࢀߟ$16ͷΈͰܭࢉͨ͠ྫ
܇࿅σʔλ
oniak3@AkiranoiMac py % python3 trainsample1.py
load_dataset('yelp_review_full'):
AutoTokenizer.from_pretrained('bert-base-cased'):
dataset.map(tokenize_function, batched=True):
Map: 100%|██████████████████████████| 650000/650000 [02:27<00:00, 4395.82 examples/s]
Map: 100%|████████████████████████████| 50000/50000 [00:11<00:00, 4389.09 examples/s]
tokenized_datasets['train'].select(range(1000)):
tokenized_datasets['test'].select(range(1000)):
AutoModelForSequenceClassification.from_pretrained('bert-base-cased', num_labels=5):
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized:
['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid
deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
TrainingArguments(output_dir='test_trainer', evaluation_strategy='epoch'):
evaluate.load('accuracy'):
Trainer(model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics):
trainer.train():
{'eval_loss': 1.4513660669326782, 'eval_accuracy': 0.399, 'eval_runtime': 928.2999, 'eval_samples_per_second': 1.077,
'eval_steps_per_second': 0.135, 'epoch': 1.0}
{'eval_loss': 1.0377055406570435, 'eval_accuracy': 0.55, 'eval_runtime': 925.9615, 'eval_samples_per_second': 1.08,
'eval_steps_per_second': 0.135, 'epoch': 2.0}
79%|██████████████████████████████████▉ | 298/375 [2:30:31<31:06, 24.24s/it]
{'eval_loss': 1.0231441259384155, 'eval_accuracy': 0.592, 'eval_runtime': 922.4306, 'eval_samples_per_second': 1.084,
'eval_steps_per_second': 0.136, 'epoch': 3.0}
{'train_runtime': 11808.8493, 'train_samples_per_second': 0.254, 'train_steps_per_second': 0.032, 'train_loss': 1.072725830078125, 'epoch':
3.0}
100%|████████████████████████████████████████████| 375/375 [3:16:48<00:00, 31.49s/it]
oniak3@AkiranoiMac py %