Slide 1

Slide 1 text

인공지능팩토리 김태영 대표이사 인공지능 모델을 데이터셋에 맞게 대량으로 찍어내는 방법 (only 파이썬)

Slide 2

Slide 2 text

문제정의 ● 3천대 서버의 인공지능 기반 이상징후 탐지 모델을 만들어라! ● 데이터의 입력 형식과 라벨 형식은 동일 ● 데이터 분포는 3천대 서버마다 다름 (유사한 것도 있으나)

Slide 3

Slide 3 text

해결방안 ● 3천대 서버의 모든 데이터셋을 학습시킨 글로벌 모델을 생성 ● 각 서버마다 개별 이상징후 탐지 모델을 생성 즉 3천대의 인공지능 모델 생성

Slide 4

Slide 4 text

해결방안 ● 3천대 서버의 모든 데이터셋을 학습시킨 글로벌 모델을 생성 ○ 장점: 하나의 모델로 모든 서버에 대한 추론 가능 ○ 단점: 서버마다 패턴 및 데이터 분포가 다른 경우 높은 성능을 가진 글로벌 모델 생성이 쉽지 않음 ● 각 서버마다 개별 이상징후 탐지 모델을 생성 즉 3천대의 인공지능 모델 생성

Slide 5

Slide 5 text

해결방안 ● 3천대 서버의 모든 데이터셋을 학습시킨 글로벌 모델을 생성 ○ 장점: 하나의 모델로 모든 서버에 대한 추론 가능 ○ 단점: 서버마다 패턴 및 데이터 분포가 다른 경우 높은 성능을 가진 글로벌 모델 생성이 쉽지 않음 ● 각 서버마다 개별 이상징후 탐지 모델을 생성 즉 3천대의 인공지능 모델 생성 ○ 장점: 각 서버에 맞는 개인화된 모델을 생성할 수 있음 ○ 단점: 개별 서버마다 데이터셋 구축 및 모델 개발이 필요함

Slide 6

Slide 6 text

해결방안 ● 3천대 서버의 모든 데이터셋을 학습시킨 글로벌 모델을 생성 ○ 장점: 하나의 모델로 모든 서버에 대한 추론 가능 ○ 단점: 서버마다 패턴 및 데이터 분포가 다른 경우 높은 성능을 가진 글로벌 모델 생성이 쉽지 않음 ● 각 서버마다 개별 이상징후 탐지 모델을 생성 즉 3천대의 인공지능 모델 생성 ○ 장점: 각 서버에 맞는 개인화된 모델을 생성할 수 있음 ○ 단점: 개별 서버마다 데이터셋 구축 및 모델 개발이 필요함 >> mlops로 해보자

Slide 7

Slide 7 text

개념 traingen worker wo testgen worker wo predictgen worker wo train worker wo test worker wo predict worker wo train workflow test workflow predict workflow Today Talk!

Slide 8

Slide 8 text

데이터셋 구성

Slide 9

Slide 9 text

데이터셋 구성 MNIST Fashion-MNIST

Slide 10

Slide 10 text

데이터셋 구성 MNIST Fashion-MNIST mixNIST

Slide 11

Slide 11 text

데이터셋 asset ● asset ○ train ○ test ○ real ○ arch ○ model ○ score

Slide 12

Slide 12 text

ID 체계 ● id ○ task ○ train ○ test ○ arch ○ model

Slide 13

Slide 13 text

작업요청서(workorder) 와 작업자(worker) ● wo 구성 ○ wait ○ running ○ success ○ fail

Slide 14

Slide 14 text

작업요청서(workorder) 와 작업자(worker) ● wo - worker ○ traingen traingen_worker.py ○ testgen testgen_worker.py ○ predictgen predictgen_worker.py ○ train train_worker.py ○ test test_worker.py ○ predict predict_worker.py

Slide 15

Slide 15 text

● traingen 데이터셋 작업 TRD000001_x.npy TRD000001_y.npy datagen_train.py traingen_worker.py traingen.py traingen_TSK000001_ TRD000001.wo ? (arg)

Slide 16

Slide 16 text

1세대 - 모델 하나 만들어보자 ● 모델 개발 프로세스 train dataset test dataset architecture train architecture test model score architecture predict in out

Slide 17

Slide 17 text

1세대 - 모델 하나 만들어보자 ● 모델 개발 프로세스 train dataset test dataset architecture train architecture test model score architecture predict in out

Slide 18

Slide 18 text

1세대 - 모델 하나 만들어보자 ● 모델 개발 프로세스 train dataset test dataset architecture train architecture test model score architecture predict in out

Slide 19

Slide 19 text

1세대 - 모델 하나 만들어보자 ● 모델 개발 프로세스 TRD000001_x.npy TRD000001_y.npy TST000001_x.npy TST000001_y.npy ARC000001_train.py ARC000001_test.py MDL000001.h5 SCR000001.txt ARC000001_predict.py PRD000001_x.npy PRD000001_yhat.npy

Slide 20

Slide 20 text

1세대 - 모델 하나 만들어보자 ● Architecture (asset) ○ ARC000001_train.py ○ ARC000001_test.py ○ ARC000001_predict.py

Slide 21

Slide 21 text

● Architecture (asset) ○ ARC000001_train.py 1세대 - 모델 하나 만들어보자 x_train = np.load(train_x_fn) y_train = np.load(train_y_fn) x_train = x_train.reshape(60000, 784).astype('float32') / 255.0 y_train = utils.to_categorical(y_train) model = models.Sequential() model.add(layers.Dense(64, input_dim=28*28, activation='relu')) model.add(layers.Dense(32, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) callbacks = [ keras.callbacks.ModelCheckpoint( model_fn, save_best_only=True, monitor="val_loss" ) ] hist = model.fit(x_train, y_train, validation_split=0.2, epochs=2, batch_size=32, callbacks=callbacks)

Slide 22

Slide 22 text

● Architecture (asset) ○ ARC000001_test.py 1세대 - 모델 하나 만들어보자 x_test = np.load(test_x_fn) y_test = np.load(test_y_fn) x_test = x_test.reshape(10000, 784).astype('float32') / 255.0 y_test = utils.to_categorical(y_test) model = keras.models.load_model(model_fn) test_loss, test_acc = model.evaluate(x_test, y_test) print("Test accuracy", test_acc) print("Test loss", test_loss) f = open(score_fn, 'wt') f.write(str(test_acc)) f.close() print("output: {0} {1}".format(score_fn, test_acc))

Slide 23

Slide 23 text

● Architecture (asset) ○ ARC000001_predict.py 1세대 - 모델 하나 만들어보자 x_predict = np.load(predict_fn) print('x_predict shape : ' + str(x_predict.shape)) x_predict = x_predict.reshape(100, 784).astype('float32') / 255.0 # model model = keras.models.load_model(model_fn) y_predict = model.predict(x_predict) # report y_predict_class = np.argmax(y_predict, axis=1) np.savetxt(out_fn, y_predict) print("output: {0} {1}".format(out_fn, y_predict_class))

Slide 24

Slide 24 text

작업요청서(workorder) 와 작업자(worker) ● wo - worker ○ traingen traingen_worker.py ○ testgen testgen_worker.py ○ predictgen predictgen_worker.py ○ train train_worker.py ○ test test_worker.py ○ predict predict_worker.py

Slide 25

Slide 25 text

학습/시험/추론 작업 TSK000001_model.csv train_record.py train_worker.py train.py train_TSK000001_ARC 000001_TRD000001_ MDL000001 ● train TRD000001_x.npy TRD000001_y.npy ARC000001_train.py MDL000001.h5

Slide 26

Slide 26 text

학습/시험/추론 작업 TSK000001_model.csv test_record.py test_worker.py test.py test_TSK000001_ARC00 0001_MDL000001_TST0 00001_SCR000001.wo ● test TST000001_x.npy TST000001_y.npy ARC000001_test.py SCR000001.txt MDL000001.h5

Slide 27

Slide 27 text

학습/시험/추론 작업 TSK000001_predict.csv predict_record.py predict_worker.py predict.py predcit_TSK000001_ARC 000001_MDL000001_PR D000001.wo ● predict PRD000001_x.npy ARC000001_predict.py PRD000001_yhat.npy MDL000001.h5

Slide 28

Slide 28 text

시나리오를 짜보자 traingen worker wo testgen worker wo predictgen worker wo train worker wo test worker wo predict worker wo train workflow test workflow predict workflow

Slide 29

Slide 29 text

데이터셋 구성 MNIST Fashion-MNIST mixNIST ● 100개의 mixNIST 준비

Slide 30

Slide 30 text

시나리오 ● 100개의 태스크 ● 100개의 시험셋 생성 >> testgen_wo ○ testgen_TSK000001_TST000001.wo ○ ... ○ testgen_TSK000100_TST000100.wo ● 모델 1개로 100개의 태스크 평가 >> test_wo ○ test_TSK000001_ARC000001_MDL000001_TST000001_SCR000001.wo ○ … ○ test_TSK000100_ARC000001_MDL000001_TST000100_SCR000100.wo

Slide 31

Slide 31 text

시나리오 ● 1단계 - 100개 중 10개 태스크의 MDL000001에 대한 스코어 확인 >> 60%이상(3/10) ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 ○ TSK000004_score 0.580299973487854 ○ TSK000005_score 0.29809999465942383 ○ TSK000006_score 0.3885999917984009 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 ○ TSK000009_score 0.5809999704360962 ○ TSK000010_score 0.5033000111579895

Slide 32

Slide 32 text

시나리오 ● 2단계 - 실패 태스크 중 한 개를 재학습 시킴 ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 ○ TSK000004_score 0.580299973487854 ○ TSK000005_score 0.29809999465942383 ○ TSK000006_score 0.3885999917984009 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 ○ TSK000009_score 0.5809999704360962 ○ TSK000010_score 0.5033000111579895

Slide 33

Slide 33 text

시나리오 ● 2단계 - 실패 태스크 중 한 개(TSK000004)를 재학습 시킴 ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 ■ traingen_TSK000003_TRD000003.wo ■ train_TSK000003_ARC000001_TRD000003_MDL000002.wo ● MDL000002.h5 생성

Slide 34

Slide 34 text

시나리오 ● 3단계 - 신규 모델로 실패 태스크들 평가 >> 60%이상(6/10) ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 ○ TSK000006_score 0.3885999917984009 0.48890000581741333 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 0.26409998536109924 ○ TSK000009_score 0.5809999704360962 0.6718000173568726 ○ TSK000010_score 0.5033000111579895 0.7215999960899353

Slide 35

Slide 35 text

시나리오 ● 4단계 - 실패 태스크 중 한 개(TSK000004)를 재학습 시킴 ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 ■ traingen_TSK000004_TRD000004.wo ■ train_TSK000004_ARC000001_TRD000004_MDL000003.wo ● MDL000003.h5 생성

Slide 36

Slide 36 text

시나리오 ● 5단계 - 신규 모델로 실패 태스크들 평가 >> 60%이상(7/10) ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 0.95169997215271 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 0.4551999866962433 ○ TSK000006_score 0.3885999917984009 0.48890000581741333 0.1986999958753585 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 0.26409998536109924 0.394899994134902 ○ TSK000009_score 0.5809999704360962 0.6718000173568726 ○ TSK000010_score 0.5033000111579895 0.7215999960899353

Slide 37

Slide 37 text

시나리오 ● 6단계 - 실패 태스크 중 한 개(TSK000005)를 재학습 시킴 ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 0.95169997215271 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 0.4551999866962433 ■ traingen_TSK000005_TRD000005.wo ■ train_TSK000005_ARC000001_TRD000005_MDL000004.wo ● MDL000004.h5 생성

Slide 38

Slide 38 text

시나리오 ● 7단계 - 신규 모델로 실패 태스크들 평가 >> 60%이상(9/10) ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 0.95169997215271 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 0.4551999866962433 0.9332000017166138 ○ TSK000006_score 0.3885999917984009 0.48890000581741333 0.1986999958753585 0.6723999977111816 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 0.26409998536109924 0.394899994134902 0.09929999709129333 ○ TSK000009_score 0.5809999704360962 0.6718000173568726 ○ TSK000010_score 0.5033000111579895 0.7215999960899353

Slide 39

Slide 39 text

시나리오 ● 8단계 - 실패 태스크 중 한 개(TSK000008)를 재학습 시킴 ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 0.95169997215271 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 0.4551999866962433 0.9332000017166138 ○ TSK000006_score 0.3885999917984009 0.48890000581741333 0.1986999958753585 0.6723999977111816 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 0.26409998536109924 0.394899994134902 0.09929999709129333 ○ TSK000009_score 0.5809999704360962 0.6718000173568726 ○ TSK000010_score 0.5033000111579895 0.7215999960899353 ■ traingen_TSK000008_TRD000008.wo ■ train_TSK000008_ARC000001_TRD000008_MDL000005.wo ● MDL000005.h5 생성

Slide 40

Slide 40 text

시나리오 ● 9단계 - 신규 모델로 실패 태스크들 평가 >> 60%이상(10/10) ○ TSK000001_score 0.9731000065803528 ○ TSK000002_score 0.7840999960899353 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 ○ TSK000004_score 0.580299973487854 0.46950000524520874 0.95169997215271 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 0.4551999866962433 0.9332000017166138 ○ TSK000006_score 0.3885999917984009 0.48890000581741333 0.1986999958753585 0.6723999977111816 ○ TSK000007_score 0.6876000165939331 ○ TSK000008_score 0.5837000012397766 0.26409998536109924 0.394899994134902 0.09929999709129333 0.916700005531311 ○ TSK000009_score 0.5809999704360962 0.6718000173568726 ○ TSK000010_score 0.5033000111579895 0.7215999960899353

Slide 41

Slide 41 text

시나리오 ● 정리 ○ 시험셋 생성 10개 100% ○ 훈련셋 생성 5개 50% ○ 훈련 횟수 5번 50% ○ 모델 갯수 5개 50% ○ 아키텍처 갯수 1개 10%

Slide 42

Slide 42 text

두번째 시나리오 - 새로운 아키텍처! ● ARC000001 ○ model = models.Sequential() ○ model.add(layers.Dense(64, input_dim=28*28, activation='relu')) ○ model.add(layers.Dense(32, activation='relu')) ○ model.add(layers.Dense(10, activation='softmax')) ● ARC000002 ○ model = models.Sequential() ○ model.add(layers.Conv2D(32, (5, 5), padding='valid', input_shape=(28, 28, 1), activation='relu')) ○ model.add(layers.MaxPooling2D(pool_size=(2, 2))) ○ model.add(layers.Conv2D(64, (3, 3), activation="relu")) ○ model.add(layers.MaxPooling2D(pool_size=(2, 2))) ○ model.add(layers.Flatten()) ○ model.add(layers.Dropout(0.5)) ○ model.add(layers.Dense(10, activation="softmax"))

Slide 43

Slide 43 text

두번째 시나리오 - 새로운 아키텍처! ● ARC000001 ○ model = models.Sequential() ○ model.add(layers.Dense(64, input_dim=28*28, activation='relu')) ○ model.add(layers.Dense(32, activation='relu')) ○ model.add(layers.Dense(10, activation='softmax')) ● ARC000002 ○ model = models.Sequential() ○ model.add(layers.Conv2D(32, (5, 5), padding='valid', input_shape=(28, 28, 1), activation='relu')) ○ model.add(layers.MaxPooling2D(pool_size=(2, 2))) ○ model.add(layers.Conv2D(64, (3, 3), activation="relu")) ○ model.add(layers.MaxPooling2D(pool_size=(2, 2))) ○ model.add(layers.Flatten()) ○ model.add(layers.Dropout(0.5)) ○ model.add(layers.Dense(10, activation="softmax"))

Slide 44

Slide 44 text

두번째 시나리오 - 새로운 아키텍처! ● train_TSK000001_ARC000001_TRD000001_MDL000001.wo ● train_TSK000003_ARC000001_TRD000003_MDL000002.wo ● train_TSK000004_ARC000001_TRD000004_MDL000003.wo ● train_TSK000005_ARC000001_TRD000005_MDL000004.wo ● train_TSK000008_ARC000001_TRD000008_MDL000005.wo ● train_TSK000001_ARC000002_TRD000001_MDL000006.wo

Slide 45

Slide 45 text

두번째 시나리오 - 새로운 아키텍처! ● 9단계 - 신규 모델로 실패 태스크들 평가 >> 60%이상(10/10) ○ TSK000001_score 0.9731000065803528 0.9897000193595886 ○ TSK000002_score 0.7840999960899353 0.795799970626831 ○ TSK000003_score 0.2980000078678131 0.9093999862670898 0.31189998984336853 ○ TSK000004_score 0.580299973487854 0.46950000524520874 0.95169997215271 0.5945000052452087 ○ TSK000005_score 0.29809999465942383 0.5546000003814697 0.4551999866962433 0.9332000017166138 0.31310001015663147 ○ TSK000006_score 0.3885999917984009 0.48890000581741333 0.1986999958753585 0.6723999977111816 0.40799999237060547 ○ TSK000007_score 0.6876000165939331 0.7035999894142151 ○ TSK000008_score 0.5837000012397766 0.26409998536109924 0.394899994134902 0.09929999709129333 0.916700005531311 0.5935999751091003 ○ TSK000009_score 0.5809999704360962 0.6718000173568726 0.5943999886512756 ○ TSK000010_score 0.5033000111579895 0.7215999960899353 0.5135999917984009

Slide 46

Slide 46 text

케라스 튜너

Slide 47

Slide 47 text

케라스 튜너

Slide 48

Slide 48 text

오토케라스

Slide 49

Slide 49 text

오토케라스

Slide 50

Slide 50 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1

Slide 51

Slide 51 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3

Slide 52

Slide 52 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2

Slide 53

Slide 53 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2 predict 4 out 4 predict 5 out 5 predict 6 out 6

Slide 54

Slide 54 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2 arch 2 model 3 score 3

Slide 55

Slide 55 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2 arch 2 model 3 task 2 test 2 score 3

Slide 56

Slide 56 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2 arch 2 model 3 task 2 test 2 score 3 task 3 test 3 score 4

Slide 57

Slide 57 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2 arch 2 model 3 task 2 test 2 score 3 task 3 train 3 model 4 test 3 score 4

Slide 58

Slide 58 text

task 1 arch 1 train 1 model 1 test 1 score 1 predict 1 out 1 predict 2 out 2 predict 3 out 3 train 2 model 2 score 2 arch 2 model 3 task 2 test 2 score 3 task 3 arch 3 train 3 model 4 test 3 score 4

Slide 59

Slide 59 text

다양한 시나리오는 workflow에서~ traingen worker wo testgen worker wo predictgen worker wo train worker wo test worker wo predict worker wo train workflow test workflow predict workflow

Slide 60

Slide 60 text

정리 ● 추론 데이터 입력 시 ● 시험셋 갱신 시 ● 훈련셋 갱신 시 ● 새로운 아키텍처 추가 시 >> ● work order 만 추가하자! ● work order 관리 정책은 work flow로 정의하자