Slide 1

Slide 1 text

ػցֶशͱVision.framework Shota Nakagami / @shtnkgm 2017/8/29

Slide 2

Slide 2 text

࿩͢಺༰ — Vision.frameworkͷجຊతͳઆ໌ — ػցֶशͷ֓ཁ — VisionΛ༻͍ͨΧϝϥը૾Λ൑ผ͢ΔαϯϓϧΞϓϦ

Slide 3

Slide 3 text

Vision.frameworkͱ͸ — iOS11͔Β௥Ճ͞Εͨը૾ೝࣝAPIΛఏڙ͢ΔϑϨʔϜϫʔ Ϋ — ಉ͘͡iOS11͔Β௥Ճ͞ΕͨػցֶशϑϨʔϜϫʔΫͷCore MLΛந৅Խ

Slide 4

Slide 4 text

ػցֶशελοΫ

Slide 5

Slide 5 text

χϡʔϥϧωοτϫʔΫͱ͸ — ػցֶशख๏ͷҰछ — ਓؒͷ೴ͷਆܦճ࿏໢Λ ਺ࣜϞσϧͰදͨ͠΋ͷ — NNͱུ͞ΕΔ ʢDNN1ɺRNN2ɺCNN3ͳͲʣ 3 Convolutional Neural Networkʢ৞ΈࠐΈχϡʔϥϧωο τϫʔΫʣ 2 Recurrent Neural Networkʢ࠶ؼܕχϡʔϥϧωοτϫ ʔΫʣ 1 Deep Neural NetworkʢσΟʔϓχϡʔϥϧωοτϫʔ Ϋʣ

Slide 6

Slide 6 text

VisionͰೝࣝͰ͖Δ΋ͷ

Slide 7

Slide 7 text

VisionͰೝࣝͰ͖Δ΋ͷᶃ — إݕग़ / Face Detection and Recognition — όʔίʔυݕग़ / Barcode Detection — ը૾ͷҐஔ߹Θͤ / Image Alignment Analysis — ςΩετݕग़ / Text Detection — ਫฏઢݕग़ / Horizon Detection

Slide 8

Slide 8 text

VisionͰೝࣝͰ͖Δ΋ͷᶄ ػցֶशϞσϧͷ༻ҙ͕ඞཁͳ΋ͷ — ΦϒδΣΫτݕग़ͱτϥοΩϯά / Object Detection and Tracking — ػցֶशʹΑΔը૾෼ੳ / Machine Learning Image Analysis

Slide 9

Slide 9 text

Χϝϥը૾Λ൑ผ͢ΔαϯϓϧΞ ϓϦΛͭ͘Δ

Slide 10

Slide 10 text

αϯϓϧΞϓϦ֓ཁ — VisionͷʮػցֶशʹΑΔը૾෼ੳʯػೳΛར༻ — ΧϝϥͰөͨ͠ը૾Λ൑ผ͠ɺϞϊͷ໊લΛग़ྗ

Slide 11

Slide 11 text

ػցֶशʹΑΔը૾ೝࣝͷྲྀΕ 1. ֶशͷͨΊը૾σʔλΛऩूʢڭࡐΛूΊΔʣ 2. ֶश༻σʔλ͔ΒɺػցֶशΞϧΰϦζϜʹΑΓϞσϧΛ࡞ ੒ ※Ϟσϧɾɾɾ౴͑Λग़ͯ͘͠ΕΔϩδοΫ ෼ྨɿ͜ͷը૾͸ݘʁೣʁ ճؼɿ਺஋༧ଌʢ໌೔ͷגՁ͸ʁʣ 3. ֶशࡁΈϞσϧΛ༻͍ͯະ஌ͷը૾Λ൑ผʢ࣮ફʣ

Slide 12

Slide 12 text

Ϟσϧ࡞੒͸ׂѪ — ֶशσʔλͷऩूɾ੔ܗ͸ׂΓͱେม — ͦΕͳΓͷϚγϯεϖοΫɺܭࢉ͕࣌ؒඞཁ — ػցֶशʹؔ͢Δ஌͕ࣝඞཁ

Slide 13

Slide 13 text

Ϟσϧͷ༻ҙ ؆୯ͷͨΊɺֶशࡁΈϞσϧΛར༻ AppleͷαΠτͰ഑෍͞Ε͍ͯΔʢ.mlmodelܗࣜʣ https://developer.apple.com/machine-learning/

Slide 14

Slide 14 text

഑෍ϞσϧҰཡ ϞσϧʹΑͬͯಘҙͳը૾ͷछྨ΍༰ྔ͕ҟͳΔ ʢ5MBʙ553.5MBʣ — MobileNets — SqueezeNet — Places205-GoogLeNet — ResNet50 — Inception v3 — VGG16

Slide 15

Slide 15 text

ࠓճ͸ResNet50Λར༻ — थ໦ɺಈ෺ɺ৯෺ɺ৐Γ෺ɺਓͳͲͷ1000छྨͷΧςΰϦ — αΠζ͸102.6 MB — MITϥΠηϯε

Slide 16

Slide 16 text

ϞσϧΛϓϩδΣΫτʹ૊ࠐΉ

Slide 17

Slide 17 text

Xcodeʹυϥοά&υϩοϓ

Slide 18

Slide 18 text

ϞσϧΫϥε͕ࣗಈੜ੒͞ΕΔ ࣗಈͰϞσϧ໊.swiftͱ͍͏໊લͰϞσϧΫϥε͕࡞੒͞ΕΔ ྫ) Resnet50.swiftʢҰ෦ൈਮʣ

Slide 19

Slide 19 text

Χϝϥը૾ͷΩϟϓνϟॲཧ

Slide 20

Slide 20 text

private func startCapture() { let captureSession = AVCaptureSession() captureSession.sessionPreset = AVCaptureSessionPresetPhoto // ೖྗͷࢦఆ let captureDevice = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeVideo) guard let input = try? AVCaptureDeviceInput(device: captureDevice) else { return } guard captureSession.canAddInput(input) else { return } captureSession.addInput(input) // ग़ྗͷࢦఆ let output: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput() output.setSampleBufferDelegate(self, queue: DispatchQueue(label: "VideoQueue")) guard captureSession.canAddOutput(output) else { return } captureSession.addOutput(output) // ϓϨϏϡʔͷࢦఆ guard let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession) else { return } previewLayer.videoGravity = AVLayerVideoGravityResizeAspectFill previewLayer.frame = view.bounds view.layer.insertSublayer(previewLayer, at: 0) // Ωϟϓνϟ։࢝ captureSession.startRunning() }

Slide 21

Slide 21 text

ࡱӨϑϨʔϜຖʹݺ͹ΕΔDeleate extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput(_ output: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) { // CMSampleBufferΛCVPixelBufferʹม׵ guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } // ͜ͷதʹVision.frameworkͷॲཧΛॻ͍͍ͯ͘ʢը૾ೝࣝ෦෼ʣ } }

Slide 22

Slide 22 text

ը૾ೝࣝ෦෼ͷॲཧ

Slide 23

Slide 23 text

VisionͰར༻͢ΔओͳΫϥε — VNCoreMLModel — VNCoreMLRequest — VNImageRequestHandler — VNObservation

Slide 24

Slide 24 text

VNCoreMLModel — CoreMLͷϞσϧΛVisionͰѻ͏ͨΊͷίϯςφΫϥε

Slide 25

Slide 25 text

VNCoreMLRequest — CoreMLʹը૾ೝࣝΛཁٻ͢ΔͨΊͷΫϥε — ೝࣝ݁Ռ͸Ϟσϧͷग़ྗܗࣜʹΑΓܾ·Δ — ը૾→Ϋϥεʢ෼ྨ݁Ռʣ — ը૾→ಛ௃ྔ — ը૾→ը૾

Slide 26

Slide 26 text

VNImageRequestHandler — Ұͭͷը૾ʹର͠ɺҰͭҎ্ͷը૾ೝࣝॲཧ ʢVNCoreMLRequestʣΛ࣮ߦ͢ΔͨΊͷΫϥε — ॳظԽ࣌ʹೝࣝର৅ͷը૾ܗࣜΛࢦఆ͢Δ — CVPixelBuffer — CIImage — CGImage

Slide 27

Slide 27 text

VNObservation — ը૾ೝࣝ݁Ռͷந৅Ϋϥε — ݁Ռͱͯ͜͠ͷΫϥεͷαϒΫϥεͷ͍ͣΕ͔͕ฦ͞ΕΔ — ೝࣝͷ֬৴౓Λද͢confidenceϓϩύςΟΛ࣋ͭ ʢVNConfidence=FloatͷΤΠϦΞεʣ

Slide 28

Slide 28 text

VNObservationαϒΫϥε — VNClassificationObservation ෼ྨ໊ͱͯ͠identifierϓϩύςΟΛ࣋ͭ — VNCoreMLFeatureValueObservation ಛ௃ྔσʔλͱͯ͠featureValueϓϩύςΟΛ࣋ͭ — VNPixelBufferObservation ը૾σʔλͱͯ͠pixelBufferϓϩύςΟΛ࣋ͭ

Slide 29

Slide 29 text

·ͱΊΔͱ… — VNCoreMLModelʢ૊ΈࠐΜͩϞσϧʣ — VNCoreMLRequestʢը૾ೝࣝͷϦΫΤετʣ — VNImageRequestHandlerʢϦΫΤετͷ࣮ߦʣ — VNObservationʢೝࣝ݁Ռʣ

Slide 30

Slide 30 text

۩ମతͳ࣮૷ίʔυ

Slide 31

Slide 31 text

ϞσϧΫϥεͷॳظԽ // CoreMLͷϞσϧΫϥεͷॳظԽ guard let model = try? VNCoreMLModel(for: Resnet50().model) else { return }

Slide 32

Slide 32 text

ը૾ೝࣝϦΫΤετΛ࡞੒ // ը૾ೝࣝϦΫΤετΛ࡞੒ʢҾ਺͸Ϟσϧͱϋϯυϥʣ let request = VNCoreMLRequest(model: model) { [weak self] (request: VNRequest, error: Error?) in guard let results = request.results as? [VNClassificationObservation] else { return } // ൑ผ݁Ռͱͦͷ֬৴౓Λ্Ґ3݅·Ͱදࣔ // identifier͸ΧϯϚ۠੾ΓͰෳ਺ॻ͔Ε͍ͯΔ͜ͱ͕͋ΔͷͰɺ࠷ॳͷ୯ޠͷΈऔಘ͢Δ let displayText = results.prefix(3) .flatMap { "\(Int($0.confidence * 100))% \($0.identifier.components(separatedBy: ", ")[0])" } .joined(separator: "\n") DispatchQueue.main.async { self?.textView.text = displayText } }

Slide 33

Slide 33 text

ը૾ೝࣝϦΫΤετΛ࣮ߦ // CVPixelBufferʹର͠ɺը૾ೝࣝϦΫΤετΛ࣮ߦ try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])

Slide 34

Slide 34 text

ը૾ೝࣝ෦෼ͷ׬੒ܗ guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } guard let model = try? VNCoreMLModel(for: Resnet50().model) else { return } let request = VNCoreMLRequest(model: model) { [weak self] (request: VNRequest, error: Error?) in guard let results = request.results as? [VNClassificationObservation] else { return } let displayText = results.prefix(3) .flatMap { "\(Int($0.confidence * 100))% \($0.identifier.components(separatedBy: ", ")[0])" } .joined(separator: "\n") DispatchQueue.main.async { self?.textView.text = displayText } } try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])

Slide 35

Slide 35 text

σϞಈը

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

tabbyͬͯԿʁ

Slide 38

Slide 38 text

tabby = τϥωίʂ τϥωίͱ͸ɺτϥͷΑ͏ͳࣶ໛༷Λ࣋ͭωίͷ͜ͱͰ͋ΔɻλϏʔͱ΋ݺ͹ΕΔɻτϥೣ ͸ετϥΠϓͷଞʹɺࣶ໛్༷͕੾Εͯɺൗ໛༷ɺᤳᤶ൝ɺࡉ͔ࣶ͘໛༷Λ్੾Εͤͯͨ͞ ΋ͷ౳͕͋Γɺଟ༷Ͱ͋ΔɻʢҾ༻: ΢ΟΩϖσΟΞʣ

Slide 39

Slide 39 text

·ͱΊ

Slide 40

Slide 40 text

— ֶशࡁΈϞσϧ͕͋Ε͹ɺ࣮૷ࣗମ͸؆୯! — ωίͷछྨ΋ڭ͑ͯ͘ΕΔ" — ͋ͱ͸Ϟσϧ΋ࣗ෼Ͱ࡞ΕΔΑ͏ʹͳΕ͹΋ͬͱ෯͕޿͕ Δ

Slide 41

Slide 41 text

౰ॳ΍Γ͔ͨͬͨ͜ͱ — ΠϯελάϥϜ༻ͷࣗಈϋογϡλά෇͚ΞϓϦ — ϋογϡλάΛ౉͢ΩϟϓγϣϯAPI͸طʹഇࢭʘ(^o^)ʗ

Slide 42

Slide 42 text

αϯϓϧίʔυ ࠓճ͝঺հͨ͠αϯϓϧίʔυ͸ͪ͜Βʹஔ͍ͯ͋Γ·͢ɻ https://github.com/shtnkgm/VisionFrameworkSample ※εΫϦʔϯγϣοτͷެ։ʹ͸NDA஫ҙ

Slide 43

Slide 43 text

͓ΘΓ

Slide 44

Slide 44 text

ࢀߟࢿྉᶃ — Build more intelligent apps with machine learning. / Apple — Vision / Apple Developer Documentation — ʲWWDC2017ʳVision.framework ͷςΩετݕग़Λࢼ͠ ͯΈ·ͨ͠ʲiOS11ʳ — Keras + iOS11 CoreML + Vision Framework ʹΑΔɺ΋΋ ΫϩإࣝผΞϓϦͷ։ൃ — [Core ML] .mlmodel ϑΝΠϧΛ࡞੒͢Δ / ϑΣϯϦϧ

Slide 45

Slide 45 text

ࢀߟࢿྉᶄ — [iOS 11] CoreMLͰը૾ͷࣝผΛࢼͯ͠Έ·ͨ͠ ʢVision.FrameworkΛ࢖Θͳ͍ύλʔϯʣ #WWDC2017 — Places205-GoogLeNetͰ৔ॴͷ൑ఆ / fabo.io — iOSDCͷϦδΣΫτίϯͰʰiOSͱσΟʔϓϥʔχϯάʱʹ ͍ͭͯ࿩͠·ͨ͠Add Star — [iOS 10][χϡʔϥϧωοτϫʔΫ] OSSͰAccelerateʹ௥Ճ ͞ΕͨBNNSΛཧղ͢Δ ~XORฤ~