Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OCR Survey by VIVEN Inc

VIVEN, Inc.
January 21, 2023

OCR Survey by VIVEN Inc

▼参考リンク
・Service | 株式会社 微分(VIVEN, Inc.)
 https://www.viven.co.jp/ja/service/

・Mission | 株式会社 微分(VIVEN, Inc.)
 https://www.viven.co.jp/ja/company/

もし、話を聞いてみたいと思った際には、メールまたは以下のSNSからご連絡ください。

Email, Twitter, Facebook, Instagram, Linkedin

【弊社HP 】
https://www.viven.inc

VIVEN, Inc.

January 21, 2023
Tweet

More Decks by VIVEN, Inc.

Other Decks in Science

Transcript

  1. 1 OCR Survey Tapas Dutta, Deep Learning Engineer

  2. n Summary Current OCR technologies miss genuine character or produce

    new character thus this work generates pixel-wise maps for character class, position and order in parallel with RNN for context modelling. n Related Works Cheng(2017) used characterʼs class and localization labels to adjust the attention positions. Bai(2018) used novel loss function to improve attention decoder. Lyu(2018), Liao(2019) used segmentation for OCR, however not effective for languages with closely spaced characters. n Proposed Methodology A CNN architecture is used for feature extraction. The extracted features are fed to class and geometry branches. The class branch consists of 2 stacked convolutions followed by soft normalization. The output feature maps(character segmentation maps) has dimension of h*w*c 2 TextScanner: Reading Characters in Order for Robust Scene Text Recognition Copyright © 2022 VIVEN Inc. All Rights Reserved (c = number of character + background). To produce the localization map sigmoid activation on the input. For order segmentation map, a small U-Net architecture is used, with GRU layers in the middle. After upsampling two convolution layers are used to generate feature maps of size h*w*N (N is sequence length). Order map such that kth character is indicated by its kth feature map is generated by multiplying order segmentation and character localization maps. The classification scores is obtained by multiplying the character segmentation maps and order segmentation maps n Result Different datasets was used to validate the effectiveness of the model such as IIIT(50,1K,0 lexicons), SVT(50, 0 lexicons), IC13, IC15, SVTP, CT achieving 99.8,99.5, 95.7, 99.4, 92.7, 94.9, 83.5, 84.8, 91.6% accuracies respectively. n Next must-read paper “Textscanner: Reading characters in order for robust scene text recognition”
  3. n Summary OCR for Persian language needs to address the

    difference in Persian language like right-to-left text, interpretation of semicolon, dot, oblique, etc. Increasing the number of layers/filters/kernel size for LSTM did not improve results. BiLSTM improved results when compared to LSTM Increasing the dimension of extracted vector improved results for BiLSTM and LSTM. Increasing the number of BiLSTM layers improved the performance n Related Works Khosravi (2006) achieved 99.02% and 98.8% accuracy using improved gradient and gradient histogram respectively for HODA dataset. Alizadehashraf (2017) achieved 97% accuracy using a small CNN architecture. Bonyani (2021) compared the performance of standard CNN architecture (DenseNet, ResNet, VGG) for recognizing persian text. LeCun(1998) used LeNet whose weights were optimized using firefly, ant colony, chimp, particle swarm optimization techniques with chimp optimization performing the best. Smith(2007) used CNN followed by LSTM achieving an accuracy of 93%. 3 Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory Copyright © 2022 VIVEN Inc. All Rights Reserved n Proposed MethodologyType equation here. The proposed algorithm consists of 3 modules for segmentation, feature extraction and recognition. For segmentation the height of images are normalized and slid windowing algorithm is used. For feature extraction a small CNN architecture of 1 convolution and 1 maxpooling is used. The recognition module uses 4 BiLSTM layers, the first two with tanh activation and the middle two with sigmoid activation and trained with connectionist temporal classification loss. n Result 20 pages of english and persian texts from different and random books was types in MS-Office. The images taken were height normalized. The metrics used for evaluation are: AllWrds: Number of words in text, InsWrds: Wrongly inserted words, DelWrds: Wrongly deleted words, SubWrds: Wrongly substituted words. The tesseract model achieved 91.73% for persian texts only and 73.56% accuracy when trained on dataset and tested on set having persian and english texts, as compared to Bina achieving 96% on both the test sets. The tesseract and Bina achieved 71% and 91% correctness for test sets having persian and english texts. n Next must-read paper Smith, R. “An overview of the Tesseract OCR engine”. In Proceedings of the Ninth International Conference on Document Analysis and Recognition 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑛𝑒𝑠𝑠 = 100 ∗ #𝐴𝑙𝑙𝑊𝑟𝑑𝑠 − (#𝐷𝑒𝑙𝑊𝑟𝑑 + #𝑆𝑢𝑏𝑊𝑟𝑑) #𝐴𝑙𝑙𝑊𝑟𝑑 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100 ∗ #𝐴𝑙𝑙𝑊𝑟𝑑𝑠 − (#𝐼𝑛𝑠𝑊𝑟𝑑 + #𝐷𝑒𝑙𝑊𝑟𝑑 + #𝑆𝑢𝑏𝑊𝑟𝑑) #𝐴𝑙𝑙𝑊𝑟𝑑
  4. n Summary Existing OCR algorithms require a CNN architecture for

    feature extraction from image, sequence modelling layers for text generation and a language model to improve the performance. This work includes all these steps in an end-to-end trainable model. Extensive experiments on different combinations of encoder and decoder validate the superiority of using BEiT as encoder and RoBERTa as decoder. Ablation studies conducted validate the effectiveness of different strategies used. n Related Works Diaz(2021), Vaswani(2017) incorporated transformer in the CNN architectures observing significant performance improvement. Bao(2021) replaced used self-supervised image pretrained transformers to replace CNN architectures. n Proposed Methodology The image is divided into P*P patches, flattened and a linear layer is used to change the dimension to a predefined number. The encoder consists of an image transformer 4 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models Copyright © 2022 VIVEN Inc. All Rights Reserved where DeiT(Touvron 2021) and ViT (Dosovitskiy 2021) are used as encoders initialization. The “[CLS]” token is used to represent the entire mage. A text transformer is used as the decoder, it is initialized with RoBERTa mode, thus output would be wordpiece instead of a character. The model is first trained on hundreds of millions of synthetic printed text line images. These weights are used to initialize the second stage pretraining on task specific synthetic and real-world datasets. Augmentation such as gaussian blur, image erosion, rotation, image dilation, downscaling, underlying are used to help modelʼs generalizing capabilities. n Result SROIE dataset is used to evaluate the modelʼs performance according to precision, recall, F1-score where the model achieved 95.76, 95.91, 95.84 respectively. n Next must-read paper “Scene Text Recognition with Permuted Autoregressive Sequence Models”
  5. n Summary The work proposes an encoder-decoder architecture with GRU,

    and attention mechanism evaluated on Khmer texts in different fonts. tesseract produced characters like @, #, / not present in the texts. The proposed model outputs repeated characters before reaching EOS character, due to encoder-decoder architecture. Khmer language characters have similar structure resulting in error in both models. n Related Works Ilya (2014) employed RNN based encoder-decoder for English to French machine translation. Dzmitry (2014) modified the previous work to include attention mechanism and improved the performance. Devendra (2015) employed LSTM based encoder-decoder mechanism for English text recognition. Farisa (2021) trained standard CNN architectures(ResNet, DeseNet) with LSTM, GRU layers and trained the entire model using Connectionist Temporal Classification loss in an end- to-end manner. n Proposed Methodology The encoder consists of a convolution, batchnormalization and ReLU activation followed by 3 residual modules. There are two types of residual blocks as photo above.The first residual block contains 2 res 5 An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention Copyright © 2022 VIVEN Inc. All Rights Reserved block 0 while the second and third blocks contains res block 0 followed by res block 1.This is followed by a 2DAveragePooling and 2D dropout layers. For an intermediate output of h*w*c the feature maps are reshaped to w*hc(𝑂!"#$#%&'( ) represented as 𝐻, ℎ) = 𝐸𝑛𝑐𝑜𝑑𝑒𝑟𝐺𝑅𝑈(𝑂!"#$#%&'( ). Here h is passed through a dropout, linear, tanh activation and used as decoderʼs initial hidden state, while H is the input to the decoder. The context vector is weighted average of different hidden state, the current weights are calculated using SoftMax on encoderʼs current hidden state and decoderʼs previous hidden state as 𝑒*+ = 𝑉 , ) ∗ tan(𝑊 , 𝑆*-. + 𝑈, ℎ+ ). The one-hot encoded previous decoderʼs output and context vector from attention are concatenated and passed through a GRU layer along with decoderʼs previous hidden state to calculate current hidden state. The current hidden state, context vector and one-hot encoded previous output are concatenated and passed through a linear layer for next character prediction. n Result Text2image is used to generate 92,213 texts for different Khmer fonts and sizes. The test set contains 3000 images. The proposed model achieved a character error rate(ratio of unrecognized to total number of characters) of 1% as compared to tesseractʼs error rate of 3% on this dataset. n Next must-read paper “Khmer OCR fine tune engine for Unicode and legacy fonts using Tesseract 4.0 with Deep Neural Network” 𝐶𝐸𝑅 = 𝑆 + 𝐼 + 𝐷 𝑁 𝑊𝐸𝑅 = 𝑐𝑜𝑢𝑛𝑡 𝑜𝑓 𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑐𝑜𝑢𝑛𝑡 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
  6. n Summary Texts in the real world are mostly curved

    or stylised thus ASTER is equipped with a rectification module to rectify the input image(using thin plate spline) and a recognition module using sequence to sequence module with attention to predict characters from rectified images. n Related Works Wang(2012) used two separate CNN modules to localize and recognize texts. Jaderberg(2014) used one CNN for both localization and recognition. Su and Lu (2014,2017) used RNN for sequence prediction. He(2016), Shi(2017) used a combination of CNN and RNN for text prediction. Wang(2017) employed gated recurrent CNN for text recognition. Yang (2017) employed a character detection model optimized with an alignment loss for character localization. End-to-End text Jaderber (2016), Weinman(2014) recognition models use text proposals followed by word recognizer. Busta (2017) combined FCN detector with connectionist temporal classification(CTC) for recognition. n Proposed Methodology The algorithm contains two parts for rectification and recognition. Rectification is done using Thin Plate Spline Transformation, it contains 3 parts as localization network, grid generator and sampler 6 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification Copyright © 2022 VIVEN Inc. All Rights Reserved The localization network predicts k coordinates using a CNN architecture with FC layer, from the original image. Given a pixel location p on original image I the grid generator computes the pixel location on rectified image. Here ∆𝐶 is calculated as ∆𝐶 = 1.∗0 0 0 𝐶 0 0 ℂ 10∗. 𝐶) and 𝑇 = 𝑐, 0'∗2 ∆𝐶-.. ℂ is a square matrix such that ℂ*+ = 𝐹(||𝐶* − 𝐶+ ||) and 𝐹 𝑟 = 𝑟' ∗ log(𝑟). Differentiable image sampling is used to clip neighbouring pixels to restrict pixels within image and interpolate neighbourhood pixels in a differentiable manner. The recognition module consists of an encoder- decoder architecture with ConvNet acting as encoder and BLSTM with attention as decoder. The attention weights are calculated from encoder output and previous hidden state as 𝑒*+ = tan(𝑊𝑠&-. + 𝑉ℎ* + 𝑏) ∗ 𝑊). after softmax glimpse vector g is the weighted sum of encoder outputs using attention weights. g is then concatenated with one hot encoded previous output and passed through a recurrent unit. This output is used fed to a linear layer with softmax to calculate current character n Result Multiple datasets are used to evaluate the modelʼs performance as IIIT5K(0), SVT(0), IC03(0), IC13, SVTP, CUTE obtaining 93.4, 93. 6, 94.5, 91.8, 76.1.78.5 and 79.5 respectively. n Next must-read paper: “Focusing attention: Towards accurate text recognition in natural images.”
  7. n Summary Integrates connectionist temporal classification(CTC) loss with focal loss

    to help the model with unbalanced languages datasets. Empirically for both synthetic and real images performance can be improved using alpha = 0.25 and y=0.5. n Related Works A Graves(2008) was the first to combine CTC loss with RNN for text recognition. A Ul Hasam (2013) used BiLSTM with CTC loss for Urdu text recognition. M. Busta (2017) combined recognition and detection in an end-to-end model. J. BA(2014) used reinforcement learning to concentrate on part of image useful for prediction. C.Y. Lee (2016) used RNN with attention for optical character recognition. M.Jaderberg used spatial transformer for spatial manipulation of data within the module combined with focal loss for text recognition. n Proposed Methodology With ResNet as the backbone the algorithm extract feature maps from the last convolution layer which is cut into multiple slices, each containing information about a small area within the image. This is followed by a BiLSTM and fully connected layer with softmax for final 7 Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets Copyright © 2022 VIVEN Inc. All Rights Reserved output 𝑝 𝜋 𝑥 = ∏&4. ) 𝑦5& & . Here 𝑦5& & represents the probability of observing the elements (set of all possible characters and blank character) at slice t for T total slices. Thus, CTC loss is calculated as 𝑝 𝑙 𝑦 = ∑ 5:7 5 48 𝑝(𝜋|𝑦)i.e., the sum of all probabilities. For hyperparameter alpha and y focal loss is calculated as 𝐹𝐿 𝑝& = −𝛼& (1 − 𝑝& )9log(𝑝& ). Here alpha is used to overcome data imbalance and y to help model focus more on hard samples. Thus, CTC loss can be modified as 𝐹 :): 𝑙 𝑦 = −𝛼& 1 − 𝑝 𝑙 𝑦 9 log(𝑝(𝑙|𝑦). Thus, the model can focus more on hard samples. n Result A synthetic dataset generated from MNIST by concatenating 5 images together from two groups of ʻ0-9,a-hʼ and ʼi-zʼ characters one of 1M and 100K (10:1 imbalance) other of 1Mand 10K(100:1 imbalance) images with 10K test set, and Chinese OCR of 3.6M training and 5K testing images. Highest accuracy obtained were 62.8%, 72.4% and 76.4% respectively. 𝜋𝑡
  8. n Summary This work proposes a light-weight model for text

    recognition. Various strategies to improve the modelʼs performance or decrease the modelʼs parameters are also discussed in the work. Ablation studies conducted are used to verify the effectiveness of each strategy. n Proposed Methodology The algorithm uses 3 modules for recognition namely text detection to output bounding box for text, direction classifier is necessary if the bounding box reversed the text and text recognizing. For recognition a light backbone MobileNetV3_large_x0.5. Empirically it was observed that removing the squeeze excitation blocks from the model resulted in no loss of accuracy while reducing the number of parameters along with inference time. Feature Pyramid Network(FPN) is used in the head to detect small texts using large resolution feature maps. Cosine learning rate is used to use large learning rate( LR) at the beginning and small LR at later stages. Large value of LR at the beginning may lead to instability thus LR warmup is used. FPGM pruner is used to dynamically calculate compression ratio for each layer and remove similar values, thus improve inference efficiency. MobileNetV3_small_x0.35 is used as the backbone for direction classifier for input resolution of 48*192. 8 PP-OCR: A Practical Ultra Lightweight OCR System Copyright © 2022 VIVEN Inc. All Rights Reserved Augmentation such as rotation, gaussian blur, perspective distortion, motion blur(Base Data Augmentation BDA) along with random augment are used to improve the modelʼs generalizing capabilities. Modified PACT quantization along with L2 regularization coefficient of 1e-3 is used. For recognizer MobileNetV3_small_x0.35 is used as backbone pretrained on synthetic images and modified strides to preserve horizontal and vertical information, with BDA and TIA (Luo 2020) augmentation. A fully connected layer of 48 dimension is used as head along with L2 regularization. Cosine learning rate along with LR warmup is used for training. PACT quantization is used for every layers except LSTM layers. n Result Multiple synthetic as well as public datasets such as LSVT, RCTW-17, MTWI 2018, CASIA-10K, etc. are combined for training and validation of the different modules. When using all the strategies previously mentioned the model achieved an accuracy of 69%
  9. 9 On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

    Copyright © 2022 VIVEN Inc. All Rights Reserved 𝐴𝑡𝑡 − 𝑜𝑢𝑡<= = ∑ <!=> 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑟𝑒𝑙 <!=!)→(<= 𝑉<>=> . V ℎ!𝑤! is calculated by multiplying the feature maps of shallow CNN with trainable weights and attention weights 𝑟𝑒𝑙 ℎ! 𝑤!-> ℎ𝑤 are calculated as 𝑟𝑒𝑙 ℎ! 𝑤! → ℎ𝑤 ∝ ( ) 𝑒ℎ𝑤 + 𝑃ℎ𝑤 𝑊𝑞𝑊𝑘𝑡 𝑒 ℎ! 𝑤! + 𝑝 ℎ! 𝑤! 𝑇 Here 𝑒<!=> PE!F> represented extracted feature maps and positional embedding, respectively. Further Phʼwʼ can be calculated as 𝑃 ℎ! 𝑤! = 𝛼 𝐸 𝑃ℎ 𝑠𝑖𝑛𝑢 + 𝛽 𝐸 𝑃𝑤 𝑠𝑖𝑛𝑢 Further Ph sinu and Pw sinu are sinusoidal positional encoding along height and width, while ɑ(E) and 𝛃(E) are calculated as 𝛼 𝐸 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 max 0, 𝑔 𝐸 𝑊ℎ 1 𝑊ℎ 2 and 𝛽 𝐸 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(max 0, 𝑔 𝐸 𝑊𝑤1 𝑊𝑤 2 ) For E representing CNN extracted features. To capture short-term dependency the 1X1 convolution are replaced with 3X3 convolution and are represented as Locality-aware feedforward layer. n Result Datasets having horizontal texts as well as randomly aligned texts were used to validate the performance of model such as IC13(94.1%), IC03(96.7%), SVT(91.3%), IIIT5K(92.8%), IC15(79%), SVTP(86.5%), CT80(87.8%) n Next must-read paper “Aster: An attentional scene text recognizer with flexible rectification” n Summary Current OCR technologies is unable to recognize rotated, curved, vertically aligned, arbitrary shaped texts. This work uses attention mechanism to tackle these challenges. n Related Works A Cheng(2018) employed a selection module to select features in 4 direction by projecting an intermediate feature map. Yang(2017) used attention module requiring extensive character level supervision. Hui(2019) employed attention but is biased towards horizontal texts due to height pooling and RNN layers. Fenfen(2018) used 1D transformer for recognition. Pengyuan(2019) employed self-attention in the decoder. 𝑎𝑡𝑡 − 𝑜𝑢𝑡ℎ𝑤 = ∑ <!=> 𝑠𝑜𝑓𝑡𝑚𝑎𝑥( ) 𝑟𝑒𝑙 ℎ!𝑤! → ℎ𝑤 𝑉 ℎ!𝑤! text recognition. n Proposed Methodology A shallow CNN module is used to supress background information while reducing computational costs for subsequent layers. The output is passed through self attention blocks, with novel 2D positional embedding. This can be formulated as
  10. 10 Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

    Copyright © 2022 VIVEN Inc. All Rights Reserved step is needed, this causes a bottleneck when in modelʼs semantic reasoning. Thus, this work uses approximated information at t-1 step to calculate vectors at t step. The output of the attention(g) is used to predict target character(FC with softmax) and optimized using cross- entropy(CE). The most likely character is passed through an embedding layer to calculate approximate embedding. The extracted features are passed through several transformer units to output global context(s) and optimized using CE loss. Features g, s are dynamically allocated importance as, 𝑍𝑡 = 𝛼(𝑊𝑧 . [𝑔𝑡 , 𝑠𝑡]) and 𝑓𝑡 = 𝑧𝑡 ∗ 𝑔𝑡 + 1 − 𝑧𝑡 ∗ 𝑠𝑡. The entire model is optimized as 𝐿𝑜𝑠𝑠 = 𝛼𝑒 𝐿𝑒 + 𝛼𝑟 𝐿𝑟 + 𝛼𝑓 𝐿𝑓 n Result Various datasets such as IC13, IC15, IIIT5K, SVT, SVTP, CUTE, TRW-T, TRW-L are used for evaluation achieving 95.5, 82.7, 94.8, 91.5, 85.1, 87.8, 85.5 and 84.3, respectively. n Conclusion The model could be incorporated with CTC loss to improve performance. n Next must-read paper “An end-to- end trainable neural network for spotting text with arbitrary shapes” n Summary This work attempts to overcome the shortcomings of RNN such as its time dependency, and most importantly one way transmission of context which greatly limits the modelʼs effectiveness to learn semantic information. n Related Works Baoguang (2016) combined CNN and RNN with connectionist temporal classification(CTC) loss function for recognition. Minghui (2019) formulated the problem as a pixel level classification task. Chen (2016) extracted the visual features in 1D and used the semantic information of last time step for recognition. Mingkun (2019) used a rectification network based on local features to improve performance. Zhanzhan (2018) extracted features along 4 directions and used a filter gate to calculate the contribution of each. Zbigniew (2017) encoded spatial coordinates on 2D feature maps to increase sequential information extracted. n Proposed Methodology ResNet50 is used as backbone with feature pyramid extracting features from 3rd, 4th and 5th residual blocks. The features extracted are passed to transformer along with positional embedding. A novel parallel visual attention module is used that computes weights as 𝑒&,*+ = 𝑊 P ) tan(𝑊 # 𝑓# 𝑂& + 𝑤Q 𝑣*+ ) For transformer extracted features as v and O representing character reading order (1…N-1) and f as the embedding function. After softmax, weighted sum with v is used to compute the attention outputs. When using RNN to calculate vectors at t time step information of t-1 time
  11. 11 Multi-Lingual Optical Character Recognition System Using the Reinforcement Learning

    of Character Segmenter Copyright © 2022 VIVEN Inc. All Rights Reserved First L and R are trained with cross-entropy loss function. The trained modules are used in conjunction with S in reinforcement learning. S outputs the partition map in a probabilistic vector, which is then processed to output. Binary map. The processing involves non-max suppression, clipping probabilities at 0.99 and thresholding. Thus, the action is based on the modelʼs output and the reward is based on the distance from true value as 𝑟 𝑋, 𝑎 = 1 − !(R S,, ,R) TUV( , W.,X") . For input X and action, a Y(x, a) is the processed partition map and Y is the ground truth. The denominator is used for length normalization, Ny being the total number of characters in Y. n Result Texts of various languages are used for evaluation such as Chinese, English, Korean as well as mixed texts such as Chinese with English, Chinese with Korean English with Korean and Chinese with English and Korean achieving 94.74, 77.01, 97.07, 87.23, 97.1, 87.46, 90.87, respectively. n Next must-read paper “Tesseract Blends Old and New OCR Technology” n Summary The performance of OCR systems decrease when working on texts having multiple languages. To tackle this problem this work uses segmenter (Reinforcement algorithm), switcher and recognizer trained in a supervised manner. n Related Works Zheng(2016) formulated character segmentation as a binary segmentation. Chernyshova(2020) proposed a word image segmentation model using dynamic programming to select most probable boundaries in images. B.shi(2017) used convolution layers for feature extraction, LSTM layers to predict character class and CTC loss to ignore the repeated characters produced due to multiple slices. D. Kumar(2015) used encoder-decoder architecture with attention to scan image along horizontal direction followed by decoding the feature vector using attention. n Proposed Methodology The segmenter is used to partition a word image into n sub-images, switcher then assigns a recognizer for each sub-image followed by recognizer which assigns a label. The architecture of word recognizers(R):
  12. 12 Copyright © 2022 VIVEN Inc. All Rights Reserved 会社名

    株式会社 微分 代表者 吉⽥ 慎太郎 所在地 東京都新宿区新宿4丁⽬1-16 JR新宿ミラ イナタワー18F JustCo新宿 設⽴ 2020年10⽉ 資本⾦ 7,000,000(2022年10⽉時点) 従業員 20名(全雇⽤形態含む) 事業内容 教育機関向けソフトウェア「School DX」の開発 ウェブアプリケーションの開発 画像認識/⾃然⾔語の研究開発 会社概要
  13. 13 Copyright © 2022 VIVEN Inc. All Rights Reserved