Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MPAI: data coding standards for AI

MPAI: data coding standards for AI

Video: https://video.linux.it/w/52PVMQRWmKCmbMRQjdLeXf?start=26m47&stop=59m48

L'intelligenza artificiale ha reso più efficienti i processi precedentemente gestiti con il Data Processing (DP), ma il suo utilizzo è spesso stato irregolare e poco trasparente, con il rischio di pregiudizi nei dati e decisioni non tracciabili, specialmente in settori critici come quello dei veicoli autonomi. Gli standard del Data Processing hanno favorito lo sviluppo tecnologico, ma mancano ancora standard comparabili per l'AI

Per far fronte a questa lacuna, MPAI (Moving Picture, Audio, and Data Coding by Artificial Intelligence) ha intrapreso la missione di sviluppare standard per la codifica dati basati sull’intelligenza artificiale, creando Specifiche Tecniche che segmentano applicazioni complesse in AI Modules (AIMs) con funzioni e interfacce ben definite. Questi AIMs possono essere combinati in AI Workflows (AIWs) per costruire applicazioni con operazioni trasparenti e tracciabili

MPAI si impegna a fornire un percorso standardizzato, rigoroso e rapido per questi standard tecnologici, potenzialmente completabile in meno di un anno, incentivando un mercato competitivo di componenti AI standardizzati

Ing. Leonardo Chiariglione — Presidente di MPAI, cofondatore di MPEG e promotore della standarizzazione ISO degli standard MPEG

Python Torino

January 17, 2024
Tweet

More Decks by Python Torino

Other Decks in Research

Transcript

  1. MPAI: data coding standards for AI Leonardo Chiariglione, MPA Viaggio

    al centro dell’IA, Turin, 2023/01/16 1 24-Jan-24
  2. Two ages  AI’s great strides: former DP processes implemented

    more efficiently.  AI performs complex functions, but result depends on training data sets.  AI is used in an ad hoc way as IT in its early days.  AI’s impact Potentially devastating social impacts caused by information services. Lack of Explainability unacceptable, e.g., in autonomous vehicles.  Great examples of data processing (DP) coding standards.  Virtually no comparable examples for AI.  Why? Maybe because there is no right environment… 24-Jan-24 3
  3. MPAI: Moving Picture, Audio, and Data Coding By Artificial Intelligence

    24-Jan-24 International, unaffiliated, non-profit SDO. Develops AI-based data coding standards. With clear Intellectual Property Rights licensing frameworks. What it does How it does it What it is
  4. Breaking monolithic applications into components  Components: Have known functions

    and interfaces, Implementable with AI or DP technologies. Implement applications as AI Workflows (AIWs) with known functions & interfaces composed of interconnected AIMs with known functions & interfaces. 24-Jan-24 5 AI module (AIM) Speech analysis Text Emotion Speech AI Workflow (AIW) AI Module AIM AI Module AIM AI Module AIM AI Module (AIM) Inputs Outputs AIM Storage
  5. Rigorous standard development process Call for Technologies Standard Development Community

    Comments Use Cases Functional Requirements Commercial Requirements All Principal Members Stage 0 Stage 1 Stage 2 Stage 3 Interest Collection Proposal All Members Stage 5 Stage 4 All Stage 6 6 Technical Specification Principal Members Reference Software Reference Software Conformance Testing Performance Assessment Stage 7
  6. 3. Accessible & timely available standards After the development, Members

    holding IP in the standard select the preferred patent pool administrator. Beforeinitiating a standard, Active Principal Members develop & adopt its Framework Licence (FWL), a licence without values: $, %, dates etc. declaring that the eventual licence will be issued 1. Not after products are on the market. 2. At a price comparable with similar standard technologies. During the development, any Member making a contribution declares it will make its licence available according to the FWL. 1/24/2024 7 Read the MPAI Patent Policy
  7. The MPAI organisation 24-Jan-24 24-Jan-24 8 8 General Assembly Requirements

    Standing Committee Standing Committees Board of Directors Secretariat Membership & Nominating Finance & Audit IPR Support Industry & Standards Commu nication MMM EEV EVC SPG CAV Development Committees MMC-DC GME- DC AIF-DC NNW- DC AIH-DC CAE- DC CUI-DC PAF-DC XRV- DC
  8. Governance of the MPAI Ecosystem (MPAI-GME) The rules governing the

    MPAI Ecosystem: 1 - MPAI 2 – Implementers 3 - MPAI Store 4 - Performance Assessors 5 - Users  MPAI Store established in Scotland as a Company Limited by Guarantee tasked with:  Managing the “Implementer ID” Registration Authority.  Verifying Security  Testing Conformance  Making Implementations available for or download.  Receiving and classifying user experience scores.  Performance Assessors are appointed by MPAI but independent of it 1/24/2024 10
  9. The MPAI ecosystem 24-Jan-24 11 ✓ Develops Implem- entations ✓

    Assesses Perfor- mance ✓ Verifies security ✓ Tests Conformance ✓ Checks Performance Implementation Implementation OK OK Store Implementer Performance Assessor ✓ Down- loads Implem- entation End User Experience Score Implementation ✓ Standards (TS-RS-CT-PA) MPAI Conformance Testing Performance Assessment Technical Specification
  10. AI Framework (MPAI-AIF) V2.0  Standard AI Framework enabling dynamic

    configuration, initialisation, and control of mixed AI and data processing workflows.  Hierarchical structure: AI Framework (AIF) - AI Workflows (AIW) - AI Modules (AIM).  AIWs may be proprietary or standard, i.e., with standard functions and interfaces, with an explicit computing workflow.  Developers can offer “better” AIMs compared to other implementations.  AIWs can execute AIMs implemented in hardware, software, or hybrid hardware/software. 24-Jan-24 12
  11. Reference Model 24-Jan-24 13 User Agent AI Module (AIM) AI

    Module (AIM) AI Module (AIM) AI Module (AIM) Controller (Non-secure/Secure) Global Storage MPAI Store Access Inputs Outputs SAL Communi- cation Attestation Service Encryption Service Communication Service AIM/AIW Storage (non-secure) AIM/AIW Storage (secure)
  12. Context-based Audio Enhancement (MPAI-CAE V2.0)  Improves the user experience

    for audio applications – teleconferencing, post- production, restoration – in different contexts – in the home, in the studio, etc.  Technical Specification includes:  4 Use cases: Emotion Enhanced Speech, Audio Recording Preservation, Speech Restoration System, and Enhanced Audioconference Experience (and Reference Software).  1 Composite AIM: Audio Scene Description and Data types.  Reference software.  Conformance Testing. 24-Jan-24 14
  13. Audio Recording Preservation (CAE-ARP) 24-Jan-24 15 Packager Irreg. File Analysed

    Audio File Irreg. Images Irreg. Images Audio Analyser Communication Global Storage Video Analyser Irreg. File Irreg. File Tape Irregularity Classifier Tape Audio Restoration Irreg. File Irreg. File Preservation Audio File Preservation Audio-Visual File Preservation Master Files Access Copy Files Restored Audio Files Irreg. File Controller MPAI Store User Agent Editing List
  14. Audio Scene Description (CAE-ASD) 24-Jan-24 16 Audio Analysis Transform Audio

    Source Localisation Audio Separation and Enhancement Audio Synthesis Transform Microphone Array Geometry Multichannel Audio Audio Scene Descriptors Transform Multichannel Audio Transform Multichannel Audio Audio Spatial Attributes Transform Enhanced Audio Audio Objects Audio Scene Geometry Audio Description Multiplexing
  15. Connected Autonomous Vehicle – Architecture (MPAI-CAV) V1.0 Specification of the

    Architecture of Connected Autonomous Vehicles (CAV) based on a Reference Model comprising: 1. A CAV broken down into Subsystems. 2. Subsystems broken down into Components. Specification of: 24-Jan-24 17 Functions I/O Data Topology Subsystems X X X Components X X
  16. MPAI-CAV – Architecture V1.0 24-Jan-24 18 Environment Sensing Subsystem (ESS)

    Human-CAV Interaction (HCI) Autonomous Motion Subsystem (AMS) Motion Actuation Subsystem (MAS)
  17. Autonomous Motion Subsystem (CAV-AMS) 24-Jan-24 19 Route Planner Path Planner

    Route Motion Planner Obstacle Avoider Command Issuer Environment Representation Fusion Environment Sensing Subsystem Human-CAV Interaction Basic Environment Representation FER FER FER FER Route Path Trajectory Trajectory Response Command Response Response Alert Road State Road State Road State Road State Trajectory Path Trajectory Decision Recorder Motion Actuation Subsystem Environment Sensing Subsystem Remote AMSs Environment Representation HCI-AMS Command AMS-HCI Response User Agent Command Controller Global Storage Access Communication Attestation Service Encryption Service Communication Service MPAI Store
  18. The «AI-based Compression and Understanding of Industrial Data» standard Company

    Performance Prediction use case Predicts the performance of a company in a given time horizon based on: Financial risks Vertical risks (i.e., seismic and cyber) Governance data. Performance is expressed by:  Default probability Business discontinuity probability  Organisational Model Index Compression and Understanding of Financial Data (MPAI-CUI)
  19. Company Performance Prediction (CUI-CPP) 24-Jan-24 21 Controller Communication Global Storage

    MPAI Store Governance Assessment Risk Matrix Financial Assessment Risk Matrix Generation Prediction Organisational Model Index Default Probability Business Discontinuity Probability Perturbation Default Probability Governance Risk Assessment Financial Statement Governance Features Financial Features Prediction Horizon User Agent
  20. Multimodal Conversation (MPAI-MMM) V2  Goal: technologies that enable more

    human-like human-machine conversation emulating human-human conversation in completeness and intensity.  Improve machine’s “conversational” capabilities by better understanding of a human message and generating a pertinent response.  Goal achieved by specifying, inter alia, Personal Status, a new data type that represents the “internal status” of a conversing human expressed with text, speech, face, and gesture.  Personal Status can be used by the machine to represent its own internal status as if it were a human.  A human can be replaced by a machine (machine-to-machine). 24-Jan-24 22
  21. Speech Recognitio n Language Understanding Face Descriptors Body Descriptors Input

    Speech Input Speech Speech Object Audio Scene Description Face Object Face Recognitio n Face ID Environment Sensing Subsystem Audio (Outdoor) Audio (Indoor) Dialogue Processing Speaker Recognitio n Speaker ID Autonomous Motion Subsystem HCI-AMS Command AMS-HCI Response Machine Text Personal Status Display Machine Portable Avatar Machine Personal Status Recognised Text Visual (Outdoor) Visual (Indoor) Input Personal Status Spatial Object Identification Physical Object Physical Object ID Body Descriptors Refined Text Meaning Remote HCIs Inter HCI Information LiDAR (Indoor) Visual Scene Geometry Input Text Controller User Agent Audio- Visual Alignmen t Audio Scene Geometry Visual Scene Geometry ParticipantIDs ParticipantIDs ParticipantIDs Personal Status Extraction Meaning Global Storage Access Communi-cation Attestation Service Encryption Service Communication Service MPAI Store Visual Scene Description Human-CAV Interaction (MMC-HCI)
  22. Virtual Meeting Secretary 24-Jan-24 24 S Language Understanding Meaning Input

    Personal Status Dialogue Processing Meaning Edited Summary Input Text Input Personal Status Summary User Agent Refined Text VS Text VS Personal Status Personal Status Extraction Recognised Text Input Speech Summary Body Descriptors Personal Status Display PortableAvatar Demultiplexing Face Descriptors Input Text Portable Avatars VS Portable Avatar Speech Recognition Meaning Refined Text Avatar ID Avatar ID Summarisation VS Avatar Model Input Speech Controller Global Storage MPAI Store Access Communi-cation Attestation Service Encryption Service Communication Service
  23. MPAI Metaverse Model (MPAI-MMM) – Architecture 24-Jan-24 25 Technical Specification:

    MPAI Metaverse Model (MPAI-MMM) – Architecture specifies: 1. Terms and Definitions 2. Operation Model 3. Functional Requirements of Processes, Actions, Items, and Data Types 4. Functional Profiles to enable Interoperability of two or more metaverse instances (M-Instances) if they: 1. Rely on the same Operation Model, and 2. Use: 1. The same Profile specified by MPAI- MMM – Architecture, and 2. Either the same Technologies, or 3. Independent Technologies while accessing Conversion Services that losslessly transform Data of M- InstanceA to Data of M-InstanceB . 25  Note: Full Interoperability may not be achieved if the M-Instances implement different Profiles.
  24. Inter-Process Communication: Human-Device-User 24-Jan-24 26 Human Device User Universe Metaverse

    Persona Process App User Process1 Process2 M-Instance Request-Action Response-Action
  25. 24-Jan-24 27 Neural Network Watermarking (MPAI-NNW) Specifies methodologies to evaluate

    the following aspects of a neural network watermarking technology:  The impact of a watermarking applied to a neural network on a neural network or its inferences. The ability of a neural network watermarking detector/decoder to detect/decode a payload.  The computational cost of injecting, detecting a watermark or decoding a payload in the neural network. 1/24/2024 27
  26. MPAI-NNW in MMC-MQA: Proof of authenticity of Question/Answer AIM inference

    24-Jan-24 28 Question Answering (NNW authentication) Input Speech Meaning Speech recognition Question Analysis Intention Language Understanding Controller Instance Identifier Meaning Recognised Text Input Video Visual Scene Description Speech Synthesis (Text) Machine Speech Machine Text User Agent Machine Text Input Text Refined Text Visual Object Identification Physical Object Input Selection NNW decoder Machine Text QA AIM proof of authenticity
  27. Portable Avatar Format (MPAI-PAF)  Portable Avatar and related Data

    Formats enabling a receiver to decode and render an Avatar in a Virtual Environment as intended by the sender.  Personal Status Display Composite AI Module allowing the conversion of a Text and a Personal Status to a Portable Avatar.  AI Workflows and AI Modules composing the Avatar-Based Videoconference Use Case also using Data Types from other MPAI Technical Specifications. 24-Jan-24 29 Speech Type Body Descriptors Portable Avatar Multiplexing Portable Avatar Avatar ID Avatar Model Time Visual Enviornment Spatial Attitude Face Descriptors Speech Text Language Preference Personal Status
  28. Personal Status Display (PAF-PSD) 24-Jan-24 30 Speech Synthesis (PS) Face

    Description Body Description PS-Speech PS-Face PS-Gesture Avatar Model Avatar Model Avatar Model Machine Text Machine Text Machine Text Machine Text Avatar ID Machine Speech Machine Face Descriptors Machine Body Descriptors Portable Avatar Multiplexing Portable Avatar Machine Speech Avatar ID Machine Text Avatar Model Personal Status
  29. Avatar-Based Videoconference (PAF-ABV) 24-Jan-24 31 Server Transmitting Client Virtual Secretary

    Language Preferences Avatar Model Speech and Text Visual Receiving Client Point of View Output Audio & Text Output Visual Summary Summary Environment Model Spatial Attitudes Speech Object Face Object Portable Avatar Portable Avatars Portable Avatar (VS) Summary Portable Avatars
  30. AI for Health (MPAI-AIH) 1. Architecture of an AI Health

    Platform composed of: 1. End User Front Ends tasked with 1. acquiring, AI-based processing, 2. Licensing Health Data to Back End. 2. A Back End tasked with 1. collecting, AI-based processing 2. providing access to and enabling Third-Party Users through Licences to process Health Data, 3. collecting. and redistributing AI Models updated via federated learning techniques. 3. A Blockchain tasked with managing Licences as Smart Contracts. 2. The format of the Licences 3. The API 4. Taxonomy of AIH Processing functions 5. The Data Formats 24-Jan-24 33
  31. API API Data Storage & Access Services Auditing Services Authentication

    & Access Control Services Blockchain and Distributed Ledger Technologies Third-Party Health-related Entities De-Identification & Anonymization Services Global Secure Data Vault Access to multiple services from smartphone (data storage, permissions, licenses, etc.) Other sources of health- related data Record of transactions on the B&DLT that can be used for audition purposes Access to data (de- identified and anonym), processing of data through AI to extract knowledge Licensing & Governance Services AIH Federated Learning System User Agent AI Module (AIM) AI Module (AIM) AI Module (AIM) AI Module (AIM) Controller Global Storage MPAI Store Access Communi- cation Attestation Service Encryption Service Communicatio n Service AIM/AIW Storage (non-secure) AIM/AIW Storage (secure) U s e r A g e n t A I M o d u l e ( A I M ) A I M o d u l e ( A I M ) A I M o d u l e ( A I M ) A I M o d u l e ( A I M ) Cont rol l er G l o b a l S t o r a g e M P A I S t o r e A c c e s s C o m m u n i - c a t i o n Trusted Services A t t e s t a t i o n S e r v i c e E n c r y p t i o n S e r v i c e A I M S t o r a g e S e r v i c e C o m m u n i c a t i o n S e r v i c e A I M S e c u r i t y E n g i n e A I M M o d e l S e r v i c e A I M / A I W S t o r a g e ( n o n - s e c u r e ) A I M / A I W S t o r a g e ( s e c u r e ) Platform Back-End
  32. Specific Biometric Sensors Smartbands, Smartwatches, … Biometric Signals End-user input

    data AI-Health App Secure Data Vault Interaction w/ Backend Anomaly Detection & Risks Alert Other installed health-related apps User Agent AI Module (AIM) AI Module (AIM) AI Module (AIM) AI Module (AIM) Controller Global Storage MPAI Store Access Communi- cation Attestatio n Service Encryption Service Communicati on Service AIM/AIW Storage (non-secure) AIM/AIW Storage (secure) Platform Front-End
  33. Object and Scene Description (MPAI-OSD)  Data Formats able to

    describe uni- and multimodal Objects and Scenes, and their localisation in space for uniform use across MPAI Technical Specifications. Audio Environment Visual Environment Coordinates, Angles, and Objects Spatial Attitude and Point of View Audio Scene Descriptors Visual Scene Descriptors Audio-Visual Scene Descriptors 24-Jan-24 36
  34. Audio-Visual Scene Description Variable name Comment Timestamp type 0: Absolute

    Time; 1: Relative Time Timestamp value In seconds Space type 0: Global Space (from 0,0.0); 1: Relative Space Audio Environment MPAI-CAE Visual Environment MPAI-OSD Coordinate system 0: Cartesian; 1: Spherical Coordinate value In meters # human AV Objects Integer Spatial Attitude1 MPAI-OSD Audio objectID1 string Body DescriptorsID1 string Face DescriptorsID1 string … #non-human AV Objects Integer Spatial AttitudeA MPAI-OSD Audio ObjectIDA string Visual ObjectA string 24-Jan-24 37
  35. Human-Machine Communication (MPAI-HMC)  Enables a speaking human either in

    a real space or else represented as a Digitised Human in a Virtual Space to communicate with a Machine displayed as a speaking humanoid in the real space or represented as a speaking Virtual Human in a Virtual Space. 24-Jan-24 38 Machine2 Digitised Human Virtual Human real human1 AV Scene AV Scene Machine1 real human1 AV Scene Digitised Human
  36. High-level Reference Model 24-Jan-24 39 Input Visual Audio-Visual Frontend Input

    Selector Input Audio AV Scene Integration & Description Portable Avatar AV Scene Descriptors Entity Context Understanding Personal Status Meaning Translated Text Audio Instance ID Refined Text Entity ID AV Scene Geometry Entity Dialogue Processing Machine Personal Status Machine Text Machine Avatar ID Personal Status Display Portable Avatar (HMC) Portable Avatar Viosual Text Audio-Visual Scene Rendering Visual Instance ID Input Text Controller Global Storage MPAI Store Access Communication Attestation Service Encryption Service Communication Service AV Scene Descriptors User Agent Audio
  37. Guidelines for mitigation of data loss and cheating in online

    multiplayer gaming (MPAI-SPG)  Guidelines on the design and use of neural networks for the purpose of creating reliable and accurate prediction systems to predict absent or malicious players’ control data in an authoritative server context. 24-Jan-24 40 Behaviour Engine Physics Engine Rules Engine Game State Engine GM’ GM GM’ GM Online Game Server GS CD GS CD Client 1 Client 2 Behaviour Engine- AI Physics Engine-AI Rules Engine-AI Game State Engine GMp GMp GMp GM* GM* GM* Controller Communication Global Storage GSp Game State Engine -AI CD GM’ GM GS Server-based Predictive Multiplayer Gaming GS CD Client N MPAI Store
  38. Extended Reality Venues (MPAI-XRV) – Live Theatrical Stage Performance Specifies

    functions and interfaces of:  AI Workflow implementing the Live Theatrical Stage Performance  AI Modules. to automate live multisensory immersive stage performances which ordinarily require extensive on-site show control staff to operate. 24-Jan-24 41
  39. Reference Model 24-Jan-24 42 Scene Descriptors FX Lighting Output A/V

    RE Venue Specification VE Venue Specification Virtual Env. Descriptors Output A/V Action Descriptors Action Descriptors Biometric Data Lidar MoCap Data Sensor Data Venue Data Performance Status Interpretation Operator Command Interpretation Performance Description Scene Descriptors Participants Status Interpretation Participants Description Participants Descriptors InputAV Controller App Data Lighting/FX Audio/VJ/DJ Show Control Cue Point Participants Status Interpreted Operator Controls Virtual Environment Experience Generation Real Environment Experience Generation Real Environment Volumetric Skeleton/Mesh Script Controller Communication Global Storage MPAI Store User Agent Scene Descriptors Real Environment Venue Data Volumetric Virtual Environment Volumetric Action Generation Virtual Environment Input AV Scene Descriptors Scene Descriptors
  40. AI-based End-to-End Video Coding (MPAI-EEV)  Compressed video representation that

    exploits AI-based end-to-end data coding technologies, i.e., without being constrained by how data coding has traditionally been applied in the of video coding context. 24-Jan-24 44
  41. AI-Enhanced Video Coding (MPA-EVC)  Compressed video representation by enhancing

    or replacing compression tools – intra prediction, super resolution, and in-loop filter – of an existing video codec (MPEG-5 EVC) with AI-based tools. 24-Jan-24 45 Resolution NN-Intra only NN-SR only Combined SD2HD -6.92% -16.5% 24.57% HD24K -6.43% -7.22% -11.40% vs. HD-resolution NN- Intra enhanced NN-SR 4K-resolution SR-enhanced HD-resolution native anchor Bicubic 4K-resolution bicubic-enhanced Comparison scheme: HD upscaled with bicubic Vs. HD enhanced with NN-Intra and upscaled with NN-SR
  42. We look forward to working with you in MPAI on

    our and your projects! https://mpai.community/how-to-join/ 24-Jan-24 46 46 46 46 46 Join MPAI Share the fun Build the future