Upgrade to Pro — share decks privately, control downloads, hide ads and more …

InnerSource Summit 2024: Thoughts on AI and Inn...

InnerSource Summit 2024: Thoughts on AI and InnerSource

Open source AI is being defined as I write this. What would that mean for innersource initiatives? Would the data to train the foundational models be themselves open source (internally atleast) and what are the consequences of doing that. This talk is a to expand the conversation the OSI is having with Open Source AI but with an Inner Source slant.

Harish Pillay

November 19, 2024
Tweet

More Decks by Harish Pillay

Other Decks in Technology

Transcript

  1. Thoughts on AI and Inner Source Straits Interactive Pte Ltd.

    Harish Pillay ONLINE November 20-21, 2024 Harish Pillay 9V1HP Fellow IES, Fellow SCS, Life Member IEEE @harishpillay@floss.social • [email protected] 1 Slides on speakerdeck.com
  2. Chesterton Fence 4 There exists in such a case a

    certain institution or law; let us say, for the sake of simplicity, there’s a fence/gate erected across a road. The more modern type of reformer goes up to it and says: ‘I don’t see the use of this; let us clear it away.’ To which the more intelligent type of reformer will do well to answer: ‘If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.’ - G. K. Chesterton
  3. What insights can we glean? - Don’t tear down that

    fence! - We risk unforeseen problems 7
  4. What insights can we glean? - Don’t tear down that

    fence! - We risk unforeseen problems - A display of intellectual humility 8
  5. What insights can we glean? - Don’t tear down that

    fence! - We risk unforeseen problems - A display of intellectual humility - Sparking needed progress (?) 9
  6. What insights can we glean? - Don’t tear down that

    fence! - We risk unforeseen problems - A display of intellectual humility - Sparking needed progress (?) 10
  7. So, how does that help with InnerSource? Most of us

    are familiar with Open Source and the enormous value it brings to society 12
  8. So, how does that help with InnerSource? Most of us

    are familiar with Open Source and the enormous value it brings to society … and we want to bring benefits (a useful subset of it*) to our organizations via InnerSource 13 * https://www.youtube.com/watch?v=rRRzuV44x_k, 2020 InnerSource Summit APAC
  9. 14

  10. Applying AI to InnerSource Environments Two broad categories of AI

    Discriminative (discAI) Generative (genAI) 19
  11. Applying AI to InnerSource Environments 20 Feature DiscAI GenAI Data

    generation Classifies existing data; labelling Creates new data Training Often uses supervised learning Often uses supervised learning Goal Learns the decision boundary between categories Learns underlying data distribution Applications Image and speech recognition, spam filtering Creates new designs, generating new content Advantages Simpler and faster to train with smaller datasets Flexibly, handles complex data distributions Analogy Sorting laundry (shirts, pants, socks etc) Designing a completely new shirt Training Complexity Often less resource-intensive Requires more data and compute resources Interpretability More interpretable depending on model used Often harder to interpret because of probabilistic and complex generation process
  12. Applying AI to InnerSource Environments - All AI related efforts

    need access to data - Provenance of data is the baseline - but in any InnerSource effort, this will have to be watched carefully 22
  13. Applying AI to InnerSource Environments - All AI related efforts

    need access to data - Provenance of data is the baseline - but in any InnerSource effort, this will have to be watched carefully - Access to data - is it yours? Was the model trained on YOUR data? Do you KNOW where the data resides and came from? 23
  14. A word about Data - I’ll lean on the Open

    Data* definition which in turn uses the Open Definition: ‘Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.’ 24 * https://opendatahandbook.org/guide/en/what-is-open-data/
  15. For AI to be deployed in an InnerSource context, does

    it need to comply with open data? 25
  16. For AI to be deployed in an InnerSource context, does

    it need to comply with open data? OSI’s AI Definition v1.0? 26
  17. Conjecture If your organization is deploying AI systems for use

    within your InnerSource initiatives, NOT knowing the training data provenance is not a good idea 28