Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI has already changed software development

AI has already changed software development

Avatar for Radamés Roriz

Radamés Roriz

April 09, 2025
Tweet

More Decks by Radamés Roriz

Other Decks in Programming

Transcript

  1. AI has already changed software developm ent by active_genie https://roriz.dev

    TROPICAL ON RAILS 2025 Radamés Roriz Senior Software Engineer @ Knowbe4
  2. 2 Rada Passionate about solving real problems and building products

    that make a positive impact on people's lives. Obsessed with AI, talking about it, writing blog, and now open-source Radamés Roriz Senior Software Engineer @ Knowbe4 https://roriz.dev https://github.com/roriz https://www.linkedin.com/in/radames-roriz/
  3. 3 To scale GenAI needs to resolve the smallest problem

    as possible Rada vision 👀 The best place for GenAI is on engineer hands Use GenAI for extract is underrated
  4. 8 TROPICAL ON RAILS 2025 Common traits 02 Heavily dependent

    on LLMs 01 No proprietary AI models 03 All has a large engineering team
  5. 12 Expect to be a cop 2 3 4 clear

    and straightforward Auto fixed rules Clear measure 1 Tons of rules
  6. 13 18 rules 2 3 4 The targeted output length

    can be specified in terms of the count of words Sometimes we get better results when we explicitly instruct the model to reason Some cases providing examples may be easier 1 Some tasks are best specified as sequence of steps 18 relative rules 18+7 relative rules
  7. 22 Glamour Noir Dress gpt-4o-mini Glamour Noir Dress gemini-2.0-flash-lite Glamour

    Noir Dress deepseek-chat Glamour Noir Dress claude-3-5-haiku Battle Products by use
  8. 26 Battle Benchmark Long term stability Benchmark ensures that every

    change, every new feature, will be measured and creates a clear path to discover if it has any negative or positive impact. Model Tests Precision Tokens Avg. Duration (s) claude-3-5-haiku-20241 022 10/0 (10) 100 12.723,00 3.78 deepseek-chat 1/9 (10) 10 3.933,00 6.92 gemini-2.0-flash-lite 9/1 (10) 90 13.674,00 8.06 gpt-4o-mini 9/1 (10) 90 8.343,00 4.69
  9. 35 Scoring Benchmark Long term stability Benchmark ensures that every

    change, every new feature, will be measured and creates a clear path to discover if it has any negative or positive impact. Model Tests Precision Tokens Avg. Duration (s) claude-3-5-haiku-20241 022 9/4 (13) 69 18.492,00 7.73 deepseek-chat 9/4 (13) 69 10.584,00 9.38 gemini-2.0-flash-lite 8/5 (13) 61 11.009,00 4.89 gpt-4o-mini 8/5 (13) 61 9.959,00 2.38
  10. 40 TROPICAL ON RAILS 2025 Ranking Who is the strongest

    character in the MCU? (without restrictions)
  11. 41 TROPICAL ON RAILS 2025 75 60 55 45 50

    45 70 35 52 53 80 75 48 55 65 45 55 50 71 52 25 45 45 45 45 45 32 55 35 56 42 35 65 55 45 61 70 55 50 50 81 55 63 70 70 55 45 80 60 60 55 Ranking Scoring
  12. 42 TROPICAL ON RAILS 2025 75 60 55 45 50

    45 70 35 52 53 80 75 48 55 65 45 55 50 71 52 25 45 45 45 45 45 32 55 35 56 42 35 65 55 45 61 70 55 50 50 81 55 63 70 70 55 45 80 60 60 55 Ranking Sorted
  13. 43 TROPICAL ON RAILS 2025 75 45 45 70 35

    80 75 48 45 50 71 25 45 45 45 45 45 32 35 42 35 45 50 50 81 70 70 45 80 60 55 50 52 53 55 65 55 52 55 56 65 55 61 70 55 55 63 55 60 60 55 Ranking scoring variation
  14. 44 TROPICAL ON RAILS 2025 Standard Deviation Mean Coefficient of

    Variation Ranking scoring variation formula
  15. 45 TROPICAL ON RAILS 2025 1025 1010 1005 1000 1020

    1002 1003 1030 1025 1005 1015 1005 1021 1002 1005 1006 1015 1005 1011 1020 1005 1000 1031 1005 1013 1020 1020 1005 1030 1010 1010 1005 Ranking Elo ranking
  16. 46 TROPICAL ON RAILS 2025 1025 1010 1005 1000 1020

    1003 1030 1025 1005 1015 1005 1021 1002 1005 1006 1015 1005 1011 1020 1005 1000 1031 1005 1013 1020 1020 1005 1030 1010 1010 1005 Defenders Relegations Ranking Round 1
  17. 48 TROPICAL ON RAILS 2025 1071 1004 1046 1021 1024

    1030 991 1016 991 949 1051 1025 963 989 977 1057 992 965 1057 991 Defenders Relegations Ranking Round 1 - Updated 1025 1020 1030 1025 1021 1015 1020 1031 1020 1020 1030
  18. 49 TROPICAL ON RAILS 2025 1071 1004 1046 1021 1024

    1030 991 1016 991 949 1051 1025 963 989 977 1057 992 965 1057 991 1025 1020 1030 1025 1021 1015 1020 1031 1020 1020 1030 Ranking Round 1 - Elimination
  19. 50 TROPICAL ON RAILS 2025 1081 1071 1004 1046 1076

    1021 1086 1081 1024 1030 1077 1016 1071 1051 1025 1076 1087 1057 1076 1076 1086 1057 Ranking Round 1 - Rebalance
  20. 51 TROPICAL ON RAILS 2025 1094 1108 971 1005 1100

    989 1124 1155 989 991 1116 980 1083 1014 988 1097 1156 1047 1089 1100 1155 1078 Ranking Round 2 - Elimination
  21. 53 Plan A Plan B Plan C Plan D Plan

    F Plan G Rank 1 2 3 4 5 6 FFA score (win/lose) 33 (11/0) 27 (8/3) 25 (7/4) 25 (7/4) 25 (7/4) 23 (6/5) Elo score 1094 1155 1156 1116 1089 1100 Initial score 75 75 81 71 70 70 Plan A Plan B Plan C Plan D Plan F Plan G Rank 7 8 9 10 11 12 FFA score (win/lose) 23 (6/5) 19 (4/7) 19 (4/7) 17 (3/8) 15 (2/9) 13 (1/10) Elo score 1097 1124 1100 1083 1155 1108 Initial score 70 80 70 65 80 60 Ranking The strongest
  22. 54 Ranking Use Case AI co-scientist virtual scientific collaborator to

    help scientists generate novel hypotheses and research proposals
  23. 61 Benchmark Long term stability Benchmark ensures that every change,

    every new feature, will be measured and creates a clear path to discover if it has any negative or positive impact. module Model Tests Precision Tokens Avg. Duration (s) battle claude-3-5-haiku-2024 1022 10/0 (10) 100 12723 3.78 battle deepseek-chat 1/9 (10) 10 3933 6.92 battle gemini-2.0-flash-lite 9/1 (10) 90 13674 8.06 battle gpt-4o-mini 9/1 (10) 90 8343 4.69 data_extractor claude-3-5-haiku-2024 1022 23/0 (23) 100 29718 3.69 data_extractor deepseek-chat 23/0 (23) 100 17618 9.97 data_extractor gemini-2.0-flash-lite 20/3 (23) 86 16930 3.16 data_extractor gpt-4o-mini 23/0 (23) 100 13244 1.83 ranking claude-3-5-haiku-2024 1022 2/0 (2) 100 272503 44.33 ranking deepseek-chat 1/1 (2) 50 313242 1872.90 ranking gemini-2.0-flash-lite 2/0 (2) 100 133440 202.95 ranking gpt-4o-mini 0/2 (2) 0 436665 929.49 scoring claude-3-5-haiku-2024 1022 9/4 (13) 69 18492 7.73 scoring deepseek-chat 9/4 (13) 69 10584 9.38 scoring gemini-2.0-flash-lite 8/5 (13) 61 11009 4.89 scoring gpt-4o-mini 8/5 (13) 61 9959 2.38
  24. 62 TROPICAL ON RAILS 2025 We do not talk about

    - Image OCR - guardrails - sensitive data protection - RAG - finetune out of the box - memory between calls - MCP endpoint - auto rate limit out Future
  25. TROPICAL ON RAILS 2025 LLMs are not intelligent ; they

    are just tools . They mimic existing knowledge, while we build the future https://roriz.dev https://github.com/Roriz/active_genie https://www.linkedin.com/in/radames-roriz/