AI has already changed software development

AI has already changed software developm ent by active_genie https://roriz.dev
TROPICAL ON RAILS 2025 Radamés Roriz Senior Software Engineer @ Knowbe4

2 Rada Passionate about solving real problems and building products
that make a positive impact on people's lives. Obsessed with AI, talking about it, writing blog, and now open-source Radamés Roriz Senior Software Engineer @ Knowbe4 https://roriz.dev https://github.com/roriz https://www.linkedin.com/in/radames-roriz/

3 To scale GenAI needs to resolve the smallest problem
as possible Rada vision 👀 The best place for GenAI is on engineer hands Use GenAI for extract is underrated

01 Market TROPICAL ON RAILS 2025

5 Cursor AI Evaluation $2.6 billion founded end 2022 https://www.nytimes.com/2023/03/23/technology/chatbot-characterai-chatgpt-valuation.html
Core know-how ﬁne-tune model

6 Character. AI Evaluation $1 billion founded November 2021 https://www.nytimes.com/2023/03/23/technology/chatbot-characterai-chatgpt-valuation.html
Core know-how RAG

7 Perplexity AI Evaluation $9 billion founded August 2022 https://www.cnbc.com/2024/11/05/perplexity-ai-nears-500-million-funding-round-at-9-billion-valuation.html
Core know-how Prompting

8 TROPICAL ON RAILS 2025 Common traits 02 Heavily dependent
on LLMs 01 No proprietary AI models 03 All has a large engineering team

Last Update March 19, 2023 02 How scale AI TROPICAL
ON RAILS 2025

10 TROPICAL ON RAILS 2025 How scale LLM feature? http://platform.openai.com/docs/guides/optimizing-llm-accuracy

11 TROPICAL ON RAILS 2025 How real world looks like

12 Expect to be a cop 2 3 4 clear
and straightforward Auto ﬁxed rules Clear measure 1 Tons of rules

13 18 rules 2 3 4 The targeted output length
can be speciﬁed in terms of the count of words Sometimes we get better results when we explicitly instruct the model to reason Some cases providing examples may be easier 1 Some tasks are best speciﬁed as sequence of steps 18 relative rules 18+7 relative rules

14 TROPICAL ON RAILS 2025 I.K.O prompting Intention Knowhow Output
I K O

15 TROPICAL ON RAILS 2025 I.K.O + ActiveGenie K K
I O ActiveGenie

04 ActiveGenie TROPICAL ON RAILS 2025

ActiveGenie is the lodash for GenerativeAI https://github.com/Roriz/active_genie

18 TROPICAL ON RAILS 2025 Battle

19 TROPICAL ON RAILS 2025 Battle Intention I K I
O

20 TROPICAL ON RAILS 2025 Battle Output O K I
O

21 Battle Products by use

22 Glamour Noir Dress gpt-4o-mini Glamour Noir Dress gemini-2.0-flash-lite Glamour
Noir Dress deepseek-chat Glamour Noir Dress claude-3-5-haiku Battle Products by use

23 Battle By difﬁcult on gamiﬁcation

24 race condition gpt-4o-mini race condition gemini-2.0-flash-lite race condition deepseek-chat
race condition claude-3-5-haiku Battle By difﬁcult

25 TROPICAL ON RAILS 2025 Battle Benchmark

26 Battle Benchmark Long term stability Benchmark ensures that every
change, every new feature, will be measured and creates a clear path to discover if it has any negative or positive impact. Model Tests Precision Tokens Avg. Duration (s) claude-3-5-haiku-20241 022 10/0 (10) 100 12.723,00 3.78 deepseek-chat 1/9 (10) 10 3.933,00 6.92 gemini-2.0-ﬂash-lite 9/1 (10) 90 13.674,00 8.06 gpt-4o-mini 9/1 (10) 90 8.343,00 4.69

27 TROPICAL ON RAILS 2025 Scoring

28 TROPICAL ON RAILS 2025 Scoring Intention I K I
O

29 TROPICAL ON RAILS 2025 Scoring Output O K I
O

30 Scoring Content Quality

31 Scoring Content Quality 92 gpt-4o-mini 91 gemini-2.0-flash-lite 94 deepseek-chat
89 claude-3-5-haiku

32 Scoring Prioritization

33 Scoring Prioritization 88 gpt-4o-mini 86 gemini-2.0-flash-lite 89 deepseek-chat 86
claude-3-5-haiku

34 TROPICAL ON RAILS 2025 Scoring Benchmark

35 Scoring Benchmark Long term stability Benchmark ensures that every
change, every new feature, will be measured and creates a clear path to discover if it has any negative or positive impact. Model Tests Precision Tokens Avg. Duration (s) claude-3-5-haiku-20241 022 9/4 (13) 69 18.492,00 7.73 deepseek-chat 9/4 (13) 69 10.584,00 9.38 gemini-2.0-ﬂash-lite 8/5 (13) 61 11.009,00 4.89 gpt-4o-mini 8/5 (13) 61 9.959,00 2.38

04 AI + Engineer = Ranking TROPICAL ON RAILS 2025

37 How scale battles How many players How many battles
10 45 100 4950 1000 499500

38 TROPICAL ON RAILS 2025 Ranking

39 TROPICAL ON RAILS 2025 Ranking

40 TROPICAL ON RAILS 2025 Ranking Who is the strongest
character in the MCU? (without restrictions)

41 TROPICAL ON RAILS 2025 75 60 55 45 50
45 70 35 52 53 80 75 48 55 65 45 55 50 71 52 25 45 45 45 45 45 32 55 35 56 42 35 65 55 45 61 70 55 50 50 81 55 63 70 70 55 45 80 60 60 55 Ranking Scoring

45 70 35 52 53 80 75 48 55 65 45 55 50 71 52 25 45 45 45 45 45 32 55 35 56 42 35 65 55 45 61 70 55 50 50 81 55 63 70 70 55 45 80 60 60 55 Ranking Sorted

80 75 48 45 50 71 25 45 45 45 45 45 32 35 42 35 45 50 50 81 70 70 45 80 60 55 50 52 53 55 65 55 52 55 56 65 55 61 70 55 55 63 55 60 60 55 Ranking scoring variation

44 TROPICAL ON RAILS 2025 Standard Deviation Mean Coefﬁcient of
Variation Ranking scoring variation formula

1002 1003 1030 1025 1005 1015 1005 1021 1002 1005 1006 1015 1005 1011 1020 1005 1000 1031 1005 1013 1020 1020 1005 1030 1010 1010 1005 Ranking Elo ranking

1003 1030 1025 1005 1015 1005 1021 1002 1005 1006 1015 1005 1011 1020 1005 1000 1031 1005 1013 1020 1020 1005 1030 1010 1010 1005 Defenders Relegations Ranking Round 1

47 TROPICAL ON RAILS 2025 Ranking Elo ranking formula

1030 991 1016 991 949 1051 1025 963 989 977 1057 992 965 1057 991 Defenders Relegations Ranking Round 1 - Updated 1025 1020 1030 1025 1021 1015 1020 1031 1020 1020 1030

1030 991 1016 991 949 1051 1025 963 989 977 1057 992 965 1057 991 1025 1020 1030 1025 1021 1015 1020 1031 1020 1020 1030 Ranking Round 1 - Elimination

1021 1086 1081 1024 1030 1077 1016 1071 1051 1025 1076 1087 1057 1076 1076 1086 1057 Ranking Round 1 - Rebalance

989 1124 1155 989 991 1116 980 1083 1014 988 1097 1156 1047 1089 1100 1155 1078 Ranking Round 2 - Elimination

52 TROPICAL ON RAILS 2025 Ranking Free For All

53 Plan A Plan B Plan C Plan D Plan
F Plan G Rank 1 2 3 4 5 6 FFA score (win/lose) 33 (11/0) 27 (8/3) 25 (7/4) 25 (7/4) 25 (7/4) 23 (6/5) Elo score 1094 1155 1156 1116 1089 1100 Initial score 75 75 81 71 70 70 Plan A Plan B Plan C Plan D Plan F Plan G Rank 7 8 9 10 11 12 FFA score (win/lose) 23 (6/5) 19 (4/7) 19 (4/7) 17 (3/8) 15 (2/9) 13 (1/10) Elo score 1097 1124 1100 1083 1155 1108 Initial score 70 80 70 65 80 60 Ranking The strongest

54 Ranking Use Case AI co-scientist virtual scientific collaborator to
help scientists generate novel hypotheses and research proposals

05 ActiveGenie Future TROPICAL ON RAILS 2025

56 TROPICAL ON RAILS 2025 Data Extraction

57 TROPICAL ON RAILS 2025 Data Extraction Intention I K
I O

58 TROPICAL ON RAILS 2025 Data Extraction Output O K
I O

59 Data Extraction Extract from unstructured

60 marketing gpt-4o-mini marketing gemini-2.0-flash-lite marketing deepseek-chat marketing claude-3-5-haiku Data
Extraction Extract from unstructured

61 Benchmark Long term stability Benchmark ensures that every change,
every new feature, will be measured and creates a clear path to discover if it has any negative or positive impact. module Model Tests Precision Tokens Avg. Duration (s) battle claude-3-5-haiku-2024 1022 10/0 (10) 100 12723 3.78 battle deepseek-chat 1/9 (10) 10 3933 6.92 battle gemini-2.0-flash-lite 9/1 (10) 90 13674 8.06 battle gpt-4o-mini 9/1 (10) 90 8343 4.69 data_extractor claude-3-5-haiku-2024 1022 23/0 (23) 100 29718 3.69 data_extractor deepseek-chat 23/0 (23) 100 17618 9.97 data_extractor gemini-2.0-flash-lite 20/3 (23) 86 16930 3.16 data_extractor gpt-4o-mini 23/0 (23) 100 13244 1.83 ranking claude-3-5-haiku-2024 1022 2/0 (2) 100 272503 44.33 ranking deepseek-chat 1/1 (2) 50 313242 1872.90 ranking gemini-2.0-flash-lite 2/0 (2) 100 133440 202.95 ranking gpt-4o-mini 0/2 (2) 0 436665 929.49 scoring claude-3-5-haiku-2024 1022 9/4 (13) 69 18492 7.73 scoring deepseek-chat 9/4 (13) 69 10584 9.38 scoring gemini-2.0-flash-lite 8/5 (13) 61 11009 4.89 scoring gpt-4o-mini 8/5 (13) 61 9959 2.38

62 TROPICAL ON RAILS 2025 We do not talk about
- Image OCR - guardrails - sensitive data protection - RAG - ﬁnetune out of the box - memory between calls - MCP endpoint - auto rate limit out Future

TROPICAL ON RAILS 2025 LLMs are not intelligent ; they
are just tools . They mimic existing knowledge, while we build the future https://roriz.dev https://github.com/Roriz/active_genie https://www.linkedin.com/in/radames-roriz/

AI has already changed software development

AI has already changed software development

More Decks by Radamés Roriz

Other Decks in Programming

Featured

Transcript