Slide 3
Slide 3 text
Project
Solution
The first step to build this project was to find and label Github
source code repositories applying each of the mapped
architecture patterns.
Github provides a search mechanism to find repositories that could be
used based on tags, this is the most time-consuming task, with the help
of online LLMs we can make the search faster.
The dataset generated from labeled repositories follow the structure
below to identify the repository, the architecture style used, and a list of
key files and its contents to be used as the input for the embedding's
generation.
[
{
"repo": "spring-petclinic",
"architecture": "layered",
"files": [
{
"file_path": "src/main/java/org/springframework/Ow
nerController.java",
"content": "package org.springframework...\npublic
class OwnerController {...}"
},
...
]
}
]
Some pre-processing was used to normalize the data; using the
assumption most of the enterprise java projects adopt some keywords
for class naming convention (model, controller, service and repository).
we grouped code contents into these naming convention categories.
DATASET SELECTION