is a 10-digit numbers that classify every traded product locally and globally • Determine tariffs: 0% vs 8% can mean millions in costs for imported goods • Required by law: Every import/export needs correct classification • 11,000+ possible codes in Korea's system
on legal law framework • Robustness for new products and inventions: • Product cases databases cannot keep with new products released on the market • UNIPASS product case dataset covers only about 30% of all possible hs codes • No need to fine-tune LLM on domain or product cases -> cost and time saved • Can explain it’s decision making vs black-box Deep Learning approach
Legal Law document: • 2000+ pages law document • ~ 3 million tokens of legal text • Uses Retrieval Tools • Retrieval Augmentation Generation (RAG) tool over vectorstore for semantic meaning • SQLite database for keyword search over legal framework
vectorstore - FAISS (Facebook AI Similarity Search) • Korean PDF with text to structured Markdown - https://github.com/datalab-to/marker • tried about 5 options • another good markdown structure was from https://github.com/microsoft/markitdown • LangChain for chunking based on MarkDown Header content • SQLite with FTS5 extension - full-text search support
multiple evaluation • Chunking and vectorstore are not right tools for hierarchical document structure • Limited number of open source fast, smart and good at Korean LLM models • Transparency and monitoring • Hard to debug • HSense Manager makes wrong decision -> all members work hard in wrong direction until the end • Need a system to backtrack decision and see the alternatives
slower due to LLM processing time • Practical Solutions: • Overnight Batch Processing: Run multiple agent teams on product of 100 catalo gs • Quality-First Approach: Accuracy matters more than real-time speed • Market Context: • Existing slow systems prove value and are in demand: • AI research and report writers (Perplexity, Gemini Deep Research): 20+ minutes per report • OpenAI reasoning pro models: PhD-level problem solving takes time • Companies in specific industries have limited set of product categories • the domain and search space can be narrow downed
object initialization • Add memory to keep session and conversations • Can be used with Sqlite or Postgresql for persistance • Setting the respons_model with pydantic schema for structured output • logging enablers with detailed tools and intermediate steps • multi-modal support for images out of box if model is multi-modal
• coordinate • route • Team also itself acts as an Agent with own prompt • Team coordination, decision steps, iterations are all hidden behind Agno framework
• 글꼴 : Pretendard (한글/영문) 및 Inter (영문 이탤릭용) Prentendard는 이 파일에 포함되어 있지만, 아래 주소에서 직접 다운로드하여 활용하셔도 무방합 니다. • https://cactus.tistory.com/306 • https://fonts.google.com/specimen/Inter • 코딩 글꼴 : JetBrains Mono • https://www.jetbrains.com/lp/mono/