Slide 2
Slide 2 text
ୈ 2 ষ LangChain ೖ
6 separator=" ɻ ",
7 chunk_size =10,
8 chunk_overlap =2,
9 length_function =len ,
10 is_separator_regex =False ,
11 )
12
13 texts = text_splitter .split_text(text)
14 print(texts)
˛ ࣮ߦ݁Ռ
[' ޗ ഐ ೣ Ͱ ͋ Δ ', ' ໊ લ · ͩ ͳ ͍ ']
CharacterTextSplitter Ͱ·ͣ separator Ͱೖྗ͞ΕͨςΩετΛ
۠Γɺ࣍ʹ chunk_size Ͱࢦఆͨ͠จࣈΛ͑ͳ͍Α͏ʹจࣈྻͷ࿈݁
Λߦ͍·͢ɻϓϩάϥϜ 2.8 Ͱ chunk(ׂ͞Εͨจॻͷ୯Ґ) ͷ࠷େͷେ
͖͞Λ 10 ͱࢦఆͨ͠ͷͰɺ
ʮޗഐೣͰ͋Δʯͱʮ໊લ·ͩͳ͍ʯ͕ͦΕ
ͧΕҟͳΔ chunk Ͱग़ྗ͞Ε͍ͯ·͢ɻ
จࣈͷΧϯτΛͲͷΑ͏ʹߦͳ͍ͬͯΔ͔ʹ͍ͭͯ֬ೝ͢Δʹ͋ͨͬ
ͯɺϓϩάϥϜ 2.9 Ͱ chunk_size ΛมԽͤͯ͞ग़ྗ͕Ͳ͏มΘΔ͔Λ࣮
ݧ͠·͢ɻ
˛ϓϩάϥϜ 2.9 CharacterTextSplitter ᶄ
1 for i in range (12 ,17):
2 text_splitter = CharacterTextSplitter (
3 separator=" ɻ ",
4 chunk_size=i,
5 chunk_overlap =2,
6 length_function =len ,
7 is_separator_regex =False ,
8 )
9 texts = text_splitter .split_text(text)
10 print(i, texts)
˛ ࣮ߦ݁Ռ
12 [' ޗ ഐ ೣ Ͱ ͋ Δ ', ' ໊ લ · ͩ ͳ ͍ ']
13 [' ޗ ഐ ೣ Ͱ ͋ Δ ', ' ໊ લ · ͩ ͳ ͍ ']
14 [' ޗ ഐ ೣ Ͱ ͋ Δ ', ' ໊ લ · ͩ ͳ ͍ ']
15 [' ޗ ഐ ೣ Ͱ ͋ Δ ɻ ໊ લ · ͩ ͳ ͍ ']
16 [' ޗ ഐ ೣ Ͱ ͋ Δ ɻ ໊ લ · ͩ ͳ ͍ ']
20