Slide 11
Slide 11 text
main.py - ᶃ PDF ಡΈࠐΈ
def check_columns(df, previous_df):
difference1 = set(df.keys()) - set(previous_df.keys())
difference2 = set(previous_df.keys()) - set(df.keys())
return (len(difference1) == 0 and len(difference2) == 0)
11
def get_data(pdf_path):
previous_df = pd.DataFrame()
dfs = tabula.read_pdf(pdf_path, lattice=True, pages = 'all')
for df in dfs: # ෳϖʔδͷදΛ݁߹͢Δ
if (check_columns(df, previous_df)):
df = pd.concat([previous_df, df])
previous_df = df
return previous_df
PDFΛಡΈࠐΈɺDataFrame Φϒ
δΣΫτΛฦ٫͢Δ
ෳϖʔδʹ·͕ͨΔදͷ໊߲
Λൺֱ͠ɺಉ͡ද͔Ͳ͏͔Λఆ
͢Δʢ্ͷ͔ؔΒݺΕΔʣ