benchmark/parse.rb
# 284km/rcsv
Calculating -------------------------------------
unquoted 166.007 (±12.7%) i/s - 810.000 in 5.007349s
quoted 146.088 (±24.6%) i/s - 656.000 in 5.009174s
include col_sep 131.046 (±28.2%) i/s - 580.000 in 5.001424s
include row_sep 138.830 (±18.7%) i/s - 666.000 in 5.054874s
encode utf-8 100.167 (±26.0%) i/s - 448.000 in 5.576945s
encode sjis 137.429 (±18.2%) i/s - 660.000 in 5.028713s
=========================================================
# ruby/csv
Calculating -------------------------------------
unquoted 37.546 (±21.3%) i/s - 177.000 in 5.066859s
quoted 16.773 (±23.8%) i/s - 78.000 in 5.026788s
include col_sep 8.316 (±24.0%) i/s - 39.000 in 5.113550s
include row_sep 1.842 (±54.3%) i/s - 9.000 in 5.422059s
encode utf-8 26.126 (±15.3%) i/s - 126.000 in 5.055306s
encode sjis 29.573 (±16.9%) i/s - 142.000 in 5.028898s
Slide 3
Slide 3 text
Yesterday's benchmark.
still have various problem,
continues development now.
Slide 4
Slide 4 text
Yesterday's benchmark.
still have various problem,
continues development now.
About 3 times faster
Slide 5
Slide 5 text
284km/rcsv
forked from arp/rcsv
Using the Ruby binding of libcsv
with FFI,
I made the interface as ruby/csv
as possible.
Slide 6
Slide 6 text
# Motivation
# Concern
Slide 7
Slide 7 text
# Motivation
- CSV is often used
- Sometimes I use a large CSV
Slide 8
Slide 8 text
# Concern
- oj (A fast JSON parser and Object
marshaller as a Ruby gem.)
- Demand (few effective use cases?)
- Don’t improve performance so much
for cost?
=> Hmm, let's do it.
Slide 9
Slide 9 text
# CSV
RFC 4180
Slide 10
Slide 10 text
# CSV
1. Each record is located on a
separate line, delimited by a line
break (CRLF).
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
Slide 11
Slide 11 text
# CSV
2. The last record in the file may
or may not have an ending line
break.
aaa,bbb,ccc CRLF
zzz,yyy,xxx
Slide 12
Slide 12 text
# CSV
3. There maybe an optional header line
appearing as the first line of the file.
This header should contain the same
number of fields as the records.
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
Slide 13
Slide 13 text
# CSV
4. … Each line should contain the same
number of fields. Spaces are
considered part of a field and should not
be ignored. The last field in the record
must not be followed by a comma.
aaa,bbb,ccc
Slide 14
Slide 14 text
# CSV
5. Each field may or may not be enclosed
in double quotes. If fields are not
enclosed with double quotes, then
double quotes may not appear inside the
fields.
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
Slide 15
Slide 15 text
# CSV
6. Fields containing line breaks
(CRLF), double quotes, and commas
should be enclosed in double-quotes.
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
Slide 16
Slide 16 text
# CSV
7. If double-quotes are used to
enclose fields, then a double-quote
appearing inside a field must be
escaped by preceding it with
another double quote.
“aaa","b""bb","ccc"
# ruby/csv
- 2007-12-24 23:41 jeg2 o * lib/csv.rb, test/csv/
test_csv.rb: Removed in preparation for
code ͕શ෦ফ͑ΔɻFasterCSV ΛೖΕΔ४උͷ༷ɻ
- 2007-12-25 02:46 jeg2 o * lib/csv.rb: Import the
FasterCSV source as the new CSV class.
FasterCSV ͕ೖΔɻ
Slide 23
Slide 23 text
# ruby/csv
2008-09-21 00:39 jeg2 o * lib/csv/csv.rb: Reworked CSV's parser
and generator to be m17
େ͖͘৭ʑͱมΘ͍ͬͯͯɺm17n ͷରԠͳͲ͜ͷ࣌ظʹߦΘΕͯ
͍Δ༷
2009, 2010 ͱ͍͏ͷɺencoding ʹؔ͢Δ commit ͕ଟ͍͜ͱ͕ݟ͑Δ
2012-11-14 02:53 zzak o * lib/csv.rb (init_comments): Document
private method #init_comm
2012-09-19 22:07 zzak o * lib/csv.rb (Object#CSV, Array#to_csv,
String#parse_csv): Exa
2012 ͷಛͷͻͱͭʹɺzzak ͕ CSV ͷυΩϡϝϯτΛॻ͍ͯ͘Ε
ͨ͜ͱ͕͋Δ
Slide 24
Slide 24 text
# ruby/csv
2017-04-24 17:38 SHIBATA Hiroshi oᴷᵫᴷᵏ Enabled travis
2017-04-24 17:37 SHIBATA Hiroshi o Enabled tests used by test suite of
ruby core
2017-04-24 17:25 SHIBATA Hiroshi o Update basically configuration for
gemspec
2017-04-24 17:16 SHIBATA Hiroshi o Update BSDL license.
2017-04-24 17:15 SHIBATA Hiroshi o Update repository name
2017-04-24 17:15 SHIBATA Hiroshi o Removed needless skelton files
2017-04-24 15:43 SHIBATA Hiroshi o overrided boilerplate by bundle
init cmath
ࣲా͞ΜʹΑͬͯɺruby/csv ͕ੜ
Slide 25
Slide 25 text
# ruby/csv
2018 ࠷ۙ
ਢ౻͞Μ͕ϝϯςφʹͳͬͨ͜ͱɻ
ίʔυͷཧ͕ਐΜͰ͍ΔΑ͏ʹݟ͑Δ
2018-03-06 09:34 Kenta Murata oᴷᵏ Describe our attitude to
RuboCop
### NOTE: About RuboCop
We don't use RuboCop because we can manage our coding style by ourselves.
We want to accept small fluc
tuations in our coding style because we use Ruby.
Please do not submit issues and PRs that aim to introduce RuboCop in this
repository.
# rcsv
arp/rcsv
A fast libcsv-based CSV parser for Ruby
ruby/csv ʹൺΔͱຬ͍ͨͯ͠Δػೳతʹগͳ͍͕ɺ
fastest_csv ʹൺͨΒ͔ͳΓଟ͍ɻ
Slide 29
Slide 29 text
## How did I start
- If I have a very fast parser, will I win?
- Is there room for improvement
anymore?
- Will it become practical if fastest-csv is
full of functions?
Slide 30
Slide 30 text
## How did I start
- If I have a very fast parser, will I win?
- Is there room for improvement
anymore?
- Will it become practical if fastest-csv is
full of functions?
Slide 31
Slide 31 text
## How did I start
- Indeed it is fast
- It is difficult to have flexibility
(such as adapting to an optional
specification).
Slide 32
Slide 32 text
## How did I start
What I thought next:
- Write a part of ruby/csv with C
- OreOre CSV Implementation
- Can I use libcsv well ??
Slide 33
Slide 33 text
## How did I start
- ruby/csv ͷΠϯλʔϑΣʔεͰɺlibcsv based
ͳ࣮ʹ͢Δͷ͕ݱ࣮తʹࢦ͢Ձ͕͋Δͱ
அ͢Δ
- rcsv (libcsv-based CSV parser) ͱ͍͏ͷ͕͋Δɻ
- ͜ΕͷΠϯλʔϑΣʔεΛ ruby/csv ʹ߹Θ
ͤΔɻͭ·Γ ruby/csv ͷ test ʹύε͢Δঢ়ଶΛ
ඪʹͨ͠