Data doesn’t grow
in tables
Dealing with large sets of documents
Slide 2
Slide 2 text
–An investigative reporter
“We're working with 40 GB of XXX and
would like to search within the
documents for certain keywords (like
XXX) so we can identify XXX. Ideally we
should be able to tag the docs..”
Slide 3
Slide 3 text
Some lingo
• OCR (Optical Character Recognition)
• NLP (Natural Language Processing)
• NER (Named
Entity
Recognition)
• Regular
Expressions