"In the broadest sense of
the word, the definition of
research includes any
gathering of data,
information and facts for
the advancement of
knowledge."
Slide 4
Slide 4 text
"Research is a process of
steps used to collect and
analyze information to
increase our understanding
of a topic or issue"
Slide 5
Slide 5 text
Data is essential for research
Slide 6
Slide 6 text
Where do we get
data from?
Einstein got his data from his own
experiments and from other
peoples experiments
Information exchange took weeks if not months
Slide 7
Slide 7 text
Today we have
the internet!
Information exchange takes milliseconds
Works much better than anything
Einstein had
When our information is centralized
by context, we can more easily find
what we’re looking for
Slide 16
Slide 16 text
We already have websites that
centralize this information
Slide 17
Slide 17 text
And allow us to find data that Google
couldn’t
Slide 18
Slide 18 text
BUT THERE’S
ROOM FOR
IMPROVEMENT
Slide 19
Slide 19 text
How is this data currently
being centralized?
Slide 20
Slide 20 text
Each center sends us their data in
the form of Excel or Access
files, through FTP or Email
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
THIS IS
AN
ENTIRELY
MANUAL
PROCESS
Slide 23
Slide 23 text
Is this sustainable?
Slide 24
Slide 24 text
Is this sustainable?
This process needs to be automated
Slide 25
Slide 25 text
• no human interference
• less communication hassles
• less human errors
• more accurate data
• more data
What are the advantages
of automating the data
exchange process?
Slide 26
Slide 26 text
How do we automate?
Centers no longer have to send us
anything. We get it directly from
their website
Slide 27
Slide 27 text
There’s no secret.
Google, hotel sites, flight search engines
and many others do this
It is called web scraping
Slide 28
Slide 28 text
How does it work
Slide 29
Slide 29 text
We automatically navigate to the
centers websites and fetch the
information that we need
Slide 30
Slide 30 text
We automatically navigate to the
centers websites and fetch the
information that we need
This is done by little scripts
called spiders or
web crawlers
Slide 31
Slide 31 text
What? Spiders?
Slide 32
Slide 32 text
“A Web crawler (or
spider) is a computer
program that browses the
World Wide Web in a
methodical, automated
manner or in an orderly
fashion.”
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
This process allows us to reach more
centers and gather more data
Slide 35
Slide 35 text
For each center to have a website
that displays their information
The main
requirement
Without a website we wouldn’t be
able to automate this exchange
Slide 36
Slide 36 text
Working prototype
http://seeds.iriscouch.com/
Slide 37
Slide 37 text
Working prototype
http://seeds.iriscouch.com/
PASSPORT
DATA
Slide 38
Slide 38 text
Working prototype
http://seeds.iriscouch.com/
PASSPORT
DATA
CHARACTERIZATION
Slide 39
Slide 39 text
Working prototype
http://seeds.iriscouch.com/
PASSPORT
DATA
CHARACTERIZATION
OTHER...
Slide 40
Slide 40 text
RECAP
Slide 41
Slide 41 text
RECAP
Automation of the data exchange process
is the only sustainable solution
Slide 42
Slide 42 text
RECAP
Automation of the data exchange process
is the only sustainable solution
With new technologies, web scraping has
become a very reliable system
Slide 43
Slide 43 text
RECAP
Automation of the data exchange process
is the only sustainable solution
With new technologies, web scraping has
become a very reliable system
The process is modular and will allow us to
plug-in systems such as GRIN-Global