generated data has grown exponentially We now produce every other day the same quantity of information humanity has produced until 2003 We now produce every other day the same quantity of information humanity has produced until 2003 Eric Schmidt, Google's ex-CEO @Techonomy 2010
of data INNOVATE Data can be used as a raw material for innovation, allowing teams and individuals to create change on a large scale. Data can be used as a raw material for innovation, allowing teams and individuals to create change on a large scale.
data to empower individuals and leverage their potential to innovate • Adaptability make organizations more reactive by foreseeing change and creating a culture of flexibility
more easily understandable and usable Making complex data more easily understandable and usable The Billion Dollar Gram, David McCandless, 2009 http://www.informationisbeautiful.net/2009/the-billion-dollar-gram/
• Growing demand for data Journalists, citizens, analysts, ... • Government-driven initiatives Open-Data, transparency • Data as a resource Ecosystem of producers and consumers
very thorough on data meta-information: tags, categories, sources and license. Infochimps is very thorough on data meta-information: tags, categories, sources and license.
access to datasets as well as visualizations and tools to explore them Datamarket provides access to datasets as well as visualizations and tools to explore them
takes it further by giving a tool that allows to explore data and visualize it in various ways Datamarket takes it further by giving a tool that allows to explore data and visualize it in various ways
put the emphasis on the visualization of data, but also offers access to the data catalogue. ManyEyes put the emphasis on the visualization of data, but also offers access to the data catalogue.
contributed by the users and creating using ManyEye's visualization tools Visualizations are contributed by the users and creating using ManyEye's visualization tools
is a social directory of people/organizations sharing their datasets. It is loosely modeled after the GitHub code social sharing site. BuzzData is a social directory of people/organizations sharing their datasets. It is loosely modeled after the GitHub code social sharing site.
is put on social and collaborative features: follow, clone and comment. The emphasis is put on social and collaborative features: follow, clone and comment.
be explored as a spreadsheet, but not really explored nor directly manipulated (and it's OK!) Data can be explored as a spreadsheet, but not really explored nor directly manipulated (and it's OK!)
SPREADSHEET SOCIAL FEATURES TAGS/CATEG. SOURCE LICENSE VISUALIZATION PIVOT TABLE COMMENTS PROFILE PAGE FOLLOW/CLONE Make it easy to find the data (all in one place) and have detailed information about it (source, license, quality) Make it easy to find the data (all in one place) and have detailed information about it (source, license, quality)
SPREADSHEET SOCIAL FEATURES TAGS/CATEG. SOURCE LICENSE VISUALIZATION PIVOT TABLE COMMENTS PROFILE PAGE FOLLOW/CLONE Make it easy to interact with The data and have a feel of its usefulness Make it easy to interact with The data and have a feel of its usefulness
SPREADSHEET SOCIAL FEATURES TAGS/CATEG. SOURCE LICENSE VISUALIZATION PIVOT TABLE COMMENTS PROFILE PAGE FOLLOW/CLONE Allow people to form groups of interest, curate and enhance the data through discussions and sharing Allow people to form groups of interest, curate and enhance the data through discussions and sharing
CATALOGUE DATASETS DATASETS SHOWCASE DATASETS DATASETS DATAVIZs INTERACTIVE TOOL DATASET DATAVIZ Offers a centralized access to the raw material Offers a centralized access to the raw material
CATALOGUE DATASETS DATASETS SHOWCASE DATASETS DATASETS DATAVIZs INTERACTIVE TOOL DATASET DATAVIZ Gives a feel of the data and allows people to evaluate its usefulness Gives a feel of the data and allows people to evaluate its usefulness
CATALOGUE DATASETS DATASETS SHOWCASE DATASETS DATASETS DATAVIZs INTERACTIVE TOOL DATASET DATAVIZ Allow people to act and contribute content to the platform Allow people to act and contribute content to the platform
CATALOGUE DATASETS DATASETS SHOWCASE DATASETS DATASETS DATAVIZs INTERACTIVE TOOL DATASET DATAVIZ This is a positive feedback loop, a desirable property for a data platform This is a positive feedback loop, a desirable property for a data platform
is turned in a commodity • Easy to find (centralized) • Easy to get (accessible) • Up-to-date (fresh) Limiting access to data greatly reduces the opportunities of reaping the rewards of collecting the data Limiting access to data greatly reduces the opportunities of reaping the rewards of collecting the data
social innovation • Groups of interest (self-organization) • Lightweight collaboration (x-pollination) • Showcase best content (emergence) This is a good recipe for sustainable change This is a good recipe for sustainable change
DATABASES MICRO-COMPUTERS Early days of collecting data Data was simple, table-like, isolated, hard to acquire and slow to process Data was simple, table-like, isolated, hard to acquire and slow to process
CLIENT-SERVER APPLICATION GRAPHICAL USER INTERFACE Early days of business intelligence RELATIONAL DATABASES MICRO-COMPUTERS Early days of collecting data
CLIENT-SERVER APPLICATION GRAPHICAL USER INTERFACE Early days of business intelligence RELATIONAL DATABASES MICRO-COMPUTERS Early days of collecting data Data was larger, related, Inter-connected. Tools made it easier to process and analyze Data was larger, related, Inter-connected. Tools made it easier to process and analyze
APPLICATIONS OPEN STANDARDS Everybody is producing, collecting and processing data 90s 00s CLIENT-SERVER APPLICATION GRAPHICAL USER INTERFACE Early days of business intelligence RELATIONAL DATABASES MICRO-COMPUTERS Early days of collecting data
APPLICATIONS OPEN STANDARDS Everybody is producing, collecting and processing data 90s 00s CLIENT-SERVER APPLICATION GRAPHICAL USER INTERFACE Early days of business intelligence RELATIONAL DATABASES MICRO-COMPUTERS Early days of collecting data Data is everywhere, available connected, semi-structured formats. Many tools exist to process and analyze it on a large scale Data is everywhere, available connected, semi-structured formats. Many tools exist to process and analyze it on a large scale
A standard hardware and software platform allowed more companies to jump in 80s: A standard hardware and software platform allowed more companies to jump in
Standard data formats, and easier access allowed for bigger market and new needs 90s: Standard data formats, and easier access allowed for bigger market and new needs
PROPRIETARY HARDWARE OPEN HARDWARE The PC allowed the democratization of personal computers by using a standard, semi-open hardware platform The PC allowed the democratization of personal computers by using a standard, semi-open hardware platform
PROPRIETARY HARDWARE OPEN HARDWARE PROPRIETARY FORMATS OPEN FORMATS As more and more software was created, there was a need to exchange information between software (but some companies made it hard for competitors) As more and more software was created, there was a need to exchange information between software (but some companies made it hard for competitors)
PROPRIETARY HARDWARE OPEN HARDWARE PROPRIETARY FORMATS OPEN FORMATS PROPRIETARY SOFTWARE OPEN SOFTWARE Software is more and more seen as a service rather than a product. Open software allow sharing and better composability of software. Software is more and more seen as a service rather than a product. Open software allow sharing and better composability of software.
PROPRIETARY HARDWARE OPEN HARDWARE PROPRIETARY FORMATS OPEN FORMATS PROPRIETARY SOFTWARE OPEN SOFTWARE PROPRIETARY DATA OPEN DATA Data is a precursor of information, and sharing it multiplies its potential for change and insight. Data is a precursor of information, and sharing it multiplies its potential for change and insight.
gives organization more freedom and independence Big technology companies fought this idea for long, seeing the threat that being open would cause to their business. In 2010, we thankfully have many examples showing that openness is both good for innovation and business! Big technology companies fought this idea for long, seeing the threat that being open would cause to their business. In 2010, we thankfully have many examples showing that openness is both good for innovation and business!
do organizations make decisions about their data infrastructures ? DISCLAIMER: The following is a caricature of what happens in some organizations, and is inspired from real facts. DISCLAIMER: The following is a caricature of what happens in some organizations, and is inspired from real facts.
BUSINESS INTELLIGENCE PRODUCT PDF REPORT EXCEL SPREAD SHEETS Most systems operate as a black-box, creating artificial barriers of access to the data Most systems operate as a black-box, creating artificial barriers of access to the data
BUSINESS INTELLIGENCE PRODUCT PDF REPORT EXCEL SPREAD SHEETS People will often work outside the system as it is not well integrated within their workflow People will often work outside the system as it is not well integrated within their workflow
LICENSES TRAINING You've spent money on using this specific product. Changing would mean starting all over again. You've spent money on using this specific product. Changing would mean starting all over again.
LICENSES TRAINING CONSULTANTS Lack of technology ownership forces you to outsource customization Lack of technology ownership forces you to outsource customization
The point here is not that these costs are unnecessary, but they can be important barriers to change and innovation, and that there are other ways to invest in technology.
Access to data is artificially complex, and most people will give up early. Those willing to share insights won't be able to do so. Access to data is artificially complex, and most people will give up early. Those willing to share insights won't be able to do so.
ecosystem Benefits: data is used efficiently, and breeds new data, leading to better understanding and insights. Benefits: data is used efficiently, and breeds new data, leading to better understanding and insights.
In an economy of knowledge, the society of information. This is especially true for industrialized countries which growth is mostly based on service and technology. This is especially true for industrialized countries which growth is mostly based on service and technology.
Data is the raw material that can be used to get a better understanding and make better decisions. Data is often expensive to collect but free to share and duplicate. It is sadly often left dormant and unused. Data is often expensive to collect but free to share and duplicate. It is sadly often left dormant and unused.
designed to be flexible and to be used to build new things that fit your specific needs Tools are designed to be flexible and to be used to build new things that fit your specific needs
Products indirectly shape and structure your organization, in a very arbitrary way. Products indirectly shape and structure your organization, in a very arbitrary way.
Interoperability Easy to extract, process and inject data, whatever the tool or technology Reducing the barriers of entry Reducing the barriers of entry
Ownership Data is a key resource, and so is the information system. Open Source allows organization to customize and keep the ownership of the technology that powers their information system.
Trial and error Technology changes fast, be prepared to have failures and be able to rebound and iterate quickly. Don't aim for perfection the first time you try.
Everything is a work-in-progress Start with what you have, whatever the quality, and let things improve organically, following user's demand. A common problem with open data is that public servants are often reluctant to publish data that they don't find of perfect quality A common problem with open data is that public servants are often reluctant to publish data that they don't find of perfect quality
• Innovation by individuals & small teams Innovation and change can hardly happen in a big, monolithic organization. Social web application allow for horizontal collaboration that won't disrupt your organization.
• Creating an ecosystem You are about to start an ecosystem, it will need attention to grow from something small to a self-sustainable system that will create improvements, sustainable change and innovation. This means starting a community and kickstarting the dynamics using a small team of community managers This means starting a community and kickstarting the dynamics using a small team of community managers
• Centralized data catalogue Data should be easy to find an easy to share. Be sure to add meta-information such as sources, license and tags. Only limit access when security or privacy concerns are involved.
use • Data exploration and visualization Provide ways to easily extract, filter and visualize the data (within the platform or by integrating with external tools), so that the data can be integrated within people's workflows. Provide Web APIs for easy mash-up creation.
• Social features for collaboration Let people self-organize in cross-domain groups of interest. Comments on datasets and visualizations will provide incentives for updates and give opportunities for new ideas.
understand Data is complex and mysterious, and it takes time and experimentation to decipher it. Making it easy and rewarding to experiment with it is the best way to ensure get benefits from data.
incentives Comments, discussions and new content are powerful drives for people to get involved. Sharing is an bi-directional communication are key elements of success.
make change happen Some innovations might need organizational support to be put in practice. Showcase and promote interesting datasets and visualizations to set examples and grow a culture of data, provide support for the projects to become real.
Open Data, Open Source There are very few reasons why things should be kept behind closed doors, especially for a public service organization. Being open is mutually beneficial to your organization and to its environment.
structural It will shape your organization and define what you can or cannot do. Own your technology and shape it so that it will help shape your organization in a positive way.
of a community Sustaining collaboration through community dynamics implies that you define new roles that will help integrate data as a key element of your organization's culture.
tools From charting applications to visualization toolkits • Dynamics of Open-Data From making data accessible to the public to creating a mutually-beneficial ecosystem • New database technologies You are welcome to contact me directly for more information! [email protected] You are welcome to contact me directly for more information! [email protected]