Slide 1

Slide 1 text

TESTABLE and REUSABLE DATA PROCESSING FLORIAN ECKERSTORFER https://florian.ec

Slide 2

Slide 2 text

I DEVELOPED a LIBRARY to PROCESS DATA

Slide 3

Slide 3 text

Plum A data processing pipeline for PHP. ³

Slide 4

Slide 4 text

–CARL FRENCH¹ “Data processing is the collection and manipulation of items of data to produce meaningful information.”

Slide 5

Slide 5 text

INFORMATION is DATA with MEANING

Slide 6

Slide 6 text

MEANING DEPENDS on CONTEXT

Slide 7

Slide 7 text

CURRENT DATE and TIME CONTEXT DATE Newspaper Tuesday 10 February 2015 21.42 GMT MySQL 2015-02-10 21:42:00 PHP date() function 1423600920 ISO 8601 2015-02-10T21:42:00Z

Slide 8

Slide 8 text

DATA PROCESSING INFORMATION

Slide 9

Slide 9 text

PROCESSING DATA FILTER CONVERSION MAPPING NORMALIZATION GROUPING SORTING

Slide 10

Slide 10 text

The whole point of this:

Slide 11

Slide 11 text

Data processing is basically all a programmer every does.

Slide 12

Slide 12 text

IMPORTING and EXPORTING DATA

Slide 13

Slide 13 text

INPUT PROCESSING OUTPUT CSV, XML, JSON, Database,… CSV, XML, JSON, Database,…

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

ddeboer / data-import² READER WORKFLOW WRITER FILTER CONVERTER List of records WRITER

Slide 16

Slide 16 text

use Ddeboer\DataImport\Workflow; use Ddeboer\DataImport\Reader; use Ddeboer\DataImport\Writer; use Ddeboer\DataImport\Filter\CallbackFilter; use Ddeboer\DataImport\ItemConverter\CallbackConverter; $reader = new Reader\...; $workflow = new Workflow($reader, $logger); $result = $workflow ->addWriter(new Writer\...()) ->addFilter(new CallbackFilter(...)) ->addItemConverter(new CallbackConverter(...)) ->setSkipItemOnFailure(true) ->process() ;

Slide 17

Slide 17 text

class ArrayReader extends \ArrayIterator implements CountableReaderInterface { } READER

Slide 18

Slide 18 text

RECORD [ 'name'=>'Florian', 'age'=>28, 'birthday'=>DateTime object(…) ]

Slide 19

Slide 19 text

use Ddeboer\DataImport\Filter\ValidatorFilter; $filter = new ValidatorFilter($validator); $filter->add('email', new Assert\Email()); $filter->add('sku', new Assert\NotBlank()); FILTER

Slide 20

Slide 20 text

interface  FilterInterface   {          public  function  filter(array  $item);          public  function  getPriority();   } CUSTOM FILTER

Slide 21

Slide 21 text

use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter; $converter = new DateTimeValueConverter('d/m/Y H:i:s', 'd-M-Y'); $workflow->addValueConverter('birthday', $converter); CONVERT

Slide 22

Slide 22 text

interface  ItemConverterInterface   {          public  function  convert($input);   } CUSTOM CONVERTER interface  ValueConverterInterface   {          public  function  convert($input);   }

Slide 23

Slide 23 text

interface  WriterInterface   {          public  function  prepare();          public  function  writeItem(array  $item);          public  function  finish();   } WRITER

Slide 24

Slide 24 text

SMALL COMPONENTS Reader Writer Filter Item Converter Value Converter

Slide 25

Slide 25 text

EACH COMPONENT Easily testable Reusable with other workflows Reusable without workflow

Slide 26

Slide 26 text

So what's the problem? Why did you create your own data processing library?

Slide 27

Slide 27 text

DATA TYPE of RECORD data-import Array Plum Array String Object Integer

Slide 28

Slide 28 text

$files = [ './file1.md', './file2.md', './file3.md' ]; class ReadFileConverter implements ConverterInterface { public function convert($item) { return ['file'=>$item, 'content'=>file_get_contents($item)]; } } $files = [ ['file'=>'./file1.md','content'=>'Hello *World*!'], ['file'=>'./file2.md','content'=>'**Foobar**'] ['file'=>'./file3.md','content'=>'![](img1.png)'] ];

Slide 29

Slide 29 text

class  FinderReader  implements  ReaderInterface   {          private  $finder;          public  function  __construct(Symfony\Component\Finder\Finder  $finder)          {                  $this-­‐>finder  =  $finder;          }          public  function  getIterator()          {                  return  $this-­‐>finder-­‐>getIterator();          }          public  function  count()          {                  return  $this-­‐>finder-­‐>count();          }   }

Slide 30

Slide 30 text

ORDER of PROCESSING data-import 1. Filter 2. Item Converter 3. Value Converter 4. Filter 5. Writer Plum

Slide 31

Slide 31 text

interface FilterInterface { public function filter($item); } interface WriterInterface { public function writeItem($item); } interface ConverterInterface { public function convert($item); }

Slide 32

Slide 32 text

interface FilterInterface extends PipelineInterface { public function filter($item); } interface WriterInterface extends PipelineInterface { public function writeItem($item); } interface ConverterInterface extends PipelineInterface { public function convert($item); }

Slide 33

Slide 33 text

$workflow = new Workflow(); $workflow->addFilter($filter) ->addConverter($dateConverter) ->addWriter($csvWriter) ->addConverter($textEncodingConverter) ->addWriter($xmlWriter); $workflow->process($reader);

Slide 34

Slide 34 text

ORDER of PROCESSING data-import 1. Filter 2. Item Converter 3. Value Converter 4. Filter 5. Writer Plum Filter Converter Writer

Slide 35

Slide 35 text

ORDER of PROCESSING data-import 1. Filter 2. Item Converter 3. Value Converter 4. Filter 5. Writer Plum Filter Converter Filter Writer

Slide 36

Slide 36 text

ORDER of PROCESSING data-import 1. Filter 2. Item Converter 3. Value Converter 4. Filter 5. Writer Plum Filter Writer Filter Writer

Slide 37

Slide 37 text

ORDER of PROCESSING data-import 1. Filter 2. Item Converter 3. Value Converter 4. Filter 5. Writer Plum Converter Writer Filter Converter Writer

Slide 38

Slide 38 text

Plum A data processing pipeline for PHP.

Slide 39

Slide 39 text

Arbitrary data type of record Arbitrary order of processing Works with Dependency Injection No value converters + −

Slide 40

Slide 40 text

WORKFLOW CONCATENATION Reuse workflows Useful with Dependency Injection $concatenator = new WorkflowConcatenator(); $workflow1->addWriter($concatenator); $workflow1->process($reader); $workflow2->process($concatenator):

Slide 41

Slide 41 text

CURRENT STATUS Workflow + Interfaces = Design pattern implemented

Slide 42

Slide 42 text

IN PROGRESS Modular system plumphp / plum plumphp / plum-json plumphp / plum-csv plumphp / plum-date

Slide 43

Slide 43 text

ROADMAP Workflow Splitter Value Converters Logging

Slide 44

Slide 44 text

WORKFLOW SPLITTER INPUT SPLITTER OUTPUT 1 OUTPUT 2 [['x'=>42],['x'=>69]] [['x'=>42]] [['x'=>69]] x>=50 x<50

Slide 45

Slide 45 text

VALUE CONVERTER $converter = new UppercaseConverter('[name]'); $converter->convert(['name' => 'florian']); Symfony Property Access Component $converter = new UppercaseConverter('name'); $converter = new UppercaseConverter('children[0].name');

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

You can contribute on Github Feedback Bugs Pull Requests

Slide 48

Slide 48 text

DO YOU HAVE ANY QUESTIONS? or FEEDBACK?

Slide 49

Slide 49 text

[1] French, Carl (1996). Data Processing and Information Technology (10th ed.). [2] David de Boer. data-import. https://github.com/ddeboer/data- import [3] Florian Eckerstorfer. plum. https://github.com/plumphp/plum