Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

▪ ▪ ▪

Slide 16

Slide 16 text

My First Heading

link1 link2

testo

/html/body/a[1]/@href

Slide 17

Slide 17 text

/axis::node-test[predicate]/axis::node-test[predicate]/axis::node-test[predicate] ▪ → → → /locationstep/locationstep/locationstep

Slide 18

Slide 18 text

/html/body/a[1]/@href /child::html/child::body/child::a[1]/attribute::href

Slide 19

Slide 19 text

▪ esempio di contenuto ▪ esempio di contenuto ▪ //nome-tag/@attributo

Slide 20

Slide 20 text

▪ → → ▪ → contains(str1, str2) → starts-with(str1, str2) /html/body/a[1]/@href //a[1]/@href //a[contains(@href, "link1.html")] //a[starts-with(@href, "link1")]

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

▪ //*[@id="menu-item-5015"]/a ▪ //ul[@id="menu-primary-items"]/li/a

Slide 23

Slide 23 text

=IMPORTXML(url, xpath_query) ▪ ▪

Slide 24

Slide 24 text

=XPathOnUrl(url, xpath, attribute, xmlHTTPSettings, mode) ▪ ▪ ▪

Slide 25

Slide 25 text

▪ ▪

Slide 26

Slide 26 text

▪ ▪ ▪

Slide 27

Slide 27 text

▪ ▪

Slide 28

Slide 28 text

from lxml import html import requests urls = open("urls.txt", "r") results_file = open("results.txt", "a+") for item in urls: url = item.rstrip("\n") page = requests.get(url) tree = html.fromstring(page.content) text = tree.xpath('//ul[@id="menu-primary-items"]/li/a/text()') results_file.write("%s,%s\n" % (url, text)) print ("SCRAPING " + url) print (text, "\n") results_file.close() ▪ ▪ ▪ ▪ ▪

Slide 29

Slide 29 text

//title //meta[@name="description"]/@content //link[@hreflang="it-IT"]/@href //link[contains(@hreflang, *)]/@href //link[@rel="canonical"]/@href //meta[@name="robots"]/@content //h1 //url/loc/text()

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

▪ → → → ▪

Slide 32

Slide 32 text

https://www.googleapis.com/an alytics/v3/data/ga ?ids=ga:XXXX&start-date=2019- 01-01&end-date=2019-03- 31&metrics=ga:sessions&filter s=ga:country==Italy&access_to ken=XXXX https://www.googleapis.com/an alytics/v3/data/ga ?ids=ga:XXXX &start-date=2019-01-01 &end-date=2019-03-31 &metrics=ga:sessions &filters=ga:country==Italy &access_token=XXXX ▪ ▪ ▪

Slide 33

Slide 33 text

https://www.googleapis.com/an alytics/v3/data/ga ?ids=ga:XXXX &start-date=2019-01-01 &end-date=2019-03-31 &metrics=ga:sessions &filters=ga:country==Italy

Slide 34

Slide 34 text

https://www.googleapis.com/an alytics/v3/data/ga ?ids=ga:XXXX &start-date=2019-01-01 &end-date=2019-03-31 &metrics=ga:sessions &filters=ga:country==Italy ▪ ▪ ▪

Slide 35

Slide 35 text

https://www.googleapis.com/an alytics/v3/data/ga ?ids=ga:XXXX &start-date=2019-01-01 &end-date=2019-03-31 &metrics=ga:sessions &filters=ga:country==Italy ▪ ▪ ▪ ▪

Slide 36

Slide 36 text

https://www.googleapis.com/an alytics/v3/data/ga ?ids=ga:XXXX &start-date=2019-01-01 &end-date=2019-03-31 &metrics=ga:sessions &filters=ga:country==Italy ▪ ▪ ▪

Slide 37

Slide 37 text

▪ ▪ ▪ ga:name operator expression ga:country == Italy

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

▪ ▪

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

▪ ▪

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

▪ ▪ → → → →

Slide 45

Slide 45 text

▪ ▪ https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query { "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 }

Slide 46

Slide 46 text

https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query { "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 }

Slide 47

Slide 47 text

{ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 } ▪ ▪ ▪ https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query

Slide 48

Slide 48 text

{ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 } ▪ ▪ https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query

Slide 49

Slide 49 text

{ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 } ▪ ▪ https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query

Slide 50

Slide 50 text

▪ ▪ ▪ "dimension": string, "operator": string, "expression": string "dimension": country, "operator": equals, "expression": ITA

Slide 51

Slide 51 text

{ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 } ▪ ▪ https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query

Slide 52

Slide 52 text

{ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 } ▪ ▪ https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query

Slide 53

Slide 53 text

{ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": 25000 "startRow": 0 } ▪ ▪ → → https://www.googleapis.com/webmasters/v3/s ites/XXXX/searchAnalytics/query

Slide 54

Slide 54 text

▪ ▪ ▪

Slide 55

Slide 55 text

▪ ▪ ▪

Slide 56

Slide 56 text

▪ ▪

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

▪ ▪ ▪

Slide 59

Slide 59 text

... rowLimit = 25000 retrieve_search_queries = webmasters_service.searchanalytics().query( siteUrl='ENTER-YOURS-HERE', body={ "startDate": "2019-01-01", "endDate": "2019-03-31", "dimensions": ["query"], "dimensionFilterGroups": [ { "filters": [ { "dimension": "country", "operator": "equals", "expression": "ITA" } ] } ], "aggregationType": "auto", "rowLimit": rowLimit } ).execute() results_file = open("results.txt", "a+") for i in range(0, rowLimit): keys = retrieve_search_queries['rows'][i]['keys'] impressions = retrieve_search_queries['rows'][i]['impressions'] clicks = retrieve_search_queries['rows'][i]['clicks'] ctr = retrieve_search_queries['rows'][i]['ctr'] position = retrieve_search_queries['rows'][i]['position'] print ("%s|%s|%s|%s|%s\n" % (keys, impressions, clicks, ctr, position)) results_file.write ("%s|%s|%s|%s|%s\n" % (keys, impressions, clicks, ctr, position)) results_file.close()

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

▪ ▪ ▪

Slide 63

Slide 63 text

▪ ▪ ▪ https://adwords.google.com/api/adwords/cm/ v201809/CampaignService ... ...

Slide 64

Slide 64 text

▪ ▪ ▪ → →

Slide 65

Slide 65 text

▪ ▪

Slide 66

Slide 66 text

▪ ▪ ▪ → → → − − →

Slide 67

Slide 67 text

▪ ▪ → →

Slide 68

Slide 68 text

▪ ▪ ▪

Slide 69

Slide 69 text

▪ ▪

Slide 70

Slide 70 text

▪ ▪

Slide 71

Slide 71 text

▪ ▪

Slide 72

Slide 72 text

▪ ▪

Slide 73

Slide 73 text

▪ ▪ → → → →

Slide 74

Slide 74 text

▪ ▪ → → →

Slide 75

Slide 75 text

▪ ▪ → →

Slide 76

Slide 76 text

▪ ▪ ▪

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

No content

Slide 79

Slide 79 text

▪ ▪ ▪

Slide 80

Slide 80 text

▪ ▪ ▪

Slide 81

Slide 81 text

▪ ▪ ▪ ▪ ▪ → → →

Slide 82

Slide 82 text

... def main(client, item, ad_group_id=None): # Initialize appropriate service. targeting_idea_service = client.GetService( 'TargetingIdeaService', version='v201809') # Construct selector object and retrieve related keywords. selector = { 'ideaType': 'KEYWORD', 'requestType': 'STATS' } selector['requestedAttributeTypes'] = [ 'KEYWORD_TEXT', 'SEARCH_VOLUME'] offset = 0 selector['paging'] = { 'startIndex': str(offset), 'numberResults': str(PAGE_SIZE) } selector['searchParameters'] = [{ 'xsi_type': 'RelatedToQuerySearchParameter', 'queries': item }] # Language setting (optional). selector['searchParameters'].append({ # The ID can be found in the documentation: # https://developers.google.com/adwords/api/docs/appendix/languagecodes 'xsi_type': 'LanguageSearchParameter', 'languages': [{'id': '1004'}] }) # Location setting (optional). selector['searchParameters'].append({ # The ID can be found in the documentation: # https://developers.google.com/adwords/api/docs/appendix/geotargeting 'xsi_type': 'LocationSearchParameter', 'locations': [{'id': '2380'}] }) # Network search parameter (optional) selector['searchParameters'].append({ 'xsi_type': 'NetworkSearchParameter', 'networkSetting': { 'targetGoogleSearch': True, 'targetSearchNetwork': False, 'targetContentNetwork': False, 'targetPartnerSearchNetwork': False } }) ▪ ▪ ▪ ▪ ▪ ▪ ▪

Slide 83

Slide 83 text

... # Display results. if 'entries' in page: for result in page['entries']: attributes = {} for attribute in result['data']: attributes[attribute['key']] = getattr(attribute['value'], 'value', '0') results_file.write('%s|%s|%s\n' % (item, attributes['KEYWORD_TEXT'], attributes['SEARCH_VOLUME'])) print ('%s|%s|%s' % (item, attributes['KEYWORD_TEXT'], attributes['SEARCH_VOLUME'])) print else: print ('No related keywords were found.') offset += PAGE_SIZE selector['paging']['startIndex'] = str(offset) more_pages = offset < int(page['totalNumEntries']) if __name__ == '__main__': # Initialize client object. adwords_client = adwords.AdWordsClient.LoadFromStorage("ABSOLUTE-PATH-TO-googleads.yaml") adwords_client.SetClientCustomerId('ENTER-YOURS-HERE') kwds = open("kwds.txt","r") #reload(sys) #sys.setdefaultencoding('utf-8') for line in kwds: item = line.strip() results_file = open("results.txt", "a+") main(adwords_client, item, int(AD_GROUP_ID) if AD_GROUP_ID.isdigit() else None) print(datetime.datetime.now()) results_file.close() sleep(2) ▪ ▪ ▪ ▪

Slide 84

Slide 84 text

... # Construct selector object and retrieve related keywords. selector = { 'ideaType': 'KEYWORD', 'requestType': ‘IDEAS' } selector['requestedAttributeTypes'] = [ 'KEYWORD_TEXT', 'SEARCH_VOLUME'] offset = 0 selector['paging'] = { 'startIndex': str(offset), 'numberResults’: 10 } ...

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

No content

Slide 87

Slide 87 text

from lxml import html import requests urls = open("urls.txt", "r") results_file = open("results.txt", "w") for item in urls: url = item.rstrip("\n") page = requests.get(url) tree = html.fromstring(page.content) text = tree.xpath('//h3[@class="r"]/a/@href') results_file.write("%s,%s\n" % (url, text)) print ("SCRAPING " + url) print (text, "\n") results_file.close() ▪ ▪ ▪ ▪

Slide 88

Slide 88 text

▪ ▪ ▪ ▪ https://www.google.[com]/search?q=site:[dominio]&start=[#pagina]&...

Slide 89

Slide 89 text

▪ ▪ ▪ /url?q=http://www.simpleagency.it/&sa=U&ved=0ahUKEwizuOnv1YTiAhU9GLkGHQUZAe8QFggUMAA& usg=AOvVaw2SLUR7xqI7OaMms1_bXQ3h

Slide 90

Slide 90 text

... #download and store new html file os.rename('/home/giancampo/diff-html/new_html.html', '/home/giancampo/diff-html/old_html.html') url = ‘YOUR-HOMEPAGE-URL' response = urllib2.urlopen(url) webContent = response.read() f = open('/home/giancampo/diff-html/new_html.html', 'w') f.write(webContent) f.close() #convert html to txt files html1 = open('/home/giancampo/diff-html/old_html.html').read() html2 = open('/home/giancampo/diff-html/new_html.html').read() old_file = html2text.html2text(html1) new_file = html2text.html2text(html2) #write text into txt files old_text = open('/home/giancampo/diff-html/old_text.txt', 'w') new_text = open('/home/giancampo/diff-html/new_text.txt', 'w') old_text.write(old_file) new_text.write(new_file) old_text.close() new_text.close() ... ▪ ▪

Slide 91

Slide 91 text

... #send an email if the script has found differences if filecmp.cmp('/home/giancampo/diff-html/old_text.txt', '/home/giancampo/diff-html/new_text.txt') == True: print 'no emails sent' else: gmail_user = ‘YOUR-GMAIL-ADDRESS' gmail_password = YOUR-GMAIL-PASSWORD' sent_from = gmail_user to = ['gianluca.campo@optimize.it'] subject = 'Changes in the homepage!' body = _diff email_text = '''From: %s\nTo: %s\nSubject: %s\n\n%s''' % (sent_from, ', '.join(to), subject, body) server = smtplib.SMTP_SSL('smtp.gmail.com', 465) server.ehlo() server.login(gmail_user, gmail_password) server.sendmail(sent_from, to, email_text) server.close() print 'Email sent!' #files closing diff_file.close() ▪ ▪ ▪

Slide 92

Slide 92 text

No content

Slide 93

Slide 93 text

No content

Slide 94

Slide 94 text

No content

Slide 95

Slide 95 text

▪ ▪ ▪ → → ▪

Slide 96

Slide 96 text

No content

Slide 97

Slide 97 text

No content

Slide 98

Slide 98 text

No content

Slide 99

Slide 99 text

No content

Slide 100

Slide 100 text

No content

Slide 101

Slide 101 text

No content

Slide 102

Slide 102 text

No content

Slide 103

Slide 103 text

No content

Slide 104

Slide 104 text

No content

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

No content

Slide 107

Slide 107 text

No content

Slide 108

Slide 108 text

No content

Slide 109

Slide 109 text

▪ ▪

Slide 110

Slide 110 text

▪ ▪ ... if __name__ == '__main__': # Initialize client object. adwords_client = adwords.AdWordsClient.LoadFromStorage("C:\\Users\\gianl\\AppDa ta\\Local\\Programs\\Python\\Python37\\_i miei script\\adwords-api\\googleads.yaml") adwords_client.SetClientCustomerId('ENTER-YOURS-HERE') kwds = open("kwds.txt","r") reload(sys) sys.setdefaultencoding('utf-8') for line in kwds: item = line.strip() results_file = open("results.txt", "a+") main(adwords_client, item, int(AD_GROUP_ID) if AD_GROUP_ID.isdigit() else None) print(datetime.datetime.now()) results_file.close() sleep(2)