acquisition, transformation & interpretation of raw data into meaningful insights for profitable or competitive business purposes. BUSINESS INTELLIGENCE
about paying too much for ‘not so great’ services. As a data consultant, you need to deliver a performance measure to rank hotels and deliver recommendations to customers. Likes are great but a binary representation. Customer Ratings are great but an abstract representation on a scale of 1-10.
K-Means Clustering 1. Scrape Lagos hotel reviews from hotels.ng. 2. Use Natural Language Processing to extract review polarity & subjectivity. 3. Create a matrix of review words across all reviews. (Converting Text into a quantitative measurement) 4. Use K-Means to cluster reviews. 5. Aggregate Polarity, Subjectivity & Cluster scores for scoring hotel performance.
cluster, but not so much that they become singular. K is the number of groups to classify hotels into. How do we know? Visualize within group differences for at least 30 - 50 clusters. Here, K is fine anywhere between 10 and 23 clusters. We can assign new hotels to a cluster by feeding their reviews into the this matrix. The matrix can learn new words from new reviews and re-classify hotels based on updates.
matrix with the number of times each word in all documents occur in each document. SO: N = Number of text files (in this case reviews) and M = Number of individual words across all documents. Normalize Matrix: [x-mean(x)] / Standard Deviation (x) Cluster document based on word frequency occurrence DOCUMENT-TERM MATRIX
& Apartments 2. Travel House Lekki 3. The Belaggio Corporate Suites 4. Piccadilly Suites 5. Ikoyi Fairview Apartments 6. Precinct Comfort 7. Victoria Continental Hotel 8. Lakeem Suites 9. Hotel Ibis Royale 10. Signature Suits TOP 10 HOTELS BY USER LIKES 1. Glonik Hotels 2. Beni Gold Hotel & Apartments 3. Unilag Guest House 4. Regent Luxury Suites 5. Wazobia Plaza 6. Intercontinental Lagos Hotel 7. Hotel Ibis Royale 8. Sheraton Hotel Lagos 9. Eko Hotels 10. Victoria Continental Hotel
Deeper analysis e.g adding a time and location variable can identify what and when customers have good and bad reviews. This can be integrated with employee data to identify ‘bad luck’ employees and best performing employees.