Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2017 - Virginia Tam - Building the Yelp API

PyBay
August 12, 2017

2017 - Virginia Tam - Building the Yelp API

Description
Ever wanted to burn down your old code and start all over again? With the Yelp public API (known as the Yelp Fusion API), that’s exactly what we did. In this talk, Virginia Tam, a software engineer at Yelp, will talk about the challenges the team faced building the newest version of the Fusion API and what we learned in the process.

Abstract
This talk will be broken into four parts: the history of Yelp’s public API, how we built the newest version, the challenges we faced, and the current state of the Fusion API.

The Yelp Fusion API provides non-Yelp developers the ability to query for business information as well as perform business searching. The previous version of this API (known as the v2 API) was first built in 2010. As of the time of this writing, the v2 API has well over 100,000 registered API keys from independent developers as well as partner companies.

In 2016, the API was in serious need of improvement. Authentication was done using OAuth 1.0, which does not use a secure SSL tunnel and thus employed a tedious handshake of tokens and client secrets between the client and the server. We wanted to move to OAuth 2.0, which uses SSL by default, but introducing these changes would have made authentication backwards incompatible for all existing users.

There was a strong desire to expose richer Yelp data by creating new endpoints, but because the code lived in a large monolithic codebase shared with several unrelated features, development would be time consuming as existing API code was tangled in legacy code. The existing API had a bulky design that made it difficult to add new data but was also impossible without breaking backwards compatibility.

We decided to use these shortcomings as an opportunity to re-build the API from the ground up using up-to-date tools and infrastructure available at Yelp. A lot had changed since 2010 -- we have a sophisticated microservices framework that allows us to develop independently of the monolithic codebase. We have new tools that would make building the API easier, provide easy endpoint creation, flexibility in choosing a datastore, and configurability in ecosystem factors such as memory allotment and library versioning. Starting from scratch also allowed us to design things the right way and consolidate a lot of common logic that had been previously duplicated and scattered throughout the codebase.

Today, we have thousands of registered API keys for our new Fusion API that includes 12 partner companies who have incorporated Yelp data into their applications. We still face ongoing challenges such as the timeline for shutting down the old API and working with partners to help them migrate to the new API. In the future, we plan on expanding the Fusion API to provide even more data and enhance the developer experience with new endpoints to give insight on their API usage.

Bio
Virginia Tam is a software engineer who spent the beginning of her career in networking and device-level software development and eventually made the lateral move to backend web development. She is currently on the Partnerships team at Yelp where she works with external companies and organizations and builds the infrastructure to support syndicating out Yelp data, ingesting partner data, and building APIs for partners. Virginia also enjoys engaging with the developer community and is a former member of the Santa Clara Valley Society of Women Engineers where she organized several workshops to introduce middle and high school aged girls to STEM careers.

PyBay

August 12, 2017
Tweet

More Decks by PyBay

Other Decks in Programming

Transcript

  1. •Search for businesses •Retrieve additional business data •Also Yelp Deals,

    gift certificates search, phone number search, etc. Public API v2 - Querying for Yelp Data History of Public API
  2. https://api.yelp.com/v2/search?term=food&location=San+Francisco&limit=1&oauth_consumer_key=12345&oauth_token=abcdefg&oauth _signature_method=HMAC-SHA1&oauth_signature=johnhancock&oauth_timestamp=1502324141&oauth_nonce=some_random_chars Public API v2 - Querying for Yelp Data

    History of Public API Response: { "businesses": [ { "categories": [ "Local Flavor", ], "display_phone": "+1-415-908-3801", "id": "jeremys-haus-of-nomz", "image_url": "http://yelp.com/food.jpg", "is_claimed": true, "is_closed": false, "location": { "address": [ "140 New Montgomery St" ], "country_code": "US", "cross_streets": "Natoma St & Minna St", "postal_code": "94105", "state_code": "CA" }, "name": "Jeremy’s Haus of Nomz", "phone": "4159083801", "rating": 2.5, ], "total": 2316 }
  3. •Built in monolithic code base •OAuth 1.0 •MySQL for rate

    limiting (25k hits per day) 2010 Throwback Technologies History of Public API
  4. •Code entanglement “Hey, this public API logic is also being

    used by the mobile app…” Why Rebuild It?
  5. from util import api_format_v1 class MobileSearch(object): """ Retrieve search results

    for mobile app. """ # do mobile-specific things return api_format_v1( businesses, mobile_things ) from util import api_format_v1 class PublicAPISearch(object): """ Retrieve search results for Public API. """ # do Public API-specific things return api_format_v1( businesses ) Why Rebuild It?
  6. •Code entanglement •Difficult to build new endpoints quickly “Hey, this

    public API logic is also being used by the mobile app…” Why Rebuild It?
  7. •No SSL requirement •Each request needs: ◦Consumer key ◦Token ◦Signature

    method ◦Signature ◦Timestamp ◦nonce OAuth 1.0 is Cumbersome Why Rebuild It?
  8. current_counts_dict = {} time_last_logged = datetime.datetime.now() def process_access_count(consumer_key): # add

    count to local dictionary current_counts_dict[consumer_key] += 1 now = datetime.datetime.now() if now - time_last_logged > timedelta(minutes=5): # write to master DB submit_access_counts_to_master(current_counts_dict) # clear in-memory tabulation current_counts_dict = {} time_last_logged = datetime.datetime.now() Why Rebuild It? Tabulation at Every Data Center
  9. •Batched access counts writes every 5 minutes •Complicated logic for

    multiple data centers Wrong Tools, Complicated Logic Why Rebuild It?
  10. Yelp PaaSTA - An open, distributed platform as a service

    Microservices to the Rescue! Let’s Rebuild It!
  11. •Dockerized containers == whatever python we want! •Memory savings! •Unicode

    all the strings! … and so much more! We can run Python 3!!!!!!! Let’s Rebuild It!
  12. •Public API Service doesn’t have a frontend component •Authorization Service

    contains all client info •Documentation Service == client management without database Clear Separation of Roles Let’s Rebuild It!
  13. GET https://api.yelp.com/v3/busi nesses/12345 access_token: xxxxxxxxx {“id”: 12345, “name”: “Yelp Inc.”,

    “address1”: “140 New Montgomery St.”, “city”: “San Francisco”, “state”: “CA”, “zip_code”: “94106”, “country”: “US”} Public API Service Authorization Service Biz Info Service Photos Service Reviews Service
  14. •Distributed nodes eliminate using async task queue •Async Incremental Counter

    eliminates batched writes The Right Tool for the Job
  15. Private Beta with Partners Public Beta through GitHub Beta Testing:

    Catching Things You Didn’t Let’s Rebuild It!
  16. •Are these endpoints still getting traffic? •Is there a better

    replacement feature? Keep What You Need, Maintain Parity Yelp Deals Endpoint Gift Cards Endpoint Developer Console Challenges & Discoveries
  17. Parallelization == More Instances == $$$$ User Public API Photos

    Reviews Biz Info Geolocation request response
  18. Performance Parity Doesn’t Break the Bank User Public API Photos

    Reviews Biz Info Geolocation request response
  19. •Partners have their own roadmaps and priorities •External devs are

    quick! Turns Out, Other Companies are Busy Too Challenges & Discoveries
  20. •No longer creating new API keys •Official discontinue date: June

    30, 2018 •Current challenge: migrating large partners Communicate, Communicate, Communicate! Challenges & Discoveries
  21. •Spin up new endpoints very quickly •Work on different parts

    independently •Microservice niceties! (logging, monitoring, service discovery, deployment, autoscaling, etc.) Fusion: Harder, Better, Faster, Stronger Takeaways
  22. •We now have an established Public API Beta program! •GraphQL

    API and new endpoints released! Beta Programs are Awesome!! Takeaways