Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Closer Look at Our Article Search API

A Closer Look at Our Article Search API

James Boehmer explains how to use The New York Times Article Search API v2.0. This talk was presented at the TechCrunch Disrupt NY Hackathon 2013.

More Decks by The New York Times Developers

Other Decks in Technology

Transcript

  1. Getting an API Key First things first, you need a

    key to use NYT APIs! 1. Go to (log in) 2. Choose the API you want to use 3. Agree to the evil terms of service!!! developer.nytimes.com/apps/register
  2. Article Search API v2 Let's say you want to find

    an article in the archives. You'll want to use the new Article Search API. Use Built in help docs at Developer Network docs at Don't forget the api-key parameter! api.nytimes.com/svc/search/v2/articlesearch.json /svc/search/v2/help.json developer.nytimes.com
  3. Article Search API v2 Query (q) parameter The q parameter

    searches the body, headline, and byline for relevant results. (~18,750 hits) (~17,770 hits) q=Pulitzer Prize q="Pulitzer Prize"
  4. Article Search API v2 Highlight (hl) parameter All results get

    returned with a headline and snippet. Use the hl parameter to highlight the query term. q="Pulitzer Prize" & hl=true { w e b _ u r l : " h t t p : / / s e l e c t . n y t i m e s . c o m / g s t / a b s t r a c t . h t m l ? r e s = 9 E 0 6 E 3 . . . " , s n i p p e t : " c o n d e m n i n g t h e < s t r o n g > P u l i t z e r P r i z e < / s t r o n g > a w a r d t o . . . " , h e a d l i n e : { m a i n : " T h e < s t r o n g > P u l i t z e r P r i z e < / s t r o n g > . " } }
  5. Article Search API v2 Begin/End Date parameters Filter your search

    results by publication date (~230 hits) The begin_date and end_date parameters are inclusive filters for limiting the search corpus by publication date. q="Pulitzer Prize" & begin_date=20130101
  6. Article Search API v2 Begin/End Date parameters (cont) The begin_date

    and end_date parameters can be used together or alone, implying an open ended filter (~230 hits) (~17500 hits) (~10 hits) q="Pulitzer Prize" & begin_date=20130101 q="Pulitzer Prize" & end_date=20121231 q=pulitzer "pentagon papers" & begin_date=19720101 & end_date=19721231
  7. Article Search API v2 Sort parameter The sort parameter sorts

    the results by publication date, forcibly overriding relevance scores. Relevance is still calculated for the query term, but only for inclusion in the result set Documents with no publication date (e.g. references and lists) are returned last q="Pulitzer Prize" & sort=newest
  8. Article Search API v2 Filter Query (fq) parameter Use standard

    syntax to create a custom filter Similar to the date parameters, the filter query also limits the corpus before searching for the query term The fields available for filtering behave in various way based on how they are analyzed at index time. Lucene q="Pulitzer Prize" & sort=newest & fq=source:"The New York Times"
  9. Article Search API v2 Filter Query (fq) fields F i

    e l d B e h a v i o r b o d y m u l t i p l e t o k e n s b o d y . s e a r c h l e f t - e d g e n - g r a m s c r e a t i v e _ w o r k s s i n g l e t o k e n c r e a t i v e _ w o r k s . c o n t a i n s m u l t i p l e t o k e n s d a y _ o f _ w e e k s i n g l e t o k e n d o c u m e n t _ t y p e c a s e s e n s i t i v e e x a c t m a t c h g l o c a t i o n s s i n g l e t o k e n g l o c a t i o n s . c o n t a i n s m u l t i p l e t o k e n s h e a d l i n e m u l t i p l e t o k e n s h e a d l i n e . s e a r c h l e f t - e d g e n - g r a m s k i c k e r s i n g l e t o k e n k i c k e r . c o n t a i n s m u l t i p l e t o k e n s n e w s _ d e s k s i n g l e t o k e n n e w s _ d e s k . c o n t a i n s m u l t i p l e t o k e n s o r g a n i z a t i o n s s i n g l e t o k e n o r g a n i z a t i o n s . c o n t a i n s m u l t i p l e t o k e n s p e r s o n s s i n g l e t o k e n p e r s o n s . c o n t a i n s m u l t i p l e t o k e n s p u b _ d a t e t i m e s t a m p ( Y Y Y Y - M M - D D ) p u b _ y e a r i n t e g e r s e c p g m u l t i p l e t o k e n s
  10. Article Search API v2 Filter Query (fq) fields (cont) Various

    fields can be combined in a complex way to narrow down exactly what you want The default boolean between values in parenthesis is OR Explicit booleans (AND, OR) must always be UPPER CASE F i e l d B e h a v i o r s o u r c e s i n g l e t o k e n s o u r c e . c o n t a i n s m u l t i p l e t o k e n s s u b j e c t s i n g l e t o k e n s u b j e c t . c o n t a i n s m u l t i p l e t o k e n s s e c t i o n _ n a m e s i n g l e t o k e n s e c t i o n _ n a m e . c o n t a i n s m u l t i p l e t o k e n s t y p e _ o f _ m a t e r i a l s i n g l e t o k e n t y p e _ o f _ m a t e r i a l . c o n t a i n s m u l t i p l e t o k e n s w e b _ u r l c a s e s e n s i t i v e s i n g l e t o k e n w o r d _ c o u n t i n t e g e r
  11. Article Search API v2 Type parameter Filter by document_type using

    the type parameter Multiple document types can be comma-separated q="Pulitzer Prize" & sort=newest & type=blogpost,multimedia
  12. Article Search API v2 More about filter‐like parameters The type,

    begin_date and end_date parameters are API conveniences. They are functionally equivalent filter queries, joined by a logical AND t y p e = b l o g p o s t , m u l t i m e d i a ...is the same as... f q = d o c u m e n t _ t y p e : ( " b l o g p o s t " " m u l t i m e d i a " ) ...which is the same as... f q = d o c u m e n t _ t y p e : " b l o g p o s t " O R d o c u m e n t _ t y p e : " m u l t i m e d i a "
  13. Article Search API v2 More about filter‐like parameters (cont) b

    e g i n _ d a t e = 2 0 1 3 0 1 0 1 ...is the same as... f q = p u b _ d a t e : [ 2 0 1 3 ­ 0 1 ­ 0 1 T O * ] t y p e = a r t i c l e , b l o g p o s t & b e g i n _ d a t e = 2 0 1 2 0 1 0 1 & e n d _ d a t e = 2 0 1 2 1 2 3 1 ...is the same as... f q = d o c u m e n t _ t y p e : ( " a r t i c l e " " b l o g p o s t " ) A N D p u b _ d a t e : [ 2 0 1 2 ­ 0 1 ­ 0 1 T O 2 0 1 2 ­ 1 2 ­ 1 3 ]
  14. Article Search API v2 Page parameter Paginate through 10 results

    at a time using the page parameter Page numbers start with zero (i.e. page 12 is offset 120) r e s p o n s e . m e t a . h i t s / 1 0 tells you how many pages there are in total q="Pulitzer Prize" & sort=newest & fq=source:"The New York Times" & page=12
  15. Article Search API v2 Facet Field parameter A facet is

    an aggregate count for a field, relative to a query term. The r e s p o n s e . f a c e t s object will give you the top five section names and days of the week, with (and ranked by) counts. q="Pulitzer Prize" & facet_field=section_name,day_of_week
  16. Article Search API v2 More on facets What are facets

    useful for? When constructing a front end search application, we can present the user with a list of available filters. Intelligently aiding navigation for the user is always a plus! We can make search better by coupling the with their top facets, and ranking results higher by keyword We can visualize the importance of subjects over time by reporting on facets over a moving window Presently only low‐cardinality fields are available for faceting because of performance concerns. These include source,section_name,document_type,type_of_material and day_of_week most popular search terms
  17. Article Search API v2 Facet Filter parameter By default, facets

    are aggregated only for the query term. You can also include the filter query in the facet calculation s e c t i o n _ n a m e " N e w Y o r k a n d R e g i o n " ~ 9 0 7 , e t c s e c t i o n _ n a m e " A r t s " = = = r e s p o n s e . m e t a . h i t s This concept is called adaptive facets, and is useful for sub-navigation of filtered queries q="Pulitzer Prize" & facet_field=section_name,day_of_week & fq=section_name:"Arts" q="Pulitzer Prize" & facet_field=section_name,day_of_week & fq=section_name:"Arts" & facet_filter=true
  18. The New York Times Article Search API James Boehmer [email protected]

    Don't forget to check out the Times Developer Network: And our very own Open Blog: developer.nytimes.com open.blogs.nytimes.com