A Closer Look at Our Article Search API

The New York Times Article Search API James Boehmer [email protected]
@jamesboehmer github.com/jamesboehmer

Your API Key is: undefined

Getting an API Key First things first, you need a
key to use NYT APIs! 1. Go to (log in) 2. Choose the API you want to use 3. Agree to the evil terms of service!!! developer.nytimes.com/apps/register

Article Search API v2 Let's say you want to find
an article in the archives. You'll want to use the new Article Search API. Use Built in help docs at Developer Network docs at Don't forget the api-key parameter! api.nytimes.com/svc/search/v2/articlesearch.json /svc/search/v2/help.json developer.nytimes.com

Article Search API v2 Query (q) parameter The q parameter
searches the body, headline, and byline for relevant results. (~18,750 hits) (~17,770 hits) q=Pulitzer Prize q="Pulitzer Prize"

Article Search API v2 Highlight (hl) parameter All results get
returned with a headline and snippet. Use the hl parameter to highlight the query term. q="Pulitzer Prize" & hl=true { w e b _ u r l : " h t t p : / / s e l e c t . n y t i m e s . c o m / g s t / a b s t r a c t . h t m l ? r e s = 9 E 0 6 E 3 . . . " , s n i p p e t : " c o n d e m n i n g t h e < s t r o n g > P u l i t z e r P r i z e < / s t r o n g > a w a r d t o . . . " , h e a d l i n e : { m a i n : " T h e < s t r o n g > P u l i t z e r P r i z e < / s t r o n g > . " } }

Article Search API v2 Begin/End Date parameters Filter your search
results by publication date (~230 hits) The begin_date and end_date parameters are inclusive filters for limiting the search corpus by publication date. q="Pulitzer Prize" & begin_date=20130101

Article Search API v2 Begin/End Date parameters (cont) The begin_date
and end_date parameters can be used together or alone, implying an open ended filter (~230 hits) (~17500 hits) (~10 hits) q="Pulitzer Prize" & begin_date=20130101 q="Pulitzer Prize" & end_date=20121231 q=pulitzer "pentagon papers" & begin_date=19720101 & end_date=19721231

Article Search API v2 Sort parameter The sort parameter sorts
the results by publication date, forcibly overriding relevance scores. Relevance is still calculated for the query term, but only for inclusion in the result set Documents with no publication date (e.g. references and lists) are returned last q="Pulitzer Prize" & sort=newest

Article Search API v2 Filter Query (fq) parameter Use standard
syntax to create a custom filter Similar to the date parameters, the filter query also limits the corpus before searching for the query term The fields available for filtering behave in various way based on how they are analyzed at index time. Lucene q="Pulitzer Prize" & sort=newest & fq=source:"The New York Times"

Article Search API v2 Filter Query (fq) fields F i
e l d B e h a v i o r b o d y m u l t i p l e t o k e n s b o d y . s e a r c h l e f t - e d g e n - g r a m s c r e a t i v e _ w o r k s s i n g l e t o k e n c r e a t i v e _ w o r k s . c o n t a i n s m u l t i p l e t o k e n s d a y _ o f _ w e e k s i n g l e t o k e n d o c u m e n t _ t y p e c a s e s e n s i t i v e e x a c t m a t c h g l o c a t i o n s s i n g l e t o k e n g l o c a t i o n s . c o n t a i n s m u l t i p l e t o k e n s h e a d l i n e m u l t i p l e t o k e n s h e a d l i n e . s e a r c h l e f t - e d g e n - g r a m s k i c k e r s i n g l e t o k e n k i c k e r . c o n t a i n s m u l t i p l e t o k e n s n e w s _ d e s k s i n g l e t o k e n n e w s _ d e s k . c o n t a i n s m u l t i p l e t o k e n s o r g a n i z a t i o n s s i n g l e t o k e n o r g a n i z a t i o n s . c o n t a i n s m u l t i p l e t o k e n s p e r s o n s s i n g l e t o k e n p e r s o n s . c o n t a i n s m u l t i p l e t o k e n s p u b _ d a t e t i m e s t a m p ( Y Y Y Y - M M - D D ) p u b _ y e a r i n t e g e r s e c p g m u l t i p l e t o k e n s

Article Search API v2 Filter Query (fq) fields (cont) Various
fields can be combined in a complex way to narrow down exactly what you want The default boolean between values in parenthesis is OR Explicit booleans (AND, OR) must always be UPPER CASE F i e l d B e h a v i o r s o u r c e s i n g l e t o k e n s o u r c e . c o n t a i n s m u l t i p l e t o k e n s s u b j e c t s i n g l e t o k e n s u b j e c t . c o n t a i n s m u l t i p l e t o k e n s s e c t i o n _ n a m e s i n g l e t o k e n s e c t i o n _ n a m e . c o n t a i n s m u l t i p l e t o k e n s t y p e _ o f _ m a t e r i a l s i n g l e t o k e n t y p e _ o f _ m a t e r i a l . c o n t a i n s m u l t i p l e t o k e n s w e b _ u r l c a s e s e n s i t i v e s i n g l e t o k e n w o r d _ c o u n t i n t e g e r

Article Search API v2 Type parameter Filter by document_type using
the type parameter Multiple document types can be comma-separated q="Pulitzer Prize" & sort=newest & type=blogpost,multimedia

Article Search API v2 More about filter‐like parameters The type,
begin_date and end_date parameters are API conveniences. They are functionally equivalent filter queries, joined by a logical AND t y p e = b l o g p o s t , m u l t i m e d i a ...is the same as... f q = d o c u m e n t _ t y p e : ( " b l o g p o s t " " m u l t i m e d i a " ) ...which is the same as... f q = d o c u m e n t _ t y p e : " b l o g p o s t " O R d o c u m e n t _ t y p e : " m u l t i m e d i a "

Article Search API v2 More about filter‐like parameters (cont) b
e g i n _ d a t e = 2 0 1 3 0 1 0 1 ...is the same as... f q = p u b _ d a t e : [ 2 0 1 3 0 1 0 1 T O * ] t y p e = a r t i c l e , b l o g p o s t & b e g i n _ d a t e = 2 0 1 2 0 1 0 1 & e n d _ d a t e = 2 0 1 2 1 2 3 1 ...is the same as... f q = d o c u m e n t _ t y p e : ( " a r t i c l e " " b l o g p o s t " ) A N D p u b _ d a t e : [ 2 0 1 2 0 1 0 1 T O 2 0 1 2 1 2 1 3 ]

Article Search API v2 Page parameter Paginate through 10 results
at a time using the page parameter Page numbers start with zero (i.e. page 12 is offset 120) r e s p o n s e . m e t a . h i t s / 1 0 tells you how many pages there are in total q="Pulitzer Prize" & sort=newest & fq=source:"The New York Times" & page=12

Article Search API v2 Facet Field parameter A facet is
an aggregate count for a field, relative to a query term. The r e s p o n s e . f a c e t s object will give you the top five section names and days of the week, with (and ranked by) counts. q="Pulitzer Prize" & facet_field=section_name,day_of_week

Article Search API v2 More on facets What are facets
useful for? When constructing a front end search application, we can present the user with a list of available filters. Intelligently aiding navigation for the user is always a plus! We can make search better by coupling the with their top facets, and ranking results higher by keyword We can visualize the importance of subjects over time by reporting on facets over a moving window Presently only low‐cardinality fields are available for faceting because of performance concerns. These include source,section_name,document_type,type_of_material and day_of_week most popular search terms

Article Search API v2 Facet Filter parameter By default, facets
are aggregated only for the query term. You can also include the filter query in the facet calculation s e c t i o n _ n a m e " N e w Y o r k a n d R e g i o n " ~ 9 0 7 , e t c s e c t i o n _ n a m e " A r t s " = = = r e s p o n s e . m e t a . h i t s This concept is called adaptive facets, and is useful for sub-navigation of filtered queries q="Pulitzer Prize" & facet_field=section_name,day_of_week & fq=section_name:"Arts" q="Pulitzer Prize" & facet_field=section_name,day_of_week & fq=section_name:"Arts" & facet_filter=true

The New York Times Article Search API James Boehmer [email protected]
Don't forget to check out the Times Developer Network: And our very own Open Blog: developer.nytimes.com open.blogs.nytimes.com

Thank you!

A Closer Look at Our Article Search API

A Closer Look at Our Article Search API

The New York Times Developers

More Decks by The New York Times Developers

Other Decks in Technology

Featured

Transcript

The New York Times Article Search API James Boehmer [email protected]

Your API Key is: undefined

Getting an API Key First things first, you need a

Article Search API v2 Let's say you want to find

Article Search API v2 Query (q) parameter The q parameter

Article Search API v2 Highlight (hl) parameter All results get

Article Search API v2 Begin/End Date parameters Filter your search

Article Search API v2 Begin/End Date parameters (cont) The begin_date

Article Search API v2 Sort parameter The sort parameter sorts

Article Search API v2 Filter Query (fq) parameter Use standard

Article Search API v2 Filter Query (fq) fields F i

Article Search API v2 Filter Query (fq) fields (cont) Various

Article Search API v2 Type parameter Filter by document_type using

Article Search API v2 More about filter‐like parameters The type,

Article Search API v2 More about filter‐like parameters (cont) b

Article Search API v2 Page parameter Paginate through 10 results

Article Search API v2 Facet Field parameter A facet is

Article Search API v2 More on facets What are facets

Article Search API v2 Facet Filter parameter By default, facets

The New York Times Article Search API James Boehmer [email protected]

Thank you!