Save 37% off PRO during our Black Friday Sale! »

Implementing LINQ-for-objects in Python

Implementing LINQ-for-objects in Python

The Language Integrated Query (LINQ) feature of Microsoft's .NET Framework provides an DSL for expressing queries over arbitrary data sources. Using LINQ with one data source type in particular - simple in-memory collections - has proven particularly popular with .NET programmers, such that many .NET developers now eschew many imperative constructs in favour of the declarative, functional style afforded by LINQ-to-objects.

Python already has strong support for lazily transforming collections with its generator syntax, which although concise in simple cases can be unwieldy for more complex queries.

At this meetup, I'll present "asq", an implementation of LINQ-to-objects for Python which has been well received and met with some success. I'll introduce LINQ, explain the implementation if asq and demonstrate its use. I'll also outline some ideas for future directions for the library.

4be361182fa13cf39c00ec69c1cb9e30?s=128

Robert Smallshire

October 11, 2012
Tweet

Transcript

  1. Oslo Python Meetup #OsloPython Implementing LINQ in Python Robert Smallshire

    @robsmallshire 1 Wednesday, 24 October, 12
  2. 2 Introducing LINQ What is LINQ and how does it

    work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12
  3. 3 int[] numbers = { 5, 4, 1, 3, 9,

    8, 6, 7, 2, 0 }; var numsPlusOne = from n in numbers select n + 1; Console.WriteLine("Numbers + 1:"); foreach (var i in numsPlusOne) { Console.WriteLine(i); } Numbers + 1: 6 5 2 4 10 9 7 8 3 1 Wednesday, 24 October, 12
  4. 4 List<Product> products = GetProductList(); var productNames = from p

    in products select p.ProductName; Console.WriteLine("Product Names:"); foreach (var productName in productNames) { Console.WriteLine(productName); } Product Names: Chai Chang Aniseed Syrup Chef Anton's Cajun Seasoning Chef Anton's Gumbo Mix Grandma's Boysenberry Spread Uncle Bob's Organic Dried Pears Northwoods Cranberry Sauce Mishi Kobe Niku Ikura Queso Cabrales Queso Manchego La Pastora Konbu Tofu Genen Shouyu Pavlova Alice Mutton Carnarvon Tigers Teatime Chocolate Biscuits Sir Rodney's Marmalade Sir Rodney's Scones Nord-Ost Matjeshering Gorgonzola Telino Mascarpone Fabioli Geitost Wednesday, 24 October, 12
  5. 5 int[] numbers = { 5, 4, 1, 3, 9,

    8, 6, 7, 2, 0 }; var lowNums = from n in numbers where n < 5 select n; Console.WriteLine("Numbers < 5:"); foreach (var x in lowNums) { Console.WriteLine(x); } Numbers < 5: 4 1 3 2 0 Wednesday, 24 October, 12
  6. 6 int[] numbers = { 5, 4, 1, 3, 9,

    8, 6, 7, 2, 0 }; string[] strings = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" }; var textNums = from n in numbers select strings[n]; Console.WriteLine("Number strings:"); foreach (var s in textNums) { Console.WriteLine(s); } Number strings: five four one three nine eight six seven two zero Wednesday, 24 October, 12
  7. 7 List<Product> products = GetProductList(); var sortedProducts = from p

    in products orderby p.Category, p.UnitPrice descending select p; ObjectDumper.Write(sortedProducts); ProductID=43 ProductName=Ipoh Coffee Category=Beverages UnitPrice=46.0000 UnitsInStock=17 ProductID=2 ProductName=Chang Category=Beverages UnitPrice=19.0000 UnitsInStock=17 ProductID=1 ProductName=Chai Category=Beverages UnitPrice=18.0000 UnitsInStock=39 ProductID=35 ProductName=Steeleye Stout Category=Beverages UnitPrice=18.0000 UnitsInStock=20 ProductID=39 ProductName=Chartreuse verte Category=Beverages UnitPrice=18.0000 UnitsInStock=69 ProductID=70 ProductName=Outback Lager Category=Beverages UnitPrice=15.0000 UnitsInStock=15 ProductID=34 ProductName=Sasquatch Ale Category=Beverages UnitPrice=14.0000 UnitsInStock=111 ProductID=63 ProductName=Vegie-spread Category=Condiments UnitPrice=43.9000 UnitsInStock=24 ProductID=8 ProductName=Northwoods Cranberry Sauce Category=Condiments UnitPrice=40.0000 UnitsInStock=6 ProductID=6 ProductName=Grandma's Boysenberry Spread Category=Condiments UnitPrice=25.0000 UnitsInStock=120 ProductID=4 ProductName=Chef Anton's Cajun Seasoning Category=Condiments UnitPrice=22.0000 UnitsInStock=53 ProductID=5 ProductName=Chef Anton's Gumbo Mix Category=Condiments UnitPrice=21.3500 UnitsInStock=0 ProductID=65 ProductName=Louisiana Fiery Hot Pepper Sauce Category=Condiments UnitPrice=21.0500 UnitsInStock=76 ProductID=44 ProductName=Gula Malacca Category=Condiments UnitPrice=19.4500 UnitsInStock=27 ProductID=66 ProductName=Louisiana Hot Spiced Okra Category=Condiments UnitPrice=17.0000 UnitsInStock=4 ProductID=15 ProductName=Genen Shouyu Category=Condiments UnitPrice=15.5000 UnitsInStock=39 Wednesday, 24 October, 12
  8. 8 string[] categories = { "Beverages", "Condiments", "Vegetables", "Dairy Products",

    "Seafood" }; List<Product> products = GetProductList(); var q = from c in categories join p in products on c equals p.Category into ps from p in ps select new { Category = c, ProductName = p.ProductName }; foreach (var v in q) { Console.WriteLine(v.ProductName + ": " + v.Category); } Chai: Beverages Chang: Beverages Sasquatch Ale: Beverages Steeleye Stout: Beverages Chartreuse verte: Beverages Ipoh Coffee: Beverages Laughing Lumberjack Lager: Beverages Outback Lager: Beverages Aniseed Syrup: Condiments Chef Anton's Cajun Seasoning: Condiments Chef Anton's Gumbo Mix: Condiments Grandma's Boysenberry Spread: Condiments Northwoods Cranberry Sauce: Condiments Genen Shouyu: Condiments Gula Malacca: Condiments Vegie-spread: Condiments Louisiana Fiery Hot Pepper Sauce: Condiments Louisiana Hot Spiced Okra: Condiments Original Frankfurter: Condiments Queso Cabrales: Dairy Products Queso Manchego La Pastora: Dairy Products Gorgonzola Telino: Dairy Products Mascarpone Fabioli: Dairy Products Geitost: Dairy Products Raclette Courdavault: Dairy Products Camembert Pierrot: Dairy Products Gudbrandsdalsost: Dairy Products Flotemysost: Dairy Products Mozzarella di Giovanni: Dairy Products Ikura: Seafood Konbu: Seafood Carnarvon Tigers: Seafood Nord-Ost Matjeshering: Seafood Inlagd Sill: Seafood Gravad lax: Seafood Boston Crab Meat: Seafood Jack's New England Clam Chowder: Seafood Rogede sild: Seafood Spegesild: Seafood Escargots de Bourgogne: Seafood Wednesday, 24 October, 12
  9. Q IN L 9 Wednesday, 24 October, 12

  10. L IN Q 10 anguage tegrated uery Embedded Domain Specific

    Wednesday, 24 October, 12
  11. L 11 anguage Embedded Domain Specific Wednesday, 24 October, 12

  12. 12 var longWords = from s in words where s.Length

    > 8 select s[2]; IQueryable<String> longWords = words.Where(s => s.Length > 8).Select(s => s[2]); C# compiler syntax rules LINQ query expression Chained query operator method calls (fluent interface) C# compiler backend .NET bytecode Wednesday, 24 October, 12
  13. LINQ Query Operators 13 Operator Description Aggregate Average Count LongCount

    Min Max Sum Concat Cast OfType ToArray ToDictionary ToList ToLookup ToSequence DefaultIfEmpty ElementAt ElementAtOrDefault First FirstOrDefault Last LastOrDefault Single SingleOrDefault Performs a custom method over a sequence Computes the average of a sequence of numeric values Returns the number of the items in a sequence as an int Returns the number of the items in a sequence as a long Finds the minimum number of a sequence of numbers Finds the maximum number of a sequence of numbers Sums the numbers in a sequence Concatenates two sequences into one sequence Casts elements in a sequence to a given type Filters elements in a sequence of a given type Returns an Array from a sequence Returns a Dictionary from a sequence Returns a List from a sequence Returns a Lookup from a sequence Returns an IEnumerable sequence Creates a default element for an empty sequence Returns the element at a given index in a sequence Returns the element at a given index in a sequence or a default value Returns the first element of a sequence Returns the first element of a sequence or a default value if no element is found Returns the last element of a sequence Returns the last element of a sequence or a default value if no element is found Returns the single element of a sequence Returns the single element of a sequence or a default value if no element is found Operator SequenceEqual Description Compares two sequences to see if they are equivalent Empty Range Repeat GroupBy GroupJoin Join OrderBy OrderByDescending ThenBy ThenByDescending Reverse Skip SkipWhile Take TakeWhile Select SelectMany All Any Contains Where Distinct Except Intersect Union Generates an empty sequence Generates a sequence given a range Generates a sequence by repeating an item a given number of times Groups items in a sequence by a given grouping Performs a grouped join on two sequences Performs an inner join on two sequences Orders a sequence by value(s) in ascending order Orders a sequence by value(s) in descending order Orders an already-ordered sequence in ascending order Orders an already-ordered sequence in descending order Reverses the order of the items in a sequence Returns a sequence that skips a given number of items Returns a sequence that skips items that do not meet an expression Returns a sequence that takes a given number of items Returns a sequence that takes items that meet an expression Creates a projection of parts of a sequence Creates a one-to-many projection of parts of a sequence Determines if all items in a sequence meet a condition Determines if any items in a sequence meet a condition Determines if a sequence contains a given item Filters the items in a sequence Returns a sequence without duplicate items Returns a sequence representing the difference between two sequences Returns a sequence representing the intersection of two sequences Returns a sequence representing the union of two sequences Over fifty operators for expressing queries Wednesday, 24 October, 12
  14. 14 .NET bytecode Program execution by the .NET virtual machine

    (CLR) LINQ Expression Tree Expression Tree compiled/interpreted by LINQ provider to get result LINQ provider Wednesday, 24 October, 12
  15. 15 © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media

    Wednesday, 24 October, 12
  16. 16 Introducing LINQ What is LINQ and how does it

    work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12
  17. 17 numbers = [ 5, 4, 1, 3, 9, 8,

    6, 7, 2, 0 ] nums_plus_one = [ n + 1 for n in numbers ] print("Numbers + 1:") for i in nums_plus_one print(i) Numbers + 1: 6 5 2 4 10 9 7 8 3 1 ( ) Wednesday, 24 October, 12
  18. 18 products = get_product_list() var product_names = (p.product_name for p

    in products) print("Product Names:"); for product_name in product_names: print(product_name) Product Names: Chai Chang Aniseed Syrup Chef Anton's Cajun Seasoning Chef Anton's Gumbo Mix Grandma's Boysenberry Spread Uncle Bob's Organic Dried Pears Northwoods Cranberry Sauce Mishi Kobe Niku Ikura Queso Cabrales Queso Manchego La Pastora Konbu Tofu Genen Shouyu Pavlova Alice Mutton Carnarvon Tigers Teatime Chocolate Biscuits Sir Rodney's Marmalade Sir Rodney's Scones Nord-Ost Matjeshering Gorgonzola Telino Mascarpone Fabioli Geitost Wednesday, 24 October, 12
  19. 19 numbers = [ 5, 4, 1, 3, 9, 8,

    6, 7, 2, 0 ] low_nums = (n for n in numbers if n < 5) print("Numbers < 5:") for x in low_nums: print(x) Numbers < 5: 4 1 3 2 0 Wednesday, 24 October, 12
  20. 20 numbers = [ 5, 4, 1, 3, 9, 8,

    6, 7, 2, 0 ] strings = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" ] text_nums = ( strings[n] for for n in numbers ) print("Number strings:") for s in text_nums: print(s) Number strings: five four one three nine eight six seven two zero Wednesday, 24 October, 12
  21. 21 words = [ "cherry", "apple", "blueberry" ] sorted_words =

    sorted(words, key=len) print("Sorted by length):") for w in sorted_words: print(w) Sorted by length: apple cherry blueberry Wednesday, 24 October, 12
  22. 22 from operator import attrgetter products = GetProductList() # First

    sort by *secondary* key sorted_by_price = sorted(products, key=attrgetter(‘UnitPrice’), reverse=True) # Second sort by *primary* key sorted_products = sorted(sorted_by_price, key=attrgetter(‘Category’)) print(sortedProducts) ProductID=43 ProductName=Ipoh Coffee Category=Beverages UnitPrice=46.0000 UnitsInStock=17 ProductID=2 ProductName=Chang Category=Beverages UnitPrice=19.0000 UnitsInStock=17 ProductID=1 ProductName=Chai Category=Beverages UnitPrice=18.0000 UnitsInStock=39 ProductID=35 ProductName=Steeleye Stout Category=Beverages UnitPrice=18.0000 UnitsInStock=20 ProductID=39 ProductName=Chartreuse verte Category=Beverages UnitPrice=18.0000 UnitsInStock=69 ProductID=70 ProductName=Outback Lager Category=Beverages UnitPrice=15.0000 UnitsInStock=15 ProductID=34 ProductName=Sasquatch Ale Category=Beverages UnitPrice=14.0000 UnitsInStock=111 ProductID=63 ProductName=Vegie-spread Category=Condiments UnitPrice=43.9000 UnitsInStock=24 ProductID=8 ProductName=Northwoods Cranberry Sauce Category=Condiments UnitPrice=40.0000 UnitsInStock=6 ProductID=6 ProductName=Grandma's Boysenberry Spread Category=Condiments UnitPrice=25.0000 UnitsInStock=120 ProductID=4 ProductName=Chef Anton's Cajun Seasoning Category=Condiments UnitPrice=22.0000 UnitsInStock=53 ProductID=5 ProductName=Chef Anton's Gumbo Mix Category=Condiments UnitPrice=21.3500 UnitsInStock=0 ProductID=65 ProductName=Louisiana Fiery Hot Pepper Sauce Category=Condiments UnitPrice=21.0500 UnitsInStock=76 ProductID=44 ProductName=Gula Malacca Category=Condiments UnitPrice=19.4500 UnitsInStock=27 ProductID=66 ProductName=Louisiana Hot Spiced Okra Category=Condiments UnitPrice=17.0000 UnitsInStock=4 ProductID=15 ProductName=Genen Shouyu Category=Condiments UnitPrice=15.5000 UnitsInStock=39 Wednesday, 24 October, 12
  23. 23 from collections import namedtuple categories = [ "Beverages", "Condiments",

    "Vegetables", "Dairy Products", "Seafood" ] products = GetProductList() CategorizedProduct = namedtuple(‘CategorizedProduct’, [‘name’, ‘category’]) join_result = [] for c in categories: for p in products: if p.Category == c: join_result.append( CategorizedProduct(name=p.ProductName, category=c)) for v in join_result: print("{name}: {category}".format(name=v.name, category=v.category) Chai: Beverages Chang: Beverages Sasquatch Ale: Beverages Steeleye Stout: Beverages Chartreuse verte: Beverages Ipoh Coffee: Beverages Laughing Lumberjack Lager: Beverages Outback Lager: Beverages Aniseed Syrup: Condiments Chef Anton's Cajun Seasoning: Condiments Chef Anton's Gumbo Mix: Condiments Grandma's Boysenberry Spread: Condiments Northwoods Cranberry Sauce: Condiments Genen Shouyu: Condiments Gula Malacca: Condiments Vegie-spread: Condiments Louisiana Fiery Hot Pepper Sauce: Condiments Louisiana Hot Spiced Okra: Condiments Original Frankfurter: Condiments Queso Cabrales: Dairy Products Queso Manchego La Pastora: Dairy Products Gorgonzola Telino: Dairy Products Mascarpone Fabioli: Dairy Products Geitost: Dairy Products Raclette Courdavault: Dairy Products Camembert Pierrot: Dairy Products Gudbrandsdalsost: Dairy Products Flotemysost: Dairy Products Mozzarella di Giovanni: Dairy Products Ikura: Seafood Konbu: Seafood Carnarvon Tigers: Seafood Nord-Ost Matjeshering: Seafood Inlagd Sill: Seafood Gravad lax: Seafood Boston Crab Meat: Seafood Jack's New England Clam Chowder: Seafood Rogede sild: Seafood Spegesild: Seafood Escargots de Bourgogne: Seafood Wednesday, 24 October, 12
  24. 24 Introducing LINQ What is LINQ and how does it

    work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12
  25. 25 A Python implementation of LINQ to objects and Parallel

    LINQ to objects. Wednesday, 24 October, 12
  26. Objectives Asq ‣ Express complex queries using a so-called ‘fluent’

    interface ‣ Support Python 2 and Python 3 including PyPy, Jython and IronPython ‣ Equivalent capabilities to .NET LINQ support all LINQ query operators ‣ Extensible allow clients to add new operators ‣ Reliable usable in production environments 26 >>> from asq.initiators import query >>> words = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"] >>> query(words).order_by(len).then_by().take(5).select(str.upper).to_list() ['ONE', 'SIX', 'TEN', 'TWO', 'FIVE'] Wednesday, 24 October, 12
  27. 27 Wednesday, 24 October, 12

  28. The fluent interface must be bootstrapped with a query initiator,

    the asq.initiators.query initiator accepts any iterable 28 Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from asq.initiators import query >>> numbers = [5, 4, 1, 3, 9] >>> query(numbers) Queryable(<list [5, 4, 1, 3, 9]>) >>> query(numbers).select(lambda n: n + 1) Queryable(<map [6, 5, 2, 4, 10]>) >>> query(numbers).select(lambda n: n + 1).to_list() [6, 5, 2, 4, 10] >>> list(query(numbers).select(lambda n: n + 1)) [6, 5, 2, 4, 10] Introducing Asq queries Wednesday, 24 October, 12
  29. Read the system dictionary and use Asq to strip the

    whitespace from each line 29 >>> words_file = open('/usr/share/dict/words', 'r') >>> lines = words_file.readlines() >>> words_file.close() >>> lines ['A\\n', 'a\\n', 'aa\\n', 'aal\\n', 'aalii\\n', 'aam\\n', ...] >>> from asq.initiators import query >>> query(lines) Queryable(<list ['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', ...]>) >>> query(lines).select(lambda line: line.strip()) Queryable(<map ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...]>) >>> words = query(lines).select(lambda line: line.strip()).to_list() >>> words ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...] Load /usr/share/dict/words Wednesday, 24 October, 12
  30. query the file directly, and replace the lambda by passing

    the unbound str.strip method to select() 30 Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from asq.initiators import query >>> with open('/usr/share/dict/words', 'r') as words_file: ... words = query(words_file).select(str.strip).to_list() ... >>> words ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...] >>> query(words).count() 235886 >>> query(words).order_by(len).last() 'thyroparathyroidectomize' >>> query(words).skip_while(lambda s: len(s) < 5).take(3).to_list() ['aalii', 'aam', 'Aani'] Files in Python are iterable Wednesday, 24 October, 12
  31. The Grouping type is a collection of objects sharing a

    key. Here we group all words by length. 31 >>> query(words).group_by(len).to_list() [<Grouping(key=1, items=['A', 'a', 'B', 'b', 'C', 'c', ...])>, <Grouping(key=2, items=['aa', 'Ab', 'ad', 'ae', 'Ah', 'ah', ...])>, <Grouping(key=3, items=['aal', 'aam', 'aba', 'abb', 'Abe', 'Abo', ...])>, <Grouping(key=5, items=['aalii', 'Aaron', 'abaca', 'aback', 'abaff', 'abaft', ...])>, <Grouping(key=4, items=['Aani', 'Aaru', 'abac', 'abas', 'Abba', 'Abby', ...])>, ... <Grouping(key=24, items=['formaldehydesulphoxylate', 'pathologicopsychological', 'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize'])>] group_by collates related items Wednesday, 24 October, 12
  32. To pass multiple values down the query chain, it can

    be useful to group them in a Record type with fields named in the initializer call. Use asq.record.new to conveniently create Records. 32 >>> from asq.record import new >>> query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).to_list() [Record(length=1, frequency=52), Record(length=2, frequency=160), Record(length=3, frequency=1420), Record(length=5, frequency=10230), Record(length=4, frequency=5272), Record(length=8, frequency=29989), Record(length=7, frequency=23869), Record(length=9, frequency=32403), Record(length=6, frequency=17706), Record(length=11, frequency=26013), Record(length=10, frequency=30878), Record(length=12, frequency=20462), Record(length=14, frequency=9765), Record(length=16, frequency=3377), Record(length=15, frequency=5925), Record(length=20, frequency=198), Record(length=19, frequency=428), Record(length=17, frequency=1813), Record(length=13, frequency=14939), Record(length=18, frequency=842), Record(length=21, frequency=82), Record(length=22, frequency=41), Record(length=23, frequency=17), Record(length=24, frequency=5)] Creating ad hoc objects Wednesday, 24 October, 12
  33. Convert each Record into a string consisting of the right-

    justified length and frequency number of stars. 33 >>> import math >>> query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).select(lambda r: "{length:>2} {bar} \n".format(length=r.length, bar=math.ceil(r.frequency/1000) * '*')).to_list() [' 1 *\n', ' 2 *\n', ' 3 **\n', ' 5 ***********\n', ' 4 ******\n', ' 8 ******************************\n', ' 7 ************************\n', ' 9 *********************************\n', ' 6 ******************\n', '11 ***************************\n', '10 ******************************* \n', '12 *********************\n', '14 **********\n', '16 ****\n', '15 ******\n', '20 *\n', '19 *\n', '17 **\n', '13 ***************\n', '18 *\n', '21 *\n', '22 *\n', '23 *\n', '24 *\n'] Towards a histogram Wednesday, 24 October, 12
  34. Order the records by length and use the to_str() query

    operator to concatenate the elements into a single string. Pass the result to print(). 34 >>> print(query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).order_by(lambda r: r.length).select(lambda r: "{length:>2} {bar} \n".format(length=r.length, bar=math.ceil(r.frequency/1000) * '*')).to_str()) 1 * 2 * 3 ** 4 ****** 5 *********** 6 ****************** 7 ************************ 8 ****************************** 9 ********************************* 10 ******************************* 11 *************************** 12 ********************* 13 *************** 14 ********** 15 ****** 16 **** 17 ** 18 * 19 * 20 * 21 * 22 * 23 * 24 * Printing a histogram Wednesday, 24 October, 12
  35. Use asq.selectors.a_() to create an attribute selector rather than writing

    out a lambda longhand. 35 >>> from asq.selectors import a_ >>> print(query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).order_by(a_('length')).select(lambda r: "{length:>2} {bar} \n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')).to_str()) 1 * 2 * 3 ** 4 ****** 5 *********** 6 ****************** 7 ************************ 8 ****************************** 9 ********************************* 10 ******************************* 11 *************************** 12 ********************* 13 *************** 14 ********** 15 ****** 16 **** 17 ** 18 * 19 * 20 * 21 * 22 * 23 * 24 * Creating selector functions Wednesday, 24 October, 12
  36. Use the backslash line continuation character for improved readability 36

    >>> print( query(words) \ .group_by(len) \ .select(lambda g: new(length=g.key, frequency=len(g))) \ .order_by(a_('length')) \ .select(lambda r: "{length:>2} {bar}\n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')) \ .to_str() ) 1 * 2 * 3 ** 4 ****** 5 *********** 6 ****************** 7 ************************ 8 ****************************** 9 ********************************* 10 ******************************* 11 *************************** 12 ********************* 13 *************** 14 ********** 15 ****** 16 **** 17 ** 18 * 19 * 20 * 21 * 22 * 23 * 24 * Improved query formatting Wednesday, 24 October, 12
  37. Compose arbitrarily complex sorts 37 >>> print( query(words) \ .group_by(len)

    \ .select(lambda g: new(length=g.key, frequency=len(g))) \ .order_by_descending(a_('frequency')) \ .then_by(a_(‘length’)) \ .select(lambda r: "{length:>2} {bar}\n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')) \ .to_str()) 9 ********************************* 10 ******************************* 8 ****************************** 11 *************************** 7 ************************ 12 ********************* 6 ****************** 13 *************** 5 *********** 14 ********** 4 ****** 15 ****** 16 **** 3 ** 17 ** 1 * 2 * 18 * 19 * 20 * 21 * 22 * 23 * 24 * Multi-key sorts Wednesday, 24 October, 12
  38. Asq includes many predicates as more literate alternatives to lambdas

    38 >>> query(words).where(lambda w: 'ox' in w).count() 1351 >>> >>> from asq.predicates import * >>> query(words).where(contains_('ox')).count() 1351 Predicates Wednesday, 24 October, 12
  39. Predicate combinators such as xor_ can be used to combine

    other selectors or predicates. 39 >>> query(words) \ .where(xor_(m_('startswith', 'ch'), m_('endswith', 'ing'))) <Queryable(<filter(['abhorring', 'abiding', 'abounding', 'absorbing', 'abutting', 'accommodating', ...])>)> Predicate combinators Wednesday, 24 October, 12
  40. Predicate query operators such as contains(), all() or any() 40

    >>> query(words) \ .where(xor_(m_('startswith', 'ch'), m_('endswith', 'ing'))) \ .contains(‘changing’) False Asq includes predicate operators Wednesday, 24 October, 12
  41. Most operators accept optional selectors or predicates 41 >>> query(words)

    \ .order_by(len) \ .skip_while(lambda w: len(w) < 8) \ .first() 'aardvark' >>> query(words) \ .order_by(len) \ .first(lambda w: len(w) == 8) 'aardvark' Short queries with optional args Wednesday, 24 October, 12
  42. Queries are lazily evaluated, doing as little work as necessary

    42 Laziness >>> query(words).first(contains_('EI')) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "./asq/queryables.py", line 1496, in first return self._first() if predicate is None else self._first_predicate(predicate) File "./asq/queryables.py", line 1508, in _first_predicate raise ValueError("No elements matching predicate in call to first()") ValueError: No elements matching predicate in call to first() >>> query(words).select(str.upper).first(contains_('EI')) 'ABEIGH' Wednesday, 24 October, 12
  43. Asq includes logging operators which can be incorporated into the

    query chain, used here to show laziness. 43 Logging >>> import logging >>> clog = logging.getLogger("clog") >>> clog.setLevel(logging.DEBUG) >>> clog.addHandler(logging.StreamHandler(sys.stdout)) >>> >>> query(words) .log(clog, label='source') \ ... .select(str.upper) .log(clog, label='to-upper') \ ... .first(contains_('EI')) to-upper : BEGIN (DEFERRED) source : BEGIN (DEFERRED) source : [0] yields 'A' to-upper : [0] yields 'A' source : [1] yields 'a' to-upper : [1] yields 'A' source : [2] yields 'aa' to-upper : [2] yields 'AA' source : [3] yields 'aal' to-upper : [3] yields 'AAL' ... source : [167] yields 'abecedary' to-upper : [167] yields 'ABECEDARY' source : [168] yields 'abed' to-upper : [168] yields 'ABED' source : [169] yields 'abeigh' to-upper : [169] yields 'ABEIGH' 'ABEIGH' Wednesday, 24 October, 12
  44. Most places you need to used the identity selector, lambda

    x: x it can be missed out, trigger optimizations. 44 Identity selectors can be omitted >>> query(words).order_by(len).then_by(lambda w: w).to_list() ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'Ab', 'Ah', 'Al', 'Ao', 'As', 'Ay', 'Bu', 'Ed', 'Em', 'Fo', 'Ga', 'Ge', 'Gi, ... 'pseudolamellibranchiate', 'scientificogeographical', 'thymolsulphonephthalein', 'transubstantiationalist', 'formaldehydesulphoxylate', 'pathologicopsychological', 'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize'] >>> query(words).order_by(len).then_by().to_list() ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'Ab', 'Ah', 'Al', 'Ao', 'As', 'Ay', 'Bu', 'Ed', 'Em', 'Fo', 'Ga', 'Ge', 'Gi, ... 'pseudolamellibranchiate', 'scientificogeographical', 'thymolsulphonephthalein', 'transubstantiationalist', 'formaldehydesulphoxylate', 'pathologicopsychological', 'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize'] Wednesday, 24 October, 12
  45. Asq includes tools for adding new query operators 45 Extending

    Asq >>> from asq.extension import extend >>> from asq.queryables import Queryable >>> @extend(Queryable) ... def separate_with(self, separator): ... def generator(): ... i = iter(self) ... try: ... yield next(i) ... except StopIteration: ... return ... for item in i: ... yield separator ... yield item ... return self._create(generator()) ... >>> query(words).separate_with('**').take(10).to_list() ['A', '**', 'a', '**', 'aa', '**', 'aal', '**', 'aalii', '**'] Wednesday, 24 October, 12
  46. 46 Introducing LINQ What is LINQ and how does it

    work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12
  47. 47 © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media

    Wednesday, 24 October, 12
  48. 48 © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media

    Wednesday, 24 October, 12
  49. Making Asq faster and scalable 49 Parallel improvements ‣ Support

    parallel back ends • Currently uses multiprocessing module • Could use threading module (Jython, IronPython) • Considering OpenCL backend for numeric arrays ‣ API changes required for parallel • some query operators have order dependent results requiring a different API, e.g. aggregate() ‣ Provide parallel implementations of more operators • only a handful of operators have parallel implementations Wednesday, 24 October, 12
  50. 50 Thanks! http://asq.googlecode.com @robsmallshire Wednesday, 24 October, 12