Slide 1

Slide 1 text

Oslo Python Meetup #OsloPython Implementing LINQ in Python Robert Smallshire @robsmallshire 1 Wednesday, 24 October, 12

Slide 2

Slide 2 text

2 Introducing LINQ What is LINQ and how does it work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12

Slide 3

Slide 3 text

3 int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; var numsPlusOne = from n in numbers select n + 1; Console.WriteLine("Numbers + 1:"); foreach (var i in numsPlusOne) { Console.WriteLine(i); } Numbers + 1: 6 5 2 4 10 9 7 8 3 1 Wednesday, 24 October, 12

Slide 4

Slide 4 text

4 List products = GetProductList(); var productNames = from p in products select p.ProductName; Console.WriteLine("Product Names:"); foreach (var productName in productNames) { Console.WriteLine(productName); } Product Names: Chai Chang Aniseed Syrup Chef Anton's Cajun Seasoning Chef Anton's Gumbo Mix Grandma's Boysenberry Spread Uncle Bob's Organic Dried Pears Northwoods Cranberry Sauce Mishi Kobe Niku Ikura Queso Cabrales Queso Manchego La Pastora Konbu Tofu Genen Shouyu Pavlova Alice Mutton Carnarvon Tigers Teatime Chocolate Biscuits Sir Rodney's Marmalade Sir Rodney's Scones Nord-Ost Matjeshering Gorgonzola Telino Mascarpone Fabioli Geitost Wednesday, 24 October, 12

Slide 5

Slide 5 text

5 int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; var lowNums = from n in numbers where n < 5 select n; Console.WriteLine("Numbers < 5:"); foreach (var x in lowNums) { Console.WriteLine(x); } Numbers < 5: 4 1 3 2 0 Wednesday, 24 October, 12

Slide 6

Slide 6 text

6 int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; string[] strings = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" }; var textNums = from n in numbers select strings[n]; Console.WriteLine("Number strings:"); foreach (var s in textNums) { Console.WriteLine(s); } Number strings: five four one three nine eight six seven two zero Wednesday, 24 October, 12

Slide 7

Slide 7 text

7 List products = GetProductList(); var sortedProducts = from p in products orderby p.Category, p.UnitPrice descending select p; ObjectDumper.Write(sortedProducts); ProductID=43 ProductName=Ipoh Coffee Category=Beverages UnitPrice=46.0000 UnitsInStock=17 ProductID=2 ProductName=Chang Category=Beverages UnitPrice=19.0000 UnitsInStock=17 ProductID=1 ProductName=Chai Category=Beverages UnitPrice=18.0000 UnitsInStock=39 ProductID=35 ProductName=Steeleye Stout Category=Beverages UnitPrice=18.0000 UnitsInStock=20 ProductID=39 ProductName=Chartreuse verte Category=Beverages UnitPrice=18.0000 UnitsInStock=69 ProductID=70 ProductName=Outback Lager Category=Beverages UnitPrice=15.0000 UnitsInStock=15 ProductID=34 ProductName=Sasquatch Ale Category=Beverages UnitPrice=14.0000 UnitsInStock=111 ProductID=63 ProductName=Vegie-spread Category=Condiments UnitPrice=43.9000 UnitsInStock=24 ProductID=8 ProductName=Northwoods Cranberry Sauce Category=Condiments UnitPrice=40.0000 UnitsInStock=6 ProductID=6 ProductName=Grandma's Boysenberry Spread Category=Condiments UnitPrice=25.0000 UnitsInStock=120 ProductID=4 ProductName=Chef Anton's Cajun Seasoning Category=Condiments UnitPrice=22.0000 UnitsInStock=53 ProductID=5 ProductName=Chef Anton's Gumbo Mix Category=Condiments UnitPrice=21.3500 UnitsInStock=0 ProductID=65 ProductName=Louisiana Fiery Hot Pepper Sauce Category=Condiments UnitPrice=21.0500 UnitsInStock=76 ProductID=44 ProductName=Gula Malacca Category=Condiments UnitPrice=19.4500 UnitsInStock=27 ProductID=66 ProductName=Louisiana Hot Spiced Okra Category=Condiments UnitPrice=17.0000 UnitsInStock=4 ProductID=15 ProductName=Genen Shouyu Category=Condiments UnitPrice=15.5000 UnitsInStock=39 Wednesday, 24 October, 12

Slide 8

Slide 8 text

8 string[] categories = { "Beverages", "Condiments", "Vegetables", "Dairy Products", "Seafood" }; List products = GetProductList(); var q = from c in categories join p in products on c equals p.Category into ps from p in ps select new { Category = c, ProductName = p.ProductName }; foreach (var v in q) { Console.WriteLine(v.ProductName + ": " + v.Category); } Chai: Beverages Chang: Beverages Sasquatch Ale: Beverages Steeleye Stout: Beverages Chartreuse verte: Beverages Ipoh Coffee: Beverages Laughing Lumberjack Lager: Beverages Outback Lager: Beverages Aniseed Syrup: Condiments Chef Anton's Cajun Seasoning: Condiments Chef Anton's Gumbo Mix: Condiments Grandma's Boysenberry Spread: Condiments Northwoods Cranberry Sauce: Condiments Genen Shouyu: Condiments Gula Malacca: Condiments Vegie-spread: Condiments Louisiana Fiery Hot Pepper Sauce: Condiments Louisiana Hot Spiced Okra: Condiments Original Frankfurter: Condiments Queso Cabrales: Dairy Products Queso Manchego La Pastora: Dairy Products Gorgonzola Telino: Dairy Products Mascarpone Fabioli: Dairy Products Geitost: Dairy Products Raclette Courdavault: Dairy Products Camembert Pierrot: Dairy Products Gudbrandsdalsost: Dairy Products Flotemysost: Dairy Products Mozzarella di Giovanni: Dairy Products Ikura: Seafood Konbu: Seafood Carnarvon Tigers: Seafood Nord-Ost Matjeshering: Seafood Inlagd Sill: Seafood Gravad lax: Seafood Boston Crab Meat: Seafood Jack's New England Clam Chowder: Seafood Rogede sild: Seafood Spegesild: Seafood Escargots de Bourgogne: Seafood Wednesday, 24 October, 12

Slide 9

Slide 9 text

Q IN L 9 Wednesday, 24 October, 12

Slide 10

Slide 10 text

L IN Q 10 anguage tegrated uery Embedded Domain Specific Wednesday, 24 October, 12

Slide 11

Slide 11 text

L 11 anguage Embedded Domain Specific Wednesday, 24 October, 12

Slide 12

Slide 12 text

12 var longWords = from s in words where s.Length > 8 select s[2]; IQueryable longWords = words.Where(s => s.Length > 8).Select(s => s[2]); C# compiler syntax rules LINQ query expression Chained query operator method calls (fluent interface) C# compiler backend .NET bytecode Wednesday, 24 October, 12

Slide 13

Slide 13 text

LINQ Query Operators 13 Operator Description Aggregate Average Count LongCount Min Max Sum Concat Cast OfType ToArray ToDictionary ToList ToLookup ToSequence DefaultIfEmpty ElementAt ElementAtOrDefault First FirstOrDefault Last LastOrDefault Single SingleOrDefault Performs a custom method over a sequence Computes the average of a sequence of numeric values Returns the number of the items in a sequence as an int Returns the number of the items in a sequence as a long Finds the minimum number of a sequence of numbers Finds the maximum number of a sequence of numbers Sums the numbers in a sequence Concatenates two sequences into one sequence Casts elements in a sequence to a given type Filters elements in a sequence of a given type Returns an Array from a sequence Returns a Dictionary from a sequence Returns a List from a sequence Returns a Lookup from a sequence Returns an IEnumerable sequence Creates a default element for an empty sequence Returns the element at a given index in a sequence Returns the element at a given index in a sequence or a default value Returns the first element of a sequence Returns the first element of a sequence or a default value if no element is found Returns the last element of a sequence Returns the last element of a sequence or a default value if no element is found Returns the single element of a sequence Returns the single element of a sequence or a default value if no element is found Operator SequenceEqual Description Compares two sequences to see if they are equivalent Empty Range Repeat GroupBy GroupJoin Join OrderBy OrderByDescending ThenBy ThenByDescending Reverse Skip SkipWhile Take TakeWhile Select SelectMany All Any Contains Where Distinct Except Intersect Union Generates an empty sequence Generates a sequence given a range Generates a sequence by repeating an item a given number of times Groups items in a sequence by a given grouping Performs a grouped join on two sequences Performs an inner join on two sequences Orders a sequence by value(s) in ascending order Orders a sequence by value(s) in descending order Orders an already-ordered sequence in ascending order Orders an already-ordered sequence in descending order Reverses the order of the items in a sequence Returns a sequence that skips a given number of items Returns a sequence that skips items that do not meet an expression Returns a sequence that takes a given number of items Returns a sequence that takes items that meet an expression Creates a projection of parts of a sequence Creates a one-to-many projection of parts of a sequence Determines if all items in a sequence meet a condition Determines if any items in a sequence meet a condition Determines if a sequence contains a given item Filters the items in a sequence Returns a sequence without duplicate items Returns a sequence representing the difference between two sequences Returns a sequence representing the intersection of two sequences Returns a sequence representing the union of two sequences Over fifty operators for expressing queries Wednesday, 24 October, 12

Slide 14

Slide 14 text

14 .NET bytecode Program execution by the .NET virtual machine (CLR) LINQ Expression Tree Expression Tree compiled/interpreted by LINQ provider to get result LINQ provider Wednesday, 24 October, 12

Slide 15

Slide 15 text

15 © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media Wednesday, 24 October, 12

Slide 16

Slide 16 text

16 Introducing LINQ What is LINQ and how does it work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12

Slide 17

Slide 17 text

17 numbers = [ 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 ] nums_plus_one = [ n + 1 for n in numbers ] print("Numbers + 1:") for i in nums_plus_one print(i) Numbers + 1: 6 5 2 4 10 9 7 8 3 1 ( ) Wednesday, 24 October, 12

Slide 18

Slide 18 text

18 products = get_product_list() var product_names = (p.product_name for p in products) print("Product Names:"); for product_name in product_names: print(product_name) Product Names: Chai Chang Aniseed Syrup Chef Anton's Cajun Seasoning Chef Anton's Gumbo Mix Grandma's Boysenberry Spread Uncle Bob's Organic Dried Pears Northwoods Cranberry Sauce Mishi Kobe Niku Ikura Queso Cabrales Queso Manchego La Pastora Konbu Tofu Genen Shouyu Pavlova Alice Mutton Carnarvon Tigers Teatime Chocolate Biscuits Sir Rodney's Marmalade Sir Rodney's Scones Nord-Ost Matjeshering Gorgonzola Telino Mascarpone Fabioli Geitost Wednesday, 24 October, 12

Slide 19

Slide 19 text

19 numbers = [ 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 ] low_nums = (n for n in numbers if n < 5) print("Numbers < 5:") for x in low_nums: print(x) Numbers < 5: 4 1 3 2 0 Wednesday, 24 October, 12

Slide 20

Slide 20 text

20 numbers = [ 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 ] strings = [ "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" ] text_nums = ( strings[n] for for n in numbers ) print("Number strings:") for s in text_nums: print(s) Number strings: five four one three nine eight six seven two zero Wednesday, 24 October, 12

Slide 21

Slide 21 text

21 words = [ "cherry", "apple", "blueberry" ] sorted_words = sorted(words, key=len) print("Sorted by length):") for w in sorted_words: print(w) Sorted by length: apple cherry blueberry Wednesday, 24 October, 12

Slide 22

Slide 22 text

22 from operator import attrgetter products = GetProductList() # First sort by *secondary* key sorted_by_price = sorted(products, key=attrgetter(‘UnitPrice’), reverse=True) # Second sort by *primary* key sorted_products = sorted(sorted_by_price, key=attrgetter(‘Category’)) print(sortedProducts) ProductID=43 ProductName=Ipoh Coffee Category=Beverages UnitPrice=46.0000 UnitsInStock=17 ProductID=2 ProductName=Chang Category=Beverages UnitPrice=19.0000 UnitsInStock=17 ProductID=1 ProductName=Chai Category=Beverages UnitPrice=18.0000 UnitsInStock=39 ProductID=35 ProductName=Steeleye Stout Category=Beverages UnitPrice=18.0000 UnitsInStock=20 ProductID=39 ProductName=Chartreuse verte Category=Beverages UnitPrice=18.0000 UnitsInStock=69 ProductID=70 ProductName=Outback Lager Category=Beverages UnitPrice=15.0000 UnitsInStock=15 ProductID=34 ProductName=Sasquatch Ale Category=Beverages UnitPrice=14.0000 UnitsInStock=111 ProductID=63 ProductName=Vegie-spread Category=Condiments UnitPrice=43.9000 UnitsInStock=24 ProductID=8 ProductName=Northwoods Cranberry Sauce Category=Condiments UnitPrice=40.0000 UnitsInStock=6 ProductID=6 ProductName=Grandma's Boysenberry Spread Category=Condiments UnitPrice=25.0000 UnitsInStock=120 ProductID=4 ProductName=Chef Anton's Cajun Seasoning Category=Condiments UnitPrice=22.0000 UnitsInStock=53 ProductID=5 ProductName=Chef Anton's Gumbo Mix Category=Condiments UnitPrice=21.3500 UnitsInStock=0 ProductID=65 ProductName=Louisiana Fiery Hot Pepper Sauce Category=Condiments UnitPrice=21.0500 UnitsInStock=76 ProductID=44 ProductName=Gula Malacca Category=Condiments UnitPrice=19.4500 UnitsInStock=27 ProductID=66 ProductName=Louisiana Hot Spiced Okra Category=Condiments UnitPrice=17.0000 UnitsInStock=4 ProductID=15 ProductName=Genen Shouyu Category=Condiments UnitPrice=15.5000 UnitsInStock=39 Wednesday, 24 October, 12

Slide 23

Slide 23 text

23 from collections import namedtuple categories = [ "Beverages", "Condiments", "Vegetables", "Dairy Products", "Seafood" ] products = GetProductList() CategorizedProduct = namedtuple(‘CategorizedProduct’, [‘name’, ‘category’]) join_result = [] for c in categories: for p in products: if p.Category == c: join_result.append( CategorizedProduct(name=p.ProductName, category=c)) for v in join_result: print("{name}: {category}".format(name=v.name, category=v.category) Chai: Beverages Chang: Beverages Sasquatch Ale: Beverages Steeleye Stout: Beverages Chartreuse verte: Beverages Ipoh Coffee: Beverages Laughing Lumberjack Lager: Beverages Outback Lager: Beverages Aniseed Syrup: Condiments Chef Anton's Cajun Seasoning: Condiments Chef Anton's Gumbo Mix: Condiments Grandma's Boysenberry Spread: Condiments Northwoods Cranberry Sauce: Condiments Genen Shouyu: Condiments Gula Malacca: Condiments Vegie-spread: Condiments Louisiana Fiery Hot Pepper Sauce: Condiments Louisiana Hot Spiced Okra: Condiments Original Frankfurter: Condiments Queso Cabrales: Dairy Products Queso Manchego La Pastora: Dairy Products Gorgonzola Telino: Dairy Products Mascarpone Fabioli: Dairy Products Geitost: Dairy Products Raclette Courdavault: Dairy Products Camembert Pierrot: Dairy Products Gudbrandsdalsost: Dairy Products Flotemysost: Dairy Products Mozzarella di Giovanni: Dairy Products Ikura: Seafood Konbu: Seafood Carnarvon Tigers: Seafood Nord-Ost Matjeshering: Seafood Inlagd Sill: Seafood Gravad lax: Seafood Boston Crab Meat: Seafood Jack's New England Clam Chowder: Seafood Rogede sild: Seafood Spegesild: Seafood Escargots de Bourgogne: Seafood Wednesday, 24 October, 12

Slide 24

Slide 24 text

24 Introducing LINQ What is LINQ and how does it work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12

Slide 25

Slide 25 text

25 A Python implementation of LINQ to objects and Parallel LINQ to objects. Wednesday, 24 October, 12

Slide 26

Slide 26 text

Objectives Asq ‣ Express complex queries using a so-called ‘fluent’ interface ‣ Support Python 2 and Python 3 including PyPy, Jython and IronPython ‣ Equivalent capabilities to .NET LINQ support all LINQ query operators ‣ Extensible allow clients to add new operators ‣ Reliable usable in production environments 26 >>> from asq.initiators import query >>> words = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"] >>> query(words).order_by(len).then_by().take(5).select(str.upper).to_list() ['ONE', 'SIX', 'TEN', 'TWO', 'FIVE'] Wednesday, 24 October, 12

Slide 27

Slide 27 text

27 Wednesday, 24 October, 12

Slide 28

Slide 28 text

The fluent interface must be bootstrapped with a query initiator, the asq.initiators.query initiator accepts any iterable 28 Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from asq.initiators import query >>> numbers = [5, 4, 1, 3, 9] >>> query(numbers) Queryable() >>> query(numbers).select(lambda n: n + 1) Queryable() >>> query(numbers).select(lambda n: n + 1).to_list() [6, 5, 2, 4, 10] >>> list(query(numbers).select(lambda n: n + 1)) [6, 5, 2, 4, 10] Introducing Asq queries Wednesday, 24 October, 12

Slide 29

Slide 29 text

Read the system dictionary and use Asq to strip the whitespace from each line 29 >>> words_file = open('/usr/share/dict/words', 'r') >>> lines = words_file.readlines() >>> words_file.close() >>> lines ['A\\n', 'a\\n', 'aa\\n', 'aal\\n', 'aalii\\n', 'aam\\n', ...] >>> from asq.initiators import query >>> query(lines) Queryable() >>> query(lines).select(lambda line: line.strip()) Queryable() >>> words = query(lines).select(lambda line: line.strip()).to_list() >>> words ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...] Load /usr/share/dict/words Wednesday, 24 October, 12

Slide 30

Slide 30 text

query the file directly, and replace the lambda by passing the unbound str.strip method to select() 30 Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from asq.initiators import query >>> with open('/usr/share/dict/words', 'r') as words_file: ... words = query(words_file).select(str.strip).to_list() ... >>> words ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...] >>> query(words).count() 235886 >>> query(words).order_by(len).last() 'thyroparathyroidectomize' >>> query(words).skip_while(lambda s: len(s) < 5).take(3).to_list() ['aalii', 'aam', 'Aani'] Files in Python are iterable Wednesday, 24 October, 12

Slide 31

Slide 31 text

The Grouping type is a collection of objects sharing a key. Here we group all words by length. 31 >>> query(words).group_by(len).to_list() [, , , , , ... ] group_by collates related items Wednesday, 24 October, 12

Slide 32

Slide 32 text

To pass multiple values down the query chain, it can be useful to group them in a Record type with fields named in the initializer call. Use asq.record.new to conveniently create Records. 32 >>> from asq.record import new >>> query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).to_list() [Record(length=1, frequency=52), Record(length=2, frequency=160), Record(length=3, frequency=1420), Record(length=5, frequency=10230), Record(length=4, frequency=5272), Record(length=8, frequency=29989), Record(length=7, frequency=23869), Record(length=9, frequency=32403), Record(length=6, frequency=17706), Record(length=11, frequency=26013), Record(length=10, frequency=30878), Record(length=12, frequency=20462), Record(length=14, frequency=9765), Record(length=16, frequency=3377), Record(length=15, frequency=5925), Record(length=20, frequency=198), Record(length=19, frequency=428), Record(length=17, frequency=1813), Record(length=13, frequency=14939), Record(length=18, frequency=842), Record(length=21, frequency=82), Record(length=22, frequency=41), Record(length=23, frequency=17), Record(length=24, frequency=5)] Creating ad hoc objects Wednesday, 24 October, 12

Slide 33

Slide 33 text

Convert each Record into a string consisting of the right- justified length and frequency number of stars. 33 >>> import math >>> query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).select(lambda r: "{length:>2} {bar} \n".format(length=r.length, bar=math.ceil(r.frequency/1000) * '*')).to_list() [' 1 *\n', ' 2 *\n', ' 3 **\n', ' 5 ***********\n', ' 4 ******\n', ' 8 ******************************\n', ' 7 ************************\n', ' 9 *********************************\n', ' 6 ******************\n', '11 ***************************\n', '10 ******************************* \n', '12 *********************\n', '14 **********\n', '16 ****\n', '15 ******\n', '20 *\n', '19 *\n', '17 **\n', '13 ***************\n', '18 *\n', '21 *\n', '22 *\n', '23 *\n', '24 *\n'] Towards a histogram Wednesday, 24 October, 12

Slide 34

Slide 34 text

Order the records by length and use the to_str() query operator to concatenate the elements into a single string. Pass the result to print(). 34 >>> print(query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).order_by(lambda r: r.length).select(lambda r: "{length:>2} {bar} \n".format(length=r.length, bar=math.ceil(r.frequency/1000) * '*')).to_str()) 1 * 2 * 3 ** 4 ****** 5 *********** 6 ****************** 7 ************************ 8 ****************************** 9 ********************************* 10 ******************************* 11 *************************** 12 ********************* 13 *************** 14 ********** 15 ****** 16 **** 17 ** 18 * 19 * 20 * 21 * 22 * 23 * 24 * Printing a histogram Wednesday, 24 October, 12

Slide 35

Slide 35 text

Use asq.selectors.a_() to create an attribute selector rather than writing out a lambda longhand. 35 >>> from asq.selectors import a_ >>> print(query(words).group_by(len).select(lambda g: new(length=g.key, frequency=len(g))).order_by(a_('length')).select(lambda r: "{length:>2} {bar} \n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')).to_str()) 1 * 2 * 3 ** 4 ****** 5 *********** 6 ****************** 7 ************************ 8 ****************************** 9 ********************************* 10 ******************************* 11 *************************** 12 ********************* 13 *************** 14 ********** 15 ****** 16 **** 17 ** 18 * 19 * 20 * 21 * 22 * 23 * 24 * Creating selector functions Wednesday, 24 October, 12

Slide 36

Slide 36 text

Use the backslash line continuation character for improved readability 36 >>> print( query(words) \ .group_by(len) \ .select(lambda g: new(length=g.key, frequency=len(g))) \ .order_by(a_('length')) \ .select(lambda r: "{length:>2} {bar}\n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')) \ .to_str() ) 1 * 2 * 3 ** 4 ****** 5 *********** 6 ****************** 7 ************************ 8 ****************************** 9 ********************************* 10 ******************************* 11 *************************** 12 ********************* 13 *************** 14 ********** 15 ****** 16 **** 17 ** 18 * 19 * 20 * 21 * 22 * 23 * 24 * Improved query formatting Wednesday, 24 October, 12

Slide 37

Slide 37 text

Compose arbitrarily complex sorts 37 >>> print( query(words) \ .group_by(len) \ .select(lambda g: new(length=g.key, frequency=len(g))) \ .order_by_descending(a_('frequency')) \ .then_by(a_(‘length’)) \ .select(lambda r: "{length:>2} {bar}\n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')) \ .to_str()) 9 ********************************* 10 ******************************* 8 ****************************** 11 *************************** 7 ************************ 12 ********************* 6 ****************** 13 *************** 5 *********** 14 ********** 4 ****** 15 ****** 16 **** 3 ** 17 ** 1 * 2 * 18 * 19 * 20 * 21 * 22 * 23 * 24 * Multi-key sorts Wednesday, 24 October, 12

Slide 38

Slide 38 text

Asq includes many predicates as more literate alternatives to lambdas 38 >>> query(words).where(lambda w: 'ox' in w).count() 1351 >>> >>> from asq.predicates import * >>> query(words).where(contains_('ox')).count() 1351 Predicates Wednesday, 24 October, 12

Slide 39

Slide 39 text

Predicate combinators such as xor_ can be used to combine other selectors or predicates. 39 >>> query(words) \ .where(xor_(m_('startswith', 'ch'), m_('endswith', 'ing'))) )> Predicate combinators Wednesday, 24 October, 12

Slide 40

Slide 40 text

Predicate query operators such as contains(), all() or any() 40 >>> query(words) \ .where(xor_(m_('startswith', 'ch'), m_('endswith', 'ing'))) \ .contains(‘changing’) False Asq includes predicate operators Wednesday, 24 October, 12

Slide 41

Slide 41 text

Most operators accept optional selectors or predicates 41 >>> query(words) \ .order_by(len) \ .skip_while(lambda w: len(w) < 8) \ .first() 'aardvark' >>> query(words) \ .order_by(len) \ .first(lambda w: len(w) == 8) 'aardvark' Short queries with optional args Wednesday, 24 October, 12

Slide 42

Slide 42 text

Queries are lazily evaluated, doing as little work as necessary 42 Laziness >>> query(words).first(contains_('EI')) Traceback (most recent call last): File "", line 1, in File "./asq/queryables.py", line 1496, in first return self._first() if predicate is None else self._first_predicate(predicate) File "./asq/queryables.py", line 1508, in _first_predicate raise ValueError("No elements matching predicate in call to first()") ValueError: No elements matching predicate in call to first() >>> query(words).select(str.upper).first(contains_('EI')) 'ABEIGH' Wednesday, 24 October, 12

Slide 43

Slide 43 text

Asq includes logging operators which can be incorporated into the query chain, used here to show laziness. 43 Logging >>> import logging >>> clog = logging.getLogger("clog") >>> clog.setLevel(logging.DEBUG) >>> clog.addHandler(logging.StreamHandler(sys.stdout)) >>> >>> query(words) .log(clog, label='source') \ ... .select(str.upper) .log(clog, label='to-upper') \ ... .first(contains_('EI')) to-upper : BEGIN (DEFERRED) source : BEGIN (DEFERRED) source : [0] yields 'A' to-upper : [0] yields 'A' source : [1] yields 'a' to-upper : [1] yields 'A' source : [2] yields 'aa' to-upper : [2] yields 'AA' source : [3] yields 'aal' to-upper : [3] yields 'AAL' ... source : [167] yields 'abecedary' to-upper : [167] yields 'ABECEDARY' source : [168] yields 'abed' to-upper : [168] yields 'ABED' source : [169] yields 'abeigh' to-upper : [169] yields 'ABEIGH' 'ABEIGH' Wednesday, 24 October, 12

Slide 44

Slide 44 text

Most places you need to used the identity selector, lambda x: x it can be missed out, trigger optimizations. 44 Identity selectors can be omitted >>> query(words).order_by(len).then_by(lambda w: w).to_list() ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'Ab', 'Ah', 'Al', 'Ao', 'As', 'Ay', 'Bu', 'Ed', 'Em', 'Fo', 'Ga', 'Ge', 'Gi, ... 'pseudolamellibranchiate', 'scientificogeographical', 'thymolsulphonephthalein', 'transubstantiationalist', 'formaldehydesulphoxylate', 'pathologicopsychological', 'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize'] >>> query(words).order_by(len).then_by().to_list() ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'Ab', 'Ah', 'Al', 'Ao', 'As', 'Ay', 'Bu', 'Ed', 'Em', 'Fo', 'Ga', 'Ge', 'Gi, ... 'pseudolamellibranchiate', 'scientificogeographical', 'thymolsulphonephthalein', 'transubstantiationalist', 'formaldehydesulphoxylate', 'pathologicopsychological', 'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize'] Wednesday, 24 October, 12

Slide 45

Slide 45 text

Asq includes tools for adding new query operators 45 Extending Asq >>> from asq.extension import extend >>> from asq.queryables import Queryable >>> @extend(Queryable) ... def separate_with(self, separator): ... def generator(): ... i = iter(self) ... try: ... yield next(i) ... except StopIteration: ... return ... for item in i: ... yield separator ... yield item ... return self._create(generator()) ... >>> query(words).separate_with('**').take(10).to_list() ['A', '**', 'a', '**', 'aa', '**', 'aal', '**', 'aalii', '**'] Wednesday, 24 October, 12

Slide 46

Slide 46 text

46 Introducing LINQ What is LINQ and how does it work? Does Python need LINQ? Querying collections in Python Next steps Forthcoming capabilities in Asq Asq! A LINQ-to-objects implementation for Python 1 2 3 4 Wednesday, 24 October, 12

Slide 47

Slide 47 text

47 © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media Wednesday, 24 October, 12

Slide 48

Slide 48 text

48 © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media Wednesday, 24 October, 12

Slide 49

Slide 49 text

Making Asq faster and scalable 49 Parallel improvements ‣ Support parallel back ends • Currently uses multiprocessing module • Could use threading module (Jython, IronPython) • Considering OpenCL backend for numeric arrays ‣ API changes required for parallel • some query operators have order dependent results requiring a different API, e.g. aggregate() ‣ Provide parallel implementations of more operators • only a handful of operators have parallel implementations Wednesday, 24 October, 12

Slide 50

Slide 50 text

50 Thanks! http://asq.googlecode.com @robsmallshire Wednesday, 24 October, 12