Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implementing LINQ-for-objects in Python

Implementing LINQ-for-objects in Python

The Language Integrated Query (LINQ) feature of Microsoft's .NET Framework provides an DSL for expressing queries over arbitrary data sources. Using LINQ with one data source type in particular - simple in-memory collections - has proven particularly popular with .NET programmers, such that many .NET developers now eschew many imperative constructs in favour of the declarative, functional style afforded by LINQ-to-objects.

Python already has strong support for lazily transforming collections with its generator syntax, which although concise in simple cases can be unwieldy for more complex queries.

At this meetup, I'll present "asq", an implementation of LINQ-to-objects for Python which has been well received and met with some success. I'll introduce LINQ, explain the implementation if asq and demonstrate its use. I'll also outline some ideas for future directions for the library.

Robert Smallshire

October 11, 2012
Tweet

More Decks by Robert Smallshire

Other Decks in Programming

Transcript

  1. Oslo Python Meetup #OsloPython
    Implementing LINQ in Python
    Robert Smallshire
    @robsmallshire
    1
    Wednesday, 24 October, 12

    View Slide

  2. 2
    Introducing LINQ
    What is LINQ and how does it work?
    Does Python need LINQ?
    Querying collections in Python
    Next steps
    Forthcoming capabilities in Asq
    Asq!
    A LINQ-to-objects implementation for Python
    1
    2
    3
    4
    Wednesday, 24 October, 12

    View Slide

  3. 3
    int[] numbers = { 5, 4, 1, 3, 9,
    8, 6, 7, 2, 0 };
    var numsPlusOne =
    from n in numbers
    select n + 1;
    Console.WriteLine("Numbers + 1:");
    foreach (var i in numsPlusOne)
    {
    Console.WriteLine(i);
    }
    Numbers + 1:
    6
    5
    2
    4
    10
    9
    7
    8
    3
    1
    Wednesday, 24 October, 12

    View Slide

  4. 4
    List products = GetProductList();
    var productNames =
    from p in products
    select p.ProductName;
    Console.WriteLine("Product Names:");
    foreach (var productName in productNames)
    {
    Console.WriteLine(productName);
    }
    Product Names:
    Chai
    Chang
    Aniseed Syrup
    Chef Anton's Cajun
    Seasoning
    Chef Anton's Gumbo Mix
    Grandma's Boysenberry
    Spread
    Uncle Bob's Organic
    Dried Pears
    Northwoods Cranberry
    Sauce
    Mishi Kobe Niku
    Ikura
    Queso Cabrales
    Queso Manchego La
    Pastora
    Konbu
    Tofu
    Genen Shouyu
    Pavlova
    Alice Mutton
    Carnarvon Tigers
    Teatime Chocolate
    Biscuits
    Sir Rodney's Marmalade
    Sir Rodney's Scones
    Nord-Ost Matjeshering
    Gorgonzola Telino
    Mascarpone Fabioli
    Geitost
    Wednesday, 24 October, 12

    View Slide

  5. 5
    int[] numbers = { 5, 4, 1, 3, 9,
    8, 6, 7, 2, 0 };
    var lowNums =
    from n in numbers
    where n < 5
    select n;
    Console.WriteLine("Numbers < 5:");
    foreach (var x in lowNums)
    {
    Console.WriteLine(x);
    }
    Numbers < 5:
    4
    1
    3
    2
    0
    Wednesday, 24 October, 12

    View Slide

  6. 6
    int[] numbers = { 5, 4, 1, 3, 9,
    8, 6, 7, 2, 0 };
    string[] strings = { "zero", "one", "two",
    "three", "four", "five", "six", "seven",
    "eight", "nine" };
    var textNums =
    from n in numbers
    select strings[n];
    Console.WriteLine("Number strings:");
    foreach (var s in textNums)
    {
    Console.WriteLine(s);
    }
    Number strings:
    five
    four
    one
    three
    nine
    eight
    six
    seven
    two
    zero
    Wednesday, 24 October, 12

    View Slide

  7. 7
    List products = GetProductList();
    var sortedProducts =
    from p in products
    orderby p.Category, p.UnitPrice descending
    select p;
    ObjectDumper.Write(sortedProducts);
    ProductID=43 ProductName=Ipoh Coffee Category=Beverages UnitPrice=46.0000 UnitsInStock=17
    ProductID=2 ProductName=Chang Category=Beverages UnitPrice=19.0000 UnitsInStock=17
    ProductID=1 ProductName=Chai Category=Beverages UnitPrice=18.0000 UnitsInStock=39
    ProductID=35 ProductName=Steeleye Stout Category=Beverages UnitPrice=18.0000 UnitsInStock=20
    ProductID=39 ProductName=Chartreuse verte Category=Beverages UnitPrice=18.0000 UnitsInStock=69
    ProductID=70 ProductName=Outback Lager Category=Beverages UnitPrice=15.0000 UnitsInStock=15
    ProductID=34 ProductName=Sasquatch Ale Category=Beverages UnitPrice=14.0000 UnitsInStock=111
    ProductID=63 ProductName=Vegie-spread Category=Condiments UnitPrice=43.9000 UnitsInStock=24
    ProductID=8 ProductName=Northwoods Cranberry Sauce Category=Condiments UnitPrice=40.0000 UnitsInStock=6
    ProductID=6 ProductName=Grandma's Boysenberry Spread Category=Condiments UnitPrice=25.0000 UnitsInStock=120
    ProductID=4 ProductName=Chef Anton's Cajun Seasoning Category=Condiments UnitPrice=22.0000 UnitsInStock=53
    ProductID=5 ProductName=Chef Anton's Gumbo Mix Category=Condiments UnitPrice=21.3500 UnitsInStock=0
    ProductID=65 ProductName=Louisiana Fiery Hot Pepper Sauce Category=Condiments UnitPrice=21.0500 UnitsInStock=76
    ProductID=44 ProductName=Gula Malacca Category=Condiments UnitPrice=19.4500 UnitsInStock=27
    ProductID=66 ProductName=Louisiana Hot Spiced Okra Category=Condiments UnitPrice=17.0000 UnitsInStock=4
    ProductID=15 ProductName=Genen Shouyu Category=Condiments UnitPrice=15.5000 UnitsInStock=39
    Wednesday, 24 October, 12

    View Slide

  8. 8
    string[] categories = { "Beverages", "Condiments",
    "Vegetables", "Dairy Products", "Seafood" };
    List products = GetProductList();
    var q =
    from c in categories
    join p in products on c equals p.Category into ps
    from p in ps
    select new { Category = c,
    ProductName = p.ProductName };
    foreach (var v in q)
    {
    Console.WriteLine(v.ProductName + ": "
    + v.Category);
    }
    Chai: Beverages
    Chang: Beverages
    Sasquatch Ale: Beverages
    Steeleye Stout: Beverages
    Chartreuse verte: Beverages
    Ipoh Coffee: Beverages
    Laughing Lumberjack Lager: Beverages
    Outback Lager: Beverages
    Aniseed Syrup: Condiments
    Chef Anton's Cajun Seasoning: Condiments
    Chef Anton's Gumbo Mix: Condiments
    Grandma's Boysenberry Spread: Condiments
    Northwoods Cranberry Sauce: Condiments
    Genen Shouyu: Condiments
    Gula Malacca: Condiments
    Vegie-spread: Condiments
    Louisiana Fiery Hot Pepper Sauce: Condiments
    Louisiana Hot Spiced Okra: Condiments
    Original Frankfurter: Condiments
    Queso Cabrales: Dairy Products
    Queso Manchego La Pastora: Dairy Products
    Gorgonzola Telino: Dairy Products
    Mascarpone Fabioli: Dairy Products
    Geitost: Dairy Products
    Raclette Courdavault: Dairy Products
    Camembert Pierrot: Dairy Products
    Gudbrandsdalsost: Dairy Products
    Flotemysost: Dairy Products
    Mozzarella di Giovanni: Dairy Products
    Ikura: Seafood
    Konbu: Seafood
    Carnarvon Tigers: Seafood
    Nord-Ost Matjeshering: Seafood
    Inlagd Sill: Seafood
    Gravad lax: Seafood
    Boston Crab Meat: Seafood
    Jack's New England Clam Chowder: Seafood
    Rogede sild: Seafood
    Spegesild: Seafood
    Escargots de Bourgogne: Seafood
    Wednesday, 24 October, 12

    View Slide

  9. Q
    IN
    L
    9
    Wednesday, 24 October, 12

    View Slide

  10. L
    IN
    Q
    10
    anguage
    tegrated
    uery
    Embedded
    Domain Specific
    Wednesday, 24 October, 12

    View Slide

  11. L
    11
    anguage
    Embedded
    Domain Specific
    Wednesday, 24 October, 12

    View Slide

  12. 12
    var longWords =
    from s in words
    where s.Length > 8
    select s[2];
    IQueryable longWords =
    words.Where(s => s.Length > 8).Select(s => s[2]);
    C# compiler syntax rules
    LINQ query
    expression
    Chained query
    operator method calls
    (fluent interface)
    C# compiler backend
    .NET bytecode
    Wednesday, 24 October, 12

    View Slide

  13. LINQ Query Operators
    13
    Operator Description
    Aggregate
    Average
    Count
    LongCount
    Min
    Max
    Sum
    Concat
    Cast
    OfType
    ToArray
    ToDictionary
    ToList
    ToLookup
    ToSequence
    DefaultIfEmpty
    ElementAt
    ElementAtOrDefault
    First
    FirstOrDefault
    Last
    LastOrDefault
    Single
    SingleOrDefault
    Performs a custom method over a sequence
    Computes the average of a sequence of numeric values
    Returns the number of the items in a sequence as an int
    Returns the number of the items in a sequence as a long
    Finds the minimum number of a sequence of numbers
    Finds the maximum number of a sequence of numbers
    Sums the numbers in a sequence
    Concatenates two sequences into one sequence
    Casts elements in a sequence to a given type
    Filters elements in a sequence of a given type
    Returns an Array from a sequence
    Returns a Dictionary from a sequence
    Returns a List from a sequence
    Returns a Lookup from a sequence
    Returns an IEnumerable sequence
    Creates a default element for an empty sequence
    Returns the element at a given index in a sequence
    Returns the element at a given index in a sequence or a default value
    Returns the first element of a sequence
    Returns the first element of a sequence or a default value if no element is found
    Returns the last element of a sequence
    Returns the last element of a sequence or a default value if no element is found
    Returns the single element of a sequence
    Returns the single element of a sequence or a default value if no element is found
    Operator
    SequenceEqual
    Description
    Compares two sequences to see if they are equivalent
    Empty
    Range
    Repeat
    GroupBy
    GroupJoin
    Join
    OrderBy
    OrderByDescending
    ThenBy
    ThenByDescending
    Reverse
    Skip
    SkipWhile
    Take
    TakeWhile
    Select
    SelectMany
    All
    Any
    Contains
    Where
    Distinct
    Except
    Intersect
    Union
    Generates an empty sequence
    Generates a sequence given a range
    Generates a sequence by repeating an item a given number of times
    Groups items in a sequence by a given grouping
    Performs a grouped join on two sequences
    Performs an inner join on two sequences
    Orders a sequence by value(s) in ascending order
    Orders a sequence by value(s) in descending order
    Orders an already-ordered sequence in ascending order
    Orders an already-ordered sequence in descending order
    Reverses the order of the items in a sequence
    Returns a sequence that skips a given number of items
    Returns a sequence that skips items that do not meet an expression
    Returns a sequence that takes a given number of items
    Returns a sequence that takes items that meet an expression
    Creates a projection of parts of a sequence
    Creates a one-to-many projection of parts of a sequence
    Determines if all items in a sequence meet a condition
    Determines if any items in a sequence meet a condition
    Determines if a sequence contains a given item
    Filters the items in a sequence
    Returns a sequence without duplicate items
    Returns a sequence representing the difference between two sequences
    Returns a sequence representing the intersection of two sequences
    Returns a sequence representing the union of two sequences
    Over fifty operators for expressing queries
    Wednesday, 24 October, 12

    View Slide

  14. 14
    .NET bytecode
    Program execution by the
    .NET virtual machine (CLR)
    LINQ Expression Tree
    Expression Tree compiled/interpreted
    by LINQ provider to get result
    LINQ provider
    Wednesday, 24 October, 12

    View Slide

  15. 15
    © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media
    Wednesday, 24 October, 12

    View Slide

  16. 16
    Introducing LINQ
    What is LINQ and how does it work?
    Does Python need LINQ?
    Querying collections in Python
    Next steps
    Forthcoming capabilities in Asq
    Asq!
    A LINQ-to-objects implementation for Python
    1
    2
    3
    4
    Wednesday, 24 October, 12

    View Slide

  17. 17
    numbers = [ 5, 4, 1, 3, 9,
    8, 6, 7, 2, 0 ]
    nums_plus_one = [ n + 1 for n in numbers ]
    print("Numbers + 1:")
    for i in nums_plus_one
    print(i)
    Numbers + 1:
    6
    5
    2
    4
    10
    9
    7
    8
    3
    1
    ( )
    Wednesday, 24 October, 12

    View Slide

  18. 18
    products = get_product_list()
    var product_names =
    (p.product_name for p in products)
    print("Product Names:");
    for product_name in product_names:
    print(product_name)
    Product Names:
    Chai
    Chang
    Aniseed Syrup
    Chef Anton's Cajun
    Seasoning
    Chef Anton's Gumbo Mix
    Grandma's Boysenberry
    Spread
    Uncle Bob's Organic
    Dried Pears
    Northwoods Cranberry
    Sauce
    Mishi Kobe Niku
    Ikura
    Queso Cabrales
    Queso Manchego La
    Pastora
    Konbu
    Tofu
    Genen Shouyu
    Pavlova
    Alice Mutton
    Carnarvon Tigers
    Teatime Chocolate
    Biscuits
    Sir Rodney's Marmalade
    Sir Rodney's Scones
    Nord-Ost Matjeshering
    Gorgonzola Telino
    Mascarpone Fabioli
    Geitost
    Wednesday, 24 October, 12

    View Slide

  19. 19
    numbers = [ 5, 4, 1, 3, 9,
    8, 6, 7, 2, 0 ]
    low_nums = (n for n in numbers if n < 5)
    print("Numbers < 5:")
    for x in low_nums:
    print(x)
    Numbers < 5:
    4
    1
    3
    2
    0
    Wednesday, 24 October, 12

    View Slide

  20. 20
    numbers = [ 5, 4, 1, 3, 9,
    8, 6, 7, 2, 0 ]
    strings = [ "zero", "one", "two",
    "three", "four", "five", "six", "seven",
    "eight", "nine" ]
    text_nums = ( strings[n] for for n in numbers )
    print("Number strings:")
    for s in text_nums:
    print(s)
    Number strings:
    five
    four
    one
    three
    nine
    eight
    six
    seven
    two
    zero
    Wednesday, 24 October, 12

    View Slide

  21. 21
    words = [ "cherry", "apple", "blueberry" ]
    sorted_words = sorted(words, key=len)
    print("Sorted by length):")
    for w in sorted_words:
    print(w)
    Sorted by length:
    apple
    cherry
    blueberry
    Wednesday, 24 October, 12

    View Slide

  22. 22
    from operator import attrgetter
    products = GetProductList()
    # First sort by *secondary* key
    sorted_by_price = sorted(products, key=attrgetter(‘UnitPrice’),
    reverse=True)
    # Second sort by *primary* key
    sorted_products = sorted(sorted_by_price, key=attrgetter(‘Category’))
    print(sortedProducts)
    ProductID=43 ProductName=Ipoh Coffee Category=Beverages UnitPrice=46.0000 UnitsInStock=17
    ProductID=2 ProductName=Chang Category=Beverages UnitPrice=19.0000 UnitsInStock=17
    ProductID=1 ProductName=Chai Category=Beverages UnitPrice=18.0000 UnitsInStock=39
    ProductID=35 ProductName=Steeleye Stout Category=Beverages UnitPrice=18.0000 UnitsInStock=20
    ProductID=39 ProductName=Chartreuse verte Category=Beverages UnitPrice=18.0000 UnitsInStock=69
    ProductID=70 ProductName=Outback Lager Category=Beverages UnitPrice=15.0000 UnitsInStock=15
    ProductID=34 ProductName=Sasquatch Ale Category=Beverages UnitPrice=14.0000 UnitsInStock=111
    ProductID=63 ProductName=Vegie-spread Category=Condiments UnitPrice=43.9000 UnitsInStock=24
    ProductID=8 ProductName=Northwoods Cranberry Sauce Category=Condiments UnitPrice=40.0000 UnitsInStock=6
    ProductID=6 ProductName=Grandma's Boysenberry Spread Category=Condiments UnitPrice=25.0000 UnitsInStock=120
    ProductID=4 ProductName=Chef Anton's Cajun Seasoning Category=Condiments UnitPrice=22.0000 UnitsInStock=53
    ProductID=5 ProductName=Chef Anton's Gumbo Mix Category=Condiments UnitPrice=21.3500 UnitsInStock=0
    ProductID=65 ProductName=Louisiana Fiery Hot Pepper Sauce Category=Condiments UnitPrice=21.0500 UnitsInStock=76
    ProductID=44 ProductName=Gula Malacca Category=Condiments UnitPrice=19.4500 UnitsInStock=27
    ProductID=66 ProductName=Louisiana Hot Spiced Okra Category=Condiments UnitPrice=17.0000 UnitsInStock=4
    ProductID=15 ProductName=Genen Shouyu Category=Condiments UnitPrice=15.5000 UnitsInStock=39
    Wednesday, 24 October, 12

    View Slide

  23. 23
    from collections import namedtuple
    categories = [ "Beverages", "Condiments",
    "Vegetables", "Dairy Products", "Seafood" ]
    products = GetProductList()
    CategorizedProduct = namedtuple(‘CategorizedProduct’,
    [‘name’, ‘category’])
    join_result = []
    for c in categories:
    for p in products:
    if p.Category == c:
    join_result.append(
    CategorizedProduct(name=p.ProductName,
    category=c))
    for v in join_result:
    print("{name}: {category}".format(name=v.name,
    category=v.category)
    Chai: Beverages
    Chang: Beverages
    Sasquatch Ale: Beverages
    Steeleye Stout: Beverages
    Chartreuse verte: Beverages
    Ipoh Coffee: Beverages
    Laughing Lumberjack Lager: Beverages
    Outback Lager: Beverages
    Aniseed Syrup: Condiments
    Chef Anton's Cajun Seasoning: Condiments
    Chef Anton's Gumbo Mix: Condiments
    Grandma's Boysenberry Spread: Condiments
    Northwoods Cranberry Sauce: Condiments
    Genen Shouyu: Condiments
    Gula Malacca: Condiments
    Vegie-spread: Condiments
    Louisiana Fiery Hot Pepper Sauce: Condiments
    Louisiana Hot Spiced Okra: Condiments
    Original Frankfurter: Condiments
    Queso Cabrales: Dairy Products
    Queso Manchego La Pastora: Dairy Products
    Gorgonzola Telino: Dairy Products
    Mascarpone Fabioli: Dairy Products
    Geitost: Dairy Products
    Raclette Courdavault: Dairy Products
    Camembert Pierrot: Dairy Products
    Gudbrandsdalsost: Dairy Products
    Flotemysost: Dairy Products
    Mozzarella di Giovanni: Dairy Products
    Ikura: Seafood
    Konbu: Seafood
    Carnarvon Tigers: Seafood
    Nord-Ost Matjeshering: Seafood
    Inlagd Sill: Seafood
    Gravad lax: Seafood
    Boston Crab Meat: Seafood
    Jack's New England Clam Chowder: Seafood
    Rogede sild: Seafood
    Spegesild: Seafood
    Escargots de Bourgogne: Seafood
    Wednesday, 24 October, 12

    View Slide

  24. 24
    Introducing LINQ
    What is LINQ and how does it work?
    Does Python need LINQ?
    Querying collections in Python
    Next steps
    Forthcoming capabilities in Asq
    Asq!
    A LINQ-to-objects implementation for Python
    1
    2
    3
    4
    Wednesday, 24 October, 12

    View Slide

  25. 25
    A Python implementation of LINQ to
    objects and Parallel LINQ to objects.
    Wednesday, 24 October, 12

    View Slide

  26. Objectives
    Asq
    ‣ Express complex queries
    using a so-called ‘fluent’ interface
    ‣ Support Python 2 and Python 3
    including PyPy, Jython and IronPython
    ‣ Equivalent capabilities to .NET LINQ
    support all LINQ query operators
    ‣ Extensible
    allow clients to add new operators
    ‣ Reliable
    usable in production environments
    26
    >>> from asq.initiators import query
    >>> words = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]
    >>> query(words).order_by(len).then_by().take(5).select(str.upper).to_list()
    ['ONE', 'SIX', 'TEN', 'TWO', 'FIVE']
    Wednesday, 24 October, 12

    View Slide

  27. 27
    Wednesday, 24 October, 12

    View Slide

  28. The fluent interface must be bootstrapped with a query initiator,
    the asq.initiators.query initiator accepts any iterable
    28
    Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more
    information.
    >>> from asq.initiators import query
    >>> numbers = [5, 4, 1, 3, 9]
    >>> query(numbers)
    Queryable()
    >>> query(numbers).select(lambda n: n + 1)
    Queryable()
    >>> query(numbers).select(lambda n: n + 1).to_list()
    [6, 5, 2, 4, 10]
    >>> list(query(numbers).select(lambda n: n + 1))
    [6, 5, 2, 4, 10]
    Introducing Asq queries
    Wednesday, 24 October, 12

    View Slide

  29. Read the system dictionary and use Asq to strip the
    whitespace from each line
    29
    >>> words_file = open('/usr/share/dict/words', 'r')
    >>> lines = words_file.readlines()
    >>> words_file.close()
    >>> lines
    ['A\\n', 'a\\n', 'aa\\n', 'aal\\n', 'aalii\\n', 'aam\\n', ...]
    >>> from asq.initiators import query
    >>> query(lines)
    Queryable()
    >>> query(lines).select(lambda line: line.strip())
    Queryable()
    >>> words = query(lines).select(lambda line: line.strip()).to_list()
    >>> words
    ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...]
    Load /usr/share/dict/words
    Wednesday, 24 October, 12

    View Slide

  30. query the file directly, and replace the lambda by passing
    the unbound str.strip method to select()
    30
    Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from asq.initiators import query
    >>> with open('/usr/share/dict/words', 'r') as words_file:
    ... words = query(words_file).select(str.strip).to_list()
    ...
    >>> words
    ['A', 'a', 'aa', 'aal', 'aalii', 'aam', ...]
    >>> query(words).count()
    235886
    >>> query(words).order_by(len).last()
    'thyroparathyroidectomize'
    >>> query(words).skip_while(lambda s: len(s) < 5).take(3).to_list()
    ['aalii', 'aam', 'Aani']
    Files in Python are iterable
    Wednesday, 24 October, 12

    View Slide

  31. The Grouping type is a collection of objects sharing a key.
    Here we group all words by length.
    31
    >>> query(words).group_by(len).to_list()
    [,
    ,
    'Abo', ...])>, 'aback', 'abaff', 'abaft', ...])>, 'Aaru', 'abac', 'abas', 'Abba', 'Abby', ...])>,
    ...
    'pathologicopsychological', 'scientificophilosophical',
    'tetraiodophenolphthalein', 'thyroparathyroidectomize'])>]
    group_by collates related items
    Wednesday, 24 October, 12

    View Slide

  32. To pass multiple values down the query chain, it can be
    useful to group them in a Record type with fields named in
    the initializer call. Use asq.record.new to conveniently
    create Records.
    32
    >>> from asq.record import new
    >>> query(words).group_by(len).select(lambda g: new(length=g.key,
    frequency=len(g))).to_list()
    [Record(length=1, frequency=52), Record(length=2, frequency=160),
    Record(length=3, frequency=1420), Record(length=5, frequency=10230),
    Record(length=4, frequency=5272), Record(length=8, frequency=29989),
    Record(length=7, frequency=23869), Record(length=9, frequency=32403),
    Record(length=6, frequency=17706), Record(length=11, frequency=26013),
    Record(length=10, frequency=30878), Record(length=12,
    frequency=20462), Record(length=14, frequency=9765), Record(length=16,
    frequency=3377), Record(length=15, frequency=5925), Record(length=20,
    frequency=198), Record(length=19, frequency=428), Record(length=17,
    frequency=1813), Record(length=13, frequency=14939), Record(length=18,
    frequency=842), Record(length=21, frequency=82), Record(length=22,
    frequency=41), Record(length=23, frequency=17), Record(length=24,
    frequency=5)]
    Creating ad hoc objects
    Wednesday, 24 October, 12

    View Slide

  33. Convert each Record into a string consisting of the right-
    justified length and frequency number of stars.
    33
    >>> import math
    >>> query(words).group_by(len).select(lambda g: new(length=g.key,
    frequency=len(g))).select(lambda r: "{length:>2} {bar}
    \n".format(length=r.length, bar=math.ceil(r.frequency/1000) *
    '*')).to_list()
    [' 1 *\n', ' 2 *\n', ' 3 **\n', ' 5 ***********\n', ' 4 ******\n', ' 8
    ******************************\n', ' 7 ************************\n', '
    9 *********************************\n', ' 6 ******************\n', '11
    ***************************\n', '10 *******************************
    \n', '12 *********************\n', '14 **********\n', '16 ****\n', '15
    ******\n', '20 *\n', '19 *\n', '17 **\n', '13 ***************\n', '18
    *\n', '21 *\n', '22 *\n', '23 *\n', '24 *\n']
    Towards a histogram
    Wednesday, 24 October, 12

    View Slide

  34. Order the records by length
    and use the to_str() query
    operator to concatenate the
    elements into a single string.
    Pass the result to print().
    34
    >>> print(query(words).group_by(len).select(lambda g: new(length=g.key,
    frequency=len(g))).order_by(lambda r: r.length).select(lambda r: "{length:>2} {bar}
    \n".format(length=r.length, bar=math.ceil(r.frequency/1000) * '*')).to_str())
    1 *
    2 *
    3 **
    4 ******
    5 ***********
    6 ******************
    7 ************************
    8 ******************************
    9 *********************************
    10 *******************************
    11 ***************************
    12 *********************
    13 ***************
    14 **********
    15 ******
    16 ****
    17 **
    18 *
    19 *
    20 *
    21 *
    22 *
    23 *
    24 *
    Printing a histogram
    Wednesday, 24 October, 12

    View Slide

  35. Use asq.selectors.a_() to
    create an attribute selector
    rather than writing out a
    lambda longhand.
    35
    >>> from asq.selectors import a_
    >>> print(query(words).group_by(len).select(lambda g: new(length=g.key,
    frequency=len(g))).order_by(a_('length')).select(lambda r: "{length:>2} {bar}
    \n".format(length=r.length, bar=(1 + r.frequency//1000) * '*')).to_str())
    1 *
    2 *
    3 **
    4 ******
    5 ***********
    6 ******************
    7 ************************
    8 ******************************
    9 *********************************
    10 *******************************
    11 ***************************
    12 *********************
    13 ***************
    14 **********
    15 ******
    16 ****
    17 **
    18 *
    19 *
    20 *
    21 *
    22 *
    23 *
    24 *
    Creating selector functions
    Wednesday, 24 October, 12

    View Slide

  36. Use the backslash line continuation character
    for improved readability
    36
    >>> print( query(words) \
    .group_by(len) \
    .select(lambda g: new(length=g.key, frequency=len(g))) \
    .order_by(a_('length')) \
    .select(lambda r: "{length:>2} {bar}\n".format(length=r.length,
    bar=(1 + r.frequency//1000) * '*')) \
    .to_str() )
    1 *
    2 *
    3 **
    4 ******
    5 ***********
    6 ******************
    7 ************************
    8 ******************************
    9 *********************************
    10 *******************************
    11 ***************************
    12 *********************
    13 ***************
    14 **********
    15 ******
    16 ****
    17 **
    18 *
    19 *
    20 *
    21 *
    22 *
    23 *
    24 *
    Improved query formatting
    Wednesday, 24 October, 12

    View Slide

  37. Compose arbitrarily complex sorts
    37
    >>> print( query(words) \
    .group_by(len) \
    .select(lambda g: new(length=g.key, frequency=len(g))) \
    .order_by_descending(a_('frequency')) \
    .then_by(a_(‘length’)) \
    .select(lambda r: "{length:>2} {bar}\n".format(length=r.length,
    bar=(1 + r.frequency//1000) * '*')) \
    .to_str())
    9 *********************************
    10 *******************************
    8 ******************************
    11 ***************************
    7 ************************
    12 *********************
    6 ******************
    13 ***************
    5 ***********
    14 **********
    4 ******
    15 ******
    16 ****
    3 **
    17 **
    1 *
    2 *
    18 *
    19 *
    20 *
    21 *
    22 *
    23 *
    24 *
    Multi-key sorts
    Wednesday, 24 October, 12

    View Slide

  38. Asq includes many predicates as more literate
    alternatives to lambdas
    38
    >>> query(words).where(lambda w: 'ox' in w).count()
    1351
    >>>
    >>> from asq.predicates import *
    >>> query(words).where(contains_('ox')).count()
    1351
    Predicates
    Wednesday, 24 October, 12

    View Slide

  39. Predicate combinators such as xor_ can be used to
    combine other selectors or predicates.
    39
    >>> query(words) \
    .where(xor_(m_('startswith', 'ch'),
    m_('endswith', 'ing')))
    'abounding', 'absorbing', 'abutting',
    'accommodating', ...])>)>
    Predicate combinators
    Wednesday, 24 October, 12

    View Slide

  40. Predicate query operators such as contains(), all() or any()
    40
    >>> query(words) \
    .where(xor_(m_('startswith', 'ch'),
    m_('endswith', 'ing'))) \
    .contains(‘changing’)
    False
    Asq includes predicate operators
    Wednesday, 24 October, 12

    View Slide

  41. Most operators accept optional selectors or predicates
    41
    >>> query(words) \
    .order_by(len) \
    .skip_while(lambda w: len(w) < 8) \
    .first()
    'aardvark'
    >>> query(words) \
    .order_by(len) \
    .first(lambda w: len(w) == 8)
    'aardvark'
    Short queries with optional args
    Wednesday, 24 October, 12

    View Slide

  42. Queries are lazily evaluated, doing as
    little work as necessary
    42
    Laziness
    >>> query(words).first(contains_('EI'))
    Traceback (most recent call last):
    File "", line 1, in
    File "./asq/queryables.py", line 1496, in first
    return self._first() if predicate is None else
    self._first_predicate(predicate)
    File "./asq/queryables.py", line 1508, in _first_predicate
    raise ValueError("No elements matching predicate in call to
    first()")
    ValueError: No elements matching predicate in call to first()
    >>>
    query(words).select(str.upper).first(contains_('EI'))
    'ABEIGH'
    Wednesday, 24 October, 12

    View Slide

  43. Asq includes logging operators which can be incorporated
    into the query chain, used here to show laziness.
    43
    Logging
    >>> import logging
    >>> clog = logging.getLogger("clog")
    >>> clog.setLevel(logging.DEBUG)
    >>> clog.addHandler(logging.StreamHandler(sys.stdout))
    >>>
    >>> query(words) .log(clog, label='source') \
    ... .select(str.upper) .log(clog, label='to-upper') \
    ... .first(contains_('EI'))
    to-upper : BEGIN (DEFERRED)
    source : BEGIN (DEFERRED)
    source : [0] yields 'A'
    to-upper : [0] yields 'A'
    source : [1] yields 'a'
    to-upper : [1] yields 'A'
    source : [2] yields 'aa'
    to-upper : [2] yields 'AA'
    source : [3] yields 'aal'
    to-upper : [3] yields 'AAL'
    ...
    source : [167] yields 'abecedary'
    to-upper : [167] yields 'ABECEDARY'
    source : [168] yields 'abed'
    to-upper : [168] yields 'ABED'
    source : [169] yields 'abeigh'
    to-upper : [169] yields 'ABEIGH'
    'ABEIGH'
    Wednesday, 24 October, 12

    View Slide

  44. Most places you need to used the identity selector,
    lambda x: x it can be missed out, trigger optimizations.
    44
    Identity selectors can be omitted
    >>> query(words).order_by(len).then_by(lambda w: w).to_list()
    ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
    'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
    'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y',
    'z', 'Ab', 'Ah', 'Al', 'Ao', 'As', 'Ay', 'Bu', 'Ed', 'Em', 'Fo', 'Ga', 'Ge', 'Gi,
    ...
    'pseudolamellibranchiate', 'scientificogeographical', 'thymolsulphonephthalein',
    'transubstantiationalist', 'formaldehydesulphoxylate', 'pathologicopsychological',
    'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize']
    >>> query(words).order_by(len).then_by().to_list()
    ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
    'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
    'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y',
    'z', 'Ab', 'Ah', 'Al', 'Ao', 'As', 'Ay', 'Bu', 'Ed', 'Em', 'Fo', 'Ga', 'Ge', 'Gi,
    ...
    'pseudolamellibranchiate', 'scientificogeographical', 'thymolsulphonephthalein',
    'transubstantiationalist', 'formaldehydesulphoxylate', 'pathologicopsychological',
    'scientificophilosophical', 'tetraiodophenolphthalein', 'thyroparathyroidectomize']
    Wednesday, 24 October, 12

    View Slide

  45. Asq includes tools for adding new query operators
    45
    Extending Asq
    >>> from asq.extension import extend
    >>> from asq.queryables import Queryable
    >>> @extend(Queryable)
    ... def separate_with(self, separator):
    ... def generator():
    ... i = iter(self)
    ... try:
    ... yield next(i)
    ... except StopIteration:
    ... return
    ... for item in i:
    ... yield separator
    ... yield item
    ... return self._create(generator())
    ...
    >>>
    query(words).separate_with('**').take(10).to_list()
    ['A', '**', 'a', '**', 'aa', '**', 'aal', '**',
    'aalii', '**']
    Wednesday, 24 October, 12

    View Slide

  46. 46
    Introducing LINQ
    What is LINQ and how does it work?
    Does Python need LINQ?
    Querying collections in Python
    Next steps
    Forthcoming capabilities in Asq
    Asq!
    A LINQ-to-objects implementation for Python
    1
    2
    3
    4
    Wednesday, 24 October, 12

    View Slide

  47. 47
    © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media
    Wednesday, 24 October, 12

    View Slide

  48. 48
    © 2007-2012, Joe Albahari, Ben Albahari and O'Reilly Media
    Wednesday, 24 October, 12

    View Slide

  49. Making Asq faster and scalable
    49
    Parallel improvements
    ‣ Support parallel back ends
    • Currently uses multiprocessing module
    • Could use threading module (Jython, IronPython)
    • Considering OpenCL backend for numeric arrays
    ‣ API changes required for parallel
    • some query operators have order dependent results
    requiring a different API, e.g. aggregate()
    ‣ Provide parallel implementations of more operators
    • only a handful of operators have parallel implementations
    Wednesday, 24 October, 12

    View Slide

  50. 50
    Thanks!
    http://asq.googlecode.com
    @robsmallshire
    Wednesday, 24 October, 12

    View Slide