Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When python meets GraphQL - Managing contirbutor identities

Bitergia
PRO
February 01, 2020

When python meets GraphQL - Managing contirbutor identities

SortingHat is an open source Python tool that helps to manage the different contributor identities within an open source project. Under the hood SortingHat relies on a relational database, which can be queried via SQL, command line or directly via its Python interface. However, these ways of interacting with SortingHat hinder its integration with external tools, web interfaces and new web technologies (e.g., Django, REST services). To overcome these obstacles, we have evolved SortingHat's architecture using a GraphQL model based on the Graphene-Django implementation.

This talk describes our experience in migrating to GraphQL, from adapting the SortingHat functionalities to refactoring the unit tests. Furthermore, we comment also on lesson learned, advantages and drawbacks of using this new approach

SortingHat is one of the core tools of GrimoireLab, an open-source software analytics platform part of CHAOSS project (Community Health Analytics Open Source Software) under the umbrella of the Linux Foundation.

Bitergia
PRO

February 01, 2020
Tweet

More Decks by Bitergia

Other Decks in Programming

Transcript

  1. When Python meets
    GraphQL
    FOSDEM 2020 Python DevRoom
    share this slide! @mghfdez
    Managing contributor identities
    in your Open-source project

    View Slide

  2. About me
    share this slide! @mghfdez
    My name is Miguel-Ángel Fernández
    Working at Bitergia, part of the Engineering team
    Software developer...
    … also involved in stuff related with data and
    metrics

    View Slide

  3. share this slide! @mghfdez

    View Slide

  4. How can I measure
    my project?
    share this slide! @mghfdez
    How many contributors do we have ?
    How many companies are contributing to
    my project?

    View Slide

  5. share this slide! @mghfdez
    Photo credit: juliooliveiraa
    Tom Riddle
    Affiliated to Slytherin, Hogwarts
    It’s all about identities

    View Slide

  6. share this slide! @mghfdez
    Photo credit: James Seattle
    Lord Voldemort
    Working as a freelance (dark) wizard
    It’s all about identities

    View Slide

  7. Wait… they are the same person!
    share this slide! @mghfdez
    Photo credit: juliooliveiraa Photo credit: James Seattle

    View Slide

  8. share this slide! @mghfdez
    Manrique López
    Jose Manrique López de la Fuente
    Manrique López
    jsmanrique
    [email protected]
    [email protected]
    [email protected]
    jsmanrique
    [email protected]
    02/2005 - 12/2010 CTIC
    01/2010 - 12/2012 Andago
    01/2013 - 06/2013 TapQuo
    07/2013 - 12/2015 freelance (ASOLIF, CENATIC)
    07/2013 - now Bitergia
    A little bit more complex

    View Slide

  9. share this slide! @mghfdez
    Who is who?
    Project manager

    View Slide

  10. share this slide! @mghfdez
    “For I'm the famous Sorting Hat.
    (...)
    So put me on and you will know
    Which house you should be in... ”
    SortingHat: Wizardry on Software Project Members

    View Slide

  11. share this slide! @mghfdez
    Photo credit: James Seattle
    Merge identities!
    Affiliate this person!
    Complete the profile!
    Name: Tom
    Gender: Male
    Email: [email protected]
    Lord Voldemort
    Tom Riddle

    View Slide

  12. Boosting SH integration
    share this slide! @mghfdez
    Main idea: building a robust API
    Easy to integrate with external apps
    Flexible, easy to adapt
    Ensure consistency
    Hatstall
    Python module

    View Slide

  13. GraphQL is...
    share this slide! @mghfdez
    … A query language, transport-agnostic
    but typically served over HTTP.
    … A specification for client-server communication:
    It doesn’t dictate which language to use, how the data
    should be stored or which clients to support.
    … Based on graph theory: nodes, edges and connections.

    View Slide

  14. REST vs GraphQL
    share this slide! @mghfdez
    /unique_identities//identities
    /unique_identities//profile
    /unique_identities//enrollments
    /organizations//domains
    query {
    unique_identities(uuid:“”) {
    identities {
    uid
    }
    profile {
    email
    gender
    }
    enrollments {
    organization
    end_date
    }
    domains {
    domain_name
    }
    }
    }

    View Slide

  15. Comparing approaches: REST
    Convention between server and client
    Overfetching / Underfetching
    API Documentation is not tied to development
    Multiple requests per view
    share this slide! @mghfdez

    View Slide

  16. Comparing approaches: GraphQL
    Strongly typed language
    The client defines what it receives
    The server only sends what is needed
    One single request per view
    share this slide! @mghfdez

    View Slide

  17. Summarizing ...
    share this slide! @mghfdez

    View Slide

  18. Implementing process
    share this slide! @mghfdez
    Define data model &
    schema
    Up next...
    Support paginated
    results
    Authentication
    Implement basic
    queries & mutations

    View Slide

  19. Implementation:
    Graphene-Django
    share this slide! @mghfdez Picture credit: Snippedia
    Graphene-Django is built on top of Graphene.
    It provides some additional abstractions
    that help to add GraphQL functionality to
    your Django project.

    View Slide

  20. Schema
    share this slide! @mghfdez
    Mutations
    Types
    Queries
    GraphQL
    Schema

    View Slide

  21. Schema.py
    share this slide! @mghfdez
    CRUD
    operations
    Models
    Resolvers
    GraphQL
    Schema:
    Graphene-Django

    View Slide

  22. It is already a graph
    share this slide! @mghfdez
    Lord Voldemort
    Profile
    Identities
    Affiliations
    Name: Tom
    Gender: Male
    Email: [email protected]
    Tom Riddle
    slytherin.edu
    UUID

    View Slide

  23. (Basic) Recipe for building queries
    share this slide! @mghfdez
    class Organization(EntityBase):
    name = CharField(max_length=MAX_SIZE)
    class Meta:
    db_table = 'organizations'
    unique_together = ('name',)
    def __str__(self):
    return self.name
    class OrganizationType(DjangoObjectType):
    class Meta:
    model = Organization
    class SortingHatQuery:
    organizations = graphene.List(OrganizationType)
    def resolve_organizations(self, info, **kwargs):
    return Organization.objects.order_by('name')
    models.py
    schema.py

    View Slide

  24. Documentation is already updated!
    share this slide! @mghfdez

    View Slide

  25. (Basic) Recipe for building mutations
    share this slide! @mghfdez
    class AddOrganization(graphene.Mutation):
    class Arguments:
    name = graphene.String()
    organization = graphene.Field(lambda: OrganizationType)
    def mutate(self, info, name):
    org = add_organization(name)
    return AddOrganization(
    organization=org
    )
    class SortingHatMutation(graphene.ObjectType):
    add_organization = AddOrganization.Field()
    schema.py

    View Slide

  26. (Basic) Recipe for building mutations
    share this slide! @mghfdez
    def add_organization(name):
    validate_field('name', name)
    organization = Organization(name=name)
    try:
    organization.save()
    except django.db.utils.IntegrityError as exc:
    _handle_integrity_error(Organization, exc)
    return organization
    db.py
    @django.db.transaction.atomic
    def add_organization(name):
    try:
    org = add_organization_db(name=name)
    except ValueError as e:
    raise InvalidValueError(msg=str(e))
    except AlreadyExistsError as exc:
    raise exc
    return org
    api.py

    View Slide

  27. Documentation is already updated… again!
    share this slide! @mghfdez

    View Slide

  28. share this slide! @mghfdez
    About pagination
    identities(first:2 offset:2)
    identities(first:2 after:$uuid)
    identities(first:2 after:$uuidCursor)
    How are we getting the cursor?
    It is a property of the connection,
    not of the object.

    View Slide

  29. share this slide! @mghfdez
    Edges and connections
    Information that is specific to the edge,
    rather than to one of the objects.
    There are specifications like Relay
    Friend A
    Friend B
    Friendship
    time

    View Slide

  30. share this slide! @mghfdez
    Implementing pagination
    We are taking our own approach without
    reinventing the wheel
    It is a hybrid approach based on offsets and
    limits, using Paginator Django objects
    Also benefiting from edges & connections

    View Slide

  31. share this slide! @mghfdez
    Query Result

    View Slide

  32. share this slide! @mghfdez

    View Slide

  33. share this slide! @mghfdez
    class AbstractPaginatedType(graphene.ObjectType):
    @classmethod
    def create_paginated_result(cls, query, page=1,
    page_size=DEFAULT_SIZE):
    paginator = Paginator(query, page_size)
    result = paginator.page(page)
    entities = result.object_list
    page_info = PaginationType(
    page=result.number,
    page_size=page_size,
    num_pages=paginator.num_pages,
    has_next=result.has_next(),
    has_prev=result.has_previous(),
    start_index=result.start_index(),
    end_index=result.end_index(),
    total_results=len(query)
    )
    return cls(entities=entities, page_info=page_info)
    Django objects
    Query results
    Pagination info

    View Slide

  34. Returning paginated results
    share this slide! @mghfdez
    class OrganizationPaginatedType(AbstractPaginatedType):
    entities = graphene.List(OrganizationType)
    page_info = graphene.Field(PaginationType)
    class SortingHatQuery:
    def resolve_organizations(...)
    (...)
    return OrganizationPaginatedType.create_paginated_result(query,
    page,
    page_size=page_size)

    View Slide

  35. Authenticated queries
    share this slide! @mghfdez
    It is based on JSON Web Tokens (JWT)
    An existing user must generate a token
    which has to be included in the Authorization
    header with the HTTP request
    This token is generated using a mutation
    which comes defined by the graphene-jwt
    module

    View Slide

  36. Testing authentication
    share this slide! @mghfdez
    Use an application capable of setting up headers to the HTTP requests
    Heads-up!
    Configuring the Django CSRF token properly was not trivial
    Insomnia app

    View Slide

  37. Testing authentication
    share this slide! @mghfdez
    from django.test import RequestFactory
    def setUp(self):
    self.user = get_user_model().objects.create(username='test')
    self.context_value = RequestFactory().get(GRAPHQL_ENDPOINT)
    self.context_value.user = self.user
    def test_add_organization(self):
    client = graphene.test.Client(schema)
    executed = client.execute(self.SH_ADD_ORG, context_value=self.context_value)

    View Slide

  38. Bonus: filtering
    share this slide! @mghfdez
    class OrganizationFilterType(graphene.InputObjectType):
    name = graphene.String(required=False)
    class SortingHatQuery:
    organizations = graphene.Field(
    OrganizationPaginatedType,
    page_size=graphene.Int(),
    page=graphene.Int(),
    filters=OrganizationFilterType(required=False)
    )
    def resolve_organizations(...):
    # Modified resolver

    View Slide

  39. (some) Future work
    share this slide! @mghfdez
    Implementing a command line & web Client
    Limiting nested queries
    Feedback is welcome!

    View Slide

  40. share this slide! @mghfdez
    GrimoireLab architecture

    View Slide

  41. share this slide! @mghfdez
    Let’s go for some
    questions
    Twitter @mghfdez
    Email [email protected]
    GitHub mafesan
    speaker pic
    FLOSS enthusiast & Data nerd
    Software Developer @ Bitergia
    Contributing to
    CHAOSS-GrimoireLab project

    View Slide