Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parsing a distribution name is sometimes hard

Parsing a distribution name is sometimes hard

LT at PerlCon 2019

Kenichi Ishigaki

August 09, 2019
Tweet

More Decks by Kenichi Ishigaki

Other Decks in Technology

Transcript

  1. Usually it's easy I/IS/ISHIGAKI/Module-CPANTS-Analyse-1.01.tar.gz S/SK/SKAJI/Perl6/App-Mi6-0.0.2.tar.gz • The blue part is

    the author's directory based on their ID • The purple part is a subdirectory under the author's dir • The red part is the name of the distribution • The orange part is the version of the distribution
  2. CPAN::DistnameInfo I've been using a patched version for CPANTS but

    I didn't want to repeat that for CPAN::Groonga
  3. CPAN::DistnameInfo I was going to ping the gang, but I

    thought twice: let's test it with BackPAN first
  4. CPAN::DistnameInfo says... my $path = "E/ER/ERWANMAS/v0.10.zip"; say encode_json({ CPAN::DistnameInfo->new($path)->properties });

    { "cpanid" : "ERWANMAS", "dist" : "v", "distvname" : "v0.10", "extension" : "zip", "filename" : "v0.10.zip", "maturity" : "released", "pathname" : "E/ER/ERWANMAS/v0.10.zip", "version" : "0.10" }
  5. Or ... my $path = "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties });

    { "cpanid" : "SONNY", "dist" : "DBIx-Class-InflateColumn", "distvname" : "DBIx-Class-InflateColumn-S3", "extension" : "tar.gz", "filename" : "DBIx-Class-InflateColumn-S3.tar.gz", "maturity" : "released", "pathname" : "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz", "version" : "S3" } But really?
  6. More delicate cases my $path = "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties

    }); { "cpanid" : "HARPREET", "dist" : "XMS-MotifSetv", "distvname" : "XMS-MotifSetv1.0", "extension" : "tar.gz", "filename" : "XMS-MotifSetv1.0.tar.gz", "maturity" : "released", "pathname" : "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz", "version" : "1.0" }
  7. More delicate cases my $path = "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties

    }); { "cpanid" : "MPERRY", "dist" : "Config-INI-Reader", "distvname" : "Config-INI-Reader-Encrypted2", "extension" : "tar.gz", "filename" : "Config-INI-Reader-Encrypted2.tar.gz", "maturity" : "released", "pathname" : "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz", "version" : "Encrypted2" }
  8. More delicate cases my $path = "C/CA/CAFFIEND/font_ft2_0.1.0.tgz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties

    }); { "cpanid" : "CAFFIEND", "dist" : "font_ft", "distvname" : "font_ft2_0.1.0", "extension" : "tgz", "filename" : "font_ft2_0.1.0.tgz", "maturity" : "released", "pathname" : "C/CA/CAFFIEND/font_ft2_0.1.0.tgz", "version" : "2_0.1.0" }
  9. Why this happens? • CPAN::DistnameInfo looks for a distribution name

    and a version at the same time (using regex) • But it might be better to look for a version first, then treat the rest as a name
  10. Parse::Distname https://metacpan.org/release/Parse-Distname So I wrote a new module as a

    PoC, instead of applying a breaking change to the existing code
  11. Let's see my $path = "E/ER/ERWANMAS/v0.10.zip"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid" : "ERWANMAS", - "dist" : "v", + "dist" : "", "distvname" : "v0.10", "extension" : "zip", "filename" : "v0.10.zip", "maturity" : "released", "pathname" : "E/ER/ERWANMAS/v0.10.zip", - "version" : "0.10" + "version" : "v0.10" }
  12. Let' see my $path = "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid" : "SONNY", - "dist" : "DBIx-Class-InflateColumn", + "dist" : "DBIx-Class-InflateColumn-S3", "distvname" : "DBIx-Class-InflateColumn-S3", "extension" : "tar.gz", "filename" : "DBIx-Class-InflateColumn-S3.tar.gz", "maturity" : "released", "pathname" : "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz", - "version" : "S3" + "version" : null }
  13. Let's see my $path = "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties });

    { "cpanid" : "HARPREET", - "dist" : "XMS-MotifSetv", + "dist" : "XMS-MotifSet", "distvname" : "XMS-MotifSetv1.0", "extension" : "tar.gz", "filename" : "XMS-MotifSetv1.0.tar.gz", "maturity" : "released", "pathname" : "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz", - "version" : "1.0" + "version" : "v1.0" }
  14. Let's see my $path = "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid": "MPERRY", - "dist": "Config-INI-Reader", + "dist": "Config-INI-Reader-Encrypted", "distvname": "Config-INI-Reader-Encrypted2", "extension": "tar.gz", "filename": "Config-INI-Reader-Encrypted2.tar.gz", "maturity": "released", "pathname": "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz", - "version": "Encrypted2" + "version": "2" }
  15. Let's see my $path = "C/CA/CAFFIEND/font_ft2_0.1.0.tgz"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid": "CAFFIEND", - "dist": "font_ft", + "dist": "font_ft2", "distvname": "font_ft2_0.1.0", "extension": "tgz", "filename": "font_ft2_0.1.0.tgz", "maturity": "released", "pathname": "C/CA/CAFFIEND/font_ft2_0.1.0.tgz", - "version": "2_0.1.0" + "version": "0.1.0" }
  16. Fixed 200+ cases • Out of 330000+ BackPAN distributions •

    Most cases are ancient, or accidental, and often removed already • See https://github.com/charsbar/Parse- Distname/blob/master/xt/walk_through.t for details • Parse::Distname also contains a few patches for CPAN::DistnameInfo
  17. May not be perfect yet my $path = "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz"; say

    encode_json({ Parse::Distname->new($path)->properties }); { "cpanid" : "CDRAKE", - "dist" : "Crypt", + "dist" : "Crypt-MatrixSSL", "distvname" : "Crypt-MatrixSSL3", "extension" : "tar.gz", "filename" : "Crypt-MatrixSSL3.tar.gz", "maturity" : "released", "pathname" : "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz", - "version" : "MatrixSSL3" + "version" : "3" } Looks better, but...
  18. Fixed this morning (0.04) my $path = "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz"; say encode_json({

    Parse::Distname->new($path)->properties }); { "cpanid" : "CDRAKE", - "dist" : "Crypt", + "dist" : "Crypt-MatrixSSL3", "distvname" : "Crypt-MatrixSSL3", "extension" : "tar.gz", "filename" : "Crypt-MatrixSSL3.tar.gz", "maturity" : "released", "pathname" : "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz", - "version" : "MatrixSSL3" + "version" : null } ... by making it an exception
  19. Dogfooding • I have started using this for CPANTS and

    CPAN::Groonga • If everything goes well...?
  20. Caveats for migration • Distribution name may become empty (and

    your database may complain about this) • Internal hash keys are changed