Upgrade to Pro — share decks privately, control downloads, hide ads and more …

openpyxl

 openpyxl

Talk given during an afpyro-be even @ ULB in Brussels

Eric Gazoni

May 10, 2013
Tweet

More Decks by Eric Gazoni

Other Decks in Programming

Transcript

  1. About me Hello, I’m Eric @ericgazoni Work for Adimian (France)

    Scientific applications for large companies
  2. The origins: Excel 1987 Excel 2.0 for Windows 1990 Excel

    3.0 1992 Excel 4.0 1993 Excel 5.0 (Office 4.2 & 4.3, also a 32-bit version for Windows NT only on the x86, PowerPC, Alpha, and MIPS architectures) This version of Excel includes a DOOM-like game as an Easter Egg. 1995 Excel for Windows 95 (version 7.0) included in Office 95 1997 Excel 97 (version 8.0) included in Office 97 (for x86 and Alpha). This version of Excel includes a flight simulator as an Easter Egg. 1999 Excel 2000 (version 9.0) included in Office 2000 2001 Excel 2002 (version 10) included in Office XP 2003 Office Excel 2003 (version 11) included in Office 2003 2007 Office Excel 2007 (version 12) included in Office 2007 2010 Excel 2010 (version 14) included in Office 2010 2013 Excel 2013 (version 15) included in Office 2013
  3. The origins: COM Component Object Model Communication protocol for compatible

    products Cross platform Weird server-client interface All COM errors raise the same Exception() Efficient
  4. The origins: API design sensible defaults object oriented close to

    the internal Excel API signature carefully weight API change decisions
  5. Defective by design Microsoft submitted initial material to Ecma International

    Technical Committee TC45, where it was standardized to become ECMA-376, approved in December 2006 Office Open XML (also informally known as OOXML or OpenXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents.
  6. Defective by design The ISO standardization of Office Open XML

    was controversial and embittered, with much discussion both about the specification and about the standardization process. According to InfoWorld: OOXML was opposed by many on grounds it was unneeded, as software makers could use OpenDocument Format (ODF), a less complicated office software format that was already an international standard. — InfoWorld
  7. «Open format» VBA is still stored as a proprietary binary

    blob Supports internationalization, as long as you’re using US English Workbook/worksheet protection can be removed by editing an XML file attribute Every XML file produced by Excel is 1 (long) line
  8. Number formats 164 built-in number formats («general», «long date», ...)

    can be found on pages 1767-1768 of the specs defines codes from 0 to 49 (and with gaps)
  9. Number formats Dates are numbers, in Julian calendar, knowing that

    Excel has an historical bug on year 1900 if you’re on a Mac, where you have to consider it as a leap year (yes, really) Date detection relies mostly on finding ‘y’, ‘m’ or ‘d’ in the number format string of a given cell 
 (yes, really)
  10. Number formats def to_julian(self, year, month, day, hours=0, minutes=0, seconds=0):

    """Convert from Python date to Excel JD.""" # explicitly disallow bad years # Excel 2000 treats JD=0 as 1/0/1900 (buggy, disallow) # Excel 2000 treats JD=2958466 as a bad date (Y10K bug!) if year < 1900 or year > 10000: msg = 'Year not supported by Excel: %s' % year raise ValueError(msg) if self.excel_base_date == CALENDAR_WINDOWS_1900: # Fudge factor for the erroneous fact that the year 1900 is # treated as a Leap Year in MS Excel. This affects every date # following 28th February 1900 if year == 1900 and month <= 2: excel_1900_leap_year = False else: excel_1900_leap_year = True excel_base_date = 2415020 elif self.excel_base_date == CALENDAR_MAC_1904: excel_base_date = 2416481 excel_1900_leap_year = False else: raise NotImplementedError('base date supported.')
  11. Worksheets names LibreOffice Writer put the worksheet names in xl/

    docProps/core.xml (mandatory as described in the specs) and in xl/workbook.xml (optional) GnuCalc will only write in xl/docProps/core.xml MS Excel only puts it in xl/workbook.xml and will only read from there Implementation: read from everywhere
  12. PHPExcel Open source by @Mark_Baker PHP is already a high-level

    language Just make it more «Pythonic» But Python allows for much more...
  13. Why improve ? Work with scientists, face performance issues for

    the rest of your life If it works fine with 100, let’s try 1.000 When you make it work with 1.000, try 10.000
  14. vs Excel hey, it’s just XML right ? unlimited number

    of rows and columns anyone ? (be careful, might not open further in plain Excel)
  15. VBA there is no way known to honest men to

    get into this VBA blob you still need a valid Excel client to interact with it
  16. Formulas openpyxl does not re-implement a formula engine when you

    enter a formula, the result section is left blank when opened in Excel, unresolved formulas are calculated automatically
  17. Styles support styles have to be carefully reverse-engineered high number

    of possible combinations specs don’t help much some styles are not preserved in a round trip