Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multicore COBOL: three approaches

042f472cd92332d20f866704d0801337?s=47 Multicore World 2013
February 20, 2013
88

Multicore COBOL: three approaches

042f472cd92332d20f866704d0801337?s=128

Multicore World 2013

February 20, 2013
Tweet

Transcript

  1. Multicore COBOL: three approaches Richard A. O’Keefe (ok@cs.otago.ac.nz) Friday, 15

    February 13
  2. WHY (for cynics) • COBOL programs handle money. • Lots

    of money. • Organisations that have lots of COBOL are likely to have money. • We’d like some of that. • Hmm, do they know about multicore? • It’s not in the standard... • Spoiler alert: beware microprocessor-centrism! 2 Friday, 15 February 13
  3. WHY (for idealists) • If organisations can switch over to

    faster machines using less power, it’s good for all of us. • Nothing makes for good theory like contact with the real world. Concurrency for early-adopters and space-cadets is one thing; concurrency for payrolls has to be usable by people who care more about the job than the tools and should help to reduce error rates. • There may be lessons for other languages. • Spoiler alert: there are such lessons. 3 Friday, 15 February 13
  4. Instead of the language • Why not just convert COBOL

    systems to a concurrent programming language? • C (pthreads, Windows threads, C11 threads) • C++ (ditto, Intel TBB) • Ada (concurrent since 1979) • Java (born concurrent) • C# (everyone runs Windows on their mainframes, right?) • Erlang • Cloud Haskell • something? 4 Friday, 15 February 13
  5. Arithmetic • COBOL does fixed point arithmetic with up to

    31 decimal digits natively. • COBOL compilers know this and generate good code. • C11 and C++11 support the new decimal floating point standard. • IBM have hardware support; x86(-64) do not. Intel’s library is software. • Decimal float doesn’t quite do the job anyway. • From the previous slide, only Ada can support COBOL data at all well. • Java has BigDecimal, but speedy it isn’t. 5 Friday, 15 February 13
  6. Conversion is troublesome • Example: http://www.semdesigns.com/Products/Services/ COBOLMigration.html?Home=LegacyMigration • Converts COBOL

    fixed point to C# decimal • more memory • software, not hardware • loses overflow checking • Converts COBOL nested records to class instances with pointers to nested class instances • Adds dynamic allocation overheads • Adds indirection cost to all references • Can’t read or write a record as one memory block • Unkind to caches • And this is a good COBOL->C# translator 6 Friday, 15 February 13
  7. Even if it works... • With the very best systems

    for converting COBOL to Java or C# you get readable code. • Small examples get a lot bigger. • But your COBOL programmers can’t read it. • And your Java or C# programmers are confused by an architecture that does things the COBOL way, not the Java or C# way. • And if the the original code wasn’t concurrent, the result won’t be concurrent either. So you still have to rework the code somehow to get concurrency. • 7 Friday, 15 February 13
  8. A COBOL program 8 Friday, 15 February 13

  9. A COBOL program in Java 9 Friday, 15 February 13

  10. Three approaches • BELOW the language • INSIDE the language

    • ABOVE the language • 10 Friday, 15 February 13
  11. Below the language: low-hanging fruit for vendors • Don’t change

    the user-visible language • If you don’t have multithread SORT/MERGE, you are just not trying. • Embedded SQL statements should use a multithreaded SQL engine; DB engines like Firebird can be linked into an application, but that’s only safe if code is safe (C isn’t, COBOL is) • Loop vectorisation can use SIMD operations, sometimes. • 11 Friday, 15 February 13
  12. Below the language: I/O • COBOL programs do lots of

    I/O • on complex encoded records • sequential I/O can be overlapped with computation • and so can decoding/encoding • in order to get parallel I/O, you have to have more than one disc. • given n discs we can read or write n records at the same time. • RAID can do n blocks at a time • a support library understanding COBOL file structure is needed 12 Friday, 15 February 13
  13. Below the language: problems • Parallelism, yes; concurrency, no. •

    Won’t scale to thousands of processors. • 13 Friday, 15 February 13
  14. Within the language • The obvious thing to do: add

    complexity to the language • But not too much. I’ve seen COBOL 2002. • Can support parallelism or concurrency. • Pushes the restructuring work to the programmer. • Vectorised operations like Fortran 90 straightforward language design. • Posix thread binding straightforward. • C11 and C++11 support for atomic operations pretty much useless because COBOL arithmetic is different. 14 Friday, 15 February 13
  15. Within language: 1985 > 2002 • COBOL ’85 makes a

    better substrate for extensions than COBOL ’02. • It has the virtues of its limitations: – No pointers – No dynamic allocation – No recursion in standard language – Much easier for a compiler to do good data flow analysis – Multiple programs can run safely in the same address spae – Containers could be added at language level • COBOL ’02 adds pointers, classes, lions, tigers, and bears 15 Friday, 15 February 13
  16. Above the language • Imagine a system with thousands of

    concurrent users • Users fill out forms on smart terminals • Each form posted creates a new thread • Handler component is loaded if necessary • Handler validates request, gets data from files and databases, updates files and databases, and returns results to user. • Threads can start other threads. Components can call components. • Interactions are transactions. • The system runs as one OS process 16 Friday, 15 February 13
  17. That’s a web server, right? • No, it’s CICS. •

    And it’s 43 years old. • And components are written in COBOL. • Or rather, restricted COBOL with embedded EXEC CICS statements, which are translated to RTS calls. • Each component is a separate COBOL program. • A sequential one. • Communicating with others through queues, files, and databases. • Semantics = processes, implementation = threads. 17 Friday, 15 February 13
  18. What’s so great? • Available concurrency always depended on transactions,

    not on number of processor cores. • Going multicore needed no user program recompilation. • New framework code was obviously needed. • Programming a single component whose instances are isolated from others is easier than programming a whole system. • It’s basically the actor model for COBOL. • Or think of J2EE, but (in principle) simpler and safer. 18 Friday, 15 February 13
  19. What’s not so great • Anything for which shared memory

    would be good. • Maintaining a system made of hundreds if not thousands of small components requires good tool support. (But have you looked at Java? Same tool: Eclipse!) • If you introduce recursion or dynamic containers you lose the “there is enough memory” guarantee. • You need the framework. (But CICS is not alone.) • Major work porting to/from this approach. • 19 Friday, 15 February 13
  20. Why does it work? • Running thousands of threads in

    one address space works • because of the things COBOL ’85 can’t do. • Using COBOL ’85 constructs, it is impossible for one thread to write or read data belonging to another. • If there can be locks, there can be deadlocks, but components can be killed and restarted without taking down the whole system 20 Friday, 15 February 13
  21. Other languages • Below the language needs major compiler and

    library work. If user code is “glue” code and most runtime goes into e.g. image processing, this can pay off. • Within the language needs compiler work and creates a new language that needs new training materials, new debugger support, &c. • Above the language needs less compiler work (both data and code must be addressed off base registers, not just code) and may be easier for programmers to learn & use 21 Friday, 15 February 13