Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intelligence is not enough: The humanity of engineering

Bryan Cantrill
October 06, 2023
1.7k

Intelligence is not enough: The humanity of engineering

Presentation that I gave at Monktoberfest 2023. Video at https://www.youtube.com/watch?v=bQfJi7rjuEk

Bryan Cantrill

October 06, 2023
Tweet

Transcript

  1. OXIDE “Serious”? • This tweet used the word “serious” three

    times, mainly to deride others • Not clear what “serious” means in the context of an argument that equates a computer program with nuclear weapons? • Or accuses anyone who disagrees with this assessment of “just vibes”? • Or one that puts the risk of human extinction at the (metaphorical!) hands of a computer program to be 5% with zero methodology? • So, a serious question: why treat this seriously at all?
  2. OXIDE Reasons to treat this seriously • Fear of technology

    isn’t new – and isn’t always poorly founded! • New technologies often have unintended consequences and externalities that merit consideration and discussion • But in those who believe in AI-based extinction risk, the fear itself is alarming – in part because of the actions that it would justify • The “AI pause” – if implemented – would be brazenly authoritarian • The accompanying rhetoric is often disturbingly violent
  3. OXIDE Concrete extinction risk • Most AGI-based extinction risk fears

    – when made concrete – hinge on: ◦ A computer program getting ahold of nuclear weapons ◦ A computer program making a novel bioweapon ◦ A computer program developing novel molecular nanotechnology • We are going to leave aside nuclear weapons, as indisputably serious people have been thinking about it since the dawn of the atomic age • But the latter two have something important in common…
  4. OXIDE Superintelligent engineering? • Whether stated explicitly or not, when

    we talk about the fear of a superintelligent AI actively killing not just some humans but all of them, we are talking about AI making weapons • Let us leave aside many questions about such scenarios (e.g., AI’s alignment, motivation, or means of production – and human adaptability, countermeasures, and resilience), and focus on one pillar… • It depends on AI making applying the constraints of physical and mathematical reality to make new stuff – which is to say, engineering
  5. OXIDE Engineering and intelligence • If our very existence is

    threatened by a superintelligence engaged in engineering, it prompts an important question… • Is engineering an act of intelligence alone? • I can’t speak to building novel bioweapons or the significant challenges in reviving otherwise moribund molecular nanotechnology… • …but we do have a bunch of recent experience building something big and new that is surely simpler than these domains
  6. OXIDE Building a computer • In case it needs to

    be said: building a new computer + new network switch + high-speed backplane + all software from lowest levels of firmware to highest levels of control plane is hard and complicated • It is still, however, engineering not science • Engineering is the act of learning from failure: even when building anew, there will be many occasions when the system does not, in fact, work! • It is worth exploring a tiny fraction of the failures that we endured in building, as they are instructive as to the nature of engineering…
  7. OXIDE Failure to bring CPU out of reset • Despite

    following the documented power sequencing to the CPU (AMD Milan), it was refusing to come out of reset, simply reinitiating the power-on sequence after 1.25 seconds of inactivity • Natural assumption was that power was marginal – but the power looked good (and making it extraordinary didn’t change anything) • Went down any number of blind alleys, performing directed experiments with respect to non-connected pins that shouldn’t make any difference • These experiments weren’t easy!
  8. OXIDE Failure to bring CPU out of reset • After

    several weeks of debugging, we discovered that our voltage regulator had a firmware bug: it adjusted voltage as requested by the CPU via SVI2 – but never sent a completion (VOTF Complete) • The CPU had no way of knowing that the power was in fact correct • AMD’s tool for verifying power (SDLE) did not check for this packet • Corrected regulator firmware resulted in the CPU coming out of reset!
  9. OXIDE Failure to bring NIC out of reset • We

    could not get the Chelsio NIC to come out of reset • Extensive validation did not reveal any signal that was out of spec • Attempting to take a working add-in card (AIC) and destroy it revealed that one of the pinstrap resistors (to select the clock source) was incorrectly specified • We had a 1K ohm pull-down resistor, but this was in fact too weak – and a 499 ohm resistor was required to overcome an internal pull-up • Reworking with the correct resistor resulted in the NIC correctly starting!
  10. OXIDE NIC transiently failing to train all PCIe lanes •

    We have our own platform enablement layer (i.e., no BIOS); we are responsible for initializing devices at the lowest layer • With disconcerting frequency, some number of Chelsio NIC links did not train correctly for some of their lanes on boot • Decoding the Link Status and Training State Machine (LSTSM) on the CPU allowed us to better understand where it was failing, but not why • Discovered that a second PERST resulted in correct training – and moreover that this second PERST is present on legacy firmware!
  11. OXIDE Failure to connect to U.2 NVMe drives • In

    a revision of our PCIe-to-U.2 passthrough card (Sharkfin), we had I2C connectivity – but no PCIe connectivity whatsoever • A previous version of this card had worked, but little had changed in the schematic and the layout – why were the new ones broken?! • Physical inspection revealed that one of the parts was simply wrong! • The wrong reel of parts had been loaded into a pick-and-place machine, and an inverter had been laid down instead of an AND gate (!) • Reworked ~1200 cards in ~96 hours!
  12. OXIDE Random data corruption on software install • When installing

    OS boot images, sporadic (!) corruption was seen • Adding checksums to these images revealed corruption was rampant (!!) • Microprocessor was speculatively loading through a stowaway mapping from early boot, which was allocating in the TLB • If application address conflicted with address of stowaway mapping, kernel would incorrectly copy data from the wire to the wrong location • Eliminating stowaway mapping eliminated the corruption – but highlighted divergent perspectives on side-effects of speculative loads
  13. OXIDE What do these have in common? • Each posed

    an existential risk for the artifact: without solving them, we wouldn’t have something that’s impaired – we would have nothing • Each revealed an emergent property, often at an interface boundary • The breakthrough was often something that “shouldn’t” have worked • Intelligence alone does not solve problems like this • In all cases, we summoned other elements of our character: our resilience, our teamwork, our rigor, our optimism, our curiosity
  14. OXIDE Values in engineering • These extra-intelligence values are so

    important to us, that we have codified them – and use them very explicitly as a lens for hiring • To be clear, we are certainly seeking capable, intelligent people – but that intelligence is useless without these shared (human!) values • We may be more explicit about it than others, but many engineering teams are also implicitly hiring for shared values • Viz.: It is comical to think of an engineering team hiring based only on the results of a test – or any other linear measure of intelligence!
  15. OXIDE The humanity in engineering • This humanity necessary to

    understand and resolve failure – so essential in designing and building – is hidden in the final artifact • This is the soul in Tracy Kidder’s Soul of a New Machine – and the perspiration in Edison’s proverbial 99% perspiration • Computer programs lack this humanity: they do not have willpower, desire, or drive – let alone the deeper human qualities required • Which doesn’t mean that AI can’t be useful to engineers, merely that it cannot engineer autonomously
  16. OXIDE So, should we worry about AI? • Extinction risk

    due to AGI is de minimis – but we must not falsely dichotomize AI into posing existential risk or no risk whatsoever! • The risk that AI does pose may feel mundane – but it is much more how it will be abused (deliberately or accidentally) by existing structures • AI ethics is exceedingly important, especially when it is being used to inform decisions that affect people’s lives! • By acknowledging that AI is and will be an important tool, we can move beyond fear to focus on enforcing existing regulatory regimes
  17. OXIDE Further wells to fall down information • Richard Smalley/K.

    Eric Drexler debate on molecular nanotechnology • Lex Friedman interview with Marc Andreessen • Logan Bartlett interview with Eliezer Yudkowsky • Oxide and Friends podcast, especially Okay Doomer, Tales From the Bringup Lab and More Tales from the Bringup Lab