Intelligence is not enough: The humanity of engineering

Intelligence is not enough The humanity of engineering Bryan Cantrill
Oxide Computer Company

OXIDE It always starts with a tweet…

OXIDE It always starts with a tweet being trolled…

OXIDE “Serious”? • This tweet used the word “serious” three
times, mainly to deride others • Not clear what “serious” means in the context of an argument that equates a computer program with nuclear weapons? • Or accuses anyone who disagrees with this assessment of “just vibes”? • Or one that puts the risk of human extinction at the (metaphorical!) hands of a computer program to be 5% with zero methodology? • So, a serious question: why treat this seriously at all?

OXIDE Reasons to treat this seriously • Fear of technology
isn’t new – and isn’t always poorly founded! • New technologies often have unintended consequences and externalities that merit consideration and discussion • But in those who believe in AI-based extinction risk, the fear itself is alarming – in part because of the actions that it would justify • The “AI pause” – if implemented – would be brazenly authoritarian • The accompanying rhetoric is often disturbingly violent

OXIDE Concrete extinction risk • Most AGI-based extinction risk fears
– when made concrete – hinge on: ◦ A computer program getting ahold of nuclear weapons ◦ A computer program making a novel bioweapon ◦ A computer program developing novel molecular nanotechnology • We are going to leave aside nuclear weapons, as indisputably serious people have been thinking about it since the dawn of the atomic age • But the latter two have something important in common…

OXIDE Superintelligent engineering? • Whether stated explicitly or not, when
we talk about the fear of a superintelligent AI actively killing not just some humans but all of them, we are talking about AI making weapons • Let us leave aside many questions about such scenarios (e.g., AI’s alignment, motivation, or means of production – and human adaptability, countermeasures, and resilience), and focus on one pillar… • It depends on AI making applying the constraints of physical and mathematical reality to make new stuﬀ – which is to say, engineering

OXIDE Engineering and intelligence • If our very existence is
threatened by a superintelligence engaged in engineering, it prompts an important question… • Is engineering an act of intelligence alone? • I can’t speak to building novel bioweapons or the signiﬁcant challenges in reviving otherwise moribund molecular nanotechnology… • …but we do have a bunch of recent experience building something big and new that is surely simpler than these domains

OXIDE What we built!

OXIDE Building a computer • In case it needs to
be said: building a new computer + new network switch + high-speed backplane + all software from lowest levels of ﬁrmware to highest levels of control plane is hard and complicated • It is still, however, engineering not science • Engineering is the act of learning from failure: even when building anew, there will be many occasions when the system does not, in fact, work! • It is worth exploring a tiny fraction of the failures that we endured in building, as they are instructive as to the nature of engineering…

OXIDE Failure to bring CPU out of reset • Despite
following the documented power sequencing to the CPU (AMD Milan), it was refusing to come out of reset, simply reinitiating the power-on sequence after 1.25 seconds of inactivity • Natural assumption was that power was marginal – but the power looked good (and making it extraordinary didn’t change anything) • Went down any number of blind alleys, performing directed experiments with respect to non-connected pins that shouldn’t make any diﬀerence • These experiments weren’t easy!

OXIDE Failure to bring CPU out of reset

OXIDE Failure to bring CPU out of reset • After
several weeks of debugging, we discovered that our voltage regulator had a ﬁrmware bug: it adjusted voltage as requested by the CPU via SVI2 – but never sent a completion (VOTF Complete) • The CPU had no way of knowing that the power was in fact correct • AMD’s tool for verifying power (SDLE) did not check for this packet • Corrected regulator ﬁrmware resulted in the CPU coming out of reset!

OXIDE Failure to bring NIC out of reset • We
could not get the Chelsio NIC to come out of reset • Extensive validation did not reveal any signal that was out of spec • Attempting to take a working add-in card (AIC) and destroy it revealed that one of the pinstrap resistors (to select the clock source) was incorrectly speciﬁed • We had a 1K ohm pull-down resistor, but this was in fact too weak – and a 499 ohm resistor was required to overcome an internal pull-up • Reworking with the correct resistor resulted in the NIC correctly starting!

OXIDE NIC transiently failing to train all PCIe lanes •
We have our own platform enablement layer (i.e., no BIOS); we are responsible for initializing devices at the lowest layer • With disconcerting frequency, some number of Chelsio NIC links did not train correctly for some of their lanes on boot • Decoding the Link Status and Training State Machine (LSTSM) on the CPU allowed us to better understand where it was failing, but not why • Discovered that a second PERST resulted in correct training – and moreover that this second PERST is present on legacy ﬁrmware!

OXIDE Failure to connect to U.2 NVMe drives • In
a revision of our PCIe-to-U.2 passthrough card (Sharkﬁn), we had I2C connectivity – but no PCIe connectivity whatsoever • A previous version of this card had worked, but little had changed in the schematic and the layout – why were the new ones broken?! • Physical inspection revealed that one of the parts was simply wrong! • The wrong reel of parts had been loaded into a pick-and-place machine, and an inverter had been laid down instead of an AND gate (!) • Reworked ~1200 cards in ~96 hours!

OXIDE Random data corruption on software install • When installing
OS boot images, sporadic (!) corruption was seen • Adding checksums to these images revealed corruption was rampant (!!) • Microprocessor was speculatively loading through a stowaway mapping from early boot, which was allocating in the TLB • If application address conﬂicted with address of stowaway mapping, kernel would incorrectly copy data from the wire to the wrong location • Eliminating stowaway mapping eliminated the corruption – but highlighted divergent perspectives on side-eﬀects of speculative loads

OXIDE What do these have in common? • Each posed
an existential risk for the artifact: without solving them, we wouldn’t have something that’s impaired – we would have nothing • Each revealed an emergent property, often at an interface boundary • The breakthrough was often something that “shouldn’t” have worked • Intelligence alone does not solve problems like this • In all cases, we summoned other elements of our character: our resilience, our teamwork, our rigor, our optimism, our curiosity

OXIDE Values in engineering • These extra-intelligence values are so
important to us, that we have codiﬁed them – and use them very explicitly as a lens for hiring • To be clear, we are certainly seeking capable, intelligent people – but that intelligence is useless without these shared (human!) values • We may be more explicit about it than others, but many engineering teams are also implicitly hiring for shared values • Viz.: It is comical to think of an engineering team hiring based only on the results of a test – or any other linear measure of intelligence!

OXIDE The humanity in engineering • This humanity necessary to
understand and resolve failure – so essential in designing and building – is hidden in the ﬁnal artifact • This is the soul in Tracy Kidder’s Soul of a New Machine – and the perspiration in Edison’s proverbial 99% perspiration • Computer programs lack this humanity: they do not have willpower, desire, or drive – let alone the deeper human qualities required • Which doesn’t mean that AI can’t be useful to engineers, merely that it cannot engineer autonomously

OXIDE So, should we worry about AI? • Extinction risk
due to AGI is de minimis – but we must not falsely dichotomize AI into posing existential risk or no risk whatsoever! • The risk that AI does pose may feel mundane – but it is much more how it will be abused (deliberately or accidentally) by existing structures • AI ethics is exceedingly important, especially when it is being used to inform decisions that aﬀect people’s lives! • By acknowledging that AI is and will be an important tool, we can move beyond fear to focus on enforcing existing regulatory regimes

OXIDE Further wells to fall down information • Richard Smalley/K.
Eric Drexler debate on molecular nanotechnology • Lex Friedman interview with Marc Andreessen • Logan Bartlett interview with Eliezer Yudkowsky • Oxide and Friends podcast, especially Okay Doomer, Tales From the Bringup Lab and More Tales from the Bringup Lab

Intelligence is not enough: The humanity of eng...

Intelligence is not enough: The humanity of engineering

Bryan Cantrill

More Decks by Bryan Cantrill

Featured

Transcript