Slide 1

Slide 1 text

FPGAs and Open-Source Hardware - An Intro Lightning Talk Matt P. Dziubinski Meeting C++ 2016 [email protected] // @matt_dz Department of Mathematical Sciences, Aalborg University CREATES (Center for Research in Econometric Analysis of Time Series)

Slide 2

Slide 2 text

FPGAs 2

Slide 3

Slide 3 text

40 Years of Microprocessor Trend Data https://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend- data/ 3

Slide 4

Slide 4 text

Instruction Level Parallelism & Loop Unrolling - Code I #include #include #include #include #include #include 4

Slide 5

Slide 5 text

Instruction Level Parallelism & Loop Unrolling - Code II using T = double; T sum_1(const std::vector & input) { T sum = 0.0; for (std::size_t i = 0, n = input.size(); i != n; ++i) sum += input[i]; return sum; } T sum_2(const std::vector & input) { T sum1 = 0.0, sum2 = 0.0; for (std::size_t i = 0, n = input.size(); i != n; i += 2) { sum1 += input[i]; sum2 += input[i + 1]; } return sum1 + sum2; } 5

Slide 6

Slide 6 text

Instruction Level Parallelism & Loop Unrolling - Code III int main(int argc, char * argv[]) { const std::size_t n = (argc > 1) ? std::atoll(argv[1]) : 10000000; const std::size_t f = (argc > 2) ? std::atoll(argv[2]) : 1; std::cout << "n = " << n << '\n'; // iterations count std::cout << "f = " << f << '\n'; // unroll factor const std::vector a(n, T(1)); boost::timer::auto_cpu_timer timer; const T sum = (f == 1) ? sum_1(a) : (f == 2) ? sum_2(a) : 0; std::cout << sum << '\n'; } 6

Slide 7

Slide 7 text

Instruction Level Parallelism & Loop Unrolling - Results make vector_sums CXXFLAGS="-std=c++14 -O2 -march=native" LDLIBS=-lboost_timer $ ./vector_sums 1000000000 1 n = 1000000000 f = 1 1e+09 0.841269s wall, 0.840000s user + 0.010000s system = 0.850000s CPU (101.0%) $ ./vector_sums 1000000000 2 n = 1000000000 f = 2 1e+09 0.466293s wall, 0.460000s user + 0.000000s system = 0.460000s CPU (98.7%) 7

Slide 8

Slide 8 text

Microarchitecture Intel® 64 and IA-32 Architectures Optimization Reference Manual https://www-ssl.intel.com/content/www/us/en/architecture-and- technology/64-ia-32-architectures-optimization-manual.html 8

Slide 9

Slide 9 text

IACA Results - sum_1 $ iaca -64 -arch IVB -graph ./vector_sums_1i Intel(R) Architecture Code Analyzer Version - 2.1 Analyzed File - ./vector_sums_1i Binary Format - 64Bit Architecture - IVB Analysis Type - Throughput Throughput Analysis Report -------------------------- Block Throughput: 3.00 Cycles Throughput Bottleneck: InterIteration Port Binding In Cycles Per Iteration: ------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | ------------------------------------------------------------------------- | Cycles | 1.0 0.0 | 1.0 | 1.0 1.0 | 1.0 1.0 | 0.0 | 1.0 | ------------------------------------------------------------------------- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion happened # - ESP Tracking sync uop was issued @ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected ! - instruction not supported, was not accounted in Analysis | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | | mov rdx, qword ptr [rdi] | 2 | | 1.0 | | 1.0 1.0 | | | CP | vaddsd xmm0, xmm0, qword ptr [rdx+rax*8] | 1 | 1.0 | | | | | | | add rax, 0x1 | 1 | | | | | | 1.0 | | cmp rax, rcx | 0F | | | | | | | | jnz 0xffffffffffffffe7 Total Num Of Uops: 5 9

Slide 10

Slide 10 text

IACA Results - sum_2 $ iaca -64 -arch IVB -graph ./vector_sums_2i Intel(R) Architecture Code Analyzer Version - 2.1 Analyzed File - ./vector_sums_2i Binary Format - 64Bit Architecture - IVB Analysis Type - Throughput Throughput Analysis Report -------------------------- Block Throughput: 6.00 Cycles Throughput Bottleneck: InterIteration Port Binding In Cycles Per Iteration: ------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | ------------------------------------------------------------------------- | Cycles | 1.5 0.0 | 3.0 | 1.5 1.5 | 1.5 1.5 | 0.0 | 1.5 | ------------------------------------------------------------------------- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3), CP - on a critical path F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion happened # - ESP Tracking sync uop was issued @ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected ! - instruction not supported, was not accounted in Analysis | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov rcx, qword ptr [rdi] | 2 | | 1.0 | 0.5 0.5 | 0.5 0.5 | | | CP | vaddsd xmm0, xmm0, qword ptr [rcx+rax*8] | 1 | 1.0 | | | | | | | add rax, 0x2 | 2 | | 1.0 | 0.5 0.5 | 0.5 0.5 | | | | vaddsd xmm1, xmm1, qword ptr [rcx+rdx*1] | 1 | 0.5 | | | | | 0.5 | | add rdx, 0x10 | 1 | | | | | | 1.0 | | cmp rax, rsi | 0F | | | | | | | | jnz 0xffffffffffffffde | 1 | | 1.0 | | | | | CP | vaddsd xmm0, xmm0, xmm1 Total Num Of Uops: 9 10

Slide 11

Slide 11 text

CPUs: General Purpose, Fixed Functionality Hardware Intel® 64 and IA-32 Architectures Optimization Reference Manual https://www-ssl.intel.com/content/www/us/en/architecture-and- technology/64-ia-32-architectures-optimization-manual.html 11

Slide 12

Slide 12 text

CPUs: General Purpose, Fixed Functionality Hardware General purpose flexibility – not always needed at all costs. What if we need custom pipelines with more/custom functional units? 12

Slide 13

Slide 13 text

FPGAs: Custom Purpose, Reconfigurable Hardware We'll make our own functional units! With look-up-tables and flip-flops! 13

Slide 14

Slide 14 text

The world’s first FPGA: Xilinx XC2064 P. Alfke, I. Bolsens, B. Carter, M. Santarini, and S. Trimberger, “It’s an FPGA!,” IEEE Solid-State Circuits Mag., vol. 3, no. 4, 2011, pp. 15-20. 14

Slide 15

Slide 15 text

Reconfigurable Computing - Trends Lesley Shannon, Veronica Cojocaru, Cong Nguyen Dao, and Philip H.W. Leong. "Trends in reconfigurable computing: Applications and architectures." In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2015. 15

Slide 16

Slide 16 text

Reconfigurable Computing - Progression Ahmed, et al., “A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform,” IEEE Micro, Mar-Apr 2016. 16

Slide 17

Slide 17 text

Reconfigurable Computing - Timeline Russell Tessier, Kenneth Pocek, and André DeHon. "Reconfigurable Computing Architectures." In Proc. of the IEEE (Special Issue on Reconfigurable Systems), Volume 103, Number 3, 2015. 17

Slide 18

Slide 18 text

Hardware Description Language (HDL) • Hardware description language != software programming language • Hardware description: Quite different from software programming • Think: digital circuits design, design entry method happens to be using a text format rather than a schematic done with a CAD tool • Synthesizable constructs (design) vs. non-synthesizable constructs / testbenches (verification) • Gotchas: http://www.sutherland-hdl.com/papers.html • Verilog (IEEE 1364), SystemVerilog (IEEE 1800), VHDL (IEEE 1076) • Toolchain support, degree of standards compliance: extremely diverse... 18

Slide 19

Slide 19 text

Full Adder: Netlist & Processor vs. Reconfigurable Arch. A. DeHon, “Fundamental Underpinnings of Reconfigurable Computing Architectures,” Proceedings of the IEEE, March 2015. 19

Slide 20

Slide 20 text

SystemVerilog • combinational logic • outputs - purely a combination of inputs • assign continuous assignment statement • always_comb procedure • blocking assignment statement = • sequential logic • outputs - also depend on memory (e.g., previous inputs) • always_ff procedure • nonblocking assignment statement <= 20

Slide 21

Slide 21 text

Inverter: Combinational Logic Continuous Assignment module inv(input logic a, output logic q); assign q = ~a; endmodule 21

Slide 22

Slide 22 text

Inverter: Combinational Logic Procedural Blocking Assignment module inv(input logic a, output logic q); always_comb begin q = ~a; end endmodule https://bradpierce.wordpress.com/2009/12/04/sv-always_comb- safer-than-verilog-assign/ 22

Slide 23

Slide 23 text

Inverter: Design & Testbench inv.sv (Design) module inv(input logic a, output logic q); always_comb q = ~a; endmodule inv_tb.sv (Testbench) module inv_tb(); logic data_in; logic data_out; inv not_dut(.a(data_in), .q(data_out)); initial begin $dumpfile("inv_tb.vcd"); $dumpvars(0, inv_tb); #1 data_in = 0; #1 data_in = 1; #2 $finish; end endmodule 23

Slide 24

Slide 24 text

Design Flow "Open-source Hardware: Opportunities and Challenges" Gagan Gupta, Tony Nowatzki, Vinay Gangadhar and Karthikeyan Sankaralingam (pre-print: https://arxiv.org/abs/1606.01980) To Appear in IEEE Computer 24

Slide 25

Slide 25 text

The IceStorm flow • Fully open source Verilog-to-Bitstream flow for iCE40 FPGAs • Yosys: Verilog synthesis suite, formal verification • Arachne-pnr: place-and-route tool for iCE40 • IceStorm: tools & docs for iCE40 bitstream • including IcePack/IceUnpack, IceBox (icebox_explain), IceTime, IceProg • http://www.clifford.at/icestorm/ • iCE40 FPGAs • Lattice iCEstick (HX1K-TQ144) • http://www.latticesemi.com/icestick • https://octopart.com/search?q=icestick • iCE40-HX8K Breakout Board (HX8K-CT256) 25

Slide 26

Slide 26 text

Open source projects • GTKWave: waveform viewer • http://wiki.gedaproject.org/geda:faq#what_is_the_geda_suite 26

Slide 27

Slide 27 text

iCE40 HX1K: Block Diagram iCE40 LP/HX Family Data Sheet http://www.latticesemi.com/~/media/LatticeSemi/Documents/DataSheets/iCE/ iCE40LPHXFamilyDataSheet.pdf 27

Slide 28

Slide 28 text

iCE40 HX1K: Programmable Logic Block (PLB) iCE40 LP/HX Family Data Sheet http://www.latticesemi.com/~/media/LatticeSemi/Documents/DataSheets/iCE/ iCE40LPHXFamilyDataSheet.pdf 28

Slide 29

Slide 29 text

Inverter Simulation Waveform (Pre-Synthesis) # pre-synthesis simulation iverilog -g2012 -o inv_pre inv.sv inv_tb.sv ./inv_pre cp inv_tb.vcd inv_pre_tb.vcd gtkwave inv_pre_tb.vcd 29

Slide 30

Slide 30 text

Inverter Simulation Waveform (Post-Synthesis) # post-synthesis simulation yosys -p 'synth_ice40 -top inv -blif inv.blif' inv.sv yosys -o inv_syn.v inv.blif iverilog -g2012 -o inv_post -D inv_tb.sv inv_syn.v \ `yosys-config --datdir/ice40/cells_sim.v` ./inv_post cp inv_tb.vcd inv_post_tb.vcd gtkwave inv_post_tb.vcd 30

Slide 31

Slide 31 text

Inverter Logic Diagram (Pre-Synthesis) read_verilog -sv inv.sv show -format png -prefix ./inv_diagram_pre inv 31

Slide 32

Slide 32 text

Inverter Logic Diagram (Post-Synthesis) read_verilog inv_syn.v show -format png -prefix ./inv_diagram_post inv 32

Slide 33

Slide 33 text

Hello Meeting C++ (iCE40-HX1K) iCEstick iCE40HX1K TQ144 module hi(input logic clk, output logic LED1, output logic LED2, output logic LED3, output logic LED4, output logic LED5); logic [22:0] counter = 0; logic period_passed; logic control = 0; logic hello; logic meetingcpp; always_ff @(posedge clk) begin counter <= counter + 1; period_passed <= (counter == 0); if (period_passed) control <= !control; end assign hello = control; assign meetingcpp = ~control; assign LED1 = hello; assign LED2 = hello; assign LED3 = meetingcpp; assign LED4 = meetingcpp; assign LED5 = meetingcpp; endmodule 33

Slide 34

Slide 34 text

Hello Meeting C++ (iCE40-HX8K) iCE40-HX8K Breakout Board iCE40HX8K CT256 module hi(input logic clk, output logic LED1, output logic LED2, output logic LED3, output logic LED4, output logic LED5, output logic LED6, output logic LED7, output logic LED8); logic [22:0] counter = 0; logic period_passed; logic control = 0; logic hello; logic meetingcpp; always_ff @(posedge clk) begin counter <= counter + 1; period_passed <= (counter == 0); if (period_passed) control <= !control; end assign hello = control; assign meetingcpp = ~control; assign LED1 = hello; assign LED2 = hello; assign LED3 = meetingcpp; assign LED4 = meetingcpp; assign LED5 = hello; assign LED6 = meetingcpp; assign LED7 = hello; assign LED8 = meetingcpp; endmodule 34

Slide 35

Slide 35 text

PCF (Physical Constraints File) (iCE40-HX1K) # iCEstick iCE40HX1K TQ144 clk & LED pins set_io clk 21 set_io LED1 99 set_io LED2 98 set_io LED3 97 set_io LED4 96 set_io LED5 95 Source: Datasheet / Pinout Diagram (next slide) 35

Slide 36

Slide 36 text

iCEstick iCE40HX1K TQ144 Pinout Diagram https://github.com/Obijuan/open-fpga-verilog-tutorial/blob/master/tutorial/doc/images/icestick_pinout.png 36

Slide 37

Slide 37 text

PCF (Physical Constraints File) (iCE40-HX8K) # iCE40-HX8K Breakout Board iCE40HX8K CT256 clk & LED pins set_io clk J3 set_io LED1 B5 set_io LED2 B4 set_io LED3 A2 set_io LED4 A1 set_io LED5 C5 set_io LED6 C4 set_io LED7 B3 set_io LED8 C3 37

Slide 38

Slide 38 text

iCEstick iCE40HX1K TQ144 Makefile hi.asc: hi.sv hi.pcf yosys -q -p "synth_ice40 -blif hi.blif" hi.sv arachne-pnr -d 1k -p hi.pcf hi.blif -o hi.asc hi.bin: hi.asc icebox_explain hi.asc > hi.ex icepack hi.asc hi.bin timing.txt: hi.asc icetime -tmd hx1k hi.asc > timing.txt configure: hi.bin iceprog hi.bin clean: rm -f hi.blif hi.asc hi.ex hi.bin .PHONY: clean configure 38

Slide 39

Slide 39 text

iCE40-HX8K Breakout Board iCE40HX8K CT256 Makefile hi.asc: hi.sv hi.pcf yosys -q -p "synth_ice40 -blif hi.blif" hi.sv arachne-pnr -d 8k -p hi.pcf hi.blif -o hi.asc hi.bin: hi.asc icebox_explain hi.asc > hi.ex icepack hi.asc hi.bin timing.txt: hi.asc icetime -tmd hx8k hi.asc > timing.txt configure: hi.bin iceprog hi.bin clean: rm -f hi.blif hi.asc hi.ex hi.bin .PHONY: clean configure 39

Slide 40

Slide 40 text

IceTime: Timing Reports // Reading input .asc file.. // Reading 1k chipdb file.. // Creating timing netlist.. icetime topological timing analysis report ========================================== . . . Total number of logic levels: 23 Total path delay: 5.55 ns (180.20 MHz) 40

Slide 41

Slide 41 text

Open-Source Hardware 41

Slide 42

Slide 42 text

RISC-V RISC-V: The Free and Open RISC Instruction Set Architecture https://riscv.org/ https://riscv.org/2016/10/5th-risc-v-workshop-agenda/ 42

Slide 43

Slide 43 text

PicoRV32 - A Size-Optimized RISC-V CPU • PicoRV32 - A Size-Optimized RISC-V CPU https://github.com/cliffordwolf/picorv32 https://github.com/cliffordwolf/picorv32/tree/master/scripts/icestorm • Running a RISC-V core on an IcoBoard http://pramode.in/2016/10/23/running-riscv-on-an-icoboard/ 43

Slide 44

Slide 44 text

ZPU Softcore Running ZPU Softcore on Lattice iCE40 https://sigalrm.blogspot.com/2014/04/running-zpu-softcore-on- lattice-ice40.html 44

Slide 45

Slide 45 text

RISC-V Open Source Projects • BOOM: The Berkeley Out-of-Order RISC-V Processor • https://github.com/ucb-bar/riscv-boom https://twitter.com/boom_cpu • https://ccelio.github.io/riscv-boom-doc/ • lowRISC - creating a fully open-sourced, Linux-capable, RISC-V-based SoC: http://www.lowrisc.org/ • https://twitter.com/lowRISC • https://github.com/lowRISC/lowrisc-chip/ • mriscv: A 32-bit Microcontroller featuring a RISC-V core • https://github.com/onchipuis/mriscv https://twitter.com/onchipUIS • http://www.onchipuis.io/risc-v • PULPino - 32-bit RISC-V microcontroller core: http://www.pulp-platform.org/ • https://twitter.com/pulp_platform • https://github.com/pulp-platform/pulpino 45

Slide 46

Slide 46 text

RISC-V Startup SiFive - customized silicon based on the free and open RISC-V instruction set architecture https://www.sifive.com/ 46

Slide 47

Slide 47 text

Open Source Processors Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, and David Wentzlaff. 2016. OpenPiton: An Open Source Manycore Research Framework. SIGPLAN Not. 51, 4 (March 2016), 217-232. 47

Slide 48

Slide 48 text

Open Source FPGA Projects • FPGA Webserver: https://github.com/hamsternz/FPGA_Webserver • J2 core: a cleanroom reimplementation of the SH-2 ISA with extensions: http://j-core.org/ • "Building a CPU from Scratch: jcore Design Walkthrough": http://j-core.org/talks/ • NetFPGA: http://netfpga.org/ • https://github.com/NetFPGA/netfpga • https://github.com/NetFPGA/NetFPGA-public/wiki • Nyuzi Processor: GPGPU processor, SystemVerilog FPGA implementation: https://github.com/jbush001/NyuziProcessor • TPU: Designing a CPU in VHDL • http://labs.domipheus.com/blog/tpu-series-quick-links/ • Github repository with VHDL sources, ISE project, assembler and ISA: https://github.com/Domipheus/TPU 48

Slide 49

Slide 49 text

FuseSoC FuseSoC - package manager and a set of build tools for HDL (Hardware Description Language) code for FPGA/ASIC development https://github.com/olofk/fusesoc Olof Kindgren, https://twitter.com/olofkindgren 49

Slide 50

Slide 50 text

OpenCores OpenCores: Open Source Hardware Community http://opencores.org/ 50

Slide 51

Slide 51 text

FOSSi Foundation & LibreCores FOSSi: The Free and Open Source Silicon Foundation http://fossi-foundation.org/ https://twitter.com/FossiFoundation LibreCores: Free and Open Source Digital Hardware open source hardware community and directory https://www.librecores.org/ https://twitter.com/librecores 51

Slide 52

Slide 52 text

Resources 18-643 Reconfigurable Logic: Technology, Architecture and Applications http://users.ece.cmu.edu/~jhoe/doku/doku.php?id=18- 643_reconfigurable_logic http://www.clifford.at/icestorm/ http://www.clifford.at/icestorm/bitdocs-1k/ http://www.edaplayground.com/ http://fpgacpu.ca/fpga/ OH! Open Hardware for Chip Designers Silicon proven Verilog library for IC and FPGA designers https://github.com/parallella/oh https://github.com/parallella/oh#design-guide https://github.com/parallella/oh#coding-guide 52

Slide 53

Slide 53 text

FPGA History P. Alfke, I. Bolsens, B. Carter, M. Santarini, and S. Trimberger, “It’s an FPGA!,” IEEE Solid-State Circuits Mag., vol. 3, no. 4, 2011, pp. 15-20. http://ieeexplore.ieee.org/document/6069771/ Computer History Museum: Oral History of Bill Carter, designer of the first FPGA. Interviewed by Steve Trimberger on 2015-07-13 https://www.youtube.com/watch?v=1oG-3XWLgog "Xilinx and the Birth of the Fabless Semiconductor Industry" by Steve Leibson. Chapter of "Fabless: the Transformation of the Semiconductor Industry" by Daniel Nenni and Paul McLellan https://forums.xilinx.com/xlnx/attachments/xlnx/ Xcell/200/1/Fabless%20Book%20Chapter%20FINAL.pdf 53

Slide 54

Slide 54 text

Verilog • EDA Playground Verilog Tutorials - https://www.youtube.com/ playlist?list=PLScWdLzHpkAfbPhzz1NKHDv2clv1SgsMo • HDLBits — Verilog Practice - http://verilog.stuffedcow.net/ • http://hackaday.com/2015/08/19/ learning-verilog-on-a-25-fpga-part-i/ • http://hackaday.com/2015/07/28/ open-source-fpga-toolchain-builds-cpu/ • https://github.com/Obijuan/ open-fpga-verilog-tutorial/wiki/Chapter-0% 3A-you-are-leaving-the-privative-sector • Lattice HDL Coding Guidelines - http://www.latticesemi. com/~/media/LatticeSemi/Documents/UserManuals/EI/ HDLcodingguidelines.pdf?document_id=48203 • Quick Reference for Verilog HDL - https://github.com/ parallella/oh/blob/master/docs/verilog_reference.md 54

Slide 55

Slide 55 text

Conferences & Communities ORCONF: An open source digital design conference http://orconf.org/ OSHUG: Open Source Hardware User Group http://oshug.org/ RISC-V Workshops https://riscv.org/workshops/ 55

Slide 56

Slide 56 text

News & Research • http://clifford.at/icestorm/ - http://clifford.at/yosys/ - https://twitter.com/oe1cxw • http://fpga.org/ - https://twitter.com/jangray • http://fpgacpu.ca/ - https://twitter.com/elaforest • http://fpgalanguages.com/ - https://twitter.com/fpga_languages • http://fpgawars.github.io/ - https://github.com/FPGAwars/ • https://twitter.com/fpganotes • https://www.fpgarelated.com/ - https://twitter.com/FPGARelated • http://icoboard.org/ - https://twitter.com/ico_TC • http://nachiket.github.io/ - https://twitter.com/nachiketkapre • https://www.parallella.org/ - https://twitter.com/adapteva • http://zedboard.org/content/microzed-chronicles - https://twitter.com/ATaylorCEngFIET 56

Slide 57

Slide 57 text

Extra Slides 57

Slide 58

Slide 58 text

Place & Route & Timing (Source #1 & #2) // . . . Version #1 . . . always_ff @(posedge clk) begin counter <= counter + 1; period_passed <= (counter == 0); if (period_passed) control <= ~control; end always_comb begin hello = control; meetingcpp = ~control; LED1 = hello; LED2 = hello; LED3 = meetingcpp; LED4 = meetingcpp; LED5 = meetingcpp; end endmodule // . . . Version #2 . . . always_ff @(posedge clk) begin counter <= counter + 1; period_passed <= ~|counter; if (period_passed) control <= ~control; end always_comb begin hello = control; meetingcpp = ~control; LED1 = hello; LED2 = hello; LED3 = meetingcpp; LED4 = meetingcpp; LED5 = meetingcpp; end endmodule 58

Slide 59

Slide 59 text

Place & Route & Timing (P&R Version #1) place... initial wire length = 517 at iteration #50: temp = 11.1576, wire length = 179 at iteration #100: temp = 4.43194, wire length = 93 at iteration #150: temp = 0.917684, wire length = 60 final wire length = 50 After placement: PIOs 4 / 96 PLBs 10 / 160 BRAMs 0 / 16 place time 0.05s route... pass 1, 0 shared. After routing: span_4 13 / 6944 span_12 4 / 1440 route time 0.04s 59

Slide 60

Slide 60 text

Place & Route & Timing (P&R Version #2) place... initial wire length = 537 at iteration #50: temp = 10.5997, wire length = 191 at iteration #100: temp = 4.21034, wire length = 119 at iteration #150: temp = 0.917684, wire length = 54 final wire length = 45 After placement: PIOs 4 / 96 PLBs 9 / 160 BRAMs 0 / 16 place time 0.06s route... pass 1, 0 shared. After routing: span_4 13 / 6944 span_12 1 / 1440 route time 0.04s 60

Slide 61

Slide 61 text

Place & Route & Timing (Timing Report Version #1) Report for critical path: ------------------------- lc40_11_7_2 (LogicCell40) [clk] -> lcout: 0.640 ns 0.640 ns net_21394 (counter[0]) odrv_11_7_21394_11136 (Odrv12) I -> O: 0.540 ns t106 (LocalMux) I -> O: 0.330 ns inmux_8_7_17380_17407 (InMux) I -> O: 0.260 ns lc40_8_7_0 (LogicCell40) in1 -> carryout: 0.260 ns 2.029 ns t2 lc40_8_7_1 (LogicCell40) carryin -> carryout: 0.126 ns . . . 5.549 ns net_15475 (counter[22]) Resolvable net names on path: 0.640 ns .. 1.769 ns counter[0] 2.155 ns .. 2.155 ns $auto$alumacc.cc:470:replace_alu$13.C[2] . . . 5.073 ns .. 5.332 ns $auto$alumacc.cc:470:replace_alu$13.C[22] lcout -> counter[22] Total number of logic levels: 23 Total path delay: 5.55 ns (180.20 MHz) 61

Slide 62

Slide 62 text

Place & Route & Timing (Timing Report Version #2) Report for critical path: ------------------------- lc40_5_8_5 (LogicCell40) [clk] -> lcout: 0.640 ns 0.640 ns net_9025 (counter[0]) t42 (LocalMux) I -> O: 0.330 ns inmux_6_8_13283_13312 (InMux) I -> O: 0.260 ns lc40_6_8_0 (LogicCell40) in1 -> carryout: 0.260 ns 1.489 ns t2 lc40_6_8_1 (LogicCell40) carryin -> carryout: 0.126 ns . . . 5.009 ns net_11380 (counter[22]) Resolvable net names on path: 0.640 ns .. 1.229 ns counter[0] 1.615 ns .. 1.615 ns $auto$alumacc.cc:470:replace_alu$14.C[2] . . . 4.532 ns .. 4.792 ns $auto$alumacc.cc:470:replace_alu$14.C[22] lcout -> counter[22] Total number of logic levels: 23 Total path delay: 5.01 ns (199.63 MHz) 62

Slide 63

Slide 63 text

Select Assign: Else & Priority (#1 & #2) // Version #1 (priority) module sel_assign( input logic s1, s2, d1, d2, output logic o1, o2 ); always_comb begin o1 = 1'b0; o2 = 1'b0; if (s1) o1 = d1; else if (s2) o2 = d2; end endmodule // Version #2 (no priority) module sel_assign( input logic s1, s2, d1, d2, output logic o1, o2 ); always_comb begin o1 = 1'b0; o2 = 1'b0; if (s1) o1 = d1; if (s2) o2 = d2; end endmodule 63

Slide 64

Slide 64 text

Select Assign: Else & Priority (Logic Diagram #1) Note: s1 & inferred priority 64

Slide 65

Slide 65 text

Select Assign: Else & Priority (Logic Diagram #2) Note: no inferred priority of s1 necessary 65

Slide 66

Slide 66 text

Synchronizer (#1 & #2) // Version #1 (correct) module sync( input logic clk, input logic d, output logic q); logic n; always_ff @(posedge clk) begin n <= d; q <= n; end endmodule • always_ff & nonblocking assignment statement // Version #2 (incorrect) module sync( input logic clk, input logic d, output logic q); logic n; always_ff @(posedge clk) begin n = d; q = n; end endmodule • issue: always_ff & blocking assignment statement 66

Slide 67

Slide 67 text

Synchronizer (Logic Diagram #1) Note: synchronization via intermediate n and 2 flip-flops 67

Slide 68

Slide 68 text

"Synchronizer" (Logic Diagram #2) Note: useless intermediate n optimized out Yosys: 2.7.3. Executing OPT_CLEAN pass (remove unused cells and wires). Finding unused cells or wires in module \sync.. removing unused `$dff' cell `$procdff$3'. removing unused non-port wire \n. removed 1 unused temporary wires. 68

Slide 69

Slide 69 text

Inverter: Self-Checking Testbench module inv_tb(); logic data_in; logic data_out; logic data_out_expected; logic errors = 0; inv not_dut(.a(data_in), .q(data_out)); initial begin $dumpfile("inv_tb.vcd"); $dumpvars(0, inv_tb); data_out_expected = 1; data_in = 0; #1; $write("data_in: %b, ", data_in); $display("data_out: %b", data_out); if (data_out !== data_out_expected) begin errors = 1; $display("Error: input = %b, output = %b (expected: %b)", data_in, data_out, data_out_expected); end data_out_expected = 0; data_in = 1; #1; $write("data_in: %b, ", data_in); $display("data_out: %b", data_out); if (data_out !== data_out_expected) begin errors = 1; $display("Error: input = %b, output = %b (expected: %b)", data_in, data_out, data_out_expected); end $display("Tests: %s", errors ? "FAILED!" : "passed."); $finish; end endmodule 69

Slide 70

Slide 70 text

Inverter: Self-Checking Testbench: Output iverilog -g2012 -o inv_pre inv.sv inv_tb.sv ./inv_pre gtkwave inv_tb.vcd & VCD info: dumpfile inv_tb.vcd opened for output. data_in: 0, data_out: 1 data_in: 1, data_out: 0 Tests: passed. 70

Slide 71

Slide 71 text

Inverter: Formal Verification with Yosys-SMTBMC Reference: http://www.clifford.at/papers/2016/yosys-smtbmc/ Correct design: // inv.sv module inv(input logic a, output logic q); always_comb q = ~a; endmodule Faulty design: // inv.sv module inv(input logic a, output logic q); always_comb q = a; // not an inverter! endmodule 71

Slide 72

Slide 72 text

Inverter: Formal Verification with Yosys-SMTBMC ABV (Assertion Based Verification) & SVA (SystemVerilog Assertions) // inv_tb.sv module inv_tb(input logic data_in); logic data_out; inv inv_uut(.a(data_in), .q(data_out)); always_comb begin if (!$initstate) begin // negation property assert(data_out == data_in ^ 1); // other examples assert(data_out ^ data_in); if (!data_in) assert(data_out); if (data_out) assert(!data_in); assert(data_in != 0 || data_out != 0); assert(data_in == 0 || data_out == 0); end end endmodule 72

Slide 73

Slide 73 text

Inverter: Formal Verification with Yosys-SMTBMC Correct design: $ yosys -ql inv.yslog \ -p 'read_verilog -sv inv.sv' \ -p 'read_verilog -formal -sv inv_tb.sv' \ -p 'prep -top inv_tb -nordff' \ -p 'write_smt2 inv.smt2' $ yosys-smtbmc inv.smt2 ## 0 0:00:00 Solver: z3 ## 0 0:00:00 Checking asserts in step 0.. ## 0 0:00:00 Checking asserts in step 1.. . . . ## 0 0:00:00 Checking asserts in step 18.. ## 0 0:00:00 Checking asserts in step 19.. ## 0 0:00:00 Status: PASSED 73

Slide 74

Slide 74 text

Inverter: Formal Verification with Yosys-SMTBMC Faulty design: $ yosys -ql inv.yslog \ -p 'read_verilog -sv inv.sv' \ -p 'read_verilog -formal -sv inv_tb.sv' \ -p 'prep -top inv_tb -nordff' \ -p 'write_smt2 inv.smt2' $ yosys-smtbmc inv.smt2 ## 0 0:00:00 Solver: z3 ## 0 0:00:00 Checking asserts in step 0.. ## 0 0:00:00 Checking asserts in step 1.. ## 0 0:00:00 BMC failed! ## 0 0:00:00 Assert failed in inv_tb: inv_tb.sv:15 ## 0 0:00:00 Assert failed in inv_tb: inv_tb.sv:17 ## 0 0:00:00 Assert failed in inv_tb: inv_tb.sv:19 ## 0 0:00:00 Assert failed in inv_tb: inv_tb.sv:12 ## 0 0:00:00 Status: FAILED (!) 74

Slide 75

Slide 75 text

Slides & References https://speakerdeck.com/mattpd https://github.com/MattPD/cpplinks/blob/master/comparch.fpga.md 75

Slide 76

Slide 76 text

Thank You! Questions? 76