The Autumn of Moore’s Law: Scaling Up Computer Performance, 2011-2020

In 2010 and 2011 I gave this survey talk on prospects for continued exponential scaling of computer performance for the Singularity University Graduate Studies Program, in Mountain View, CA.

It is in three parts: prospects for continued transistor scaling; the transition to parallel computer architecture; and the challenges of writing mainstream software for parallel computers.

Hello again, world

It has been about nine years since my last blog post at FPGA CPU News. How’s that for taking a break?

Back then I returned to Microsoft as a performance architect on the .NET Common Language Runtime. (Example.) Around 2004 it became clear that clock frequency scaling was at its asymptotic end and future performance scaling would increasingly come from parallel computing. I spent the next five years working to get Microsoft’s client software stack, and in particular its developer platform and tools ready for mainstream multi-core, manycore, and heterogeneous platforms. My mission was “to provide loveable parallel programming models, tools, and infrastructure that enable any developer to write robust software that scales up on new hardware”. I led a product incubation on transactional memory and 2007-09 I helped define and build Microsoft’s Parallel Computing Platform strategy, team, and software, some of which shipped in Visual Studio 2010.

I have missed blogging. I microblog on twitter, but it does not afford the space to elaborate on a topic.

The main theme of this blog is implementing parallel computers in FPGAs, but I will also use this space to sound off on other matters of interest to me.

For starters I am going to bring forward the archived FPGA CPU News content, bit by bit. Unfortunately the old site was just a big sed script so there is no good automated solution. I will fix linkrot where I can. Otherwise dead links will get dead-url’d and struck out. These articles should follow in reverse chronological order.

For the time being, the old archived site is at and this site will be at When I finish importing the archived content, I will remove the old site (both will point here).

Thank you for visiting.

Wednesday, February 5, 2003

Ron Wilson, EE Times: Avoidance proposed as solution to 90-nm problems.  Very interesting.

“The notion that RTL must be a description of the wiring, not simply an expression of the logic, recurred during the panel. It has also been voiced frequently by design teams (not represented on the panel) that are working with 130-nm designs. …””The notion of the predesigned, configurable platform is beginning to get serious notice at 90 nm.”

Monday, January 20, 2003

Happy new year (belated).

Embrace change Anthony Cataldo, EE Times: Altera to spin new FPGA for 90-nm production

Altera: Cyclone Devices … Shipping Ahead of Schedule.

“With only 15 months from conception to shipment, the development of the Cyclone device family is the fastest in Altera’s history.”

Altera: … Delivery of First Stratix GX Devices. Now sampling.

Impressive.  Congratulations.  Execute, execute, execute.

Xilinx: Enables Gibson Guitar’s Best of Show Award. I saw this at CES.  A guitar with an ethernet jack.

“Gibson will offer MaGIC, an acronym for Media-accelerated Global Information Carrier, in every Gibson guitar within the next 12-18 months. …””MaGIC uses state-of-the-art technology to provide up to 32 channels of 32-bit bi-directional high-fidelity audio with sample rates up to 192 kHz. Data and control can be transported 30 to 30,000 times faster than MIDI.”

Tom Hawkins of Launchbird Design Systems, Inc. announces Confluence 0.1.

“Confluence is a simple, yet amazingly powerful hardware design language. Its flexibility and high level of expression reduces code size and complexity of a design when compared with either Verilog or VHDL. Confluence also enforces clean RTL preventing common errors and bad design practices often introduced in traditional HDL coding.””And unlike C based approaches, design engineers love Confluence because it still feels like coding in HDL. The language is implicitly parallel and very structural. …”

“Confluence runs on Linux x86.”

OK, but please let us know when you run on the volume platform. Does Confluence employ OCaml?  Interesting if so. So far, details sketchy, but welcome, the more, the merrier.

Today’s schedule of the SDRForum Symposium on Use of Reconfigurable Logic in Software Defined Radios.

Saturday, December 28, 2002

FPGA-FAQ has a nice fresh list of FPGA boards.

Peter Clarke, Semiconductor Business News: Former UK defense unit offers floating-point unit for FPGAs. For MicroBlaze and the Virtex-II Pro’s PowerPC(s). QinetiQ [Quixilica].

‘We’re already seeing applications in image and signal processing systems, control, and support of legacy hardware, where the combination of an FPGA with an embedded microprocessor core and the FPU can provide the functionality and performance of an entire DSP subsystem, said Bill Smith, manager of QinetiQ’s real-time systems laboratory, in statement.’

I’ve been to Malvern several times, lovely place.

Wednesday, December 18, 2002

Free Xilinx PicoBlaze Microcontroller Expands Support to Virtex-II Series FPGAs and CoolRunner-II CPLDs. PicoBlaze User Resources.

Earlier coverage.

Regarding PicoBlaze for CPLDs, e.g. CoolRunner-II, lacking any on-chip block RAM instruction memory, the PB for CR2 requires you provide an external 16-bit wide instruction RAM.  This may prove prove prohibitive in board area and cost.  You can reduce the requirement to 8-bit external memory using a few more macrocells, of course, but in my opinion this application is a better fit for a device with embedded block memory (e.g. Spartan-IIE, etc.).

This does illustrate the utility and value of a modest amount of embedded RAM and/or FLASH in these larger CPLDs — an idea whose time has come.

Monday, December 16, 2002

Xilinx: 90nm Process Technology Drives Down Costs.

IBM: IBM and Xilinx prepare for production of first 90nm chips on 300mm wafers.


Anthony Cataldo, EE Times: IBM, Xilinx tape out first 90-nm FPGAs.

Therese Poletti, San Jose Mercury News: IBM-Xilinx new chip moves to production.

John Blau, IDG News Service: IBM, UMC ready first 90-nanometer chips.


Tuesday, December 3, 2002

Xilinx:Tarari adopts Xilinx Technology for Reconfigurable Content Processor Solutions.

“Tarari content processors are hardware and software-based subsystem building blocks (silicon, boards, etc.) that snap into servers, appliances and network devices, allowing for the first time the inspection of application layer content at network speeds…”


Here, March: Applications of racks full of FPGA multiprocessors:

“I suppose my pet hand-wavy application for these concept chip-MPs is lexing and parsing XML and filtering that (and/or parse table construction for same). Let me set the stage for you. “”Imagine a future in which “web services” are ubiquitous — the internet has evolved into a true distributed operating system, a cloud offering services to several billion connected devices. Imagine that the current leading transport candidate for internet RPC, namely SOAP — (Simple Object Access Protocol, e.g. XML encoded RPC arguments and return values, on an HTTP transport, with interfaces described in WSDL (itself based upon XML Schema)) — imagine SOAP indeed becomes the standard internet RPC. That’s a ton of XML flying around. You will want your routers and firewalls, etc. of the future to filter, classify, route, etc. that XML at wire speed. That’s a ton of ASCII lexing, parsing, and filtering. It’s trivially parallelizable — every second a thousand or a million separate HTTP sessions flash past your ports — and therefore potentially a nice application for rack full of FPGAs, most FPGAs implementing a 100-way parsing and classification multiprocessor.”