Disclaimer

  • The postings on this site are my own and don’t necessarily represent Microsoft's positions, strategies or opinions.

Twitter Updates

    follow me on Twitter
    AddThis Social Bookmark Button

    Technorati

    • Add to Technorati Favorites

    Multicore Architecture

    June 14, 2009

    Microsoft Extreme Computing Group (XCG)

    Here is a small item that may be of possible interest, creation of the Extreme Computing Group (XCG) at Microsoft. XCG was formed in June 2009 with the goal of developing radical new approaches to ultrascale and high-performance computing hardware and software. The group's research activities include work in computer security, cryptography, operating system design, parallel programming models, cloud software, data center architectures, specialty hardware accelerators and quantum computing.

    June 09, 2009

    HPC: Making a Small Fortune

    N.B. I also write for the Communications of the ACM (CACM). The following essay recently appeared on the CACM blog.

    There is an old joke in the high-performance computing community that begins with a question, "How do you make a small fortune in high-performance computing?" There are several variations on the joke, but they all end with the same punch line, "Start with a large fortune and ship at least one generation of product. You will be left with a small fortune." Forty years of experience, with companies large and small, has confirmed the sad truth of this statement.

    As we all know, the computing industry is extremely competitive, and new trends and technologies have repeatedly had transformative effect. One need look no further than the regular inductees to the Dead Supercomputing Society to see the devastating effects of the ongoing attack of the killer micros on the market for custom high-performance computing system designs. The microprocessor performance increases over the past thirty years due to decreasing feature sizes, higher clock rates and greater architectural complexity have repeatedly dashed the hopes of many high-performance computing entrepreneurs.

    The market lesson is that one false step inevitably leads to failure, particularly for startup companies struggling to establish a new niche in the face of commodity economics. It has never been truer than in today's economy, where potential buyers are retrenching and evaluating each purchase with a discriminating and sometimes jaundiced eye. Recently, the high-performance computing industry lost several established companies to merger and acquisition, due to weak market positions. We have also seen startup companies fail due to missteps and financial pressures.

    This reminds me of another old analogy, which compares building computer hardware and software to playing pinball – one's reward for playing well is the opportunity to keep playing via free games. The punishment for not playing well is equally clear; one must continue to insert quarters into the machine. Venture capitalists know this well, as they evaluate the pinball skills of those pitching business plans.

    Without doubt, we need a new generation of high-performance computing systems, from consumer devices to exascale platforms, to drive innovation, improve health care, manage critical infrastructure and ensure safety and defense. The question is whether the rise of multicore and manycore chips and explicit parallelism in the commodity microprocessor and GPU markets will finally change a few of the rules of the pinball game, via a combination of consumer economics pressures and technological need, the latter due to clock frequency and power limitations.

    I believe we are at an inflection point, where new approaches must both survive and flourish if we are to continue to deliver higher performance in effective and reasonable ways. It is worth remembering that Andy Grove's famous comment, "Only the paranoid survive," is but the trailing phrase in a larger, more perspicuous comment, "Success breeds complacency. Complacency breeds failure. Only the paranoid survive."

    We cannot be complacent about the future, especially now. We must continue to innovate, even if – especially if – that means adding quarters to the innovation machine.

    April 19, 2009

    Escaping from Flatland

    In 1884 (no, that's not a typographical error), Edwin Abbott wrote a satire about Victorian England and its social hierarchy, in the guise of a mathematical story about life in a two-dimensional world, whence the whimsical title, Flatland: A Romance of Many Dimensions. If one can look beyond Abbott's misogyny to the crux of the mathematical story, it is an illuminating introduction to geometry in one, two and three dimensions, with some generalizing hints about higher dimensional geometries. (You can read a copy at the Internet Archive, or purchase a hardcopy).

    The story is told from the perspective of a square, a resident of Flatland, who receives a visit from an inhabitant of a three-dimensional world – Spaceland. In Flatland, of course, the sphere is perceived only as a circle whose diameter varies based on the sphere's orientation to the plane. The sphere preaches the existence of higher dimensions, but the Flatlander leaders attempt to suppress all such information.

    Denizens of Waferland

    In fine recursive fashion, the Flatland story is itself a metaphor for our tenacious embrace of our two-dimensional world of semiconductors, particularly as it relates to memory technologies. We are deeply entangled in the angst, ennui, despair and perhaps even the clinical depression related to our encounter with the limits of instruction level parallelism (ILP), sequential execution semantics and the microprocessor power wall.

    As a community, we have grudgingly and guardedly recognized the need for multicore processors (See Three Views on Multicore and Manycore: Able Was I Ere I Saw Elba.) However, we are still clinging tenaciously to our dual in-line memory module (DIMM), two-dimensional packaging and double data rate (DDR) memory designs. We need a visitor from the third dimension, preaching the gospel of chip stacking to the denizens of chip waferland.

    Chip Stacking: Beyond DDR

    We are approaching scaling limits for our pin-based interfaces. Each generation, we have dropped voltages, increased clock rates and doubled the number of words transferred. DDR memory first operated at 2.5V and up to 400 Mb/s, dropped to 1.8V and increased to 800 Mb/s for DDR2, and is at 1.5V and 1600 Mb/s for DDR3. We can see DDR4 on the horizon, perhaps in 2012 at ~1V.

    It is time – long past time – for us to move to the third dimension and stack our chips. With chip (die) stacking, need not be constrained by connections to the perimeters of our chips, but can exploit connectivity across a larger fraction of their area. IBM, Intel, Samsung and others are exploring variations of this idea, as this smattering of press releases and articles illustrates.

    With lower power, multicore designs, through silicon vias (TSVs), and wafer thinning for heat dissipation, we can crack the memory wall that has plagued us for so long. More to the point, those of us in the big iron/fast iron camp could learn a few things from our compatriots in the embedded systems world, where innovative packaging is a fundamental market driver.

    Make no mistake; this will not be easy, as it requires new approaches to via fabrication, as well changing our ecosystem of chipsets and interface standardization processes. We may not have tachyon-based hyperspatial communication, but surely we can escape from Flatland.

    April 11, 2009

    When Petascale Is Just Too Slow

    N.B. I also write for the Communications of the ACM (CACM). The following essay recently appeared on the CACM blog.

    It seems as if it were just yesterday when I was at NCSA and we deployed a one teraflop Linux cluster as a national resource. We were as excited as proud parents by the configuration: 512 dual processor nodes (1 GHz Intel Pentium III processors), a Myrinet interconnect and (gasp) a stunning 5 terabytes of RAID storage. It achieved a then astonishing 594 gigaflops on the High-Performance Linpack (HPL) benchmark, and it was ranked 41st on the Top500 list.

    The world has changed since then. We hit the microprocessor power (and clock rate) wall, birthing the multicore era; vector processing returned incognito, renamed as graphical processing units (GPUs) ; terabyte disks are available for a pittance at your favorite consumer electronics store; and the top-ranked system on the Top500 list broke the petaflop barrier last year, built from a combination of multicore processors and gaming engines. The last is interesting for several reasons, both sociological and technological.

    Petascale Retrospective

    On the sociological front, I remember participating in the first petascale workshop at Caltech in the 1990s. Seymour Cray, Burton Smith and others were debating future petascale hardware and architectures, a second group debated device technologies, a third discussed application futures, and a final group of us were down the hall debating future software architectures. (I distinctly remember talking to Seymour about his "parity is for farmers" comment regarding memory ECC.) All this was prelude to an extended series of architecture, system software, programming models, algorithms and applications workshops that spanned several years and multiple retreats.

    By the way, you can read the original report here; it is fascinating to look back. Paul Messina, Thomas Sterling and others deserve our thanks for launching the seminal activity.

    At the time, most of us were convinced that achieving petascale performance within a decade would require some new architectural approaches and custom designs, along with radically new system software and programming tools. We were wrong, or at least so it superficially seems. We broke the petascale barrier in 2008 using commodity x86 microprocessors and GPUs, Infiniband interconnects, minimally modified Linux and the same message-based programming model we have been using for the past twenty years.

    However, as peak system performance has risen, the number of users has declined. Programming massively parallel systems is not easy, and even terascale computing is not routine. Horst Simon explained this with an interesting analogy, which I have taken the liberty of elaborating slightly. The ascent of Mt. Everest by Edmund Hillary and Tenzing Norgay in 1953 was heroic. Today, amateurs still die each year attempting to replicate the feat. We may have scaled Mt. Petascale, but we are far from making it pleasant or even routine weekend hike.

    This raises the real question, were we wrong in believing different hardware and software approaches were needed to make petascale computing a reality? I think we were absolutely right that new approaches were needed. However, our recommendations for a new research and development agenda were not realized. At least in part, I believe this is because we have been loathe to mount the integrated research and development needed to change our current hardware/software ecosystem and procurement models.

    Exascale Futures

    I recently participated in the International Exascale Software Project Workshop (IESP), the first in a series of meetings designed to explore organizational and technical approaches to exascale system design and construction. The workshop built on several earlier meetings and studies, including the DARPA exascale hardware study and the forthcoming exascale software study (in which I participated), as well as the DOE exascale applications study. Complementary analyses are underway in the European Union and in Asia.

    Evolution or revolution, it's the persistent question. Can we build reliable exascale systems from extrapolations of current technology or will new approaches be required? There is no definitive answer, as almost any approach might be made to work at some level with enough heroic effort. The bigger question is what design would enable the most breakthrough scientific research in a reliable and cost effective way?

    My personal opinion is that we need to rethink some of our dearly held beliefs and take a different approach. The degree of parallelism required at exascale, even with future manycore designs, will challenge even our most heroic application developers, and the number of components will raise new reliability and resilience challenges. Then there are interesting questions about manycore memory bandwidth, achievable system bisection bandwidth and I/O capability and capacity. There are just a few programmability issues as well!

    I believe it is time for us to move from our deus ex machina model of explicitly managed resources to a fully distributed, asynchronous model that embraces component failure as a standard occurrence. To draw a biological analogy, we must reason about systemic, organism health and behavior rather than cellular signaling and death, and not allow cell death (component failure) to trigger organism death (system failure). Such a shift in world view has profound implications for how we structure the future of international high-performance computing research, academic-government-industrial collaborations and system procurements.

    October 07, 2008

    Three Views on Multicore

     

    Andrew Chien, Dave Patterson and I have each written articles on the challenges and opportunities inherent in multicore hardware and software for the Community Computing Consortium (CCC) blog. My recent article, on the challenge of software, is now posted. In the article, I argued that we must re-envision parallel computing and a new generation of applications that explicitly exploit the scale and heterogeneity of multicore. You can read the article on the CCC blog or below.

    Multicore: It's The Software

    For over thirty years, we have watched the great cycle of innovation defined by the commodity hardware/software ecosystem – faster processors enable software with new features and capabilities that in turn require faster processors, which beget new software. The great wheel has turned, but it no more, as power constraints and device physics now limit the performance achievable with single microprocessors.

    Multicore chips – those with multiple, lower power processors per chip – are now the norm. Moreover, current multicore chips (those with 4-8 cores/chip) are but the beginning. We can expect hundreds of cores per chip in the future, with diverse functionality (graphics, packet protocol processing, DSP, cryptography and other features).

    The software research challenge is clear – developing effective programming abstractions and tools that hide the diversity of multicore chips and features while exploiting their performance for important applications. Hence, we need a vibrant community of researchers exploring diverse approaches to parallel programming – languages, libraries, compilers, tools – and their applicability to multiple application domains.

    Microsoft researchers are investigating all of these approaches, from coordination languages for robots and distributed systems to mobile phones to desktops and data center clouds. To engage the academic community, Microsoft funds multicore research projects and many sites, and we have partnered with Intel to fund the Universal Parallel Computing Research Centers (UPCRCs) at the University of California at Berkeley and the University of Illinois at Urbana-Champaign.

    As Richard Hamming famously noted, "The purpose of computing is insight, not numbers." In that spirit, I believe our research challenge is to break free from the limitations of the desktop metaphor and exploit the ever greater performance of multicore chips to create new human-computer interaction metaphors that are more natural and intuitive. This will require new approaches to parallel computing education and increased collaboration with researchers in application domains.

    As an example, consider one possible future – "spatial computing" – where real-time vision and speech processing, coupled with knowledge bases, distributed sensors and responsive objects, enhance human activities in contextually relevant ways while remaining otherwise unobtrusive. Such an infosphere would adapt to its user's needs and behavior and move seamlessly across home, work and play.

    Multicore brings enormously interesting intellectual challenges and the opportunity to rethink much of how we approach computing. Let's embrace the opportunity!