Disclaimer

  • The postings on this site are my own and don’t necessarily represent Microsoft's positions, strategies or opinions.

Twitter Updates

    follow me on Twitter
    AddThis Social Bookmark Button

    Technorati

    • Add to Technorati Favorites

    Data Centers

    June 14, 2009

    Microsoft Extreme Computing Group (XCG)

    Here is a small item that may be of possible interest, creation of the Extreme Computing Group (XCG) at Microsoft. XCG was formed in June 2009 with the goal of developing radical new approaches to ultrascale and high-performance computing hardware and software. The group's research activities include work in computer security, cryptography, operating system design, parallel programming models, cloud software, data center architectures, specialty hardware accelerators and quantum computing.

    May 31, 2009

    Nothing Left But the Smile …

    Long ago, we in HPC recognized the inevitability of the Attack of the Killer Micros, and with few exceptions, all of today's HPC systems are based on some variant of commodity microprocessors and their commodity cousins, GPUs. A similar revolution swept the secondary storage market. After all, the I in RAID standards for inexpensive, a synonym for commodity.

    Like the Cheshire Cat in Alice's Adventures In Wonderland, it seems increasingly clear that in the high-performance, low latency interconnection network space, we will be left with nothing but the smile. Simply put, the last remaining non-commodity component of HPC clusters – the high-performance interconnect – is in grave danger, due to the global economic downturn and the price-performance pressures of commodity Ethernet.

    Remember Coaxial Cables and BNC?

    The old geezers among us, including yours truly, remember when Ethernet meant coaxial cables and CSMA/CD at 10 megabits/second. (Here's a shout out to my colleague Chuck Thacker, one of the co-inventors of Ethernet.) Since then, we have seen many generations of advances in transport fabrics, switches and routers and speeds.

    One gigabit Ethernet is now a commodity item, contained on almost all PC motherboards, 10 gigabit Ethernet is the high bandwidth standard, and 40 gigabit and 100 gigabit Ethernet standards are under development by the IEEE. Infiniband is one of the last major competitors to 10 gigabit Ethernet, and even the Infiniband vendors are increasingly offering Ethernet compatibility (i.e., so-called converged fabrics).

    Cloud Data Centers and HPC

    I invite you to ponder your response to the following question. What is the difference between a megascale data center and a petascale computing system? Increasingly, the answer is "not much, but a few elements of the software stack."

    Of course, this need not be the answer, but it likely will be unless we change our research and development strategies and also our procurement models. Ironically, megascale data centers could also benefit from lower latency, higher bandwidth communication fabrics with scalable bisection bandwidth. It is time we bring these two together, for the HPC community could leverage the economics of megascale data centers and their needs.

    March 12, 2009

    Scientific Clouds: Blowin’ in the Wind

    N.B. I recently responded to some questions from John West (HPCWire) regarding the Microsoft Cloud Computing Futures (CCF) research project. In that Q&A, I also commented on the relevance of cloud computing to computational science. What follows is an augmented subset of the Q&A, but focused on just the relevance of clouds to technical computing.

    Cirrus, stratus, altostratus, cumulus: they are the scientific names of the common clouds. They drift across the sky, reflecting the changing wind and weather. A new front is blowing into computational science, and cloud computing will soon advance scientific and engineering discovery.

    That is one of the reasons I am excited about cloud services. I believe we are at a technological transition point, just as profound as that engendered by the "attack of the killer micros." This is true whether you are enamored of Microsoft's Azure, Amazon's AWS or Google's Apps.

    Learning from History

    Let's step back and gain some perspective, starting with the "Branscomb pyramid" ("From Desktop to TeraFlop: Exploiting the U.S. Lead in High Performance Computing, Lewis Branscomb et al) and the diverse types of technical computing that now exist. We tend to focus on the apex of the computing pyramid, now exemplified by petascale systems intended to support only a handful of applications and users. However, most science is conducted at lower levels of the pyramid, using desktop computers, laboratory clusters and university-scale computing infrastructure. By analogy, it's exciting to talk about international hypersonic transport, but most people care more about efficiently and painlessly commuting to work each day.

    Over the past decade, we successfully leveraged commodity hardware to create large clusters. What was nearly heretical when we first deployed clusters at NCSA is now commonplace. However, this scaling has not been without cost. Cluster programming remains difficult at scale, we have turned a generation of researchers into parallel programmers and system administrators, institutions are struggling with rising demands for machine space, power and cooling, and duplicated facilities make sharing expertise and data difficult. We are heavily focused on computing at a time when data analysis now dominates much of science and engineering. Like many of you, I contributed to this state of affairs, and I feel some responsibility to help us find a new path.

    Hype and Reality

    Let's separate the hype from the reality. Clouds won't magically restore your 401(k) retirement fund, cure halitosis or even help you drop twenty pounds before your upcoming high school reunion. Like all new technologies, however, they challenge some conventional computing wisdom and change some of our operating assumptions.

    Personal computing was a non sequitur when computers filled rooms. Internet search was nonsensical when there were only a handful of research web sites. Social networking services depend on inexpensive, ubiquitous broadband access and mobile devices. Hosted cloud services and software are now possible given the confluence of inexpensive but powerful multicore processors, high-capacity storage, broadband networks and the economies of scale that consolidation in cloud data centers make possible.

    Five Reasons Clouds Matter

    First, the economies of scale from mega-data center provisioning mean capital and operating costs can be lower. When buying servers in 500,000 unit lots and designing facilities at scale, the provider does have some financial and technical leverage. This would allow universities, laboratories and federal agencies to devote a larger fraction of precious funding to research rather than infrastructure. Remember Dan's computational science corollary; it's the science, not the infrastructure, which matters.

    Second, truly large-scale data analysis, particularly multidisciplinary data fusion, can become routine. In the scientific community, we have worked hard to build workflows for access to distributed data. Consolidation and co-location enable new approaches, and we tend to forget that cloud data centers have many, many petabytes of disk storage. It really is possible to query multiple petabytes of data using intuitive, easy-to-use desktop tools – the business community does it all the time. Jim Gray proved the power of database tools on several scientific data analysis projects, including the SkyServer.

    Third, clouds facilitate time-space tradeoffs. It is just as cost-effective to run 100,000 individual jobs simultaneously as sequentially (e.g., for a parameter study), something that our batch queuing strategies strongly discourage on high-performance computing systems. In geek terms, the area is the same, whether one uses tall, skinny rectangles (lots of resources for a small interval) or short, long rectangles (a few resources for a long interval). The elasticity of clouds, a consequence of multiplexing many users and workloads, means that the resources are always available without waiting.

    Fourth, I also believe that the cloud will offer HPC services at increasing scale, beginning with that typified by today's laboratory clusters. This is already happening, and as I/O device virtualization continues to improve, communication latencies will decrease and tightly coupled computations will be attractive at ever larger scale.

    Finally, clouds can provide seamless extension of familiar desktop tools and interfaces, allowing computing and analysis to scale within the same environment that researchers use every day. We can leverage consumer software, just as we have leveraged consumer hardware. There is no reason our computational science tools and our "every day" tools need be different.

    Shameless Microsoft Plug

    To this point, I've written about clouds in a vendor-neutral sense. With a nod to the company name now on my paycheck, if you haven't already, I encourage you to take a look at Windows Azure and its cloud computing and storage services. In addition to rich web services, there is both open source and Visual Studio programming support. In a future post, I will describe an Azure example application for computational science. Here endeth the marketing pitch.

    Insight, Not Infrastructure

    As Richard Hamming famously noted, "The purpose of computing is insight, not numbers." Dan's computational science corollary is simple, "The purpose of computational science infrastructure is scientific discovery, not big iron bragging rights." It's time to focus on what matters and embrace the future. Our graduate students and post-docs will thank us.

    October 28, 2008

    Beyond The Azure Blue

    From the first day I arrived at Microsoft, my academic colleagues have been asking me about Microsoft's strategy for cloud computing and when (or if) there would be public announcements. Those questions rose to a crescendo as academic groups prepared responses to the NSF eXtreme Digital (XD) TeraGrid solicitation. All I could say was that we were working on a plan, and it would become clear soon.

    I don't normally pitch Microsoft products in the blog, preferring to discuss science policy, technology research and development and global competitiveness. However, something big just happened at Microsoft, something I think will affect all of us. Moreover, as I write this, the Pacific Northwest sky is clear and azure blue, and that doesn't happen often this time of year. An omen, perhaps?

    Microsoft Azure Cloud Services

    At our Professional Developers Conference (PDC), Microsoft announced Azure, our cloud computing platform, with on-demand compute and storage to host, scale and manage Internet or cloud applications. The press release has additional business perspective and a link to the presentation. Azure is one element of the vision Ray Ozzie (See "Mind to Mind: Building Innovation") described in his 2005 Internet Services Disruption memorandum.

    The simplest description of Azure is that the initial release allows you to develop hosted Windows applications using .NET Services, though future releases will support unmanaged code and open source tools as well (Eclipse, Ruby, PHP, and Python). Within Azure, a fabric controller manages application instances and access to storage via SQL Data Services (SDS), and it hosts applications atop virtualized multicore hardware. Finally, Microsoft's Live Services offerings will be layered atop the Azure framework.

    You can read the white paper for details on the Azure design and usage approach. In addition, the software development kit (SDK) is available for download. In addition to the Azure SDK itself, there are SDKs for Visual Studio, .NET and SDS Services. Finally, there are Java and Ruby SDKs for .NET Services as well. This is a Community Technology Preview (CTP), meaning Microsoft welcomes feedback on these early capabilities and will continue to expand the capabilities of Azure over the coming months.

    Science and Technology Implications

    Earlier in the year, I wrote on both my blog and in HPCWire ("Dan's Cloudy Crystal Ball") about the possibility of outsourcing research computing services and infrastructure to the cloud. I noted then that the explosive growth of computing as an enabler of scientific discovery had strained university capabilities and Federal research budgets. Given our current economic crisis, university operating budgets and Federal research expenditures will be under even greater strain and there will be increased scrutiny on the need for each investment.

    In a world of (at best) modest research budget increases, we must ask hard questions about the best use of limited funds. Cloud computing offers a potential mechanism to increase the efficiency of current research, ensure continuity of critical data and enable new kinds of research not now feasible.

    In this model, researchers focus on the higher levels of the software stack -- applications and innovation, not low-level infrastructure. University and Federal research agency administrators, in turn, procure services from the providers based on capabilities and pricing. Finally, the cloud service providers deliver economies of scale and capabilities driven by a large market base and energy efficient infrastructure. Remember, computing infrastructure exists to enable discovery, not as monuments to technological prowess.

    In addition to efficiency, the scalability of cloud services and infrastructure opens new research possibilities. Not only is it possible federate multidisciplinary research data at far larger scales than possible in a university environment (think tens to hundreds of petabytes of low latency storage), we can escape the pernicious cycle of transitory research infrastructure.

    How often have we created data repositories as part of research projects, only to find few mechanisms to ensure their long-term sustainability and access by the broader research community? How often have we faced a miasma of distributed data sources with unknown provenance and non-compatible metadata, each supported pro bono on a best effort basis? (See my recent comments on digital document preservation.) Instead, imagine multidisciplinary data fusion and mining, where students can pose queries against integrated but diverse data sources using robust tools?

    Finally, by leveraging "pay as you go" models, we can trade time and scale on a continuous basis. Imagine applying 50,000 processors for one hour at the same cost as 50 processors for one thousand hours. In the cloud, the integral under the curve is the same and the costs are comparable, but the research effects are qualitatively different.

    The Standard Questions

    The standard questions always arise about new approaches to computing. Cloud services and data storage inevitably raise the standard ones.

    • Is it reliable and will my data persist?
    • Is it safe, private and secure?
    • Will I be captured and become captive?
    • What does it cost and what if I can't continue paying?

    We tend to forget that there are complementary issues about local infrastructure because we have already internalized and accepted the implications and risks. Moreover, local failures are rarely publicized.

    • What happens if my disks crash?
    • What if I can't pay for backups or maintenance or physical plant or …?
    • What if my network is penetrated?

    These are the standard cost/benefit/risk tradeoffs. One must make them based on statistics, economics and practical constraints. Remember that we debated the same issues when we shifted research computing from vendor-backed HPC designs to predominantly commodity components.

    Let's Reason Together

    I welcome discussion of how we can exploit cloud services and infrastructure effectively – all cloud infrastructure, not just Microsoft's Azure. To do this, the cloud service providers, hardware vendors, universities and Federal government must work together to outline an agenda, conduct experiments at scale and speak with a united voice on the opportunities.

    It's a sunny day, but my head is in the clouds.

    October 15, 2008

    Preserving the Past: Educating the Future

    A recent front page article in the New York Times, entitled In the Digital Age, Federal Files Slip into Oblivion, really caught my attention. The article described a problem with which I am painfully and intimately familiar, namely the struggle to preserve the electronic record of government processes and deliberations. Quoting from the article,

    Many federal officials admit to a haphazard approach to preserving e-mail and other electronic records of their work. Indeed, many say they are unsure what materials they are supposed to preserve.

    This confusion is causing alarm among historians, archivists, librarians, Congressional investigators and watchdog groups that want to trace the decision-making process and hold federal officials accountable. With the imminent change in administrations, the concern about lost records has become more acute.

    Even with an army of government clerks, there is a limit to how many pieces of paper the federal government could produce. However, the explosive growth of digital communications and document preparation has far outstripped the processes and technology available to the Library of Congress and the National Archives and Records Administration (NARA). However, it is not just the volume of digital data, it is the diversity of electronic formats and the myriad of physical devices on which the data is stored.

    Imagine receiving a truck filled with PC disk drives and being expected to identify, curate and manage the data contained on them. Sound daunting and farfetched? It isn't. This is precisely what the Clinton White House delivered to the National Archives for preservation; though it included a mere 32 million e-mail messages. (Remember that the White House did not have Internet access until DARPA and Randy Katz wired it in the 1990s.)

    Given the growth of electronic communication since the early 1990s, the Bush administration will undoubtedly have generated hundreds of millions of e-mail messages that must be preserved, along with a plethora of electronic documents in a dizzying array of file formats. In addition to the standard challenges of document identification, extraction and preservation, the Archives of course must deal with national security and classification issues, further exacerbating the challenge.

    I have seen this struggle first hand, as a member of the Advisory Committee for the Electronic Records Archive (ACERA), the digital document preservation project of the National Archives. The National Archives are building a web accessible, indexed repository that will eventually host at least a portion of the torrent of digital data pouring from the federal government. It is an arduous and difficult journey, with more work ahead.