Writing the previous piece in this review series, it was quite difficult to enforce the discipline of not straying into either languages or applications, simply because they are so closely linked to operating software. In a sense they are different facets of the same thing, a link which becomes much closer in multicore than in the uniprocessor world particularly since languages for parallel computing tend to have constructs that reflect the underlying hardware, whether explicitly or implicitly.
Applications are the drivers of all IT. Computing is after all a tool to enable us to solve problems. Languages are a vehicle to achieve an aim. The problems that we can address through them have become increasingly sophisticated and more demanding of compute power.
The driver for this revolution has been the vast increase in the amount of data available to process, whether from data sensors (understood in a very broad sense), or from derived data ("information is data in context"), much of which ultimately comes from sensors. In the general purpose market for example the big generators have been two-fold: data derived from markets and image data. Fox Talbot, Niepce and the rest would be amazed with what we have done with cameras and where we have managed to put them!
In High Performance Computing, you have only to think of the tidal wave of data from sources like Hubble, the LHC at CERN or the design experiments carried out by countless industrial research centres.
In forty years the financial markets have gone from providing little data (at the rate of maybe a few thousand transactions per day, distributed over a few markets) to providing a slew of data with rates measured in billions of megabytes per second from thousands of sources, often highly specialised. It just is not clear to anyone at this point either where the next torrent of data will come from, nor where the next killer app is.
There is no reason that the amount of data that we have available will ever decrease. Even if the flood of data available were to cease today, it would be a very long time before we could exhaust all the opportunities that it offers. That alone would ensure that data will be the great force behind computing for the next decade.
Applications will change, or perhaps a better word is "evolve", rapidly over the next decade and there will undoubtedly be new classes. It seems unlikely however that for the remainder of this year much will appear that is dramatically new; but it always possible in this market to be wrong. To some extent the pace of application development and the appearance of new MCPs is proceeding hand-in-glove, although there is a great deal of prototyping going on for next generation systems.
For any application the degree to which parallelism can be exploited depends to a large extent on the quality and type of tools available to the user and/or programmer. Tools are, and will remain, the principle barrier to uptake of MPCs. While it is relatively easy to parallelise applications that will run on smaller numbers of cores, for larger numbers it becomes a much more complex task. For very high counts it becomes all but impossible either with present tools or by hand, except for certain quasi-pathological cases such as linear pipeline farms and the like.
Ultimately parallelisation requires new mindsets. There is a vast body of theoretical understanding and some limited use in real world applications in the literature. However that expertise won't cover all cases and so there is a lot of opportunity for those developing tools. Patterson et al* at Berkeley believed originally that there were only seven fundamental algorithms in building parallel software (the "Seven Dwarves", later extended to eight (Disney to Gell-Man?), now thirteen or more depending on who you believe and now known as "motifs". The idea is that if we can understand the fundamental approaches, then we can optimise hardware design by optimising for motifs. Future software systems will then be optimised, so the argument goes by building using a series of templates. A layer underneath will then be hardware-aware and handle the details of hardware optimisation and resource allocation.
The language issue is a different kettle of fish, and the subject of a very live debate. Leaving aside academic and experimental languages of which there are probably many thousands; there seems to be little realistic sight of a change in main supported languages this year. This is simply because languages that are used to implement real-world applications and therefore have to be robust. Most new languages are far from robust and many need a lot of support and development before they are at production standard. There is a tendency to forget that there is a long way between lab and product that cannot be short-circuited.
Concertant has run surveys of what languages people expect to run in the future and these show little or no change in the usage profile. It is still the same C, C++ combination with languages such as Java possibly diminishing somewhat. However languages such as Haskell show growth, but even then they have been around for years. OK, most have been, or will have been, extended to cope with parallelism, after a fashion but nonetheless it is still this core group that people reckon they will be using. Even someone with the clout of Google hasn't been able to make its "go" language well, go! Perhaps the sole exception is Ericsson's Erlang, which is gaining traction in some sectors of the embedded market.
I am afraid that I am a great unbeliever as far as languages bursting onto the scene goes. When they do make it to seeing the commercial light of day it has generally taken a good while for them to gain sufficient traction to enable them to be used to implement applications. Even then some that do, disappear for plain commercial reasons. On the other hand there are a few around that **might** cut it in a few years.
Predictions for 2011
Sorry to be boring, but new applications and languages will evolve rather than explode fully formed and fully accepted onto the market. Doubtless, somewhere in there there are one or two that may grow a constituency, but it will tak time. At present it seems that they would be most likely to appear from the data services sector if from anywhere particular.
Over the next ten years
As to the longer term, there will be much more to be seen in the way of applications as people start to aggregate more and mine that to produce new kinds of information. It is probably likely that at least some of these will be radically different from the sorts of applications with which we are familiar, although they may start out in a similar vein, think of The WELL and Facebook. The more data there is the more diverse the ways of viewing it and the broader the range of applications opportunities it affords.
We tend to think of datamining in terms of conventional databases, relational or otherwise, but in future we will need to look for novel approaches to enable us search and interpret the ever-growing mountain of data that will become accessible to us. Formal styles of classification can be both a boon and a hindrance. Informal approaches hold out the possibility of new search techniques, but not with side-effects. MCPs will undoubtedly facilitate these strategies.
Many people are looking at new ways of viewing data and with these come new ways of conceiving of the inter-relations between objects and thus of new types and styles of classification. Datamining is essentially about correlation and we have come to understand that correlation needn't be formal or enduring to be significant.
Look for novel datamining technologies and new models of data
The big driver in technology for many years has been the video image and the advent of the digital camera. Image search algorithms can be extremely compute intensive and are becoming increasingly more complex.
You may want to look for a picture of Auntie Mabel when she was a teenager, in a stack of thousands if not millions of images held either locally or in some cloud somewhere, given only her present image aged forty-something as reference. It is possible to a limited extent already, but not on your average domestic system. Such algorithms are only available to a select few, simply because the techniques and power aren't widely available, yet...
New applications of datamining
With new datamining technologies come new types of application. Already new DM techniques are being applied in areas such as medicine and chemistry in order to better understand the effects of drugs. These are often, but not exclusively, based upon new representations of data combined with dynamic modelling. With new ways of visualising data have come new ways of seeing the inter-relations, or potential relationships.
Applications are changing, languages will do so albeit slowly. We can't know what is coming, but we can be fairly sure that MCPs will be among the key enablers.
*Asanovic, K., Bodik, R., at al "The Landscape of Parallel Computing Research: A View from Berkeley" Technical report number: UCB/EECS-2006-183 available at http://www.eecs.berkely.edu/Pubs/TechRepts/2006/EECS-2006-183.html
Tuesday, 15 February 2011
Subscribe to:
Posts (Atom)
