Apparently, the entire computing industry is stumped by the multi-core problem. Specifically, scaling single threaded code to run across multiple cores:
Both AMD and Intel have said they will ship processors using a mix of X86 and graphics cores as early as next year, with core counts quickly rising to eight or more per chip. But software developers are still stuck with a mainly serial programming model that cannot easily take advantage of the new hardware.
Thus, there’s little doubt the computer industry needs a new parallel-programming model to support these multicore processors. But just what that model will be, and when and how it will arrive, are still up in the air.
One such model already exists – MapReduce. More recently a paper was presented for running MapReduce on multi-core systems.
This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system.
We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes.
There are various other models here as well. You can run a producer consumer model which works very well.
Spinn3r runs on a task/queue model which is essentially producer consumer. Our CPU execution works VERY well across multiple cores. We have about 80 cores right now and our code executes across them in parallel without complication.
Here’s the problem. They’re trying to shove everything into the Von Neumann architecture.
The von Neumann architecture is a computer design model that uses a processing unit and a single separate storage structure to hold both instructions and data. It is named after mathematician and early computer scientist John von Neumann. Such a computer implements a universal Turing machine, and the common “referential model” of specifying sequential architectures, in contrast with parallel architectures.
This just plain won’t work as it violates a number of the distributed computing fallacies including:
* Transport cost is zero
* Latency is zero.
* Bandwidth is infinite.
Assuming that multiple cores can easily be accessed as one is false. Latency is not zero. There are L1 and L2 caches to consider. Cache coherency is also a problem.
Instead, why not yield to distributed computing fallacy #9:
A physical machine with multiple cores and multiple disks should not be treated as one single node.
Instead of treating a 8 core 32G box as one single node, treat it as 8 boxes with 4G of memory pre core. Your IO subsystem should either use SSD (which is somewhat immune to disk seeking problems) or 8 HDDs.