Revision | 7fadd002c1bb9f841f5bf3fa467c17710c9ff9e7 (tree) |
---|---|
Zeit | 2022-06-28 03:38:43 |
Autor | Albert Mietus < albert AT mietus DOT nl > |
Commiter | Albert Mietus < albert AT mietus DOT nl > |
ASIS. Split BusyCores in Use & Analyse -- keep blog-lenght limited and more text is needs to be added
@@ -60,45 +60,7 @@ | ||
60 | 60 | When the number of cores raises this does not scale; more and more cores become idle. Now, your code has to use both |
61 | 61 | concurrency_ and parallelisme_. But also handle Critical-Sections_, Semaphores_ (and frieds) to synchronise tasks. |
62 | 62 | |BR| |
63 | -There is more below the horizon then just “Threads_”! | |
64 | - | |
65 | -Some concepts | |
66 | -============= | |
67 | - | |
68 | -Before we dive into the needs for Castle, lets define --shortly-- the available, theoretical concepts. Routinely, we add | |
69 | -wikipedia links for a deep-dive. | |
70 | - | |
71 | -.. include:: BusyCores-sidebar-concurrency.irst | |
72 | - | |
73 | -Concurrency | |
74 | ------------ | |
75 | -Concurrency_ is the ability to “compute” multiple things at the same time, instead of doing them one after the other. It requires another mindset, but isn’t that complicated. | |
76 | -A typical example is a loop: suppose we have a sequence of numbers and we like to compute the square of each one. Most developers will loop over those numbers, get one number, calculate the square, store it in another list, and continue with the next element. It works, but we have also instructed the computer to do it in sequence — especially when the task is bit more complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t optimise it. | |
77 | - | |
78 | -A better plan is to tell the compiler about the tasks; most are independently: square a number. There is also one that has to be run at the end: combine the results into a new list. And one is bit funny: distribute the sequence-elements over the “square-tasks” — clearly, one has to start with this one, but it can be concurrent with many others too. | |
79 | - | |
80 | - | |
81 | -Parallelisme | |
82 | ------------- | |
83 | -Parallelisme_ is about executing multiple tasks (apparently) at the same time. We will focus running multiple | |
84 | -concurrent task (of the same program) on as many cores as possible. And when we assume we have a thousand cores we need | |
85 | -(at least) a thousand independent tasks — at any moment— to gain maximal speed up. This is not trivial! | |
86 | -|BR| | |
87 | -It’s not only about doing a thousand things at the same time (that is not to complicated, for a computer), but also — probably: mostly — about finishing a thousand times faster… | |
88 | - | |
89 | -With many cores, multiple program-steps can be executed at the same time: from changing the same variable, acces the | |
90 | -same memory, or compete for new memory. And when solving that, we introduce new hazards: like deadlocks_ and even | |
91 | -livelocks_. | |
92 | - | |
93 | -Locking | |
94 | - | |
95 | - | |
96 | - | |
97 | -Distributed | |
98 | ------------ | |
99 | -A special form of parallelisme is Distributed-Computing_: compute on many computers. Many experts consider this | |
100 | -as an independent field of expertise; still --as Multi-Core_ is basically “many computers on a chips”-- its there is an | |
101 | -analogue [#DistributedDiff]_, and we should the know-how that is available there to design out “best ever language”. | |
63 | +.. There is more below the horizon then just “Threads_”! | |
102 | 64 | |
103 | 65 | |
104 | 66 | Threading |
@@ -121,25 +83,14 @@ | ||
121 | 83 | |
122 | 84 | .. rubric:: Footnotes |
123 | 85 | |
124 | -.. [#DistributedDiff] | |
125 | - There a two (main) differences between Distributed-Computing_ and Multi-Core_. Firstly, all “CPUs” in | |
126 | - Distributed-Computing_ are active, independent and asynchronous. There is no option to share a “core” (as | |
127 | - commonly/occasionally done in Multi-process/Threaded programming); nor is there “shared memory” (one can only send | |
128 | - messages over a network). | |
129 | - |BR| | |
130 | - Secondly, collaboration with (network based) messages is a few orders slower then (shared) memory communication. This | |
131 | - makes it harder to speed-up; the delay of messaging shouldn't be bigger as the acceleration do doing thing in | |
132 | - parallel. | |
133 | - |BR| | |
134 | - But that condition does apply to Multi-Core_ too. Although the (timing) numbers do differ. | |
86 | +.. [#FN] | |
87 | + Footnote | |
88 | + | |
135 | 89 | |
136 | 90 | .. _Multi-Core: https://en.wikipedia.org/wiki/Multi-core_processor |
137 | 91 | .. _Concurrency: https://en.wikipedia.org/wiki/Concurrency_(computer_science) |
138 | 92 | .. _Parallelisme: https://en.wikipedia.org/wiki/Parallel_computing |
139 | -.. _Distributed-Computing: https://en.wikipedia.org/wiki/Distributed_computing | |
140 | 93 | .. _Critical-Sections: https://en.wikipedia.org/wiki/Critical_section |
141 | 94 | .. _Semaphores: https://en.wikipedia.org/wiki/Semaphore_(programming) |
142 | 95 | .. _Threads: https://en.wikipedia.org/wiki/Thread_(computing) |
143 | 96 | .. _Heisenbugs: https://en.wikipedia.org/wiki/Heisenbug |
144 | -.. _deadlocks: https://en.wikipedia.org/wiki/Deadlock | |
145 | -.. _livelocks: https://en.wikipedia.org/wiki/Deadlock#Livelock |
@@ -1,35 +0,0 @@ | ||
1 | -.. -*- rst -*- | |
2 | - included in `6.BusyCores.rst` | |
3 | - | |
4 | -.. sidebar:: | |
5 | - | |
6 | - .. tabs:: | |
7 | - | |
8 | - .. tab:: Ordered | |
9 | - | |
10 | - Here, the programmer has (unwittingly) defined a sequential order. | |
11 | - | |
12 | - .. code-block:: python | |
13 | - | |
14 | - L2 = [] | |
15 | - for n in L1: | |
16 | - L2.append(power(n)) | |
17 | - | |
18 | - .. note:: As ``power()`` could have side-effects, the compiler **must** keep the defined order! | |
19 | - | |
20 | - .. tab:: Concurrent | |
21 | - | |
22 | - Now, without a specified order, the same functionality has become concurrent. | |
23 | - | |
24 | - .. code-block:: python | |
25 | - | |
26 | - L2 = [power(n) for n in L1] | |
27 | - | |
28 | - .. note:: | |
29 | - | |
30 | - Although (current) python-compilers will run it sequentially, it is *allowed* to distribute it; even when | |
31 | - ``power()`` has side-effects! | |
32 | - |BR| | |
33 | - As long as *python* put the results in the correct order in list ``L2`` **any order** is allowed. “Out of | |
34 | - order” side-effects are allowed by this code. | |
35 | - |
@@ -0,0 +1,93 @@ | ||
1 | +.. include:: /std/localtoc.irst | |
2 | + | |
3 | +.. _MC-concepts: | |
4 | + | |
5 | +======================= | |
6 | +Concepts for Many Cores | |
7 | +======================= | |
8 | + | |
9 | +.. post:: | |
10 | + :category: Castle DesignStudy | |
11 | + :tags: Castle, Concurrency | |
12 | + | |
13 | + Effectively making benefit thousands of cores, as I announced in :ref:`BusyCores` is not easy. Many languages put it | |
14 | + on the shoulders of the developer: usually by referring to pthreads_. | |
15 | + |BR| | |
16 | + But there is more below the horizon then just “Threads_”! | |
17 | + | |
18 | + Let discover some concepts that can help to design prober support, and unburden the typical (moderen, embedded) | |
19 | + developer. | |
20 | + | |
21 | +---------- | |
22 | + | |
23 | +CUT & PAST | |
24 | + | |
25 | +---------- | |
26 | + | |
27 | +Some concepts | |
28 | +============= | |
29 | + | |
30 | +Before we dive into the needs for Castle, lets define --shortly-- the available, theoretical concepts. Routinely, we add | |
31 | +wikipedia links for a deep-dive. | |
32 | + | |
33 | +.. include:: BusyCores-sidebar-concurrency.irst | |
34 | + | |
35 | +Concurrency | |
36 | +----------- | |
37 | +Concurrency_ is the ability to “compute” multiple things at the same time, instead of doing them one after the other. It requires another mindset, but isn’t that complicated. | |
38 | +A typical example is a loop: suppose we have a sequence of numbers and we like to compute the square of each one. Most developers will loop over those numbers, get one number, calculate the square, store it in another list, and continue with the next element. It works, but we have also instructed the computer to do it in sequence — especially when the task is bit more complicated, the compiler does know whether the ‘next task’ depends on the current one, and can’t optimise it. | |
39 | + | |
40 | +A better plan is to tell the compiler about the tasks; most are independently: square a number. There is also one that has to be run at the end: combine the results into a new list. And one is bit funny: distribute the sequence-elements over the “square-tasks” — clearly, one has to start with this one, but it can be concurrent with many others too. | |
41 | + | |
42 | + | |
43 | +Parallelisme | |
44 | +------------ | |
45 | +Parallelisme_ is about executing multiple tasks (apparently) at the same time. We will focus running multiple | |
46 | +concurrent task (of the same program) on as many cores as possible. And when we assume we have a thousand cores we need | |
47 | +(at least) a thousand independent tasks — at any moment— to gain maximal speed up. This is not trivial! | |
48 | +|BR| | |
49 | +It’s not only about doing a thousand things at the same time (that is not to complicated, for a computer), but also — probably: mostly — about finishing a thousand times faster… | |
50 | + | |
51 | +With many cores, multiple program-steps can be executed at the same time: from changing the same variable, acces the | |
52 | +same memory, or compete for new memory. And when solving that, we introduce new hazards: like deadlocks_ and even | |
53 | +livelocks_. | |
54 | + | |
55 | +Locking | |
56 | + | |
57 | + | |
58 | + | |
59 | +Distributed | |
60 | +----------- | |
61 | +A special form of parallelisme is Distributed-Computing_: compute on many computers. Many experts consider this | |
62 | +as an independent field of expertise; still --as Multi-Core_ is basically “many computers on a chips”-- its there is an | |
63 | +analogue [#DistributedDiff]_, and we should the know-how that is available there to design out “best ever language”. | |
64 | + | |
65 | + | |
66 | +-------- | |
67 | + | |
68 | +END | |
69 | + | |
70 | +---------- | |
71 | + | |
72 | +.. rubric:: Footnotes | |
73 | + | |
74 | +.. [#DistributedDiff] | |
75 | + There a two (main) differences between Distributed-Computing_ and Multi-Core_. Firstly, all “CPUs” in | |
76 | + Distributed-Computing_ are active, independent and asynchronous. There is no option to share a “core” (as | |
77 | + commonly/occasionally done in Multi-process/Threaded programming); nor is there “shared memory” (one can only send | |
78 | + messages over a network). | |
79 | + |BR| | |
80 | + Secondly, collaboration with (network based) messages is a few orders slower then (shared) memory communication. This | |
81 | + makes it harder to speed-up; the delay of messaging shouldn't be bigger as the acceleration do doing thing in | |
82 | + parallel. | |
83 | + |BR| | |
84 | + But that condition does apply to Multi-Core_ too. Although the (timing) numbers do differ. | |
85 | + | |
86 | +.. _pthreads: https://en.wikipedia.org/wiki/Pthreads | |
87 | +.. _Threads: https://en.wikipedia.org/wiki/Thread_(computing) | |
88 | +.. _Multi-Core: https://en.wikipedia.org/wiki/Multi-core_processor | |
89 | + | |
90 | +.. _deadlocks: https://en.wikipedia.org/wiki/Deadlock | |
91 | +.. _livelocks: https://en.wikipedia.org/wiki/Deadlock#Livelock | |
92 | +.. _Critical-Sections: https://en.wikipedia.org/wiki/Critical_section | |
93 | +.. _Distributed-Computing: https://en.wikipedia.org/wiki/Distributed_computing |
@@ -0,0 +1,35 @@ | ||
1 | +.. -*- rst -*- | |
2 | + included in `6.BusyCores.rst` | |
3 | + | |
4 | +.. sidebar:: | |
5 | + | |
6 | + .. tabs:: | |
7 | + | |
8 | + .. tab:: Ordered | |
9 | + | |
10 | + Here, the programmer has (unwittingly) defined a sequential order. | |
11 | + | |
12 | + .. code-block:: python | |
13 | + | |
14 | + L2 = [] | |
15 | + for n in L1: | |
16 | + L2.append(power(n)) | |
17 | + | |
18 | + .. note:: As ``power()`` could have side-effects, the compiler **must** keep the defined order! | |
19 | + | |
20 | + .. tab:: Concurrent | |
21 | + | |
22 | + Now, without a specified order, the same functionality has become concurrent. | |
23 | + | |
24 | + .. code-block:: python | |
25 | + | |
26 | + L2 = [power(n) for n in L1] | |
27 | + | |
28 | + .. note:: | |
29 | + | |
30 | + Although (current) python-compilers will run it sequentially, it is *allowed* to distribute it; even when | |
31 | + ``power()`` has side-effects! | |
32 | + |BR| | |
33 | + As long as *python* put the results in the correct order in list ``L2`` **any order** is allowed. “Out of | |
34 | + order” side-effects are allowed by this code. | |
35 | + |
@@ -53,3 +53,6 @@ | ||
53 | 53 | wc: |
54 | 54 | @echo "lines words file" |
55 | 55 | @wc -lw `find CCastle/ -iname \*rst`|sort -r |
56 | + | |
57 | +sidebar: | |
58 | + @grep "include::" `find CCastle/ -type f -name \*.rst` /dev/null | grep sidebar| sort| sed 's/:../:\t\t ../' |