Revision | cae120b77758e57d7fcd232b54132d0ef4723919 (tree) |
---|---|
Zeit | 2023-04-09 06:21:49 |
Autor | Albert Mietus < albert AT mietus DOT nl > |
Commiter | Albert Mietus < albert AT mietus DOT nl > |
Heisenbug ready for publication
@@ -2,17 +2,17 @@ | ||
2 | 2 | |
3 | 3 | .. _Castle-Heisenbug: |
4 | 4 | |
5 | -======================= | |
6 | -Heisenbugs (start/DRAF) | |
7 | -======================= | |
5 | +========== | |
6 | +Heisenbugs | |
7 | +========== | |
8 | 8 | |
9 | -.. post:: | |
9 | +.. post:: 2023/04/08 | |
10 | 10 | :category: CastleBlogs, Castle DesignStudy |
11 | 11 | :tags: Castle, DRAFT |
12 | 12 | |
13 | - In Castle, one can dynamically connect components, and send “events” over those connections. Typically this is done as | |
14 | - an action on an incoming message (see: :ref:`CCC-Actors`). Depending on ‘:ref:`TheMachinery`’, those events can be | |
15 | - queued and this combination *can result* in a **beautiful Heisenbug**. | |
13 | + In Castle, one can dynamically connect components and send “events” over those connections. Typically this is done as | |
14 | + an action on an incoming message (see: :ref:`CCC-Actors`). And, depending on ‘:ref:`TheMachinery`’ those events can be | |
15 | + queued. It is this combination that *can result* in a **beautiful Heisenbug**. | |
16 | 16 | |
17 | 17 | First, let’s explain the Heisenbug, before we give an example. Then we analyze it, show how to improve the code, and |
18 | 18 | finally formulate a *requirement* to prevent & detect this kind of bug in Castle. |
@@ -20,56 +20,58 @@ | ||
20 | 20 | What is a Heisenbug? |
21 | 21 | ==================== |
22 | 22 | |
23 | -The `heisenbug <https://en.wikipedia.org/wiki/Heisenbug>`__ is named to the theoretical physics *Werner Heisenberg*, who | |
23 | +The `heisenbug <https://en.wikipedia.org/wiki/Heisenbug>`__ is named after the theoretical physics *Werner Heisenberg*, who | |
24 | 24 | described the *“observer effect”*: when you look closely, the behavior changes. The same can happen to software |
25 | 25 | (bugs). The behavior apparently changes when you study -or slightly adjust- that code. Often this is due to (small) |
26 | -changes in timing; possibly even in generated code. Therefore old (old-fashioned), sequential code on slow CPUs is less | |
26 | +changes in timing; possibly even in generated code. Therefore old (old-fashioned) sequential code on slow CPUs is less | |
27 | 27 | likely to have heisenbugs than concurrent code on fast multi-core systems. It’s also common in threaded programs. |
28 | 28 | |
29 | 29 | .. include:: ./Heisenbug-sidebar-Sequence.irst |
30 | 30 | |
31 | -The sieve goes wrong | |
31 | +The Sieve goes wrong | |
32 | 32 | ==================== |
33 | 33 | |
34 | -My standard example, :ref:`Castle-TheSieve`, suffered from this issue. The initial version did work for years, | |
34 | +My standard example, :ref:`Castle-TheSieve`, suffered from this issue. The initial version did work for years | |
35 | 35 | but failed horribly when another “machinery” was used. After studying this, the bug is simple and easy to fix. |
36 | 36 | |
37 | -There are two related timing issues, that (probably only together) result in the Heisenbug. First, we introduce them one | |
38 | -by one and then show how the combination may fail. | |
37 | +There are two related timing issues that together result in the Heisenbug. First, we introduce them, and then | |
38 | +show how the combination may fail. | |
39 | 39 | |
40 | 40 | |
41 | 41 | Event-order |
42 | 42 | ----------- |
43 | 43 | |
44 | -Conceptually, the `Generator` sends (events with) integers to `Sieve(2)`, which may be forwarded to `Sieve(3)`, then to | |
45 | -`Sieve(5)`, etc. As shown in the **Conceptual sidebar**, we probably like to assume that each integer is fully sieved | |
46 | -before the next *’int’* *starts*. This is the classic “sequential view”, we are used to. | |
44 | +Conceptually, the `Generator` sends (events with) integers to `Sieve(2)`, which may forwarded them to `Sieve(3)`, then | |
45 | +to `Sieve(5)`, etc. As shown in the **Conceptual sidebar**, we probably like to assume that each integer is sieved | |
46 | +before the next *’int’* *starts*: the classic “sequential view” we are used to. | |
47 | 47 | |
48 | 48 | However, that isn’t how it works. In Castle, the order of events on a connection is defined (*one by one, |
49 | -sequential*). And given the code, the integer sent by a `Sieve` comes always later than the incoming one. That is all we | |
49 | +sequential*). And given the code, the integer sent by a `Sieve` is always later than the incoming one. That is all we | |
50 | 50 | may assume. |
51 | 51 | |BR| |
52 | 52 | The timing of unrelated events on multiple connections is not defined. That order may depend on :ref:`TheMachinery` and |
53 | -many other factors. Do not as a developer, assume any order --as I did! | |
53 | +many other factors. Do not, as a developer, assume any order --as I did! | |
54 | 54 | |
55 | 55 | As shown in the **One-by-One sidebar** diagram, this can result that the Generator is outputting all events first. Next, |
56 | 56 | Sieve(2) filters out the even integers, then Sieve(3) processes all its input, then Sieve(5), etc. |
57 | 57 | |BR| |
58 | -Although we aren’t using concurrency, and it needs huge buffers -- especially when finding big primes-- it does | |
58 | +Although we aren’t using concurrency, and it needs tremendous buffers when finding big primes, it does | |
59 | 59 | conceptually work. And so, it is an allowed execution [#ButImprove]_. |
60 | 60 | |
61 | 61 | |
62 | 62 | Reconnecting |
63 | 63 | ------------ |
64 | 64 | |
65 | -The chain of `Sieve`\s will grow as we find more primes. When an *int* isn’t filtered-out and so reaches the `Finder` a | |
65 | +The chain of `Sieve`\s will grow as we find more primes. Whenever an *int* isn’t filtered-out and reaches the `Finder` a | |
66 | 66 | *new prime* is found. Then, a new Sieve element is created and inserted into the chain. |
67 | 67 | |BR| |
68 | -This is done by the Main component (which is signaled by the Finder) [#orVariant]_. | |
68 | +Building the chain is done by the Main component (which is signaled by the Finder) [#orVariant]_. | |
69 | 69 | |
70 | -Therefore, `Main` remembers the ``last_sieve`` and reconnects its output to the newly creates `Sieve`. And temporally | |
71 | -connects that new-Sieve’s output to the Finder. For every newly found prime, this repeats. | |
70 | +Therefore, `Main` remembers the ``last_sieve`` and reconnects that output to the newly creates `Sieve`. | |
71 | +Which output is temporally connected to the `Finder` | |
72 | 72 | |BR| |
73 | +For every newly found prime, this is repeated. | |
74 | + | |
73 | 75 | This detail is shown in the **With Details** sidebar diagram; where the `Finder` and `Main` and all its messages |
74 | 76 | are shown too. |
75 | 77 |
@@ -112,39 +114,56 @@ | ||
112 | 114 | How to improve? |
113 | 115 | =============== |
114 | 116 | |
115 | -Finding this heisenbug triggered an investigation to improve both Castle and ‘:ref:`Castle-TheSieve`’. Where our goals | |
116 | -is not (just) to improve the sieve code, but all programs. And give the programmer options to prevent heisenbugs. | |
117 | +Finding this heisenbug triggered an investigation to improve Castle as well as ‘:ref:`Castle-TheSieve`’. Where our goal | |
118 | +is not just to improve the sieve code, but all programs. And give the programmer options to prevent heisenbugs. | |
119 | + | |
120 | +As we will see fixing ‘:ref:`the sieve <Castle-TheSieve>` is simple: use the **SLowStart** (base) protocol. It changes | |
121 | +only three lines! | |
122 | +|BR| | |
123 | +But we start with a new requirement, for a new tool in the :ref:`Castle Workshop <Workshop-Design>`. | |
124 | + | |
125 | + | |
126 | +Simulation | |
127 | +---------- | |
128 | + | |
129 | +As we will see below the `SlowStart` protocol will *remove* our Heisenbug. But it does not abolish the Heisenbug! | |
130 | + | |
131 | +The feature does by no means prevent a developer from using other solutions, with a comparable flaw. Besides, it’s | |
132 | +hopeless to use testing to prove the absence of a Heisenbug. As we have seen above, all variants of all possible | |
133 | +concurrent execution orders should be correct. Where we have very limited control over which variant is used. | |
134 | + | |
135 | +This led to a new requirement: :need:`U_Tools_EventOrder.` To assist the developer the Castle Workshop will come | |
136 | +with tools to detect such bugs. See :ref:`simulation` for more on this. | |
137 | + | |
117 | 138 | |
118 | 139 | .. include:: ./sieve-protocols-sidebar.irst |
119 | - | |
120 | 140 | SlowStart |
121 | 141 | --------- |
122 | 142 | |
123 | - | |
124 | 143 | Castle (now) comes [#KipEi]_ with the parametric *base* protocol :ref:`SlowStart <doc-SlowStart>`, which is based on |
125 | -`TCP Slow start <https://en.wikipedia.org/wiki/TCP_congestion_control#Slow_start>`__ and contains a queue-model | |
126 | -[#ModelOnly]_ to controll the speed off an event-connection. As the name suggests, initially the connections with be | |
144 | +`TCP Slow start <https://en.wikipedia.org/wiki/TCP_congestion_control#Slow_start>`__ and contains a queueing model | |
145 | +[#ModelOnly]_ to control the speed of an (event) connection. As the name suggests, initially the connections with be | |
127 | 146 | slow. Roughly, the parameter set the maximal number of unhandled events on a connection. |
128 | 147 | |BR| |
129 | 148 | The (improved) version of :ref:`Castle-TheSieve` uses a SlowStart of **1**. And `Main` will remove (or increase) that |
130 | 149 | limit after reconnecting. |
131 | 150 | |
132 | -Initially, the `Generator` is only *allowed* to sent one event, which is received by the `Finder`. Then `Main` will create | |
151 | +Initially, the `Generator` is only *allowed* to send one event, which is received by the `Finder`. Then, `Main` will create | |
133 | 152 | the first Sieve (`Sieve(2)`), reconnects the Generator to that component, and increases the “speed” of the |
134 | -connection. As the connection **`Generator`->`Sieve(2)`** is stable, the is no need to limit the “queue”. | |
153 | +connection. As the connection **Generator->Sieve(2)** is stable, the is no need to limit the “queue”. | |
135 | 154 | |
136 | 155 | The `Generator` acts as before: it sends (events with) integers over its output. But now, the SimpleSieve protocol can |
137 | -slow down the `Generator` --no code changes are needed for this. This may happen when the second event is sent, before | |
138 | -it is received (or “handeld”) by the Finder, and the limit is set to **1**. | |
156 | +slow down the `Generator` --no code changes are needed for this. This may happen when the second event is sent before | |
157 | +it is received (or “handled”) by the Finder, and the limit is set to **1**. | |
139 | 158 | |BR| |
140 | -As this limit is removed when a Sieve-component is inserted to the chain, only the start is slow... | |
159 | +As this limit is removed when a Sieve-component is inserted into the chain, only the start is slow... | |
141 | 160 | |
142 | 161 | The same happens to every Sieve: initially (when it is connected to the Finder) there is a limit of 1 event in the |
143 | 162 | queue. But when that connection is reconnected --and the potential Heisenbug is gone-- the limit is removed. |
144 | 163 | |
145 | 164 | .. tip:: Unlimited or better? |
146 | 165 | |
147 | - In this blog we remove the limit of the ``SlowStart protocol`` completely, for simplicity. Than, the Heisenbug is | |
166 | + In this blog, we remove the limit of the ``SlowStart protocol`` completely, for simplicity. Then the Heisenbug is | |
148 | 167 | solved. |
149 | 168 | |
150 | 169 | That is not the only option. |
@@ -152,18 +171,13 @@ | ||
152 | 171 | Given the `Generator` is a simple loop it can produce many integers fast. And so cause huge piles of queued |
153 | 172 | events. That can be done better, by the same concept: a maximal queue size. (again: just model). |
154 | 173 | |
155 | - It’s up to de developer to optimize this. Some prever the maximum (queue length) equally to the number of | |
156 | - Sieve-components, other related it to the available core. Some use static values, other will adjust it over the | |
157 | - run-time of the appplication. | |
174 | + It’s up to de developer to optimize this. Some prefer the maximum queue length equal to the number of | |
175 | + Sieve-components, others relate the maximum queue length to the number of available cores. Some use static values, | |
176 | + others will adjust them over the run-time of the application. | |
158 | 177 | |BR| |
159 | - That is all possible, with a few extra lines in the `Main` component. But also the Sieve component can set this | |
160 | - limit, both for incoming-ports as for outgoing port. | |
161 | - | |
178 | + That is all possible with a few lines in the `Main` component. But also the Sieve component can set this | |
179 | + limit, both for incoming-ports as well for outgoing ports. | |
162 | 180 | |
163 | -Simulation | |
164 | ----------- | |
165 | - | |
166 | -XXX | |
167 | 181 | |
168 | 182 | |
169 | 183 | ----- |
@@ -178,18 +192,18 @@ | ||
178 | 192 | The Heisenbug will not (trivially) disappear when switching between hose variants |
179 | 193 | |
180 | 194 | .. [#ButImprove] |
181 | - Still, as language-designer, we need to give the programmer more options to hint to a more optimals implementation. | |
195 | + Still, as language-designer, we need to give the programmer more options to hint at a more optimal implementation. | |
182 | 196 | |
183 | 197 | .. [#KipEi] |
184 | - This *SlowStart* base-protocol is part of Castle; see :ref:`Protocol-SlowStart`. But the need for it --follows like | |
185 | - this blog-- follows from the discovery of this :ref:`Castle-Heisenbug` in :ref:`Castle-TheSieve`. | |
198 | + This *SlowStart* base protocol is part of Castle; see :ref:`Protocol-SlowStart`. But the need for it --follows like | |
199 | + this blog-- follows from the discovery of these :ref:`Castle-Heisenbug` in :ref:`Castle-TheSieve`. | |
186 | 200 | |
187 | 201 | |
188 | 202 | .. [#ModelOnly] |
189 | - Remember, this queue exist as a *model* **only** (like everything in Castle-code)! | |
203 | + Remember, this queue exists as a *model* **only** (like everything in Castle-code)! | |
190 | 204 | |BR| |
191 | - Depending on ‘:ref:`TheMachinery`, there may no need to implement the queue (e.g.with DirectCall) at all; or the may | |
192 | - only be a queue-lenght and -maximum, or .. | |
205 | + Depending on ‘:ref:`TheMachinery`, there may be no need to implement the queue (e.g.with DirectCall) at all; or they may | |
206 | + only be a queue-length and -maximum, or ... | |
193 | 207 | |
194 | 208 | |
195 | 209 |
@@ -3,7 +3,7 @@ | ||
3 | 3 | .. _Castle-TheSieve: |
4 | 4 | |
5 | 5 | ============================ |
6 | -The sieve demo (start/DRAFT) | |
6 | +The Sieve demo (start/DRAFT) | |
7 | 7 | ============================ |
8 | 8 | |
9 | 9 | .. post:: |
@@ -39,6 +39,7 @@ | ||
39 | 39 | .. tab:: Demo of use |
40 | 40 | |
41 | 41 | .. code-block:: ReasonML |
42 | + :emphasize-lines: 6,10 | |
42 | 43 | |
43 | 44 | SimpleSieve.try(newPrime) on self.generator.found |
44 | 45 | { |