Ticket #45767

"Double free or corruption" on assign_continent_flood()

Eröffnet am: 2022-10-05 05:04 Letztes Update: 2022-10-20 06:24

Auswertung:
Verantwortlicher:
Typ:
Status:
Geschlossen
Komponente:
Meilenstein:
Priorität:
5 - Mittel
Schweregrad:
5 - Mittel
Lösung:
Gefixt
Datei:
1

Details

Got this in a S3_1 autogame. After several reproducing attempts got it again (maybe one has to configure with '--enable-testmatic', as that was the latest change in my reproducing attempts before it succeeded)

glibc reports "double free or corruption" from tile_list_remove() called when continent numbers are reassigned from check_terrain_change()

Ticket-Verlauf (3/10 Historien)

2022-10-05 05:04 Aktualisiert von: cazfi
  • New Ticket ""Double free or corruption" on assign_continent_flood()" created
2022-10-05 05:12 Aktualisiert von: cazfi
Kommentar

Well addign_continent_flood() removes tile from a list it's currently iterating, and that's not _safe iterator (I don't think one exist for this). That's the case on all branches, and is likely very old bug - somehow it just has never caused failures before this particular autogame. With this information it's hard to estimate if this is critical issue wrt 3.0.4 release. Seems not to be a regression, but it's possible that I've just encountered the first crash that is coming to be a trend e.g. because some new dependency library version.

2022-10-05 05:18 Aktualisiert von: cazfi
  • Meilenstein Update from (Keine) to 3.0.4 (closed)
  • Komponente Update from (Keine) to Server
Kommentar

Reply To cazfi

it's possible that I've just encountered the first crash that is coming to be a trend e.g. because some new dependency library version.

Well, the very glibc itself had been updated just before that autogame run. So it's likely that this crash will be common as the new glibc version gets widely used -> 3.0.4 should have a fix.

2022-10-05 05:54 Aktualisiert von: cazfi
Kommentar

Reply To cazfi

Well assign_continent_flood() removes tile from a list it's currently iterating, and that's not _safe iterator (I don't think one exist for this).

There's something more to this - I reworked that part, but the crash remains effectively the same (still no way to reproduce consistently, but now with a high percentage of the runs of that autogame)

2022-10-05 07:09 Aktualisiert von: cazfi
Kommentar

From the save autogame (and likely from the very situation) valgrind reports invalid read on dai_data_phase_begin() handling of workers/continent statistics, in call chain beginning from the tile change. Likely this AI stats update is happening when the terrain class has already changed, but continent numbers have not been updated to reflect that.

2022-10-05 07:28 Aktualisiert von: cazfi
Kommentar

Seems that to do all this properly requires quite big changes (and most importantly a lot of testing, which is going to take time.) So, opened an emergency fix ticket #45768 for a temporary solution to enable 3.0.4 release.

2022-10-11 09:45 Aktualisiert von: cazfi
Kommentar

Reply To cazfi

Well addign_continent_flood() removes tile from a list it's currently iterating, and that's not _safe iterator (I don't think one exist for this). That's the case on all branches, and is likely very old bug - somehow it just has never caused failures before this particular autogame. With this information it's hard to estimate if this is critical issue wrt 3.0.4 release. Seems not to be a regression, but it's possible that I've just encountered the first crash that is coming to be a trend e.g. because some new dependency library version.

As this was not the cause of this particular ticket, filed a new ticket about it -> #45825

(Edited, 2022-10-11 09:45 Aktualisiert von: cazfi)
2022-10-12 02:12 Aktualisiert von: cazfi
  • Verantwortlicher Update from (Keine) to cazfi
  • Lösung Update from Keine to Accepted
Kommentar

Attached patch is a bit simpler than what I had in my mind earlier, but this should be fine for the current codebase. Let's move to the over-engineered version only once there's need for it.

2022-10-20 06:24 Aktualisiert von: cazfi
  • Status Update from Offen to Geschlossen
  • Lösung Update from Accepted to Gefixt

Bearbeiten

Please login to add comment to this ticket » Anmelden