[Pgbigm-hackers] pg_gin_pending_cleanup function

Zurück zum Archiv-Index

Masahiko Sawada sawad****@gmail*****
2015年 9月 22日 (火) 00:55:47 JST


On Fri, Sep 18, 2015 at 1:00 AM, Fujii Masao <masao****@gmail*****> wrote:
> On Thu, Aug 27, 2015 at 1:44 AM, Fujii Masao <masao****@gmail*****> wrote:
>> On Wed, Aug 26, 2015 at 11:48 PM, Masahiko Sawada <sawad****@gmail*****> wrote:
>>> On Wed, Aug 26, 2015 at 11:06 AM, Fujii Masao <masao****@gmail*****> wrote:
>>>> Hi,
>>>>
>>>> Attached patch implements the pg_gin_pending_cleanup function which cleans up
>>>> the pending list of the specified GIN index by moving tuples in it to the main
>>>> GIN data structure in bulk. Then this function returns the number of pages in
>>>> the pending list cleaned up. I'd like to add this function into the master.
>>>>
>>>> Even without this function, we can clean up the pending list by using VACUUM.
>>>> However, since VACUUM needs to do not only the pending list cleanup but also
>>>> other various jobs, it usually takes a long time and its performance impact is
>>>> likely to be big. So I think that pg_gin_pending_cleanup function is useful
>>>> because we can clean up the list more quickly and avoid such big performance
>>>> impact by using the function.
>>>
>>> +1.
>>> It will be really useful function for maintenance GIN index.
>>> I applied this patch to HEAD cleanly, and compiled without warning.
>>> It looks good to me.
>>
>> Thanks for reviewing the patch! Applied the patch to the master.
>
> On second thought, current version of pg_gin_pending_cleanup might not be
> sufficient for real scenario because it moves the tuples from pending list into
> GIN index main structure but doesn't mark the removed pages as free in FSM.
> So even if pg_gin_pending_cleanup function is called many times, garbage pages
> in pending list will never be freed and reused later. This causes GIN index to
> be kept being bloated unexpectedly :(
>
> For that problem, I think that we should provide not only tuple-moving but also
> mark-as-free functionalities.

+1.

> One question here is; how should we provide those
> functionalities? There are basically three options.
>
> #1. Provide two separate functions, (1) tuple-move and (2) mark-as-free.
>       The demerit of this option is that a user needs to call both functions
>       when he or she wants to move tuples from pending list and mark removed
>       pages as free in FSM.
>
> #2. Provide three separate functions,
>       (1) tuple-move, (2) mark-as-free and (3) tuple-move + mark-as-free
>       But we might want to avoid providing three functions here...
>
> #3. Provide one function and enable them to specify the operation that they
>       want to perform as an argument. For example, if a user specifies "free"
>       as argument, the function does only mark-as-free operation. If "both" is
>       specified, both tuple-move and mark-as-free are performed. Of course,
>       the argument value "move" makes the function perform tuple-move.
>       Maybe the default should be "both".

I think that the function just moving tuple(i.g. (1) function) would
be useful for testing GIN and pg_bigm on 9.4 or before.
And (3) function will be helpful certainly in production environment.
But I'm not sure that using the function just marking FSM as free
(i.g, (2) function)  would help for something.
Also #3 seems to be overkill.

So IMO, we should add (1) and (3) functions.

Regards,

--
Masahiko Sawada




Pgbigm-hackers メーリングリストの案内
Zurück zum Archiv-Index