PostgreSQL La base de donnees la plus sophistiquee au monde.

La planete francophone de PostgreSQL

lundi 19 septembre 2016


Nouvelles hebdomadaires de PostgreSQL - 18 septembre 2016

La 10ème édition annuelle des Prague PostgreSQL Developer Day 2017 (P2D2 2017) est une session de 2 jours qui se tiendra les 15 & 16 février 2017 à Prague (République Tchèque). Site en tchèque :

L'appel à conférenciers pour la PGConf US 2017 court jusqu'au 15 novembre 2016, 23h59 EST. Les notifications seront envoyées le 2 décembre. Le programme de l'événement sera publié le 2 janvier 2017 :

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en septembre

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Kevin Grittner pushed:

Simon Riggs pushed:

Peter Eisentraut pushed:

  • pg_basebackup: Clean created directories on failure. Like initdb, clean up created data and xlog directories, unless the new -n/--noclean option is specified. Tablespace directories are not cleaned up, but a message is written about that. Reviewed-by: Masahiko Sawada <>
  • Add overflow checks to money type input function. The money type input function did not have any overflow checks at all. There were some regression tests that purported to check for overflow, but they actually checked for the overflow behavior of the int8 type before casting to money. Remove those unnecessary checks and add some that actually check the money input function. Reviewed-by: Fabien COELHO <>

Tom Lane pushed:

  • Docs: assorted minor cleanups. Standardize on "user_name" for a field name in related examples in ddl.sgml; before we had variously "user_name", "username", and "user". The last is flat wrong because it conflicts with a reserved word. Be consistent about entry capitalization in a table in func.sgml. Fix a typo in pgtrgm.sgml. Back-patch to 9.6 and 9.5 as relevant. Alexander Law
  • Fix executor/README to reflect disallowing SRFs in UPDATE. The parenthetical comment here is obsoleted by commit a4c35ea1c. Noted by Andres Freund.
  • Improve parser's and planner's handling of set-returning functions. Teach the parser to reject misplaced set-returning functions during parse analysis using p_expr_kind, in much the same way as we do for aggregates and window functions (cf commit eaccfded9). While this isn't complete (it misses nesting-based restrictions), it's much better than the previous error reporting for such cases, and it allows elimination of assorted ad-hoc expression_returns_set() error checks. We could add nesting checks later if it seems important to catch all cases at parse time. There is one case the parser will now throw error for although previous versions allowed it, which is SRFs in the tlist of an UPDATE. That never behaved sensibly (since it's ill-defined which generated row should be used to perform the update) and it's hard to see why it should not be treated as an error. It's a release-note-worthy change though. Also, add a new Query field hasTargetSRFs reporting whether there are any SRFs in the targetlist (including GROUP BY/ORDER BY expressions). The parser can now set that basically for free during parse analysis, and we can use it in a number of places to avoid expression_returns_set searches. (There will be more such checks soon.) In some places, this allows decontorting the logic since it's no longer expensive to check for SRFs in the tlist --- so I made the checks parallel to the handling of hasAggs/hasWindowFuncs wherever it seemed appropriate. catversion bump because adding a Query field changes stored rules. Andres Freund and Tom Lane Discussion: <>
  • Be pickier about converting between Name and Datum. We were misapplying NameGetDatum() to plain C strings in some places. This worked, because it was just a pointer cast anyway, but it's a type cheat in some sense. Use CStringGetDatum instead, and modify the NameGetDatum macro so it won't compile if applied to something that's not a pointer to NameData. This should result in no changes to generated code, but it is logically cleaner. Mark Dilger, tweaked a bit by me Discussion: <>
  • Tweak targetlist-SRF tests. Add a test case showing that we don't support SRFs in window-function arguments. Remove a duplicate test case for SRFs in aggregate arguments.
  • Tweak targetlist-SRF tests some more. Seems like it would be good to have a test case documenting the existing behavior for non-top-level SRFs.
  • Make min_parallel_relation_size's default value platform-independent. The documentation states that the default value is 8MB, but this was only true at BLCKSZ = 8kB, because the default was hard-coded as 1024. Make the code match the docs by computing the default as 8MB/BLCKSZ. Oversight in commit 75be66464, noted pursuant to a gripe from Peter E. Discussion: <>
  • Add debugging aid "bmsToString(Bitmapset *bms)". This function has no direct callers at present, but it's convenient for manual use in a debugger, rather than having to inspect memory and do bit-counting in your head. In passing, get rid of useless outBitmapset() wrapper around _outBitmapset(); let's just export the function that does the work. Likewise for outToken(). Ashutosh Bapat, tweaked a bit by me Discussion: <>

Andres Freund pushed:

  • Add more tests for targetlist SRFs. We're considering changing the implementation of targetlist SRFs considerably, and a lot of the current behaviour isn't tested in our regression tests. Thus it seems useful to increase coverage to avoid accidental behaviour changes. It's quite possible that some of the plans here will require adjustments to avoid falling afoul of ordering differences (e.g. hashed group bys). The buildfarm will tell us. Reviewed-By: Tom Lane Discussion: <>
  • Address portability issues in bfe16d1a5 test output.
  • Remove user_relns() SRF from regression tests. The output of the function changes whenever previous (or, as in this case, concurrent) tests leave a table in place. That causes unneeded churn. This should fix failures due to the tests added bfe16d1a5, like on lapwing, caused by the tsrf test running concurrently with misc. Those could also have been addressed by using temp tables, but that test has annoyed me before. Discussion: <>

Robert Haas pushed:

Heikki Linnakangas pushed:

  • Fix and clarify comments on replacement selection. These were modified by the patch to only use replacement selection for the first run in an external sort.
  • Support OpenSSL 1.1.0. Changes needed to build at all: - Check for SSL_new in configure, now that SSL_library_init is a macro. - Do not access struct members directly. This includes some new code in pgcrypto, to use the resource owner mechanism to ensure that we don't leak OpenSSL handles, now that we can't embed them in other structs anymore. - RAND_SSLeay() -> RAND_OpenSSL() Changes that were needed to silence deprecation warnings, but were not strictly necessary: - RAND_pseudo_bytes() -> RAND_bytes(). - SSL_library_init() and OpenSSL_config() -> OPENSSL_init_ssl() - ASN1_STRING_data() -> ASN1_STRING_get0_data() - DH_generate_parameters() -> DH_generate_parameters() - Locking callbacks are not needed with OpenSSL 1.1.0 anymore. (Good riddance!) Also change references to SSLEAY_VERSION_NUMBER with OPENSSL_VERSION_NUMBER, for the sake of consistency. OPENSSL_VERSION_NUMBER has existed since time immemorial. Fix SSL test suite to work with OpenSSL 1.1.0. CA certificates must have the "CA:true" basic constraint extension now, or OpenSSL will refuse them. Regenerate the test certificates with that. The "openssl" binary, used to generate the certificates, is also now more picky, and throws an error if an X509 extension is specified in "req_extensions", but that section is empty. Backpatch to all supported branches, per popular demand. In back-branches, we still support OpenSSL 0.9.7 and above. OpenSSL 0.9.6 should still work too, but I didn't test it. In master, we only support 0.9.8 and above. Patch by Andreas Karlsson, with additional changes by me. Discussion: <>
  • Fix building with LibreSSL. LibreSSL defines OPENSSL_VERSION_NUMBER to claim that it is version 2.0.0, but it doesn't have the functions added in OpenSSL 1.1.0. Add autoconf checks for the individual functions we need, and stop relying on OPENSSL_VERSION_NUMBER. Backport to 9.5 and 9.6, like the patch that broke this. In the back-branches, there are still a few OPENSSL_VERSION_NUMBER checks left, to check for OpenSSL 0.9.8 or 0.9.7. I left them as they were - LibreSSL has all those functions, so they work as intended. Per buildfarm member curculio. Discussion: <>
  • Fix ecpg -? option on Windows, add -V alias for --version. This makes the -? and -V options work consistently with other binaries. --help and --version are now only recognized as the first option, i.e. "ecpg --foobar --help" no longer prints the help, but that's consistent with most of our other binaries, too. Backpatch to all supported versions. Haribabu Kommi Discussion: <>

Correctifs en attente

Michaël Paquier sent in a patch to add a pgstat_report_activity() call to walsender.

Craig Ringer and Pavel Stěhule traded patches to add xmltable().

Dagfinn Ilmari Mannsåker sent in a patch to add psql tab completion for the recently-added ALTER TYPE … RENAME VALUE.

Corey Huinker sent in two more revisions of a patch to let file_fdw access COPY FROM PROGRAM.

Craig Ringer sent in two revisions of a patch to install the Perl TAP tests, add install rules for the isolation tester, and note that src/test/Makefile is not called from src/Makefile.

Michaël Paquier sent in a patch to move the fsync routines of initdb into src/common, issue fsync more carefully in pg_receivexlog and pg_basebackup [-X] stream, add a --no-sync option to pg_basebackup, and switch pg_basebackup commands in to use --no-sync.

Etsuro Fujita sent in another revision of a patch to push more FULL JOINs to the PostgreSQL FDW.

Heikki Linnakangas sent in two revisions of a patch to change the way pre-reading in external sort's merge phase works.

Alexander Korotkov sent in another revision of a patch to implement partial sorting.

Mithun Cy sent in another revision of a patch to cache hash index meta page.

Haribabu Kommi sent in a patch to replace most of the getimeofday function calls, except timeofday(user callable) and GetCurrentTimestamp functions with clock_gettime.

Andrew Borodin sent in two more revisions of a patch to implement penalty functions for GiST in the cube contrib extension.

Michaël Paquier sent in a patch to make the new --noclean option in its call.

KaiGai Kohei sent in another revision of a patch to implement PassDownLimitBound for ForeignScan/CustomScan.

Amit Kapila sent in two more revisions of a patch to implement WAL for hash indexes.

Amit Kapila sent in two more revisions of a patch to implement concurrent hash indexes.

Fabien COELHO sent in another revision of a patch to enable pgbench to store results into variables.

Anastasia Lubennikova sent in another revision of a patch to add covering + unique indexes.

Andrew Borodin sent in a patch to change the interpretation of NaN returned from the GiST penalty function from "best fit" to "worst fit."

Tom Lane sent in a PoC patch to put srfs in separate result nodes.

Pavan Deolasee and Claudio Freire traded patches to allow VACUUM to use over 1GB of work_mem.

Thomas Munro sent in another revision of a patch to use kqueue on platforms where it helps.

Craig Ringer sent in another revision of a patch to install the Perl TAP tests, add install rules for isolation tester, and add txid_status(bigint).

Kuntal Ghosh sent in another revision of a patch to add a WAL consistency check facility.

Marco Pfatschbacher sent in a patch to keep one postmaster monitoring pipe per process.

Amit Langote sent in another revision of a patch to implement declarative partitioning.

Rajkumar Raghuwanshi sent in another revision of a patch to enable piecewise joins of partitioned tables.

Daniel Vérité sent in a patch to create hooks into psql variables to return a boolean indicating whether a change is allowed.

Jeevan Chalke sent in another revision of a patch to enable pushing aggregates to a foreign server.

Masahiko Sawada sent in another revision of a patch to enable quorum commit for multiple synchronous replication.

Stas Kelvich sent in another revision of a patch to speed up 2PC transactions.

Julien Rouhaud sent in another revision of a patch to rename the max-worker-processes GUC to max-parallel-workers.

Kyotaro HORIGUCHI sent in another revision of a patch to refactor psql's tab completion and use that refactoring to implement IF (NOT) EXISTS completion.

MauMau sent in a patch to fix the omission of pg_recvlogical from the Windows installation.

Yury Zhuravlev sent in another revision of a patch to CMake-ify PostgreSQL.

par N Bougain le lundi 19 septembre 2016 à 22h04

Nouvelles hebdomadaires de PostgreSQL - 11 septembre 2016

Le groupe d'utilisateurs coréens tiendra son premier PGDay le 15 octobre à Séoul :

Le PGDay 2016 à Austin aura lieu le 12 novembre 2016. Date limite de candidature au 21 septembre 2016 minuit CST. Détails et formulaire de candidature :

Le PGDay.IT 2016 aura lieu à Prato le 13 décembre 2016 :

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en septembre

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Simon Riggs pushed:

Tom Lane pushed:

  • Relax transactional restrictions on ALTER TYPE ... ADD VALUE. To prevent possibly breaking indexes on enum columns, we must keep uncommitted enum values from getting stored in tables, unless we can be sure that any such column is new in the current transaction. Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE from being executed at all in a transaction block, unless the target enum type had been created in the current transaction. This patch removes that restriction, and instead insists that an uncommitted enum value can't be referenced unless it belongs to an enum type created in the same transaction as the value. Per discussion, this should be a bit less onerous. It does require each function that could possibly return a new enum value to SQL operations to check this restriction, but there aren't so many of those that this seems unmaintainable. Andrew Dunstan and Tom Lane Discussion: <>
  • Make locale-dependent regex character classes work for large char codes. Previously, we failed to recognize Unicode characters above U+7FF as being members of locale-dependent character classes such as [[:alpha:]]. (Actually, the same problem occurs for large pg_wchar values in any multibyte encoding, but UTF8 is the only case people have actually complained about.) It's impractical to get Spencer's original code to handle character classes or ranges containing many thousands of characters, because it insists on considering each member character individually at regex compile time, whether or not the character will ever be of interest at run time. To fix, choose a cutoff point MAX_SIMPLE_CHR below which we process characters individually as before, and deal with entire ranges or classes as single entities above that. We can actually make things cheaper than before for chars below the cutoff, because the color map can now be a simple linear array for those chars, rather than the multilevel tree structure Spencer designed. It's more expensive than before for chars above the cutoff, because we must do a binary search in a list of high chars and char ranges used in the regex pattern, plus call iswalpha() and friends for each locale-dependent character class used in the pattern. However, multibyte encodings are normally designed to give smaller codes to popular characters, so that we can expect that the slow path will be taken relatively infrequently. In any case, the speed penalty appears minor except when we have to apply iswalpha() etc. to high character codes at runtime --- and the previous coding gave wrong answers for those cases, so whether it was faster is moot. Tom Lane, reviewed by Heikki Linnakangas Discussion: <>
  • Cosmetic code cleanup in commands/extension.c. Some of the comments added by the CREATE EXTENSION CASCADE patch were a bit sloppy, and I didn't care for redeclaring the same local variable inside a nested block either. No functional changes.
  • Repair whitespace in initdb message. What used to be four spaces somehow turned into a tab and a couple of spaces in commit a00c58314, no doubt from overhelpful emacs autoindent. Noted by Peter Eisentraut.
  • Teach appendShellString() to not quote strings containing "-". Brain fade in commit a00c58314: I was thinking that a string starting with "-" could be taken as a switch depending on command line syntax. That's true, but having appendShellString() quote it will not help, so we may as well not do so. Per complaint from Peter Eisentraut.
  • Guard against possible memory allocation botch in batchmemtuples(). Negative availMemLessRefund would be problematic. It's not entirely clear whether the case can be hit in the code as it stands, but this seems like good future-proofing in any case. While we're at it, insist that the value be not merely positive but not tiny, so as to avoid doing a lot of repalloc work for little gain. Peter Geoghegan Discussion: <>
  • Doc: small improvements for documentation about VACUUM freezing. Mostly, explain how row xmin's used to be replaced by FrozenTransactionId and no longer are. Do a little copy-editing on the side. Per discussion with Egor Rogov. Back-patch to 9.4 where the behavioral change occurred. Discussion: <>
  • Add a HINT for client-vs-server COPY failure cases. Users often get confused between COPY and \copy and try to use client-side paths with COPY. The server then cannot find the file (if remote), or sees a permissions problem (if local), or some variant of that. Emit a hint about this in the most common cases. In future we might want to expand the set of errnos for which the hint gets printed, but be conservative for now. Craig Ringer, reviewed by Christoph Berg and Tom Lane Discussion: <>
  • Doc: minor documentation improvements about extensions. Document the formerly-undocumented behavior that schema and comment control-file entries for an extension are honored only during initial installation, whereas other properties are also honored during updates. While at it, do some copy-editing on the recently-added docs for CREATE EXTENSION ... CASCADE, use links for some formerly vague cross references, and make a couple other minor improvements. Back-patch to 9.6 where CASCADE was added. The other parts of this could go further back, but they're probably not important enough to bother.
  • Support renaming an existing value of an enum type. Not much to be said about this patch: it does what it says on the tin. In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists to clarify what it actually does. In the discussion of this patch we considered supporting other similar options, such as IF EXISTS on the type as a whole or IF NOT EXISTS on the target name. This patch doesn't actually add any such feature, but it might happen later. Dagfinn Ilmari MannsÃ¥ker, reviewed by Emre Hasegeli Discussion: <>
  • Don't print database's tablespace in pg_dump -C --no-tablespaces output. If the database has a non-default tablespace, we emitted a TABLESPACE clause in the CREATE DATABASE command emitted by -C, even if --no-tablespaces was also specified. This seems wrong, and it's inconsistent with what pg_dumpall does, so change it. Per bug #14315 from Danylo Hlynskyi. Back-patch to 9.5. The bug is much older, but it'd be a more invasive change before 9.5 because dumpDatabase() hasn't got an easy way to get to the outputNoTablespaces flag. Doesn't seem worth the work given the lack of previous complaints. Report: <>
  • Allow pg_dump to dump non-extension members of an extension-owned schema. Previously, if a schema was created by an extension, a normal pg_dump run (not --binary-upgrade) would summarily skip every object in that schema. In a case where an extension creates a schema and then users create other objects within that schema, this does the wrong thing: we want pg_dump to skip the schema but still create the non-extension-owned objects. There's no easy way to fix this pre-9.6, because in earlier versions the "dump" status for a schema is just a bool and there's no way to distinguish "dump me" from "dump my members". However, as of 9.6 we do have enough state to represent that, so this is a simple correction of the logic in selectDumpableNamespace. In passing, make some cosmetic fixes in nearby code. Martín Marqués, reviewed by Michael Paquier Discussion: <>
  • Avoid reporting "cache lookup failed" for some user-reachable cases. We have a not-terribly-thoroughly-enforced-yet project policy that internal errors with SQLSTATE XX000 (ie, plain elog) should not be triggerable from SQL. record_in, domain_in, and PL validator functions all failed to meet this standard, because they threw plain elog("cache lookup failed for XXX") errors on bad OIDs, and those are all invokable from SQL. For record_in, the best fix is to upgrade typcache.c (lookup_type_cache) to throw a user-facing error for this case. That seems consistent because it was more than halfway there already, having user-facing errors for shell types and non-composite types. Having done that, tweak domain_in to rely on the typcache to throw an appropriate error. (This costs little because InitDomainConstraintRef would fetch the typcache entry anyway.) For the PL validator functions, we already have a single choke point at CheckFunctionValidatorAccess, so just fix its error to be user-facing. Dilip Kumar, reviewed by Haribabu Kommi Discussion: <>
  • In PageIndexTupleDelete, don't assume stored item lengths are MAXALIGNed. PageAddItem stores the item length as-is. It MAXALIGN's the amount of space actually allocated for each tuple, but not the stored length. PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact are all on board with this and MAXALIGN item lengths after fetching them. But PageIndexTupleDelete expects the stored length to be a MAXALIGN multiple already. This accidentally works for existing index AMs because they all maxalign their tuple sizes internally; but we don't do that for heap tuples, and it shouldn't be a requirement for index tuples either. So, sync PageIndexTupleDelete with the rest of bufpage.c by having it maxalign the item size after fetching. Also add a check that pd_special is maxaligned, to ensure that the test "(offset + size) > phdr->pd_special" is still doing the right thing. (If offset and pd_special are aligned, it doesn't matter whether size is.) Again, this is in sync with the rest of the routines here, except for PageAddItem which doesn't test because it doesn't actually do anything for which pd_special alignment matters. This shouldn't have any immediate functional impact; it just adds the flexibility to use PageIndexTupleDelete on index tuples with non-aligned lengths. Discussion: <>
  • Invent PageIndexTupleOverwrite, and teach BRIN and GiST to use it. PageIndexTupleOverwrite performs approximately the same function as PageIndexTupleDelete (or PageIndexDeleteNoCompact) followed by PageAddItem targeting the same item pointer offset. But in the case where the new tuple is the same size as the old, it avoids shuffling other data around on the page, because the new tuple is placed where the old one was rather than being appended to the end of the page. This has been shown to provide a substantial speedup for some GiST use-cases. Also, this change allows some API simplifications: we can get rid of the rather klugy and error-prone PAI_ALLOW_FAR_OFFSET flag for PageAddItemExtended, since that was used only to cover a corner case for BRIN that's better expressed by using PageIndexTupleOverwrite. Note that this patch causes a rather subtle WAL incompatibility: the physical page content change represented by certain WAL records is now different than it was before, because while the tuples have the same itempointer line numbers, the tuples themselves are in different places. I have not bumped the WAL version number because I think it doesn't matter unless you are trying to do bitwise comparisons of original and replayed pages, and in any case we're early in a devel cycle and there will probably be more WAL changes before v10 gets out the door. There is probably room to make use of PageIndexTupleOverwrite in SP-GiST and GIN too, but that is left for a future patch. Andrey Borodin, reviewed by Anastasia Lubennikova, whacked around a bit by me Discussion: <>
  • Convert PageAddItem into a macro to save a few cycles. Nowadays this is just a backwards-compatibility wrapper around PageAddItemExtended, so let's avoid the extra level of function call. In addition, because pretty much all callers are passing constants for the two bool arguments, compilers will be able to constant-fold the conversion to a flags bitmask. Discussion: <>
  • Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple. The full generality of deleting an arbitrary number of tuples is no longer needed, so let's save some code and cycles by replacing the original coding with an implementation based on PageIndexTupleDelete. We can always get back the old code from git if we need it again for new callers (though I don't care for its willingness to mess with line pointers it wasn't told to mess with). Discussion: <>
  • Fix miserable coding in pg_stat_get_activity(). Commit dd1a3bccc replaced a test on whether a subroutine returned a null pointer with a test on whether &pointer->backendStatus was null. This accidentally failed to fail, at least on common compilers, because backendStatus is the first field in the struct; but it was surely trouble waiting to happen. Commit f91feba87 then messed things up further, changing the logic to local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); if (!local_beentry) continue; beentry = &local_beentry->backendStatus; if (!beentry) { where the second "if" is now dead code, so that the intended behavior of printing a row with "<backend information not available>" cannot occur. I suspect this is all moot because pgstat_fetch_stat_local_beentry will never actually return null in this function's usage, but it's still very poor coding. Repair back to 9.4 where the original problem was introduced.
  • Improve unreachability recognition in elog() macro. Some experimentation with an older version of gcc showed that it is able to determine whether "if (elevel_ >= ERROR)" is compile-time constant if elevel_ is declared "const", but otherwise not so much. We had accounted for that in ereport() but were too miserly with braces to make it so in elog(). I don't know how many currently-interesting compilers have the same quirk, but in case it will save some code space, let's make sure that elog() is on the same footing as ereport() for this purpose. Back-patch to 9.3 where we introduced pg_unreachable() calls into elog/ereport.
  • Fix and simplify MSVC build's handling of xml/xslt/uuid dependencies. mistakenly believed that the xml option requires the xslt option, when actually the dependency is the other way around; and it believed that libxml requires libiconv, which is not necessarily so, so we shouldn't enforce it here. Fix the option cross-checking logic. Also, since AddProject already takes care of adding libxml and libxslt include and library dependencies to every project, there's no need for the custom code that did that in mkvcbuild. While at it, let's handle the similar dependencies for uuid in a similar fashion. Given the lack of field complaints about these overly strict build dependency requirements, there seems no need for a back-patch. Michael Paquier Discussion: <>
  • Allow CREATE EXTENSION to follow extension update paths. Previously, to update an extension you had to produce both a version-update script and a new base installation script. It's become more and more obvious that that's tedious, duplicative, and error-prone. This patch attempts to improve matters by allowing the new base installation script to be omitted. CREATE EXTENSION will install a requested version if it can find a base script and a chain of update scripts that will get there. As in the existing update logic, shorter chains are preferred if there's more than one possibility, with an arbitrary tie-break rule for chains of equal length. Also adjust the pg_available_extension_versions view to show such versions as installable. While at it, refactor the code so that CASCADE processing works for extensions requested during ApplyExtensionUpdates(). Without this, addition of a new requirement in an updated extension would require creating a new base script, even if there was no other reason to do that. (It would be easy at this point to add a CASCADE option to ALTER EXTENSION UPDATE, to allow the same thing to happen during a manually-commanded version update, but I have not done that here.) Tom Lane, reviewed by Andres Freund Discussion: <>

Bruce Momjian pushed:

Ãlvaro Herrera pushed:

  • Have "make coverage" recurse into contrib as well
  • Fix two src/test/modules Makefiles. commit_ts and test_pg_dump were declaring targets before including the PGXS stanza, which meant that the "all" target customarily defined as the first (and therefore default target) was not the default anymore. Fix that by moving those target definitions to after PGXS. commit_ts was initially good, but I broke it in commit 9def031bd2; test_pg_dump was born broken, probably copying from commit_ts' mistake. In passing, fix a comment mistake in test_pg_dump/Makefile. Backpatch to 9.6. Noted by Tom Lane.
  • Fix locking a tuple updated by an aborted (sub)transaction. When heap_lock_tuple decides to follow the update chain, it tried to also lock any version of the tuple that was created by an update that was subsequently rolled back. This is pointless, since for all intents and purposes that tuple exists no more; and moreover it causes misbehavior, as reported independently by Marko Tiikkaja and Marti Raudsepp: some SELECT FOR UPDATE/SHARE queries may fail to return the tuples, and assertion-enabled builds crash. Fix by having heap_lock_updated_tuple test the xmin and return success immediately if the tuple was created by an aborted transaction. The condition where tuples become invisible occurs when an updated tuple chain is followed by heap_lock_updated_tuple, which reports the problem as HeapTupleSelfUpdated to its caller heap_lock_tuple, which in turn propagates that code outwards possibly leading the calling code (ExecLockRows) to believe that the tuple exists no longer. Backpatch to 9.3. Only on 9.5 and newer this leads to a visible failure, because of commit 27846f02c176; before that, heap_lock_tuple skips the whole dance when the tuple is already locked by the same transaction, because of the ancient HeapTupleSatisfiesUpdate behavior. Still, the buggy condition may also exist in more convoluted scenarios involving concurrent transactions, so it seems safer to fix the bug in the old branches too. Discussion:

Peter Eisentraut pushed:

  • Add location field to DefElem. Add a location field to the DefElem struct, used to parse many utility commands. Update various error messages to supply error position information. To propogate the error position information in a more systematic way, create a ParseState in standard_ProcessUtility() and pass that to interested functions implementing the utility commands. This seems better than passing the query string and then reassembling a parse state ad hoc, which violates the encapsulation of the ParseState type. Reviewed-by: Pavel Stehule <>
  • Make better use of existing enums in plpgsql plpgsql.h defines a number of enums, but most of the code passes them around as ints. Update structs and function prototypes to take enum types instead. This clarifies the struct definitions in plpgsql.h in particular. Reviewed-by: Pavel Stehule <>

Noah Misch pushed:

Andres Freund pushed:

  • Fix mdtruncate() to close fd.c handle of deleted segments. mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting it. That lead to a fd.c handle to a truncated file being kept open until backend exit. The issue appears to have been introduced way back in 1a5c450f3024ac5, before that the handle was closed inside FileUnlink(). The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE for temporary tables, truncate files in place (i.e. TRUNCATE itself is not affected), and the relation has to be bigger than 1GB. The consequences of a leaked fd.c handle aren't severe either. Discussion: <> Backpatch: all supported releases
  • Improve scalability of md.c for large relations. So far md.c used a linked list of segments. That proved to be a problem when processing large relations, because every smgr.c/md.c level access to a page incurred walking through a linked list of all preceding segments. Thus making accessing pages O(#segments). Replace the linked list of segments hanging off SMgrRelationData with an array of opened segments. That allows O(1) access to individual segments, if they've previously been opened. Discussion: <> Reviewed-By: Peter Geoghegan, Tom Lane (in an older version)
  • Faster PageIsVerified() for the all zeroes case. That's primarily useful for testing very large relations, using sparse files. Discussion: <> Reviewed-By: Peter Geoghegan

Heikki Linnakangas pushed:

  • Implement binary heap replace-top operation in a smarter way. In external sort's merge phase, we maintain a binary heap holding the next tuple from each input tape. On each step, the topmost tuple is returned, and replaced with the next tuple from the same tape. We were doing the replacement by deleting the top node in one operation, and inserting the next tuple after that. However, you can do a "replace-top" operation more efficiently, in one "sift-up". A deletion will always walk the heap from top to bottom, but in a replacement, we can stop as soon as we find the right place for the new tuple. This is particularly helpful, if the tapes are not in completely random order, so that the next tuple from a tape is likely to land near the top of the heap. Peter Geoghegan, reviewed by Claudio Freire, with some editing by me. Discussion: <>

Correctifs en attente

Michaël Paquier and Christian Ullrich traded patches to enable parallel builds with MSVC.

Haribabu Kommi sent in another revision of a patch to add a pg_hba_file_settings system view.

Craig Ringer sent in a patch to send catalog_xmin in hot standby feedback protocol, make walsender respect catalog_xmin in hot standby feedback messages, allow GetOldestXmin(...) to optionally disregard the catalog_xmin, and send catalog_xmin separately in hot_standby_feedback messages.

Heikki Linnakangas sent in a patch to fix compilation with OpenSSL 1.1 and silence deprecation warnings with OpenSSL 1.1.

Abhijit Menon-Sen sent in another revision of a patch to convert recovery.conf settings to GUCs.

KaiGai Kohei sent in another revision of a patch to implement PassDownLimitBound for ForeignScan/CustomScan.

Michaël Paquier sent in another revision of a patch to speed up 2PC transactions.

Mithun Cy sent in another revision of a patch to cache hash index meta pages.

Craig Ringer and Vladimir Gordiychuk traded patches to respect client-initiated CopyDone in walsender and allow streaming to be interrupted during ReorderBufferCommit.

Claudio Freire, Masahiko Sawada, and Simon Riggs traded patches to enable using over 1GB of work_mem in VACUUM.

Tom Lane sent in two revisions of a patch to let CREATE EXTENSION follow update chains.

Simon Riggs sent in another revision of a patch to add a "patient" mode to LOCK TABLE.

Daniel Vérité sent in another revision of a patch to add batch/pipelining support for libpq.

Michaël Paquier and Christian Ullrich traded patches to do some cleanups in pgwin32_putenv, fix the load race in pgwin32_putenv() (and open the unload race), and pin any DLL as soon as seen when looking for _putenv on Windows.

Vinayak Pokale sent in a patch to add pg_fdw_xact_resolver().

Gerdan Rezende dos Santos sent in another revision of a patch to add an option to pg_dumpall to exclude tables from the dump.

Kyotaro HORIGUCHI sent in a patch to fix an illegal SJIS mapping.

Etsuro Fujita sent in another revision of a patch to push more UPDATEs and DELETEs to the PostgreSQL FDW.

Ashutosh Bapat and Amit Langote traded patches to add declarative table partitioning.

Kyotaro HORIGUCHI sent in another revision of a patch to add IF NOT EXISTS support to psql's tab completion.

David Steele sent in two more revisions of a patch to exclude additional directories in pg_basebackup.

Pavan Deolasee sent in a patch to override compile time log levels of specific messages/modules.

Heikki Linnakangas sent in four revisions of a patch to remove tuplesort merge pre-reading.

Fabien COELHO sent in another revision of a patch to pgbench to permit storing select results into variables.

Peter Geoghegan sent in another revision of a patch to add amcheck, a B-Tree integrity checking tool.

Jim Nasby sent in another revision of a patch to fix a mistaken comment in nodeCtescan.c.

Vik Fearing sent in two more revisions of a patch to add long options for pg_ctl waiting.

Kyotaro HORIGUCHI sent in a PoC patch to use a radix tree to encoding characters of Shift JIS.

Michaël Paquier sent in another revision of a patch to forbid the use of LF and CR characters in database and role names.

Ildar Musin sent in another revision of a patch to enable index-only scans for some expressions.

Amit Kapila sent in another revision of a patch to enable WAL logging for hash indexes.

Michaël Paquier sent in three revisions of a patch to remove a useless dependency assumption libxml2 -> libxslt in MSVC scripts.

Kuntal Ghosh sent in two more revisions of a patch to add a WAL consistency check facility.

Masahiko Sawada sent in a patch to make lazy_scan_heap output how many all_frozen pages are skipped and add cheat_vacuum_table_size.

Rahila Syed sent in another revision of a patch to get psql to error out when someone tries to set autocommit at the wrong spot.

Ashutosh Bapat sent in another revision of a patch to enable pushing aggregates to a foreign server.

Masahiko Sawada sent in a patch to add a --disable-page-skipping option to the vacuumdb command.

Peter Eisentraut sent in another revision of a patch to add overflow checks to the money type input function.

Victor Wagner sent in another revision of a patch to implement failover on libpq connect level.

Stephen Frost sent in another revision of a patch to add support for restrictive RLS policies.

Petr Jelinek sent in another revision of a patch to implement logical replication.

Adam Brightwell sent in a patch to fix a mismatch between RLS and COPY.

Andrew Borodin and Mikhail Bakhterev traded patches to improve the GiST penalty functions.

Peter Eisentraut sent in another revision of a patch to add CREATE SEQUENCE AS <data type> clause.

Ashutosh Bapat sent in a patch to support partition-wise joins for partitioned tables.

Etsuro Fujita sent in another revision of a patch to push down more full joins in postgres_fdw.

Pavel Stehule sent in two more revisions of a patch to add xmltable().

Kevin Grittner sent in another revision of a patch to add trigger transition tables.

Thomas Munro sent in another revision of a patch to use kqeue when available.

Dmitry Dolgov sent in a patch to add generic type subscription.

Heikki Linnakangas sent in a patch to allow temp file access tracing.

Vitaly Burovoy sent in another revision of a patch to add tab completion for CREATE DATABASE ... TEMPLATE ... to psql.

Amit Kapila sent in a patch to add some simple tests for pg_stat_statements.

Mattia sent in a patch to allow to_date() and to_timestamp() to accept localized month names.

Corey Huinker sent in another revision of a patch to let file_fdw access COPY FROM PROGRAM.

Peter Eisentraut sent in another revision of a patch to add a pg_sequences view.

Peter Geoghegan sent in another revision of a patch to add parallel tuplesort for parallel B-tree index creation.

par N Bougain le lundi 19 septembre 2016 à 21h46

mercredi 7 septembre 2016


Nouvelles hebdomadaires de PostgreSQL - 4 septembre 2016

PostgreSQL 9.6 Release Candidate 1 est disponible, à vos tests !

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en septembre

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Tom Lane pushed:

  • Make another editorial pass over the 9.6 release notes. I think they're pretty much release-quality now.
  • Fix stray reference to the old script. Per Tomas Vondra.
  • Make AllocSetContextCreate throw an error for bad context-size parameters. The previous behavior was to silently change them to something valid. That obscured the bugs fixed in commit ea268cdc9, and generally seems less useful than complaining. Unlike the previous commit, though, we'll do this in HEAD only --- it's a bit too late to be possibly breaking third-party code in 9.6. Discussion: <>
  • Fix initdb misbehavior when user mis-enters superuser password. While testing simple_prompt() revisions, I happened to notice that current initdb behaves rather badly when --pwprompt is specified and the user miskeys the second password. It complains about the mismatch, does "rm -rf" on the data directory, and exits. The problem is that since commit c4a8812cf, there's a standalone backend sitting waiting for commands at that point. It gets unhappy about its datadir having gone away, and spews a PANIC message at the user, which is not nice. (And the shell then adds to the mess with meaningless bleating about a core dump...) We don't really want that sort of thing to happen unless there's an internal failure in initdb, which this surely is not. The best fix seems to be to move the collection of the password earlier, so that it's done essentially as part of argument collection, rather than at the rather ad-hoc time it was done before. Back-patch to 9.6 where the problem was introduced.
  • Simplify correct use of simple_prompt(). The previous API for this function had it returning a malloc'd string. That meant that callers had to check for NULL return, which few of them were doing, and it also meant that callers had to remember to free() the string later, which required extra logic in most cases. Instead, make simple_prompt() write into a buffer supplied by the caller. Anywhere that the maximum required input length is reasonably small, which is almost all of the callers, we can just use a local or static array as the buffer instead of dealing with malloc/free. A fair number of callers used "pointer == NULL" as a proxy for "haven't requested the password yet". Maintaining the same behavior requires adding a separate boolean flag for that, which adds back some of the complexity we save by removing free()s. Nonetheless, this nets out at a small reduction in overall code size, and considerably less code than we would have had if we'd added the missing NULL-return checks everywhere they were needed. In passing, clean up the API comment for simple_prompt() and get rid of a very-unnecessary malloc/free in its Windows code path. This is nominally a bug fix, but it does not seem worth back-patching, because the actual risk of an OOM failure in any of these places seems pretty tiny, and all of them are client-side not server-side anyway. This patch is by me, but it owes a great deal to Michael Paquier who identified the problem and drafted a patch for fixing it the other way. Discussion: <>
  • Fix a bunch of places that called malloc and friends with no NULL check. Where possible, use palloc or pg_malloc instead; otherwise, insert explicit NULL checks. Generally speaking, these are places where an actual OOM is quite unlikely, either because they're in client programs that don't allocate all that much, or they're very early in process startup so that we'd likely have had a fork() failure instead. Hence, no back-patch, even though this is nominally a bug fix. Michael Paquier, with some adjustments by me Discussion: <>
  • Prevent starting a standalone backend with standby_mode on. This can't really work because standby_mode expects there to be more WAL arriving, which there will not ever be because there's no WAL receiver process to fetch it. Moreover, if standby_mode is on then hot standby might also be turned on, causing even more strangeness because that expects read-only sessions to be executing in parallel. Bernd Helmle reported a case where btree_xlog_delete_get_latestRemovedXid got confused, but rather than band-aiding individual problems it seems best to prevent getting anywhere near this state in the first place. Back-patch to all supported branches. In passing, also fix some omissions of errcodes in other ereport's in readRecoveryCommandFile(). Michael Paquier (errcode hacking by me) Discussion: <00F0B2CEF6D0CEF8A90119D4@eje.credativ.lan>
  • Remove no-longer-useful SSL-specific Port.count field. Since we removed SSL renegotiation, there's no longer any reason to keep track of the amount of data transferred over the link. Daniel Gustafsson Discussion: <>
  • Try to fix portability issue in enum renumbering (again). The hack embodied in commit 4ba61a487 no longer works after today's change to allow DatumGetFloat4/Float4GetDatum to be inlined (commit 14cca1bf8). Probably what's happening is that the faulty compilers are deciding that the now-inlined assignment is a no-op and so they're not required to round to float4 width. We had a bunch of similar issues earlier this year in the degree-based trig functions, and eventually settled on using volatile intermediate variables as the least ugly method of forcing recalcitrant compilers to do what the C standard says (cf commit 82311bcdd). Let's see if that method works here. Discussion: <>
  • Improve memory management for PL/Tcl functions. Formerly, the memory used to represent a PL/Tcl function was allocated with malloc() or in TopMemoryContext, and we'd leak it all if the function got redefined during the session. Instead, create a per-function context and keep everything in or under that context. Add a reference-counting mechanism (like the one plpgsql has long had) so that we can safely clean up an old function definition, either immediately if it's not being executed or at the end of the outermost execution. Currently, we only detect that a cached function is obsolete when we next attempt to call that function. So this covers the updated-definition case but leaves cruft around after DROP FUNCTION. It's not clear whether it's worth installing a syscache invalidation callback to watch for drops; none of the other PLs do, so for now we won't do it here either. Michael Paquier and Tom Lane Discussion: <>
  • Improve memory management for PL/Perl functions. Unlike PL/Tcl, PL/Perl at least made an attempt to clean up after itself when a function gets redefined. But it was still using TopMemoryContext for the fn_mcxt of argument/result I/O functions, resulting in the potential for memory leaks depending on what those functions did, and the retail alloc/free logic was pretty bulky as well. Fix things to use a per-function memory context like the other PLs now do. Tweak a couple of places where things were being done in a not-very-safe order (on the principle that a memory leak is better than leaving global state inconsistent after an error). Also make some minor cosmetic adjustments, mostly in field names, to make the code look similar to the way PL/Tcl does now wherever it's essentially the same logic. Michael Paquier and Tom Lane Discussion: <>
  • Change API of ShmemAlloc() so it throws error rather than returning NULL. A majority of callers seem to have believed that this was the API spec already, because they omitted any check for a NULL result, and hence would crash on an out-of-shared-memory failure. The original proposal was to just add such error checks everywhere, but that does nothing to prevent similar omissions in future. Instead, let's make ShmemAlloc() throw the error (so we can remove the caller-side checks that do exist), and introduce a new function ShmemAllocNoError() that has the previous behavior of returning NULL, for the small number of callers that need that and are prepared to do the right thing. This also lets us remove the rather wishy-washy behavior of printing a WARNING for out-of-shmem, which never made much sense: either the caller has a strategy for dealing with that, or it doesn't. It's not ShmemAlloc's business to decide whether a warning is appropriate. The v10 release notes will need to call this out as a significant source-code change. It's likely that it will be a bug fix for extension callers too, but if not, they'll need to change to using ShmemAllocNoError(). This is nominally a bug fix, but the odds that it's fixing any live bug are actually rather small, because in general the requests being made by the unchecked callers were already accounted for in determining the overall shmem size, so really they ought not fail. Between that and the possible impact on extensions, no back-patch. Discussion: <>
  • Don't require dynamic timezone abbreviations to match underlying time zone. Previously, we threw an error if a dynamic timezone abbreviation did not match any abbreviation recorded in the referenced IANA time zone entry. That seemed like a good consistency check at the time, but it turns out that a number of the abbreviations in the IANA database are things that Olson and crew made up out of whole cloth. Their current policy is to remove such names in favor of using simple numeric offsets. Perhaps unsurprisingly, a lot of these made-up abbreviations have varied in meaning over time, which meant that our commit b2cbced9e and later changes made them into dynamic abbreviations. So with newer IANA database versions that don't mention these abbreviations at all, we fail, as reported in bug #14307 from Neil Anderson. It's worse than just a few unused-in-the-wild abbreviations not working, because the pg_timezone_abbrevs view stops working altogether (since its underlying function tries to compute the whole view result in one call). We considered deleting these abbreviations from our abbreviations list, but the problem with that is that we can't stay ahead of possible future IANA changes. Instead, let's leave the abbreviations list alone, and treat any "orphaned" dynamic abbreviation as just meaning the referenced time zone. It will behave a bit differently than it used to, in that you can't any longer override the zone's standard vs. daylight rule by using the "wrong" abbreviation of a pair, but that's better than failing entirely. (Also, this solution can be interpreted as adding a small new feature, which is that any abbreviation a user wants can be defined as referencing a time zone name.) Back-patch to all supported branches, since this problem affects all of them when using tzdata 2016f or newer. Report: <> Discussion: <>
  • Fix corrupt GIN_SEGMENT_ADDITEMS WAL records on big-endian hardware. computeLeafRecompressWALData() tried to produce a uint16 WAL log field by memcpy'ing the first two bytes of an int-sized variable. That accidentally works on little-endian hardware, but not at all on big-endian. Replay then thinks it's looking at an ADDITEMS action with zero entries, and reads the first two bytes of the first TID therein as the next segno/action, typically leading to "unexpected GIN leaf action" errors during replay. Even if replay failed to crash, the resulting GIN index page would surely be incorrect. To fix, just declare the variable as uint16 instead. Per bug #14295 from Spencer Thomason (much thanks to Spencer for turning his problem into a self-contained test case). This likely also explains a previous report of the same symptom from Bernd Helmle. Back-patch to 9.4 where the problem was introduced (by commit 14d02f0bb). Discussion: <> Possible-Report: <>
  • Fix multiple bugs in numeric_poly_deserialize(). These were evidently introduced by yesterday's commit 9cca11c91, which perhaps needs more review than it got. Per report from Andreas Seltenreich and additional examination of nearby code. Report: <>
  • Improve readability of the output of psql's \timing command. In addition to the existing decimal-milliseconds output value, display the same value in mm:ss.fff format if it exceeds one second. Tack on hours and even days fields if the interval is large enough. This avoids needing mental arithmetic to convert the values into customary time units. Corey Huinker, reviewed by Gerdan Santos; bikeshedding by many Discussion: <>
  • Remove useless pg_strdup() operations. split_to_stringlist() doesn't modify its first argument nor expect it to remain valid after exit, so there's no need to duplicate the optarg string at the call sites. Per Coverity. (This has been wrong all along, but commit 052cc223d changed the useless calls from "strdup" to "pg_strdup", which apparently made Coverity think it's a new bug. It's not, but it's also not worth back-patching.)
  • Update release notes to mention need for ALTER EXTENSION UPDATE. Maybe we ought to make pg_upgrade do this for you, but it won't happen in 9.6, so call out the need for it as a migration consideration.
  • Remove vestigial references to "zic" in favor of "IANA database". Commit b2cbced9e instituted a policy of referring to the timezone database as the "IANA timezone database" in our user-facing documentation. Propagate that wording into a couple of places that were still using "zic" to refer to the database, which is definitely not right (zic is the compilation tool, not the data). Back-patch, not because this is very important in itself, but because we routinely cherry-pick updates to the tznames files and I don't want to risk future merge failures.
  • Add regression test coverage for non-default timezone abbreviation sets. After further reflection about the mess cleaned up in commit 39b691f25, I decided the main bit of test coverage that was still missing was to check that the non-default abbreviation-set files we supply are usable. Add that. Back-patch to supported branches, just because it seems like a good idea to keep this all in sync.
  • Remove duplicate code from ReorderBufferCleanupTXN(). Andres is apparently the only hacker who thinks this code is better as-is. I (tgl) follow some of his logic, but the fact that it's setting off warnings from static code analyzers seems like a sufficient reason to put the complexity into a comment rather than the code. Aleksander Alekseev Discussion: <20160404190345.54d84ee8@fujitsu>

Fujii Masao pushed:

  • Fix pg_xlogdump so that it handles cross-page XLP_FIRST_IS_CONTRECORD record. Previously pg_xlogdump failed to dump the contents of the WAL file if the file starts with the continuation WAL record which spans more than one pages. Since pg_xlogdump assumed that the continuation record always fits on a page, it could not find the valid WAL record to start reading from in that case. This patch changes pg_xlogdump so that it can handle a continuation WAL record which crosses a page boundary and find the valid record to start reading from. Back-patch to 9.3 where pg_xlogdump was introduced. Author: Pavan Deolasee Reviewed-By: Michael Paquier and Craig Ringer Discussion:
  • Fix typos in comments.

Simon Riggs pushed:

Heikki Linnakangas pushed:

  • Remove support for OpenSSL versions older than 0.9.8. OpenSSL officially only supports 1.0.1 and newer. Some OS distributions still provide patches for 0.9.8, but anything older than that is not interesting anymore. Let's simplify things by removing compatibility code. Andreas Karlsson, with small changes by me.
  • Use static inline functions for float <-> Datum conversions. Now that we are OK with using static inline functions, we can use them to avoid function call overhead of pass-by-val versions of Float4GetDatum, DatumGetFloat8, and Float8GetDatum. Those functions are only a few CPU instructions long, but they could not be written into macros previously, because we need a local union variable for the conversion. I kept the pass-by-ref versions as regular functions. They are very simple too, but they call palloc() anyway, so shaving a few instructions from the function call doesn't seem so important there. Discussion: <>
  • Support multiple iterators in the Red-Black Tree implementation. While we don't need multiple iterators at the moment, the interface is nicer and less dangerous this way. Aleksander Alekseev, with some changes by me.
  • Speed up SUM calculation in numeric aggregates. This introduces a numeric sum accumulator, which performs better than repeatedly calling add_var(). The performance comes from using wider digits and delaying carry propagation, tallying positive and negative values separately, and avoiding a round of palloc/pfree on every value. This speeds up SUM(), as well as other standard aggregates like AVG() and STDDEV() that also calculate a sum internally. Reviewed-by: Andrey Borodin Discussion: <>
  • Move code shared between libpq and backend from backend/libpq/ to common/. When building libpq, ip.c and md5.c were symlinked or copied from src/backend/libpq into src/interfaces/libpq, but now that we have a directory specifically for routines that are shared between the server and client binaries, src/common/, move them there. Some routines in ip.c were only used in the backend. Keep those in src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file that's now in common. Fix the comment in src/common/Makefile to reflect how libpq actually links those files. There are two more files that libpq symlinks directly from src/backend: encnames.c and wchar.c. I don't feel compelled to move those right now, though. Patch by Michael Paquier, with some changes by me. Discussion: <>
  • Clarify the new Red-Black post-order traversal code a bit. Coverity complained about the for(;;) loop, because it never actually iterated. It was used just to be able to use "break" to exit it early. I agree with Coverity, that's a bit confusing, so refactor the code to use if-else instead. While we're at it, use a local variable to hold the "current" node. That's shorter and clearer than referring to "iter->last_visited" all the time.

Ãlvaro Herrera pushed:

Robert Haas pushed:

Kevin Grittner pushed:

Correctifs en attente

Vik Fearing sent in a patch to fix some misbehaviors by using autovacuum better.

Haribabu Kommi sent in a patch to ensure that ecpg -? option works on Windows.

Craig Ringer sent in two revisions of a patch to report server_version_num alongside server_version in startup packet.

Andrew Borodin sent in three revisions of a patch to improve the performance of GiST penalty functions.

Vik Fearing sent in a patch to update sample configuration files to more modern settings.

KaiGai Kohei sent in a patch to add an optional callback to support a special optimization when a ForeignScan/CustomScan is located under the Limit node in plan tree.

Craig Ringer sent in six more revisions of a patch to add txid_status(bigint).

Michaël Paquier sent in two more revisions of a patch to rename pg_xlog and pg_clog to things that look less removable.

Tom Lane sent in a patch to fix some corner cases that cube_in rejects.

Michaël Paquier sent in another revision of a patch to fix a case that crashed pg_basebackup.

Andreas Karlsson sent in a patch to allow compiling with OpenSSL 1.1, remove OpenSSL 1.1 deprecation warnings, and remove px_get_pseudo_random_bytes.

Martín Marqués sent in another revision of a patch to fix the fact that extensions' tables are reloaded twice: once when restoring the extension, the other when the tables themselves get picked up.

Michaël Paquier sent in another revision of a patch to fix pg_receivelog.

Kyotaro HORIGUCHI sent in another revision of a patch to start on asynchronous and vectorized execution.

Kyotaro HORIGUCHI sent in a patch to fix some comments in GatherPath.single_copy.

Fujii Masao sent in three revisions of a patch to fix a GIN bug.

Maksim Milyutin sent in a patch to enable extraction of the state of a query execution from an external session via signals.

Peter Eisentraut sent in a patch to upgrade SERIALs to the new format.

Heikki Linnakangas sent in two revisions of a patch to reflect his conclusion that pinning a buffer in TupleTableSlot is unnecessary.

Jesper Pedersen sent in a patch to add hash index support in pageinspect.

Peter Eisentraut sent in another revision of a patch to add IDENTITY columns.

Mithun Cy sent in a patch to ensure that ecpg passes make.

Jeevan Chalke sent in a patch to push aggregates to a foreign server under some circumstances.

Peter Eisentraut sent in a patch to allow the use of ICU for sorting and other locale things.

Peter Eisentraut sent in a patch to add CREATE SEQUENCE AS <data type> clause.

Peter Eisentraut sent in a patch to add autonomous transactions.

Heikki Linnakangas sent in a patch to use static inline functions for Float <-> Datum conversions.

Ivan Kartyshov sent in a patch to make async slave wait for lsn to be replayed.

Peter Eisentraut sent in a PoC patch to support C++ builds of PostgreSQL.

Heikki Linnakangas and Andres Freund traded patches to optimize aggregates.

Pavel Stehule sent in a patch to add a \setfileref to psql.

Kevin Grittner sent in another revision of a patch to add trigger transition tables.

Craig Ringer sent in another revision of a patch to enable logical decoding to follow timelines.

Craig Ringer sent in another revision of a patch to add an optional --endpos LSN argument to pg_reclogical.

Simon Riggs and Abhijit Menon-Sen traded patches to change the recovery.conf API.

Pavan Deolasee sent in three revisions of a patch to add WARM, a technique to reduce write amplification when an indexed column of a table is updated.

Michael Banck sent in a patch to add -N (exclude schema(s)) to pg_restore.

Ivan Kartyshov sent in a patch to make pg_buffercache cheaper with large shmem.

Magnus Hagander sent in two revisions of a patch to pg_basebackup stream xlog to tar.

Marko Tiikkaja sent in another revision of a patch to add an INSERT ... SET syntax.

Petr Jelinek and Erik Rijkers traded patches to add logical replication.

Stephen Frost sent in a patch to add support for restrictive RLS policies.

Peter Eisentraut sent in a patch to add a pg_sequence catalog.

Simon Riggs sent in a patch to reduce lock levels with some comments as to how and why.

Kurt Roeckx sent in a patch to allow compiling with OpenSSL 1.1.

Amit Kapila sent in another revision of a patch to implement WAL-logged hash indexes.

Michaël Paquier sent in another revision of a patch to disallow database and role names with carriage return or linefeed in them.

Haribabu Kommi sent in a patch to add a new SQL counter statistics view, pg_stat_sql.

Michaël Paquier sent in two more revisions of a patch to fix some data durability issues in pg_basebackup and pg_receivexlog.

Rahila Syed sent in two revisions of a patch to cause psql to error when autocommit is turned on inside a transaction.

Tom Lane sent in another revision of a patch to add a \timing interval to psql.

Craig Ringer sent in three revisions of a patch to emit a HINT when COPY can't find a file.

Simon Riggs sent in a patch to fix docs' incorrect assumption that decoding slots cannot go backwards from SQL.

Etsuro Fujita sent in another revision of a patch to push down more full joins in the PostgreSQL FDW.

Andy Grundman sent in a patch to avoid use of __attribute__ when building with old Sun compiler versions.

Fabien COELHO sent in another revision of a patch to allow storing select results into variables in pgbench.

Claudio Freire and Simon Riggs traded patches to allow setting maintainance_work_mem or autovacuum_work_mem higher than 1GB (and be effective).

Craig Ringer sent in two revisions of a patch to dirty replication slots when accessed via sql interface.

Michaël Paquier sent in another revision of a patch to support SCRAM-SHA-256 for passwords, along with supporting syntax.

Tomas Vondra sent in another revision of a patch to allow index-only scans for expressions.

Vik Fearing sent in a patch to add --wait and --no-wait long options to match the shorter corresponding -w and -W in pg_ctl.

Tom Lane sent in another revision of a patch to enable transaction enum alters and renames.

Pavel Stehule sent in another revision of a patch to add xmltable().

Emre Hasegeli sent in another revision of a patch to fix some floating point comparison inconsistencies for the geometric types.

Jim Nasby sent in a patch to fix some formatting in the comments of execnodes.h.

Andreas Karlsson sent in two more revisions of a patch to fix compilation with OpenSSL 1.1.

Amit Kapila sent in another revision of a patch to use granular locking in CLOG.

Victor Wagner sent in another revision of a patch to implement failover on libpq connect level.

Craig Ringer sent in a patch to send catalog_xmin separately in hot standby feedback.

par N Bougain le mercredi 7 septembre 2016 à 20h11

dimanche 4 septembre 2016


Sortie de PostgreSQL 9.6 RC 1

Le groupe de développement de PostgreSQL (PGDG) vient d'annoncer la sortie de la première version candidate de PostgreSQL 9.6. Cette version est théoriquement identique à la version finale qui sortira dans quelques semaines. Elle contient des corrections pour tous les problèmes identifiés lors de la phase de test.

Changements depuis la version Beta 4

PostgreSQL 9.6 RC 1 corrige des bugs découverts par les utilisateurs qui ont testé la Betat 4, notamment :

 * Ajout de fonctions SQL pour inspecter les méthodes d'accès d'index
 * Corrections de bugs sur les index bloom
 * Ajout d'un test de régression pour l'instertion dans les TOAST
 * Meilleure gestion des locales pour les erreurs sur les requêtes parallèles
 * Beaucoup de mises à jour dans la documentation

A partir de la version RC 1, le traitement parallèle des requêtes est désactivé par défaut dans le fichier postgresql.conf. Les utilisateurs qui souhaitent paralleliser leurs requêtes doivent augmenter le paramètre max_parallel_workers_per_gather.

Feuille de route

Cette version est la première candidate de PostgreSQL 9.6. D'autres versions candidates seront mises en ligne jusqu'à ce que tous les bugs remontés soient résolus et que la version 9.6.0 finale soit livrée. Pour plus d'information, rendez-vous sur la page dédiée aux tests sur les versions Beta.


par daamien le dimanche 4 septembre 2016 à 19h09

vendredi 2 septembre 2016


Nouvelles hebdomadaires de PostgreSQL - 28 août 2016

[ndt: MeetUp en cours d'organisation à Nantes.]

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en août

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Noah Misch pushed:

Tom Lane pushed:

  • initdb now needs submake-libpq and submake-libpgfeutils. More fallout from commit a00c58314. Pointed out by Michael Paquier.
  • Refactor some network.c code to create cidr_set_masklen_internal(). Merge several copies of "copy an inet value and adjust the mask length" code to create a single, conveniently C-callable function. This function is exported for future use by inet SPGiST support, but it's good cleanup anyway since we had three slightly-different-for-no-good-reason copies. (Extracted from a larger patch, to separate new code from refactoring of old code) Emre Hasegeli
  • Improve SP-GiST opclass API to better support unlabeled nodes. Previously, the spgSplitTuple action could only create a new upper tuple containing a single labeled node. This made it useless for opclasses that prefer to work with fixed sets of nodes (labeled or otherwise), which meant that restrictive prefixes could not be used with such node definitions. Change the output field set for the choose() method to allow it to specify any valid node set for the new upper tuple, and to specify which of these nodes to place the modified lower tuple in. In addition to its primary use for fixed node sets, this feature could allow existing opclasses that use variable node sets to skip a separate spgAddNode action when splitting a tuple, by setting up the node needed for the incoming value as part of the spgSplitTuple action. However, care would have to be taken to add the extra node only when it would not make the tuple bigger than before. (spgAddNode can enlarge the tuple, spgSplitTuple can't.) This is a prerequisite for an upcoming SP-GiST inet opclass, but is being committed separately to increase the visibility of the API change. In passing, improve the documentation about the traverse-values feature that was added by commit ccd6eb49a. Emre Hasegeli, with cosmetic adjustments and documentation rework by me Discussion: <>
  • Create an SP-GiST opclass for inet/cidr. This seems to offer significantly better search performance than the existing GiST opclass for inet/cidr, at least on data with a wide mix of network mask lengths. (That may suggest that the data splitting heuristics in the GiST opclass could be improved.) Emre Hasegeli, with mostly-cosmetic adjustments by me Discussion: <>
  • Fix network_spgist.c build failures from missing AF_INET definition. AF_INET is apparently defined in something that's pulled in automatically on Linux, but the buildfarm says that's not true everywhere. Comparing to network_gist.c suggests that including <sys/socket.h> ought to fix it, and the POSIX standard concurs.
  • Suppress compiler warnings in non-cassert builds. With Asserts off, these variables are set but never used, resulting in warnings from pickier compilers. Fix that with our standard solution. Per report from Jeff Janes.
  • Fix improper repetition of previous results from a hashed aggregate. ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <>
  • Fix small query-lifespan memory leak in bulk updates. When there is an identifiable REPLICA IDENTITY index on the target table, heap_update leaks the id_attrs bitmapset. That's not many bytes, but it adds up over enough rows, since the code typically runs in a query-lifespan context. Bug introduced in commit e55704d8b, which did a rather poor job of cloning the existing use-pattern for RelationGetIndexAttrBitmap(). Per bug #14293 from Zhou Digoal. Back-patch to 9.4 where the bug was introduced. Report: <>
  • Fix instability in parallel regression tests. Commit f0c7b789a added a test case in case.sql that creates and then drops both an '=' operator and the type it's for. Given the right timing, that can cause a "cache lookup failed for type" failure in concurrent sessions, which see the '=' operator as a potential match for '=' in a query, but then the type is gone by the time they inquire into its properties. It might be nice to make that behavior more robust someday, but as a back-patchable solution, adjust the new test case so that the operator is never visible to other sessions. Like the previous commit, back-patch to all supported branches. Discussion: <>
  • Fix logic for adding "parallel worker" context line to worker errors. The previous coding here was capable of adding a "parallel worker" context line to errors that were not, in fact, returned from a parallel worker. Instead of using an errcontext callback to add that annotation, just paste it onto the message by hand; this looks uglier but is more reliable. Discussion: <>
  • Fix assorted small bugs in ThrowErrorData(). Copy the palloc'd strings into the correct context, ie ErrorContext not wherever the source ErrorData is. This would be a large bug, except that it appears that all catchers of thrown errors do either EmitErrorReport or CopyErrorData before doing anything that would cause transient memory contexts to be cleaned up. Still, it's wrong and it will bite somebody someday. Fix failure to copy cursorpos and internalpos. Utter the appropriate incantations involving recursion_depth, so that we'll behave sanely if we get an error inside pstrdup. (In general, the body of this function ought to act like, eg, errdetail().) Per code reading induced by Jakob Egger's report.
  • Put static forward declarations in elog.c back into same order as code. The guiding principle for the last few patches in this area apparently involved throwing darts. Cosmetic only, but back-patch to 9.6 because there is no reason for 9.6 and HEAD to diverge yet in this file.
  • Fix potential memory leakage from HandleParallelMessages(). HandleParallelMessages leaked memory into the caller's context. Since it's called from ProcessInterrupts, there is basically zero certainty as to what CurrentMemoryContext is, which means we could be leaking into long-lived contexts. Over the processing of many worker messages that would grow to be a problem. Things could be even worse than just a leak, if we happened to service the interrupt while ErrorContext is current: elog.c thinks it can reset that on its own whim, possibly yanking storage out from under HandleParallelMessages. Give HandleParallelMessages its own dedicated context instead, which we can reset during each call to ensure there's no accumulation of wasted memory. Discussion: <>
  • Add a nonlocalized version of the severity field to client error messages. This has been requested a few times, but the use-case for it was never entirely clear. The reason for adding it now is that transmission of error reports from parallel workers fails when NLS is active, because pq_parse_errornotice() wrongly assumes that the existing severity field is nonlocalized. There are other ways we could have fixed that, but the other options were basically kluges, whereas this way provides something that's at least arguably a useful feature along with the bug fix. Per report from Jakob Egger. Back-patch into 9.6, because otherwise parallel query is essentially unusable in non-English locales. The problem exists in 9.5 as well, but we don't want to risk changing on-the-wire behavior in 9.5 (even though the possibility of new error fields is specifically called out in the protocol document). It may be sufficient to leave the issue unfixed in 9.5, given the very limited usefulness of pq_parse_errornotice in that version. Discussion: <>
  • Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <>
  • Update 9.6 release notes through today.

Peter Eisentraut pushed:

Robert Haas pushed:

Bruce Momjian pushed:

Kevin Grittner pushed:

Heikki Linnakangas pushed:

  • Support OID system column in postgres_fdw. You can use ALTER FOREIGN TABLE SET WITH OIDS on a foreign table, but the oid column read out as zeros, because the postgres_fdw didn't know about it. Teach postgres_fdw how to fetch it. Etsuro Fujita, with an additional test case by me. Discussion: <>

Correctifs en attente

Michaël Paquier sent in another revision of a patch to avoid crashes in pg_basebackup.

Michaël Paquier sent in a patch to clean up parts of the make system.

Heikki Linnakangas sent in a patch to fix a race condition in GetOldestActiveTransactionId().

Tom Lane sent in a patch to arrange better locale-specific-character-class handling for regexps.

Michaël Paquier sent in a patch to disallow \r and \n characters in database and role names.

Craig Ringer sent in another revision of a patch to add pipelining (batch) support for libpq.

Gabriele Bartolini, Michaël Paquier, and Simon Riggs traded patches to fix an issue where pg_receivexlog does not report flush position with --synchronous.

Thomas Munro sent in a patch to fix an error in Linux's fallocate() that could a bus error.

Amit Kapila and Thomas Munro traded patches to add dsm_unpin_segment().

Craig Ringer sent in two more revisions of a patch to add txid_status(bigint) for transaction traceability.

Heikki Linnakangas sent in another revision of a patch to add CSN-based snapshots.

Ashutosh Sharma sent in another revision of a patch to add some tests for hash index coverage.

Amit Kapila sent in two more revisions of a patch to add WAL logging for hash indexes.

Aleksander Alekseev and Heikki Linnakangas traded patches to fix the RBTree iteration interface.

Kuntal Ghosh sent in two more revisions of a patch to add a WAL consistency check facility.

Michaël Paquier sent in another revision of a patch to track wait events for latches.

Fabien COELHO sent in a patch to fix pgbench stats when using \sleep.

Stephen Frost sent in another revision of a patch to remove all superuser() checks from pgstattuple.

Masahiko Sawada sent in a WIP patch to add block level parallel vacuum.

Michaël Paquier sent in two more revisions of a patch to add LSN as a recovery target.

Pavel Stehule sent in two more revisions of a patch to add xmltable().

Martín Marqués and Michaël Paquier traded patches to fix some issues where pg_dump would dump objects created by extensions separately from the extensions themselves.

Tom Lane sent in a patch to wire into the planner the assumption that VALUES() provided are distinct.

Dagfinn Ilmari Mannsåker sent in another revision of a patch to add ALTER TYPE RENAME VALUE.

Andrew Gierth sent in a patch to fix an issue with some LATERAL queries.

Masahiko Sawada sent in another revision of a patch to make lazy_scan_heap faster.

Artur Zakirov sent in two more revisions of a patch to fix a bug in to_timestamp().

Petr Jelinek sent in another revision of a patch to implement logical replication.

Takayuki Tsunakawa sent in a patch to fix an issue that could cause cascaded replicas to stick.

Mithun Cy sent in another revision of a patch to implement failover at the libpq connect level.

Masahiko Sawada sent in a patch to support 2PC with foreign servers and use that infrastructure to update the PostgreSQL FDW.

Amit Langote sent in another revision of a patch to add declarative partitioning.

Dilip Kumar sent in another revision of a patch to fix a cache lookup failure.

Brandur sent in a patch to add SortSupport for Postgres' macaddr.

Tom Lane sent in a patch to add a nonlocalized error severity diagnostic.

Andreas Karlsson sent in a patch to remove OpenSSL < 0.9.8 on the grounds that it's no longer supported even by OpenSSL, and supporting it breaks newer versions.

Andres Freund sent in another revision of a patch to implement targetlist SRFs using ROWS FROM().

Joe Conway sent in another revision of a patch to add some RLS-related docs.

par N Bougain le vendredi 2 septembre 2016 à 00h29

mardi 23 août 2016


Nouvelles hebdomadaires de PostgreSQL - 21 août 2016

Offres d'emplois autour de PostgreSQL en août

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. . La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Tom Lane pushed:

  • Simplify the process of perltidy'ing our Perl files. Wrap the perltidy invocation into a shell script to reduce the risk of copy-and-paste errors. Include removal of *.bak files in the script, so they don't accidentally get committed. Improve the directions in the README file.
  • Final pgindent + perltidy run for 9.6.
  • Stamp HEAD as 10devel. This is a good bit more complicated than the average new-version stamping commit, because it includes various adjustments in pursuit of changing from three-part to two-part version numbers. It's likely some further work will be needed around that change; but this is enough to get through the regression tests, at least in Unix builds. Peter Eisentraut and Tom Lane
  • Stamp shared-library minor version numbers for v10.
  • Update git_changelog to know that there's a 9.6 branch. Missed this in the main 10devel version stamping patch.
  • Allow .so minor version numbers above 9 in .gitignore. Needed now that's minor version has reached 10.
  • Doc: remove out-of-date claim that pg_am rows must be inserted by hand. Commit 473b93287 added a sentence about that, but neglected to remove the adjacent sentence it had falsified. Per Alexander Law.
  • Doc: copy-editing in create_access_method.sgml. Improve shaky English grammar. And markup.
  • Remove separate version numbering for ecpg preprocessor. Once upon a time, it made sense for the ecpg preprocessor to have its own version number, because it used a manually-maintained grammar that wasn't always in sync with the core grammar. But those days are thankfully long gone, leaving only a maintenance nuisance behind. Let's use the PG v10 version numbering changeover as an excuse to get rid of the ecpg version number and just have ecpg identify itself by PG_VERSION. From the user's standpoint, ecpg will go from "4.12" in the 9.6 branch to "10" in the 10 branch, so there's no failure of monotonicity. Discussion: <>
  • Automate the maintenance of SO_MINOR_VERSION for our shared libraries. Up to now we've manually adjusted these numbers in several different Makefiles at the start of each development cycle. While that's not much work, it's easily forgotten, so let's get rid of it by setting the SO_MINOR_VERSION values directly from $(MAJORVERSION). In the case of libpq, this dev cycle's value of SO_MINOR_VERSION happens to be "10" anyway, so this switch is transparent. For ecpg's shared libraries, this will result in skipping one or two minor version numbers between v9.6 and v10, which seems like no big problem; and it was a bit inconsistent that they didn't have equal minor version numbers anyway. Discussion: <>
  • Fix assorted places in psql to print version numbers >= 10 in new style. This is somewhat cosmetic, since as long as you know what you are looking at, "10.0" is a serviceable substitute for "10". But there is a potential for confusion between version numbers with minor numbers and those without --- we don't want people asking "why is psql saying 10.0 when my server is 10.2". Therefore, back-patch as far as practical, which turns out to be 9.3. I could have redone the patch to use fprintf(stderr) in place of psql_error(), but it seems more work than is warranted for branches that will be EOL or nearly so by the time v10 comes out. Although only psql seems to contain any code that needs this, I chose to put the support function into fe_utils, since it seems likely we'll need it in other client programs in future. (In 9.3-9.5, use dumputils.c, the predecessor of fe_utils/string_utils.c.) In HEAD, also fix the backend code that whines about loadable-library version mismatch. I don't see much need to back-patch that.
  • Suppress -Wunused-result warning for strtol(). I'm not sure which bozo thought it's a problem to use strtol() only for its endptr result, but silence the warning using same method used elsewhere. Report: <>
  • Improve parsetree representation of special functions such as CURRENT_DATE. We implement a dozen or so parameterless functions that the SQL standard defines special syntax for. Up to now, that was done by converting them into more or less ad-hoc constructs such as "'now'::text::date". That's messy for multiple reasons: it exposes what should be implementation details to users, and performance is worse than it needs to be in several cases. To improve matters, invent a new expression node type SQLValueFunction that can represent any of these parameterless functions. Bump catversion because this changes stored parsetrees for rules. Discussion: <>
  • Improve plpgsql's memory management to fix some function-lifespan leaks. In some cases, exiting out of a plpgsql statement due to an error, then catching the error in a surrounding exception block, led to leakage of temporary data the statement was working with, because we kept all such data in the function-lifespan SPI Proc context. Iterating such behavior many times within one function call thus led to noticeable memory bloat. To fix, create an additional memory context meant to have statement lifespan. Since many plpgsql statements, particularly the simpler/more common ones, don't need this, create it only on demand. Reset this context at the end of any statement that uses it, and arrange for exception cleanup to reset it too, thereby fixing the memory-leak issue. Allow a stack of such contexts to exist to handle cases where a compound statement needs statement-lifespan data that persists across calls of inner statements. While at it, clean up code and improve comments referring to the existing short-term memory context, which by plpgsql convention is the per-tuple context of the eval_econtext ExprContext. We now uniformly refer to that as the eval_mcontext, whereas the new statement-lifespan memory contexts are called stmt_mcontext. This change adds some context-creation overhead, but on the other hand it allows removal of some retail pfree's in favor of context resets. On balance it seems to be about a wash performance-wise. In principle this is a bug fix, but it seems too invasive for a back-patch, and the infrequency of complaints weighs against taking the risk in the back branches. So we'll fix it only in HEAD, at least for now. Tom Lane, reviewed by Pavel Stehule Discussion: <>
  • Fix -e option in contrib/intarray/bench/ As implemented, -e ran an EXPLAIN but then discarded the output, which certainly seems pointless. Make it print to stdout instead. It's been like that forever, so back-patch to all supported branches. Daniel Gustafsson, reviewed by Andreas Scherbaum. Patch: <>
  • Implement regexp_match(), a simplified alternative to regexp_matches(). regexp_match() is like regexp_matches(), but it disallows the 'g' flag and in consequence does not need to return a set. Instead, it returns a simple text array value, or NULL if there's no match. Previously people usually got that behavior with a sub-select, but this way is considerably more efficient. Documentation adjusted so that regexp_match() is presented first and then regexp_matches() is introduced as a more complicated version. This is a bit historically revisionist but seems pedagogically better. Still TODO: extend contrib/citext to support this function. Emre Hasegeli, reviewed by David Johnston Discussion: <>
  • Support the new regexp_match() function for citext. Emre Hasegeli Patch: <>
  • Improve psql's tab completion for ALTER EXTENSION foo UPDATE ... Offer a list of available versions for that extension. Formerly, since there was no special support for this, it triggered off the UPDATE keyword and offered a list of table names --- not too helpful. Jeff Janes, reviewed by Gerdan Santos Patch: <>
  • Improve psql's tab completion for \l. Offer a list of database names; formerly no help was offered. Ian Barwick, reviewed by Gerdan Santos Patch: <>
  • In plpgsql, don't try to convert int2vector or oidvector to expanded array. These types are storage-compatible with real arrays, but they don't support toasting, so of course they can't support expansion either. Per bug #14289 from Michael Overmeyer. Back-patch to 9.5 where expanded arrays were introduced. Report: <>
  • Update line count totals for psql help displays. As usual, we've been pretty awful about maintaining these counts. They're not all that critical, perhaps, but let's get them right at release time. Also fix 9.5, which I notice is just as bad. It's probably wrong further back, but the lack of --help=foo options before 9.5 makes it too painful to count.
  • Remove typedef celt from the regex library, along with macro NOCELT. The regex library used to have a notion of a "collating element" that was distinct from a "character", but Henry Spencer never actually implemented his planned support for multi-character collating elements, and the Tcl crew ripped out most of the stubs for that years ago. The only thing left that distinguished the "celt" typedef from the "chr" typedef was that "celt" was supposed to also be able to hold the not-a-character "NOCELT" value. However, NOCELT was not used anywhere after the MCCE stub removal changes, which means there's no need for celt to be different from chr. Removing the separate typedef simplifies matters and also removes a trap for the unwary, in that celt is signed while chr may not be, so comparisons could mean different things. There's no bug there today because we restrict CHR_MAX to be less than INT_MAX, but I think there may have been such bugs before we did that, and there could be again if anyone ever decides to fool with the range of chr. This patch also removes assorted unnecessary casts to "chr" of values that are already chrs. Many of these seem to be leftover from days when the code was compatible with pre-ANSI C.
  • Clean up another pre-ANSI-C-ism in regex code: get rid of pcolor typedef. pcolor was used to represent function arguments that are nominally of type color, but when using a pre-ANSI C compiler would be passed as the promoted integer type. We really don't need that anymore.
  • Speed up planner's scanning for parallel-query hazards. We need to scan the whole parse tree for parallel-unsafe functions. If there are none, we'll later need to determine whether particular subtrees contain any parallel-restricted functions. The previous coding retained no knowledge from the first scan, even though this is very wasteful in the common case where the query contains only parallel-safe functions. We can bypass all of the later scans by remembering that fact. This provides a small but measurable speed improvement when the case applies, and shouldn't cost anything when it doesn't. Patch by me, reviewed by Robert Haas Discussion: <>
  • Guard against parallel-restricted functions in VALUES expressions. Obvious brain fade in set_rel_consider_parallel(). Noticed it while adjusting the adjacent RTE_FUNCTION case. In 9.6, also make the code look more like what I just did in HEAD by removing the unnecessary function_rte_parallel_ok subroutine (it does nothing that expression_tree_walker wouldn't do).
  • Use LEFT JOINs in some system views in case referenced row doesn't exist. In particular, left join to pg_authid so that rows in pg_stat_activity don't disappear if the session's owning user has been dropped. Also convert a few joins to pg_database to left joins, in the same spirit, though that case might be harder to hit. We were doing this in other views already, so it was a bit inconsistent that these views didn't. Oskari Saarenmaa, with some further tweaking by me Discussion: <>
  • Allow empty queries in pgbench. This might have been too much of a foot-gun before 9.6, but with the new commands-end-at-semicolons parsing rule, the only way to get an empty query into a script is to explicitly write an extra ";". So we may as well allow the case. Fabien Coelho Patch: <alpine.DEB.2.20.1607090922170.3412@sto>
  • Make initdb's suggested "pg_ctl start" command line more reliable. The original coding here was not nearly careful enough about quoting special characters, and it didn't get corner cases right for constructing the pg_ctl path either. Use join_path_components() and appendShellString() to do it honestly, so that the string will more likely work if blindly copied-and-pasted. While at it, teach appendShellString() not to quote strings that clearly don't need it, so that the output from initdb doesn't become uglier than it was before in typical cases where quoting is not needed. Ryan Murphy, reviewed by Michael Paquier and myself Discussion: <>
  • initdb now needs to reference libpq include files in MSVC builds. Fallout from commit a00c58314. Per buildfarm.

Robert Haas pushed:

  • Once again allow LWLocks to be used within DSM segments. Prior to commit 7882c3b0b95640e361f1533fe0f2d02e4e5d8610, it was possible to use LWLocks within DSM segments, but that commit broke this use case by switching from a doubly linked list to a circular linked list. Switch back, using a new bit of general infrastructure for maintaining lists of PGPROCs. Thomas Munro, reviewed by me.
  • Fix possible crash due to incorrect allocation context. Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 aimed to reduce leakage from tqueue.c, which is good. Unfortunately, by changing the memory context in which all of gather_readnext() executes, it also changed the context in which ExecShutdownGatherWorkers executes, which is not good, because that function eventually causes a call to ExecParallelRetrieveInstrumentation, which proceeds to allocate planstate->worker_instrument in a short-lived context, causing a crash. Rushabh Lathia, reviewed by Amit Kapila and by me.

Peter Eisentraut pushed:

  • Fix typos. From: Alexander Law <>
  • doc: Remove some confusion from pg_archivecleanup doc. From: Jeff Janes <>
  • Improve formatting of comments in plpgsql.h. This file had some unusual comment layout. Most of the comments introducing structs ended up to the right of the screen and following the start of the struct. Some comments for struct members ended up after the member definition. Fix that by moving comments consistently before what they are describing. Also add missing struct tags where missing so that it is easier to tell what the struct is.
  • doc: Speed up XSLT builds. The upstream XSLT stylesheets use some very general XPath expressions in some places that end up being very slow. We can optimize them with knowledge about the DocBook document structure and our particular use thereof. For example, when counting preceding chapters to get a number for the current chapter, we only need to count preceding sibling nodes (more or less) instead of searching through the entire node tree for chapter elements. This change attacks the slowest pieces as identified by xsltproc --profile. This makes the HTML build roughly 10 times faster, resulting in the new total build time being about the same as the old DSSSL-based build. Some of the non-HTML build targets (especially FO) will also benefit a bit, but they have not been specifically analyzed. With this, also remove the parameter, which was previously a hack to get the build to a manageable speed. Alexander Lakhin <>, with some additional tweaking by me
  • Remove obsolete replacement system() on darwin. Per comment in the file, this was fixed around OS X 10.2.

Bruce Momjian pushed:

Magnus Hagander pushed:

Andres Freund pushed:

  • Properly re-initialize replication slot shared memory upon creation. Slot creation did not clear all fields upon creation. After start the memory is zeroed, but when a physical replication slot was created in the shared memory of a previously existing logical slot, catalog_xmin would not be cleared. That in turn would prevent vacuum from doing its duties. To fix initialize all the fields. To make similar future bugs less likely, zero all of ReplicationSlotPersistentData, and re-order the rest of the initialization to be in struct member order. Analysis: Andrew Gierth. Reported-By: Author: Michael Paquier. Discussion: <>. Backpatch: 9.4, where replication slots were introduced
  • Fix deletion of speculatively inserted TOAST on conflict. INSERT .. ON CONFLICT runs a pre-check of the possible conflicting constraints before performing the actual speculative insertion. In case the inserted tuple included TOASTed columns the ON CONFLICT condition would be handled correctly in case the conflict was caught by the pre-check, but if two transactions entered the speculative insertion phase at the same time, one would have to re-try, and the code for aborting a speculative insertion did not handle deleting the speculatively inserted TOAST datums correctly. TOAST deletion would fail with "ERROR: attempted to delete invisible tuple" as we attempted to remove the TOAST tuples using simple_heap_delete which reasoned that the given tuples should not be visible to the command that wrote them. This commit updates the heap_abort_speculative() function which aborts the conflicting tuple to use itself, via toast_delete, for deleting associated TOAST datums. Like before, the inserted toast rows are not marked as being speculative. This commit also adds a isolationtester spec test, exercising the relevant code path. Unfortunately 9.5 cannot handle two waiting sessions, and thus cannot execute this test. Reported-By: Viren Negi, Oskari Saarenmaa. Author: Oskari Saarenmaa, edited a bit by me. Bug: #14150. Discussion: <>. Backpatch: 9.5, where ON CONFLICT was introduced
  • Add alternative output for ON CONFLICT toast isolation test. On some buildfarm animals the isolationtest added in 07ef0351 failed, as the order in which processes are run after unlocking is not guaranteed. Add an alternative output for that. Discussion: <> Backpatch: 9.6, like the test in the aforementioned commit

Heikki Linnakangas pushed:

  • Refactor sendAuthRequest. This way sendAuthRequest doesn't need to know the details of all the different authentication methods. This is in preparation for adding SCRAM authentication, which will add yet another authentication request message type, with different payload. Reviewed-By: Michael Paquier. Discussion: <>
  • Refactor RandomSalt to handle salts of different lengths. All we need is 4 bytes at the moment, for MD5 authentication. But in upcomint patches for SCRAM authentication, SCRAM will need a salt of different length. It's less scary for the caller to pass the buffer length anyway, than assume a certain-sized output buffer. Author: Michael Paquier. Discussion: <>

Álvaro Herrera pushed:

  • reorderbuffer: preserve errno while reporting error. Clobbering errno during cleanup after an error is an oft-repeated, easy to make mistake. Deal with it here as everywhere else, by saving it aside and restoring after cleanup, before ereport'ing. In passing, add a missing errcode declaration in another ereport() call in the same file, which I noticed while skimming the file looking for similar problems. Backpatch to 9.4, where this code was introduced.

Correctifs en attente

Thomas Munro sent in another revision of a patch to add condition variables.

Haribabu Kommi sent in another revision of a patch to add a pg_hba_file_settings view.

Artur Zakirov sent in a patch to add a to_date_valid() function.

Peter Geoghegan sent in another revision of a patch to displace heap's root during tuplesort merge.

Ildar Musin sent in a patch to add index-only scans for expressions.

Anastasia Lubennikova sent in another revision of a patch to add covering + unique indexes.

Thomas Munro sent in two more revisions of a patch to add barriers.

David Steele sent in a patch to exclude additional directories in pg_basebackup.

Haribabu Kommi sent in a patch to add a function to control the number of parallel workers that can be generated for a parallel query, and a contrib extension to use same to consider system load while generating parallel workers.

Peter Eisentraut sent in another revision of a patch to set log_line_prefix and application name in test drivers.

Peter Eisentraut sent in another revision of a patch to add NEXT VALUE FOR.

Peter Eisentraut sent in a patch to run the select_parallel test by itself.

Jeff Janes sent in a patch to mention that hash indexes can be used in uniqueness constraints via EXCLUDE constraints.

Craig Ringer sent in a patch to add a function to get the 32-bit xid from a bigint extended xid-with-epoch.

Oskari Saarenmaa sent in two revisions of a patch to use pread and pwrite instead of lseek + write and read.

Artur Zakirov sent in another revision of a patch to add to_timestamp() format checking.

Claudio Freire sent in a patch to keep indexes sorted by heap physical location.

Andrew Borodin sent in another revision of a patch to optimize memmoves in gistplacetopage for fixed size updates.

Emre Hasegeli sent in a patch to add a regexp_match() returning text for citext.

Andres Freund sent in another revision of a patch to improve scalability of md.c for large relations and speed up PageIsVerified() for the all zeroes case.

Amit Langote and Robert Haas traded patches to fix the ALTER TABLE docs to mention that VALIDATE CONSTRAINT will fail if ONLY is specified and there are descendant tables.

Michaël Paquier sent in another revision of a patch to add checks for the case when malloc() returns NULL.

Michaël Paquier sent in a patch to move the md5 and ip implementations to src/common, where they're usable by more things.

Peter Eisentraut sent in a patch to allow forcing pg_basebackup to clean created directories on failure.

Bruce Momjian sent in two revisions of a patch to make pg_hba.conf case-insensitive.

Alexander Korotkov sent in a patch to cacheline align PGXACT.

Dagfinn Ilmari Mannsåker sent in a patch to add ALTER TYPE ... RENAME VALUE for enums.

Thomas Munro sent in a patch to add a new subsystem: Dynamic Shared Memory Areas.

Pavel Stehule sent in another revision of a patch to add xmltable().

Etsuro Fujita sent in a patch to push down more full joins in postgres_fdw.

Peter Eisentraut sent in a patch to make better use of existing enums in plpgsql.

Peter Geoghegan sent in a patch to fix a bug in abbreviated keys abort handling, that bug having been found with amcheck.

Adrien Nayrat and Michaël Paquier traded patches to add LSN as a recovery target.

Aleksander Alekseev sent in another revision of a patch to make PostgreSQL sanitizers-friendly and prevent information disclosure.

Haribabu Kommi sent in a POC patch based on ProcessUtilityHook to restrict the CREATE TRIGGER, CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, CREATE VIEW and CREATE POLICY commands from normal users.

Craig Ringer sent in two revisions of a patch to implement a txid_status(bigint) function to report the commit status of a function.

Petr Jelinek sent in another revision of a patch to implement logical replication in core.

Thomas Munro sent in another revision of a patch to implement dsm_unpin_segment().

par N Bougain le mardi 23 août 2016 à 22h37

mercredi 17 août 2016


Nouvelles hebdomadaires de PostgreSQL - 14 août 2016

Les mises à jour de sécurité 9.5.4, 9.4.9, 9.3.14, 9.2.18 et 9.1.23 ont été publiées. Mettez à jour dès que possible !

PostgreSQL 9.6 Beta 4 disponible. À vos tests !

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en août

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

  • Fix TOAST access failure in RETURNING queries. Discussion of commit 3e2f3c2e4 exposed a problem that is of longer standing: since we don't detoast data while sticking it into a portal's holdStore for PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT queries, and we release the query's snapshot as soon as we're done loading the holdStore, later readout of the holdStore can do TOAST fetches against data that can no longer be seen by any of the session's live snapshots. This means that a concurrent VACUUM could remove the TOAST data before we can fetch it. Commit 3e2f3c2e4 exposed the problem by showing that sometimes we had *no* live snapshots while fetching TOAST data, but we'd be at risk anyway. I believe this code was all right when written, because our management of a session's exposed xmin was such that the TOAST references were safe until end of transaction. But that's no longer true now that we can advance or clear our PGXACT.xmin intra-transaction. To fix, copy the query's snapshot during FillPortalStore() and save it in the Portal; release it only when the portal is dropped. This essentially implements a policy that we must hold a relevant snapshot whenever we access potentially-toasted data. We had already come to that conclusion in other places, cf commits 08e261cbc94ce9a7 and ec543db77b6b72f2. I'd have liked to add a regression test case for this, but I didn't see a way to make one that's not unreasonably bloated; it seems to require returning a toasted value to the client, and those will be big. In passing, improve PortalRunUtility() so that it positively verifies that its ending PopActiveSnapshot() call will pop the expected snapshot, removing a rather shaky assumption about which utility commands might do their own PopActiveSnapshot(). There's no known bug here, but now that we're actively referencing the snapshot it's almost free to make this code a bit more bulletproof. We might want to consider back-patching something like this into older branches, but it would be prudent to let it prove itself more in HEAD beforehand. Discussion: <>
  • Fix crash when pg_get_viewdef_name_ext() is passed a non-view relation. Oversight in commit 976b24fb4. Andreas Seltenreich Report: <>
  • Fix misestimation of n_distinct for a nearly-unique column with many nulls. If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <>
  • Release notes for 9.5.4, 9.4.9, 9.3.14, 9.2.18, 9.1.23.
  • Update 9.6 release notes through today.
  • Update 9.6 release notes through today.
  • Stamp 9.6beta4.
  • Doc: clarify description of CREATE/ALTER FUNCTION ... SET FROM CURRENT. Per discussion with David Johnston.
  • Doc: write some for adminpack. Previous contents of adminpack.sgml were rather far short of project norms. Not to mention being outright wrong about the signature of pg_file_read().
  • Fix busted Assert for CREATE MATVIEW ... WITH NO DATA. Commit 874fe3aea changed the command tag returned for CREATE MATVIEW/CREATE TABLE AS ... WITH NO DATA, but missed that there was code in spi.c that expected the command tag to always be "SELECT". Fortunately, the consequence was only an Assert failure, so this oversight should have no impact in production builds. Since this code path was evidently un-exercised, add a regression test. Per report from Shivam Saxena. Back-patch to 9.3, like the previous commit. Michael Paquier Report: <>
  • Trivial cosmetic cleanup in bloom/blutils.c. Don't spell "InvalidOid" as "0". Initialize method fields in the same order as amapi.h declares them (and every other AM handler initializes them).
  • Last-minute updates for release notes. Security: CVE-2016-5423, CVE-2016-5424
  • Fix two errors with nested CASE/WHEN constructs. ExecEvalCase() tried to save a cycle or two by passing &econtext->caseValue_isNull as the isNull argument to its sub-evaluation of the CASE value expression. If that subexpression itself contained a CASE, then *isNull was an alias for econtext->caseValue_isNull within the recursive call of ExecEvalCase(), leading to confusion about whether the inner call's caseValue was null or not. In the worst case this could lead to a core dump due to dereferencing a null pointer. Fix by not assigning to the global variable until control comes back from the subexpression. Also, avoid using the passed-in isNull pointer transiently for evaluation of WHEN expressions. (Either one of these changes would have been sufficient to fix the known misbehavior, but it's clear now that each of these choices was in itself dangerous coding practice and best avoided. There do not seem to be any similar hazards elsewhere in execQual.c.) Also, it was possible for inlining of a SQL function that implements the equality operator used for a CASE comparison to result in one CASE expression's CaseTestExpr node being inserted inside another CASE expression. This would certainly result in wrong answers since the improperly nested CaseTestExpr would be caused to return the inner CASE's comparison value not the outer's. If the CASE values were of different data types, a crash might result; moreover such situations could be abused to allow disclosure of portions of server memory. To fix, teach inline_function to check for "bare" CaseTestExpr nodes in the arguments of a function to be inlined, and avoid inlining if there are any. Heikki Linnakangas, Michael Paquier, Tom Lane Report: Report: <> Security: CVE-2016-5423
  • Fix inappropriate printing of never-measured times in EXPLAIN. EXPLAIN (ANALYZE, TIMING OFF) would print an elapsed time of zero for a trigger function, because no measurement has been taken but it printed the field anyway. This isn't what EXPLAIN does elsewhere, so suppress it. In the same vein, EXPLAIN (ANALYZE, BUFFERS) with non-text output format would print buffer I/O timing numbers even when no measurement has been taken because track_io_timing is off. That seems not per policy, either, so change it. Back-patch to 9.2 where these features were introduced. Maksim Milyutin Discussion: <>
  • Doc: clarify that DROP ... CASCADE is recursive. Apparently that's not obvious to everybody, so let's belabor the point. In passing, document that DROP POLICY has CASCADE/RESTRICT options (which it does, per gram.y) but they do nothing (I assume, anyway). Also update some long-obsolete commentary in gram.y. Discussion: <>
  • Add SQL-accessible functions for inspecting index AM properties. Per discussion, we should provide such functions to replace the lost ability to discover AM properties by inspecting pg_am (cf commit 65c5fcd35). The added functionality is also meant to displace any code that was looking directly at pg_index.indoption, since we'd rather not believe that the bit meanings in that field are part of any client API contract. As future-proofing, define the SQL API to not assume that properties that are currently AM-wide or index-wide will remain so unless they logically must be; instead, expose them only when inquiring about a specific index or even specific index column. Also provide the ability for an index AM to override the behavior. In passing, document pg_am.amtype, overlooked in commit 473b93287. Andrew Gierth, with kibitzing by me and others Discussion: <>
  • Fix assorted bugs in contrib/bloom. In blinsert(), cope with the possibility that a page we pull from the notFullPage list is marked BLOOM_DELETED. This could happen if VACUUM recently marked it deleted but hasn't (yet) updated the metapage. We can re-use such a page safely, but we *must* reinitialize it so that it's no longer marked deleted. Fix blvacuum() so that it updates the notFullPage list even if it's going to update it to empty. The previous "optimization" of skipping the update seems pretty dubious, since it means that the next blinsert() will uselessly visit whatever pages we left in the list. Uniformly treat PageIsNew pages the same as deleted pages. This should allow proper recovery if a crash occurs just after relation extension. Properly use vacuum_delay_point, not assorted ad-hoc CHECK_FOR_INTERRUPTS calls, in the blvacuum() main loop. Fix broken tuple-counting logic: blvacuum.c counted the number of live index tuples over again in each scan, leading to VACUUM VERBOSE reporting some multiple of the actual number of surviving index tuples after any vacuum that removed any tuples (since they'd be counted in blvacuum, maybe more than once, and then again in blvacuumcleanup, without ever zeroing the counter). It's sufficient to count them in blvacuumcleanup. stats->estimated_count is a boolean, not a counter, and we don't want to set it true, so don't add tuple counts to it. Add a couple of Asserts that we don't overrun available space on a bloom page. I don't think there's any bug there today, but the way the FreeBlockNumberArray size calculation is set up is scarily fragile, and BloomPageGetFreeSpace isn't much better. The Asserts should help catch any future mistakes. Per investigation of a report from Jeff Janes. I think the first item above may explain his report; the other changes were things I noticed while casting about for an explanation. Report: <>
  • Remove bogus dependencies on NUMERIC_MAX_PRECISION. NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision and scale you can write in a numeric typmod. It might once have had something to do with the allowed range of a typmod-less numeric value, but at least since 9.1 we've allowed, and documented that we allowed, any value that would physically fit in the numeric storage format; which is something over 100000 decimal digits, not 1000. Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit on the allowed range of the exponent in scientific-format input. That was especially silly in view of the fact that you can enter larger numbers as long as you don't use 'e' to do it. Just constrain the value enough to avoid localized overflow, and let make_result be the final arbiter of what is too large. Likewise adjust ecpg's equivalent of this code. Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the number of base-NBASE digits it would accept. That created a dump/restore hazard for binary COPY without doing anything useful; the wire-format limit on number of digits (65535) is about as tight as we would want. In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what the numeric range limit is. That code doesn't exist in the back branches. Per gripe from Aravind Kumar. Back-patch to all supported branches, since they all contain the documentation claim about allowed range of NUMERIC (cf commit cabf5d84b). Discussion: <>

Peter Eisentraut pushed:

Bruce Momjian pushed:

Noah Misch pushed:

  • Promote pg_dumpall shell/connstr quoting functions to src/fe_utils. Rename these newly-extern functions with terms more typical of their new neighbors. No functional changes; a subsequent commit will use them in more places. Back-patch to 9.1 (all supported versions). Back branches lack src/fe_utils, so instead rename the functions in place; the subsequent commit will copy them into the other programs using them. Security: CVE-2016-5424
  • Sort out paired double quotes in \connect, \password and \crosstabview. In arguments, these meta-commands wrongly treated each pair as closing the double quoted string. Make the behavior match the documentation. This is a compatibility break, but I more expect to find software with untested reliance on the documented behavior than software reliant on today's behavior. Back-patch to 9.1 (all supported versions). Reviewed by Tom Lane and Peter Eisentraut. Security: CVE-2016-5424
  • Reject, in pg_dumpall, names containing CR or LF. These characters prematurely terminate Windows shell command processing, causing the shell to execute a prefix of the intended command. The chief alternative to rejecting these characters was to bypass the Windows shell with CreateProcess(), but the ability to use such names has little value. Back-patch to 9.1 (all supported versions). This change formally revokes support for these characters in database names and roles names. Don't document this; the error message is self-explanatory, and too few users would benefit. A future major release may forbid creation of databases and roles so named. For now, check only at known weak points in pg_dumpall. Future commits will, without notice, reject affected names from other frontend programs. Also extend the restriction to pg_dumpall --dbname=CONNSTR arguments and --file arguments. Unlike the effects on role name arguments and database names, this does not reflect a broad policy change. A migration to CreateProcess() could lift these two restrictions. Reviewed by Peter Eisentraut. Security: CVE-2016-5424
  • Field conninfo strings throughout src/bin/scripts. These programs nominally accepted conninfo strings, but they would proceed to use the original dbname parameter as though it were an unadorned database name. This caused "reindexdb dbname=foo" to issue an SQL command that always failed, and other programs printed a conninfo string in error messages that purported to print a database name. Fix both problems by using PQdb() to retrieve actual database names. Continue to print the full conninfo string when reporting a connection failure. It is informative there, and if the database name is the sole problem, the server-side error message will include the name. Beyond those user-visible fixes, this allows a subsequent commit to synthesize and use conninfo strings without that implementation detail leaking into messages. As a side effect, the "vacuuming database" message now appears after, not before, the connection attempt. Back-patch to 9.1 (all supported versions). Reviewed by Michael Paquier and Peter Eisentraut. Security: CVE-2016-5424
  • Fix Windows shell argument quoting. The incorrect quoting may have permitted arbitrary command execution. At a minimum, it gave broader control over the command line to actors supposed to have control over a single argument. Back-patch to 9.1 (all supported versions). Security: CVE-2016-5424
  • Obstruct shell, SQL, and conninfo injection via database and role names. Due to simplistic quoting and confusion of database names with conninfo strings, roles with the CREATEDB or CREATEROLE option could escalate to superuser privileges when a superuser next ran certain maintenance commands. The new coding rule for PQconnectdbParams() calls, documented at conninfo_array_parse(), is to pass expand_dbname=true and wrap literal database names in a trivial connection string. Escape zero-length values in appendConnStrVal(). Back-patch to 9.1 (all supported versions). Nathan Bossart, Michael Paquier, and Noah Misch. Reviewed by Peter Eisentraut. Reported by Nathan Bossart. Security: CVE-2016-5424
  • Introduce a psql "\connect -reuse-previous=on|off" option. The decision to reuse values of parameters from a previous connection has been based on whether the new target is a conninfo string. Add this means of overriding that default. This feature arose as one component of a fix for security vulnerabilities in pg_dump, pg_dumpall, and pg_upgrade, so back-patch to 9.1 (all supported versions). In 9.3 and later, comment paragraphs that required update had already-incorrect claims about behavior when no connection is open; fix those problems. Security: CVE-2016-5424

Álvaro Herrera pushed:

Simon Riggs pushed:

Correctifs en attente

Thomas Munro sent in a patch to implement dsm_unpin_segment.

Peter Eisentraut sent in a patch to set log_line_prefix and application name in test drivers for pg_regress.

David G. Johnston sent in a patch to make comprehension of the code in "TS_phrase_execute" a bit easier.

Amit Langote sent in another revision of the patch set to implement declarative table partitioning.

Robert Haas sent in a patch to run pgindent before the branch.

Tom Lane sent in a patch to allow GIN array_ops to work on anyarray.

Michaël Paquier sent in a patch to forbid use of LF and CR characters in database and role names.

Robert Haas sent in a patch to introduce condition variables.

Craig Ringer sent in a patch to give a more helpful hint as to when psql's \copy or similar trickery might be needed, as opposed to COPY.

Martín Marqués sent in a patch to add a test which will fail if it finds tables created by EXTENSIONs in pg_dump output.

Thomas Munro sent in a patch to create jumbo DSM segments, producing a SIGBUS if overallocated, which is not the right error.

Rushabh Lathia sent in a patch to fix a crash caused by a run of TPC-H.

Andreas Scherbaum sent in another revision of a patch to add to_date_valid().

Thomas Munro sent in a patch to add "barriers" to aid parallel computation.

Ryan Murphy sent in two revisions of a patch to make it possible for initdb to work with paths that contain spaces.

par N Bougain le mercredi 17 août 2016 à 23h03

samedi 13 août 2016

Adrien Nayrat

PostgreSQL : Retarder la vérification des contraintes

Retarder la vérification des contraintes

Remarque : Cet article a été rédigé durant le cadre de mon activité chez Dalibo.

Postgres respecte le modèle ACID, ainsi il garantie la cohérence de la base : une transaction amène la base d’un état stable à un autre.

Les données dans les différentes tables ne sont pas indépendantes mais obéissent à des règles sémantiques mises en place au moment de la conception du modèle conceptuel des données. Les contraintes d’intégrité ont pour principal objectif de garantir la cohérence des données entre elles, et donc de veiller à ce qu’elles respectent ces règles sémantiques. Si une insertion, une mise à jour ou une suppression viole ces règles, l’opération est purement et simplement annulée.

Le moteur effectue la vérification des contraintes à chaque modification (lorsque des contraintes ont été définies). Il est également possible de retarder la vérification des contraintes à la fin de la transaction, au moment du commit. Ainsi, les vérifications ne seront produites que sur les changements effectifs entre les opérations de delete, update et insert de la transaction.

Exemple :

 CREATE TABLE t2 (c1 INT REFERENCES t1(c1), c2 text);


 DETAIL:  KEY (c1)=(3) IS NOT present IN TABLE "t1".

La ligne insérée dans t2 doit respecter la contrainte d’intégrité référentielle, la table t1 ne contient aucune ligne où c1=3. Insérons une ligne correcte :


 c1 | c2
 1 | a
 2 | b
 (2 ROWS)

 c1 | c2
 1 | a
 (1 ROW)

Que se passe t-il si on souhaite modifier la clé primaire de t1?

 UPDATE t1 SET c1=3 WHERE c1=1;
 DETAIL:  KEY (c1)=(1) IS still referenced FROM TABLE "t2".

La vérification de la contrainte se fait lors de l’UPDATE et déclenche une erreur. Il est possible de demander au moteur d’effectuer la vérification des contraintes à la fin de la transaction avec l’ordre SET CONSTRAINTS ALL DEFERRED;. A noter également que le mot clef ALL peut-être remplacé par le nom d’une contrainte (si elle est nommée et est DEFERRABLE)

 UPDATE t1 SET c1=3 WHERE c1=1;
 DETAIL:  KEY (c1)=(1) IS still referenced FROM TABLE "t2".

Ça ne fonctionne toujours pas, en effet il faut préciser que que l’application de la contrainte peut être retardée avec le mot clé DEFERRABLE


 UPDATE t1 SET c1=3 WHERE c1=1;
 UPDATE t2 SET c1=3 WHERE c1=1;

 c1 | c2
 2 | b
 3 | a
 (2 ROWS)

 c1 | c2
 3 | a
 (1 ROW)

Dans ce cas le moteur accepte de faire la vérification en fin de transaction.

Autre intérêt, si une ligne est effacée et réinsérée dans la même transaction, les vérifications sur cette ligne ne sont pas exécutées (car inutiles).

Exemple, on vide les tables puis on insère 1 million de lignes.

TRUNCATE t1 cascade;

EXPLAIN analyse INSERT INTO T1 (c1,c2) (SELECT  *,md5(i::text) FROM (SELECT * FROM generate_series(1,1000000)) i);

Ensuite on insère 100 000 lignes, puis on les supprime pour les réinsérer à nouveau (sans indiquer au moteur de différer la vérification de contraintes).


EXPLAIN analyse  INSERT INTO T2 (c1,c2) (SELECT  *,md5(i::text) FROM (SELECT * FROM generate_series(1,1000000)) i);

 INSERT ON t2  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=3451.308..3451.308 ROWS=0 loops=1)
 -&gt;  FUNCTION Scan ON generate_series  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=170.218..1882.406 ROWS=1000000 loops=1)
 Planning TIME: 0.054 ms
 TRIGGER FOR CONSTRAINT t2_c1_fkey: TIME=16097.543 calls=1000000
 Execution TIME: 19654.595 ms
 (5 ROWS)

TIME: 19654.971 ms

DELETE FROM t2 WHERE c1 &lt;= 1000000;
 DELETE 1000000
 TIME: 2088.318 ms

EXPLAIN analyse  INSERT INTO T2 (c1,c2) (SELECT  *,md5(i::text) FROM (SELECT * FROM generate_series(1,1000000)) i);
 INSERT ON t2  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=3859.265..3859.265 ROWS=0 loops=1)
 -&gt;  FUNCTION Scan ON generate_series  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=169.826..1845.421 ROWS=1000000 loops=1)
 Planning TIME: 0.054 ms
 TRIGGER FOR CONSTRAINT t2_c1_fkey: TIME=14600.258 calls=1000000
 Execution TIME: 18558.108 ms
 (5 ROWS)

TIME: 18558.386 ms
 TIME: 8.563 ms

Le moteur va effectuer les vérifications à chaque insertion (environ 18 secondes à chaque insertion).

Effectuons la même opération en retardant la vérification des contraintes :

 TIME: 0.130 ms


EXPLAIN analyse  INSERT INTO T2 (c1,c2) (SELECT  *,md5(i::text) FROM (SELECT * FROM generate_series(1,1000000)) i);

 INSERT ON t2  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=3241.172..3241.172 ROWS=0 loops=1)
 -&gt;  FUNCTION Scan ON generate_series  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=169.770..1831.893 ROWS=1000000 loops=1)
 Planning TIME: 0.096 ms
 Execution TIME: 3269.624 ms
 (4 ROWS)

TIME: 3270.018 ms

DELETE FROM t2 WHERE c1 &lt;= 1000000;
 DELETE 2000000
 TIME: 2932.070 ms

EXPLAIN analyse  INSERT INTO T2 (c1,c2) (SELECT  *,md5(i::text) FROM (SELECT * FROM generate_series(1,1000000)) i);
 INSERT ON t2  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=3181.294..3181.294 ROWS=0 loops=1)
 -&gt;  FUNCTION Scan ON generate_series  (cost=0.00..17.50 ROWS=1000 width=36) (actual TIME=170.137..1811.889 ROWS=1000000 loops=1)
 Planning TIME: 0.055 ms
 Execution TIME: 3209.712 ms
 (4 ROWS)

TIME: 3210.067 ms

 TIME: 16630.155 ms

Les insertions sont plus rapides, en revanche le commit est plus long car le moteur effectue la vérification des contraintes. Au final, le moteur effectue une seule vérification à la fin de la transaction (commit). L’opération est donc plus rapide.

Il est possible de créer la contrainte avec un autre attribut DEFERRABLE INITIALLY DEFERRED qui permet de s’affranchir du SET CONSTRAINTS ALL DEFERRED. A noter également que le mot clef ALL peut-être remplacé par le nom d’une contrainte (si elle est nommée et est DEFERRABLE)

Si les enregistrements modifiés ne respectent pas les contraintes, la transaction est annulée au moment du commit :

 c1 | c2
 1 | un
 (1 ROW)

 SET constraints ALL deferred ;
 anayrat=# INSERT INTO t2 VALUES ('2','un');
 anayrat=# commit;
 DETAIL:  KEY (c1)=(2) IS NOT present IN TABLE "t1".

D’autre part, la solution de désactivation des contraintes peut effectivement poser des problèmes si on souhaite les réactiver et que les données ne le permettent pas (contrainte effectivement rompue), cette solution reste possible, au sein d’une transaction, toutefois cela provoque un verrouillage exclusif sur la table modifiée pendant toute la transaction ce qui peut poser de sérieux problèmes de performance.

Il est également possible de déclarer la contrainte en NOT VALID. La création de la contrainte sera quasi immédiate, les données actuellement présentes ne seront pas validées. Cependant, toutes les données insérées ou mises à jour par la suite seront validées vis à vis de ces contraintes.

Ensuite on peut demander au moteur de faire la vérification des contraintes pour l’intégralité des enregistrements avec l’ordre VALIDATE CONSTRAINT. Cet ordre entraîne un verrou exclusif sur la table. A partir de la version 9.4 le verrou est plus léger : SHARE UPDATE EXCLUSIVE sur la table modifiée. Si la contrainte est une clé étrangère, le verrou est de type ROW SHARE sur la table référente.

 c1 | c2
 1 | un
 (1 ROW)

 c1 | c2
 (0 ROWS)

INSERT INTO t2 VALUES (2,'deux');
 c1 |  c2
 2 | deux
 (1 ROW)


La table t2 contient un enregistrement ne respectant pas la contrainte t2_c1_fkey. Aucune erreur n’est remontée car la vérification se fait seulement pour les nouvelles modifications :

INSERT INTO t2 VALUES (3,'trois');
 DETAIL:  KEY (c1)=(3) IS NOT present IN TABLE "t1".

De même, une erreur est bien remontée lors de la vérification de contrainte :

 DETAIL:  KEY (c1)=(2) IS NOT present IN TABLE "t1".

En supprimant l’enregistrement posant problème, les contraintes peuvent être validées :


FacebookTwitterGoogle+ViadeoPrintEmailGoogle GmailLinkedInPartager

par Adrien Nayrat le samedi 13 août 2016 à 20h42

mardi 9 août 2016


Nouvelles hebdomadaires de PostgreSQL - 7 août 2016

La PGconf.ASIA 2016 aura lieu les 2 & 3 décembre à Tokyo. Envoyez vos propositions à pgconf-asia-2016-submission AT pgconf DOT asia (date limite des candidatures au 31 août à minuit) :

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en août

PostgreSQL Local

  • Le PostgresOpen 2016 aura lieu à Dallas (Texas, USA) du 13 au 16 septembre. L'appel à conférenciers à été lancé :
  • Session PostgreSQL le 22 septembre 2016 à Lyon (France). La date limite de candidature est le 20 mai : envoyez vos propositions à call-for-paper AT postgresql-sessions DOT org [ndt: ]
  • La PostgreSQL Conference Europe 2016 aura lieu à Tallin, Estonie, du 1er au 4 novembre 2016. Les inscriptions spéciales « lève-tôt » sont ouvertes jusqu'au 14 septembre :
  • La PgConf Silicon Valley 2016 aura lieu du 14 au 16 novembre 2016 :
  • CHAR(16) aura lieu à New York le 6 décembre 2016. L'appel à conférenciers court jusqu'au 13 septembre, minuit (EDT) :

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Fujii Masao pushed:

  • Fix pg_basebackup so that it accepts 0 as a valid compression level. The help message for pg_basebackup specifies that the numbers 0 through 9 are accepted as valid values of -Z option. But, previously -Z 0 was rejected as an invalid compression level. Per discussion, it's better to make pg_basebackup treat 0 as valid compression level meaning no compression, like pg_dump. Back-patch to all supported versions. Reported-By: Jeff Janes Reviewed-By: Amit Kapila Discussion:
  • Remove unused arguments from pg_replication_origin_xact_reset function. The document specifies that pg_replication_origin_xact_reset function doesn't have any argument variables. But previously it was actually defined so as to have two argument variables, though they were not used at all. That is, the pg_proc entry for that function was incorrect. This patch fixes the pg_proc entry and removes those two arguments from the function definition. No back-patch because this change needs a catalog version bump although the issue exists in 9.5 as well. Instead, a note about those unused argument variables will be added to 9.5 document later. Catalog version bumped due to the change of pg_proc.

Michael Meskes pushed:

Bruce Momjian pushed:

Tom Lane pushed:

  • Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch. This coding pattern creates a race condition, because if an interesting interrupt happens after we've checked InterruptPending but before we reset our latch, the latch-setting done by the signal handler would get lost, and then we might block at WaitLatch in the next iteration without ever noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS before WaitLatch or after ResetLatch, but not between them. Aside from fixing the bugs, add some explanatory comments to latch.h to perhaps forestall the next person from making the same mistake. In HEAD, also replace gather_readnext's direct call of HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean or useful for this one caller to bypass ProcessInterrupts and go straight to HandleParallelMessages; not least because that fails to consider the InterruptPending flag, resulting in useless work both here (if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call (if it is). This thinko seems to have been introduced in the initial coding of storage/ipc/shm_mq.c (commit ec9037df2), and then blindly copied into all the subsequent parallel-query support logic. Back-patch relevant hunks to 9.4 to extirpate the error everywhere. Discussion: <>
  • Minor cleanup for access/transam/parallel.c. ParallelMessagePending *must* be marked volatile, because it's set by a signal handler. On the other hand, it's pointless for HandleParallelMessageInterrupt to save/restore errno; that must be, and is, done at the outer level of the SIGUSR1 signal handler. Calling CHECK_FOR_INTERRUPTS() inside HandleParallelMessages, which itself is called from CHECK_FOR_INTERRUPTS(), seems both useless and hazardous. The comment claiming that this is needed to handle the error queue going away is certainly misguided, in any case. Improve a couple of error message texts, and use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to report loss of parallel worker connection, since that's what's used in e.g. tqueue.c. (Maybe it would be worth inventing a dedicated ERRCODE for this type of failure? But I do not think ERRCODE_INTERNAL_ERROR is appropriate.) Minor stylistic cleanups.
  • Fix pg_dump's handling of public schema with both -c and -C options. Since -c plus -C requests dropping and recreating the target database as a whole, not dropping individual objects in it, we should assume that the public schema already exists and need not be created. The previous coding considered only the state of the -c option, so it would emit "CREATE SCHEMA public" anyway, leading to an unexpected error in restore. Back-patch to 9.2. Older versions did not accept -c with -C so the issue doesn't arise there. (The logic being patched here dates to 8.0, cf commit 2193121fa, so it's not really wrong that it didn't consider the case at the time.) Note that versions before 9.6 will still attempt to emit REVOKE/GRANT on the public schema; but that happens without -c/-C too, and doesn't seem to be the focus of this complaint. I considered extending this stanza to also skip the public schema's ACL, but that would be a misfeature, as it'd break cases where users intentionally changed that ACL. The real fix for this aspect is Stephen Frost's work to not dump built-in ACLs, and that's not going to get back-ported. Per bugs #13804 and #14271. Solution found by David Johnston and later rediscovered by me. Report: <> Report: <>
  • Block interrupts during HandleParallelMessages(). As noted by Alvaro, there are CHECK_FOR_INTERRUPTS() calls in the shm_mq.c functions called by HandleParallelMessages(). I believe they're all unreachable since we always pass nowait = true, but it doesn't seem like a great idea to assume that no such call will ever be reachable from HandleParallelMessages(). If that did happen, there would be a risk of a recursive call to HandleParallelMessages(), which it does not appear to be designed for --- for example, there's nothing that would prevent out-of-order processing of received messages. And certainly such cases cannot easily be tested. So let's prevent it by holding off interrupts for the duration of the function. Back-patch to 9.5 which contains identical code. Discussion: <>
  • Remove duplicate InitPostmasterChild() call while starting a bgworker. This is apparently harmless on Windows, but on Unix it results in an assertion failure. We'd not noticed because this code doesn't get used on Unix unless you build with -DEXEC_BACKEND. Bug was evidently introduced by sloppy refactoring in commit 31c453165. Thomas Munro Discussion: <>
  • Do not let PostmasterContext survive into background workers. We don't want postmaster child processes to contain a copy of the postmaster's PostmasterContext. That would be a waste of memory at least, and at worst a security issue, since there are copies of the semi-sensitive pg_hba and pg_ident data in there. All other child process types delete the PostmasterContext after forking, but the original coding of the background worker patch (commit da07a1e85) did not do so. It appears that the only reason for that was to avoid copying the bgworker's MyBgworkerEntry out of that context; but the couple of additional statements needed to do so are hardly good justification for it. Hence, copy that data and then clear the context as other child processes do. Because this patch changes the memory context in which a bgworker function gains control, back-patching it would be a bit risky, so we won't fix this in back branches. The "security" complaint is pretty thin anyway for generic bgworkers; only with the introduction of parallel query is there any question of running untrusted code in a bgworker process. Discussion: <>
  • Make INSERT-from-multiple-VALUES-rows handle targetlist indirection better. Previously, if an INSERT with multiple rows of VALUES had indirection (array subscripting or field selection) in its target-columns list, the parser handled that by applying transformAssignedExpr() to each element of each VALUES row independently. This led to having ArrayRef assignment nodes or FieldStore nodes in each row of the VALUES RTE. That works for simple cases, but in bug #14265 Nuri Boardman points out that it fails if there are multiple assignments to elements/fields of the same target column. For such cases to work, rewriteTargetListIU() has to nest the ArrayRefs or FieldStores together to produce a single expression to be assigned to the column. But it failed to find them in the top-level targetlist and issued an error about "multiple assignments to same column". We could possibly fix this by teaching the rewriter to apply rewriteTargetListIU to each VALUES row separately, but that would be messy (it would change the output rowtype of the VALUES RTE, for example) and inefficient. Instead, let's fix the parser so that the VALUES RTE outputs are just the user-specified values, cast to the right type if necessary, and then the ArrayRefs or FieldStores are applied in the top-level targetlist to Vars representing the RTE's outputs. This is the same parsetree representation already used for similar cases with INSERT/SELECT syntax, so it allows simplifications in ruleutils.c, which no longer needs to treat INSERT-from-multiple-VALUES as its own special case. This implementation works by applying transformAssignedExpr to the VALUES entries as before, and then stripping off any ArrayRefs or FieldStores it adds. With lots of VALUES rows it would be noticeably more efficient to not add those nodes in the first place. But that's just an optimization not a bug fix, and there doesn't seem to be any good way to do it without significant refactoring. (A non-invasive answer would be to apply transformAssignedExpr + stripping to just the first VALUES row, and then just forcibly cast remaining rows to the same data types exposed in the first row. But this way would lead to different, not-INSERT-specific errors being reported in casting failure cases, so it doesn't seem very nice.) So leave that for later; this patch at least isn't making the per-row parsing work worse, and it does make the finished parsetree smaller, saving rewriter and planner work. Catversion bump because stored rules containing such INSERTs would need to change. Because of that, no back-patch, even though this is a very long-standing bug. Report: <> Discussion: <>
  • Fix bogus coding in WaitForBackgroundWorkerShutdown(). Some conditions resulted in "return" directly out of a PG_TRY block, which left the exception stack dangling, and to add insult to injury failed to restore the state of set_latch_on_sigusr1. This is a bug only in 9.5; in HEAD it was accidentally fixed by commit db0f6cad4, which removed the surrounding PG_TRY block. However, I (tgl) chose to apply the patch to HEAD as well, because the old coding was gratuitously different from WaitForBackgroundWorkerStartup(), and there would indeed have been no bug if it were done like that to start with. Dmitry Ivanov Discussion: <1637882.WfYN5gPf1A@abook>
  • Update time zone data files to tzdata release 2016f. DST law changes in Kemerovo and Novosibirsk. Historical corrections for Azerbaijan, Belarus, and Morocco. Asia/Novokuznetsk and Asia/Novosibirsk now use numeric time zone abbreviations instead of invented ones. Zones for Antarctic bases and other locations that have been uninhabited for portions of the time span known to the tzdata database now report "-00" rather than "zzz" as the zone abbreviation for those time spans. Also, I decided to remove some of the timezone/data/ files that we don't use. At one time that subdirectory was a complete copy of what IANA distributes in the tzdata tarballs, but that hasn't been true for a long time. There seems no good reason to keep shipping those specific files but not others; they're just bloating our tarballs.
  • Re-pgindent tsvector_op.c. Messed up by recent commits --- this is annoying me while trying to fix some bugs here.
  • Fix ts_delete(tsvector, text[]) to cope with duplicate array entries. Such cases either failed an Assert, or produced a corrupt tsvector in non-Assert builds, as reported by Andreas Seltenreich. The reason is that tsvector_delete_by_indices() just assumed that its input array had no duplicates. Fix by explicitly de-duping. In passing, improve some comments, and fix a number of tests for null values to use ERRCODE_NULL_VALUE_NOT_ALLOWED not ERRCODE_INVALID_PARAMETER_VALUE. Discussion: <>
  • Make array_to_tsvector() sort and de-duplicate the given strings. This is required for the result to be a legal tsvector value. Noted while fooling with Andreas Seltenreich's ts_delete() crash. Discussion: <>
  • Fix copy-and-pasteo in 81c766b3fd41c78c634d78ebae8d316808dfc630. Report: <>
  • Teach libpq to decode server version correctly from future servers. Beginning with the next development cycle, PG servers will report two-part not three-part version numbers. Fix libpq so that it will compute the correct numeric representation of such server versions for reporting by PQserverVersion(). It's desirable to get this into the field and back-patched ASAP, so that older clients are more likely to understand the new server version numbering by the time any such servers are in the wild. (The results with an old client would probably not be catastrophic anyway for a released server; for example "10.1" would be interpreted as 100100 which would be wrong in detail but would not likely cause an old client to misbehave badly. But "10devel" or "10beta1" would result in sversion==0 which at best would result in disabling all use of modern features.) Extracted from a patch by Peter Eisentraut; comments added by me Patch: <>
  • In B-tree page deletion, clean up properly after page deletion failure. In _bt_unlink_halfdead_page(), we might fail to find an immediate left sibling of the target page, perhaps because of corruption of the page sibling links. The code intends to cope with this by just abandoning the deletion attempt; but what actually happens is that it fails outright due to releasing the same buffer lock twice. (And error recovery masks a second problem, which is possible leakage of a pin on another page.) Seems to have been introduced by careless refactoring in commit efada2b8e. Since there are multiple cases to consider, let's make releasing the buffer lock in the failure case the responsibility of _bt_unlink_halfdead_page() not its caller. Also, avoid fetching the leaf page's left-link again after we've dropped lock on the page. This is probably harmless, but it's not exactly good coding practice. Per report from Kyotaro Horiguchi. Back-patch to 9.4 where the faulty code was introduced. Discussion: <>
  • First-draft release notes for 9.5.4. As usual, the release notes for other branches will be made by cutting these down, but put them up for community review first.
  • Don't propagate a null subtransaction snapshot up to parent transaction. This oversight could cause logical decoding to fail to decode an outer transaction containing changes, if a subtransaction had an XID but no actual changes. Per bug #14279 from Marko Tiikkaja. Patch by Marko based on analysis by Andrew Gierth. Discussion: <>
  • Avoid crashing in GetOldestSnapshot() if there are no known snapshots. The sole caller expects NULL to be returned in such a case, so make it so and document it. Per reports from Andreas Seltenreich and Regina Obe. This doesn't really fix their problem, as now their RETURNING queries will say "ERROR: no known snapshots", but in any case this function should not dump core in a reasonably-foreseeable situation. Report: <> Report: <>

Peter Eisentraut pushed:

Kevin Grittner pushed:

Álvaro Herrera pushed:

  • Fix assorted problems in recovery tests. In test 001_stream_rep we're using pg_stat_replication.write_location to determine catch-up status, but we care about xlog having been applied not just received, so change that to apply_location. In test 003_recovery_targets, we query the database for a recovery target specification and later for the xlog position supposedly corresponding to that recovery specification. If for whatever reason more WAL is written between the two queries, the recovery specification is earlier than the xlog position used by the query in the test harness, so we wait forever, leading to test failures. Deal with this by using a single query to extract both items. In 2a0f89cd717 we tried to deal with it by giving them more tests to run, but in hindsight that was obviously doomed to failure (no revert of that, though). Per hamster buildfarm failures. Author: Michaël Paquier

Robert Haas pushed:

  • Prevent "snapshot too old" from trying to return pruned TOAST tuples. Previously, we tested for MVCC snapshots to see whether they were too old, but not TOAST snapshots, which can lead to complaints about missing TOAST chunks if those chunks are subject to early pruning. Ideally, the threshold lsn and timestamp for a TOAST snapshot would be that of the corresponding MVCC snapshot, but since we have no way of deciding which MVCC snapshot was used to fetch the TOAST pointer, use the oldest active or registered snapshot instead. Reported by Andres Freund, who also sketched out what the fix should look like. Patch by me, reviewed by Amit Kapila.
  • Change InitToastSnapshot to a macro. tqual.h is included in some front-end compiles, and a static inline breaks on buildfarm member castoroides. Since the macro is never referenced, it should dodge that problem, although this doesn't seem like the cleanest way of hiding things from front-end compiles. Report and review by Tom Lane; patch by me.

Andres Freund pushed:

  • Fix hard to hit race condition in heapam's tuple locking code. As mentioned in its commit message, eca0f1db left open a race condition, where a page could be marked all-visible, after the code checked PageIsAllVisible() to pin the VM, but before the page is locked. Plug that hole. Reviewed-By: Robert Haas, Andres Freund Author: Amit Kapila Discussion: Backpatch: -

Correctifs en attente

Etsuro Fujita sent in a patch to clarify the documentation around PlanForeignModify.

Karan Sikka sent in a patch to implement Boyer-Moore searching in LIKE queries.

Vladimir Sitnikov sent in a WIP patch to try CreateOneShotCachedPlan for unnamed statements in exec_parse_message.

Peter Geoghegan sent in a patch to implement parallel tuplesort.

Bruce Momjian sent in a patch to change usages of prefixes (k- to K, for example).

Tomas Vondra sent in a patch to add two slab-like memory allocators.

Masahiko Sawada sent in a patch to add a quorum commit for multiple synchronous replication.

Etsuro Fujita sent in two revisions of a patch to fix an oddity in EXPLAIN for foreign/custom join pushdown plans.

Tomas Vondra sent in another revision of a patch to add multivariate statistics.

Alexey Grishchenko sent in a patch to add upport for multi-dimensional arrays to PL/PythonU.

Andres Freund sent in a PoC patch to fix some messed-up-ness in SRF handling in target lists.

Masahiko Sawada sent in a patch to ensure that lazy_scan_heap accesses the visibility map only once.

Peter Eisentraut sent in a WIP patch to stamp version 10 for development.

Peter Eisentraut sent in another revision of a patch to add a -w (wait) pg_ctl promote option.

Ildar Musin sent in a patch to remove a now-unused argument in the PinBuffer function.

Mithun Cy sent in a patch to add coverage tests for hash indexes.

Michaël Paquier sent in two revisions of a patch to fix a crash in pg_basebackup uncovered by Andreas Seltenreich's sqlsmith.

Peter Eisentraut sent in a patch to improve DefElem list processing.

Aleksander Alekseev sent in another revision of a patch add a "fast" variant of temp tables which don't bloat the catalog.

Tom Lane sent in two revisions of a patch to fix an issue that resulted in bogus ANALYZE results for an otherwise-unique column with many nulls.

Amit Kapila sent in another revision of a patch to implement WAL-logged hash indexes.

Jeff Janes sent in a patch to add a new GUC, notice_lock_waits, which acts like log_lock_waits but sends the message as NOTICE so it will show up on interactive connections like psql.

Petr Jelinek sent in WIP a patch to add logical replication, and there was much rejoicing.

Peter Eisentraut sent in a patch to fix an overflow in the MONEY type.

Anastasia Lubennikova sent in a patch to refactor the heapam code.

Thomas Munro sent in a patch to consolidate the 'unique array values' logic into a reusable function.

Pavel Stehule sent in a PoC patch to add an xmltable function.

Andreas Seltenreich sent in a patch to fix a crash in pg_get_viewdef_name_ext().

par N Bougain le mardi 9 août 2016 à 23h23

lundi 1 août 2016


Nouvelles hebdomadaires de PostgreSQL - 31 juillet 2016

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en juillet

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Álvaro Herrera pushed:

Fujii Masao pushed:

Peter Eisentraut pushed:

Tom Lane pushed:

  • Fix constant-folding of ROW(...) IS [NOT] NULL with composite fields. The SQL standard appears to specify that IS [NOT] NULL's tests of field nullness are non-recursive, ie, we shouldn't consider that a composite field with value ROW(NULL,NULL) is null for this purpose. ExecEvalNullTest got this right, but eval_const_expressions did not, leading to weird inconsistencies depending on whether the expression was such that the planner could apply constant folding. Also, adjust the docs to mention that IS [NOT] DISTINCT FROM NULL can be used as a substitute test if a simple null check is wanted for a rowtype argument. That motivated reordering things so that IS [NOT] DISTINCT FROM is described before IS [NOT] NULL. In HEAD, I went a bit further and added a table showing all the comparison-related predicates. Per bug #14235. Back-patch to all supported branches, since it's certainly undesirable that constant-folding should change the semantics. Report and patch by Andrew Gierth; assorted wordsmithing and revised regression test cases by me. Report: <>
  • Allow functions that return sets of tuples to return simple NULLs. ExecMakeTableFunctionResult(), which is used in SELECT FROM function(...) cases, formerly treated a simple NULL output from a function that both returnsSet and returnsTuple as a violation of the SRF protocol. What seems better is to treat a NULL output as equivalent to ROW(NULL,NULL,...). Without this, cases such as SELECT FROM unnest(...) on an array of composite are vulnerable to unexpected and not-very-helpful failures. Old code comments here suggested an alternative of just ignoring simple-NULL outputs, but that doesn't seem very principled. This change had been hung up for a long time due to uncertainty about how much we wanted to buy into the equivalence of simple NULL and ROW(NULL,NULL,...). I think that's been mostly resolved by the discussion around bug #14235, so let's go ahead and do it. Per bug #7808 from Joe Van Dyk. Although this is a pretty old report, fixing it smells a bit more like a new feature than a bug fix, and the lack of other similar complaints suggests that we shouldn't take much risk of destabilization by back-patching. (Maybe that could be revisited once this patch has withstood some field usage.) Andrew Gierth and Tom Lane Report: <>
  • Fix cost_rescan() to account for multi-batch hashing correctly. cost_rescan assumed that we don't need to rebuild the hash table when rescanning a hash join. However, that's currently only true for single-batch joins; for a multi-batch join we must charge full freight. This probably has escaped notice because we'd be unlikely to put a hash join on the inside of a nestloop anyway. Nonetheless, it's wrong. Fix in HEAD, but don't backpatch for fear of destabilizing plans in stable releases.
  • tqueue.c's record-typmod hashtables need the HASH_BLOBS option. The keys are integers, not strings. The code accidentally worked on little-endian machines, at least up to 256 distinct record types within a session, but failed utterly on big-endian. This was unexpectedly exposed by a test case added by commit 4452000f3, which apparently is the only parallelizable query in the regression suite that uses more than one anonymous record type. Fortunately, buildfarm member mandrill is big-endian and is running with force_parallel_mode on, so it failed.
  • Register atexit hook only once in pg_upgrade. start_postmaster() registered stop_postmaster_atexit as an atexit(3) callback each time through, although the obvious intention was to do so only once per program run. The extra registrations were harmless, so long as we didn't exceed ATEXIT_MAX, but still it's a bug. Artur Zakirov, with bikeshedding by Kyotaro Horiguchi and me Discussion: <>
  • Improve documentation about CREATE TABLE ... LIKE. The docs failed to explain that LIKE INCLUDING INDEXES would not preserve the names of indexes and associated constraints. Also, it wasn't mentioned that EXCLUDE constraints would be copied by this option. The latter oversight seems enough of a documentation bug to justify back-patching. In passing, do some minor copy-editing in the same area, and add an entry for LIKE under "Compatibility", since it's not exactly a faithful implementation of the standard's feature. Discussion: <>
  • Fix assorted fallout from IS [NOT] NULL patch. Commits 4452000f3 et al established semantics for NullTest.argisrow that are a bit different from its initial conception: rather than being merely a cache of whether we've determined the input to have composite type, the flag now has the further meaning that we should apply field-by-field testing as per the standard's definition of IS [NOT] NULL. If argisrow is false and yet the input has composite type, the construct instead has the semantics of IS [NOT] DISTINCT FROM NULL. Update the comments in primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print such cases correctly. In the case of ruleutils.c, this merely results in cosmetic changes in EXPLAIN output, since the case can't currently arise in stored rules. However, it represents a live bug for deparse.c, which would formerly have sent a remote query that had semantics different from the local behavior. (From the user's standpoint, this means that testing a remote nested-composite column for null-ness could have had unexpected recursive behavior much like that fixed in 4452000f3.) In a related but somewhat independent fix, make plancat.c set argisrow to false in all NullTest expressions constructed to represent "attnotnull" constructs. Since attnotnull is actually enforced as a simple null-value check, this is a more accurate representation of the semantics; we were previously overpromising what it meant for composite columns, which might possibly lead to incorrect planner optimizations. (It seems that what the SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so arguably we are violating the spec and should fix attnotnull to do the other thing. If we ever do, this part should get reverted.) Back-patch, same as the previous commit. Discussion: <>
  • Teach parser to transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest. Now that we've nailed down the principle that NullTest with !argisrow is fully equivalent to SQL's IS [NOT] DISTINCT FROM NULL, let's teach the parser about it. This produces a slightly more compact parse tree and is much more amenable to optimization than a DistinctExpr, since the planner knows a good deal about NullTest and next to nothing about DistinctExpr. I'm not sure that there are all that many queries in the wild that could be improved by this, but at least one source of such cases is the patch just made to postgres_fdw to emit IS [NOT] DISTINCT FROM NULL when IS [NOT] NULL isn't semantically correct. No back-patch, since to the extent that this does affect planning results, it might be considered undesirable plan destabilization.
  • Guard against empty buffer in gets_fromFile()'s check for a newline. Per the fgets() specification, it cannot return without reading some data unless it reports EOF or error. So the code here assumed that the data buffer would necessarily be nonempty when we go to check for a newline having been read. However, Agostino Sarubbo noticed that this could fail to be true if the first byte of the data is a NUL (\0). The fgets() API doesn't really work for embedded NULs, which is something I don't feel any great need for us to worry about since we generally don't allow NULs in SQL strings anyway. But we should not access off the end of our own buffer if the case occurs. Normally this would just be a harmless read, but if you were unlucky the byte before the buffer would contain '\n' and we'd overwrite it with '\0', and if you were really unlucky that might be valuable data and psql would crash. Agostino reported this to pgsql-security, but after discussion we concluded that it isn't worth treating as a security bug; if you can control the input to psql you can do far more interesting things than just maybe-crash it. Nonetheless, it is a bug, so back-patch to all supported versions.
  • Fix pq_putmessage_noblock() to not block. An evident copy-and-pasteo in commit 2bd9e412f broke the non-blocking aspect of pq_putmessage_noblock(), causing it to behave identically to pq_putmessage(). That function is nowadays used only in walsender.c, so that the net effect was to cause walsenders to hang up waiting for the receiver in situations where they should not. Kyotaro Horiguchi Patch: <>
  • Fix tqueue.c's range-remapping code. It's depressingly clear that nobody ever tested this.
  • Fix worst memory leaks in tqueue.c. TupleQueueReaderNext() leaks like a sieve if it has to do any tuple disassembly/reconstruction. While we could try to clean up its allocations piecemeal, it seems like a better idea just to insist that it should be run in a short-lived memory context, so that any transient space goes away automatically. I chose to have nodeGather.c switch into its existing per-tuple context before the call, rather than inventing a separate context inside tqueue.c. This is sufficient to stop all leakage in the simple case I exhibited earlier today (see link below), but it does not deal with leaks induced in more complex cases by tqueue.c's insistence on using TopMemoryContext for data that it's not actually trying hard to keep track of. That issue is intertwined with another major source of inefficiency, namely failure to cache lookup results across calls, so it seems best to deal with it separately. In passing, improve some comments, and modify gather_readnext's method for deciding when it's visited all the readers so that it's more obviously correct. (I'm not actually convinced that the previous code *is* correct in the case of a reader deletion; it certainly seems fragile.) Discussion: <>
  • Code review for tqueue.c: fix memory leaks, speed it up, other fixes. When doing record typmod remapping, tqueue.c did fresh catalog lookups for each tuple it processed, which was pretty horrible performance-wise (it seemed to about halve the already none-too-quick speed of bulk reads in parallel mode). Worse, it insisted on putting bits of that data into TopMemoryContext, from where it never freed them, causing a session-lifespan memory leak. (I suppose this was coded with the idea that the sender process would quit after finishing the query --- but the receiver uses the same code.) Restructure to avoid repetitive catalog lookups and to keep that data in a query-lifespan context, in or below the context where the TQueueDestReceiver or TupleQueueReader itself lives. Fix some other bugs such as continuing to use a tupledesc after releasing our refcount on it. Clean up cavalier datatype choices (typmods are int32, please, not int, and certainly not Oid). Improve comments and error message wording.
  • Doc: remove claim that hash index creation depends on effective_cache_size. This text was added by commit ff213239c, and not long thereafter obsoleted by commit 4adc2f72a (which made the test depend on NBuffers instead); but nobody noticed the need for an update. Commit 9563d5b5e adds some further dependency on maintenance_work_mem, but the existing verbiage seems to cover that with about as much precision as we really want here. Let's just take it all out rather than leaving ourselves open to more errors of omission in future. (That solution makes this change back-patchable, too.) Noted by Peter Geoghegan. Discussion: <>

Robert Haas pushed:

  • Repair damage done by citext--1.1--1.2.sql. That script is incorrect in that it sets the combine function for max(citext) twice instead of setting the combine function for max(citext) once and the combine functon for min(citext) once. The consequence is that if you install 1.0 or 1.1 and then update to 1.2, you end up with min(citext) not having a combine function, contrary to what was intended. If you install 1.2 directly, you're OK. Fix things up by defining a new 1.3 version. Upgrading from 1.2 to 1.3 won't change anything for people who first installed the 1.2 version, but people upgrading from 1.0 or 1.1 will get the right catalog contents once they reach 1.3. Report and patch by David Rowley, reviewed by Andreas Karlsson.
  • Change various deparsing functions to return NULL for invalid input. Previously, some functions returned various fixed strings and others failed with a cache lookup error. Per discussion, standardize on returning NULL. Although user-exposed "cache lookup failed" error messages might normally qualify for bug-fix treatment, no back-patch; the risk of breaking user code which is accustomed to the current behavior seems too high. Michael Paquier
  • Fix thinko in copyParamList. There's no point in consulting retval->paramMask; it's always NULL. Instead, we should consult from->paramMask. Reported by Andrew Gierth.
  • Eliminate a few more user-visible "cache lookup failed" errors. Michael Paquier

Bruce Momjian pushed:

Stephen Frost pushed:

  • Correctly handle owned sequences with extensions. With the refactoring of pg_dump to handle components, getOwnedSeqs needs to be a bit more intelligent regarding which components to dump when. Specifically, we can't simply use the owning table's components as the set of components to dump as the table might only be including certain components while all components of the sequence should be dumped, for example, when the table is a member of an extension while the sequence is not. Handle this by combining the set of components to be dumped for the sequence explicitly and those to be dumped for the table when setting the components to be dumped for the sequence. Also add a number of regression tests around this to, hopefully, catch any future changes which break the expected behavior. Discovered by: Philippe BEAUDOIN Reviewed by: Michael Paquier

Correctifs en attente

Thomas Munro sent in three more revisions of a patch to add LWLocks for DSM memory.

Michaël Paquier sent in another revision of a patch to add SCRAM-SHA-256 authentication over the SASL communication protocol.

Kyotaro HORIGUCHI sent in a patch to remove validation status condition from equalTupleDescs.

Amit Langote sent in a patch to make equalTupleDescs() parameterized on whether or not to performed an equality check on TupleConstr.

Kyotaro HORIGUCHI sent in a patch to split equalTupleDescs into two functions.

Tom Lane sent in a patch to create statement-level temporary memory contexts in plpgsql.

Heikki Linnakangas sent in a patch to optimize SUM().

Fujii Masao sent in a patch to add a 0 as a possible backup compression level for pg_basebackup.

Amit Langote sent in a patch to fix a comment on ATExecValidateConstraint.

Andrew Borodin sent in two more revisions of a patch to optimize GiST and BRIN memmoves.

Thomas Munro sent in a patch to clarify the meaning of NOT NULL constraints in light of implementation details.

Amit Kapila sent in a patch to fix some locking and pinning issues in the freeze map code.

John Harvey and Michaël Paquier traded patches to fix an issue with Perl on Windows.

Andrew Gierth sent in a patch to document the new index description functions which replace functionality taken out of the catalog.

Andres Freund sent in a PoC patch to add a new high-performance hashing function.

David Fetter sent in another revision of a patch to allow disallowing UPDATEs and DELETEs that lack a WHERE clause.

Etsuro Fujita sent in a patch to make the FDW infrastructure be more explicit about what foreign objects it is joining remotely.

Robert Haas sent in two revisions of a patch intended to fix an issue where old_snapshot_threshold allows heap:toast disagreement.

Tom Lane sent in a patch to fix an issue with target-column indirection in INSERT with multiple VALUES.

Aleksander Alekseev sent in two revisions of a patch to fix the RBtree iteration interface.

Fujii Masao sent in a patch to remove some unneeded arguments from the definition of pg_replication_origin_xact_reset().

Alexey Grishchenko sent in a patch to fix a slowness in PL/Python input array traversal.

Thomas Munro sent in a patch to remove a double invocation of InitPostmasterChild in bgworker with -DEXEC_BACKEND.

Michaël Paquier sent in two more revisions of a patch to fix wal level minimal.

Aleksander Alekseev sent in a patch to make a faster version of temporary tables by not making a catalog entry.

par N Bougain le lundi 1 août 2016 à 22h47

lundi 25 juillet 2016


Nouvelles hebdomadaires de PostgreSQL - 24 juillet 2016

PostgreSQL 9.6 Beta 3 disponible. À vos tests !

La PostgreSQL Conference Europe 2016 aura lieu à Tallin, Estonie, du 1er au 4 novembre 2016. L'appel à conférenciers est lancé jusqu'au 7 août et les inscriptions spéciales « lève-tôt » sont ouvertes jusqu'au 14 septembre :

Offres d'emplois autour de PostgreSQL en juillet

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Tom Lane pushed:

  • Correctly set up aggregate FILTER expression in partial-aggregation plans. The aggfilter expression should be removed from the parent (combining) Aggref, since it's not supposed to apply the filter, and indeed cannot because any Vars used in the filter would not be available after the lower-level aggregation step. Per report from Jeff Janes. (This has been broken since the introduction of partial aggregation, I think. The error became obvious after commit 59a3795c2, when setrefs.c began processing the parent Aggref's fields normally and thus would detect such Vars. The special-case coding previously used in setrefs.c skipped over the parent's aggfilter field without processing it. That was broken in its own way because no other setrefs.c processing got applied either; though since the executor would not execute the filter expression, only initialize it, that oversight might not have had any visible symptoms at present.) Report: <>
  • Fix regression tests to work in Welsh locale. Welsh (cy_GB) apparently sorts 'dd' after 'f', creating problems analogous to the odd sorting of 'aa' in Danish. Adjust regression test case to not use data that provokes that. Jeff Janes Patch: <>
  • Remove GetUserMappingId() and GetUserMappingById(). These functions were added in commits fbe5a3fb7 and a104a017f, but commit 45639a052 removed their only callers. Put the related code in foreign.c back to the way it was in 9.5, to avoid pointless cross-version diffs. Etsuro Fujita Patch: <>
  • Make contrib regression tests safe for Danish locale. In btree_gin and citext, avoid some not-particularly-interesting dependencies on the sorting of 'aa'. In tsearch2, use COLLATE "C" to remove an uninteresting dependency on locale sort order (and thereby allow removal of a variant expected-file). Also, in citext, avoid assuming that lower('I') = 'i'. This isn't relevant to Danish but it does fail in Turkish.
  • Make pltcl regression tests safe for Danish locale. Another peculiarity of Danish locale is that it has an unusual idea of how to sort upper vs. lower case. One of the pltcl test cases has an issue with that. Now that COLLATE works in all supported branches, we can just change the test to be locale-independent, and get rid of the variant expected file that used to support non-C locales.
  • Make core regression tests safe for Danish locale. Some tests added in 9.5 depended on 'aa' sorting before 'bb', which doesn't hold true in Danish. Use slightly different test data to avoid the problem. Jeff Janes Report: <>
  • Remove very-obsolete estimates of shmem usage from postgresql.conf.sample. runtime.sgml used to contain a table of estimated shared memory consumption rates for max_connections and some other GUCs. Commit 390bfc643 removed that on the well-founded grounds that (a) we weren't maintaining the entries well and (b) it no longer mattered so much once we got out from under SysV shmem limits. But it missed that there were even-more-obsolete versions of some of those numbers in comments in postgresql.conf.sample. Remove those too. Back-patch to 9.3 where the aforesaid commit went in.
  • Stamp 9.6beta3.
  • Doc: improve discussion of plpgsql's GET DIAGNOSTICS, other minor fixes. 9.4 added a second description of GET DIAGNOSTICS that was totally independent of the existing one, resulting in each description lying to the extent that it claimed the set of status items it described was complete. Fix that, and do some minor markup improvement. Also some other small fixes per bug #14258 from Dilian Palauzov. Discussion: <>
  • Doc: fix table of BRIN operator strategy numbers. brin-extensibility-inclusion-table was confused in places about the difference between strategy 4 (RTOverRight) and strategy 5 (RTRight). Alexander Law
  • Remove obsolete comment. Peter Geoghegan
  • Establish conventions about global object names used in regression tests. To ensure that "make installcheck" can be used safely against an existing installation, we need to be careful about what global object names (database, role, and tablespace names) we use; otherwise we might accidentally clobber important objects. There's been a weak consensus that test databases should have names including "regression", and that test role names should start with "regress_", but we didn't have any particular rule about tablespace names; and neither of the other rules was followed with any consistency either. This commit moves us a long way towards having a hard-and-fast rule that regression test databases must have names including "regression", and that test role and tablespace names must start with "regress_". It's not completely there because I did not touch some test cases in rolenames.sql that test creation of special role names like "session_user". That will require some rethinking of exactly what we want to test, whereas the intent of this patch is just to hit all the cases in which the needed renamings are cosmetic. There is no enforcement mechanism in this patch either, but if we don't add one we can expect that the tests will soon be violating the convention again. Again, that's not such a cosmetic change and it will require discussion. (But I did use a quick-hack enforcement patch to find these cases.) Discussion: <>

Andres Freund pushed:

  • Clear all-frozen visibilitymap status when locking tuples. Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so correctly relies on not adding xids to the heap without also unsetting the visibilitymap flag. Tuple locking related code has not done so. To allow selectively resetting all-frozen - to avoid pessimizing heap_lock_tuple - allow to selectively reset the all-frozen with visibilitymap_clear(). To avoid having to use visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical section, have visibilitymap_clear() return whether any bits have been reset. There's a remaining issue (denoted by XXX): After the PageIsAllVisible() check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page status could theoretically change. Practically that currently seems impossible, because updaters will hold a page level pin already. Due to the next beta coming up, it seems better to get the required WAL magic bump done before resolving this issue. The added flags field fields to xl_heap_lock and xl_heap_lock_updated require bumping the WAL magic. Since there's already been a catversion bump since the last beta, that's not an issue. Reviewed-By: Robert Haas, Amit Kapila and Andres Freund Author: Masahiko Sawada, heavily revised by Andres Freund Discussion: Backpatch: -

Peter Eisentraut pushed:

Magnus Hagander pushed:

Kevin Grittner pushed:

Robert Haas pushed:

Noah Misch pushed:

Correctifs en attente

Emre Hasegeli sent another revision of the patch to fix some floating point comparison inconsistencies of the geometric types.

Haribabu Kommi sent another revision of the patch to implement multi-tenancy using row-based access controls.

Ashutosh Bapat sent in a patch to implement the logic to assess whether two partitioned tables can be joined using the partition-wise join technique described earlier.

Masahiko Sawada sent in a patch to rename file number of to fill in a gap.

David Fetter sent in two revisions of a patch to make it possible to disallow UPDATEs and DELETEs which include no WHERE clause.

Kyotaro HORIGUCHI sent in patches to add the reason why we use replayEndRecPtr as recovery-end point, and fix the documentation about pg_stop_backup.

Kyotaro HORIGUCHI sent in another flock of patches in service of asynchronous and vectorized execution.

Amit Langote sent in a patch to fix an oversight in the interaction of marking constraints as (IN)VALID and table inheritance.

Etsuro Fujita sent in a patch to remove some lightly-used user mapping helper functions.

Mithun Cy sent in a patch to cache the meta page of Hash index in backend-private memory.

Anton Dignös sent in a patch to add temporal query processing with range types.

Amit Kapila sent in a patch to fix a concurrency issue in the freeze map code.

Thomas Munro sent in a patch to fix the problem of LWLocks not working in DSM segments by adding a non-circular variant of the dlist interface that uses NULL list termination.

Andrew Borodin sent in another revision of the patch to optimize memmoves in gistplacetopage for fixed-size updates.

par N Bougain le lundi 25 juillet 2016 à 22h48

mardi 19 juillet 2016


Nouvelles hebdomadaires de PostgreSQL - 17 juillet 2016

CHAR(16), une conférence internationale présentant et célébrant les avancées significatives que les équipes de développement de PostgreSQL ont faites dans les domaines du clustering, de la haute disponibilité et de la réplication, aura lieu à New York, le 6 décembre 2016. L'appel à conférenciers est lancé, la date de fin de candidature est fixée à minuit EDT du 30 septembre 2016 :

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en juillet

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Tom Lane pushed:

  • Revert "Add some temporary code to record stack usage at server process exit." This reverts commit 88cf37d2a86d5b66380003d7c3384530e3f91e40 as well as follow-on commits ea9c4a16d5ad88a1d28d43ef458e3209b53eb106 and c57562725d219c4249b82f4a4fb5aaeee3ae0d53. We've learned about as much as we can from the buildfarm.
  • Improve output of psql's \df+ command. Add display of proparallel (parallel-safety) when the server is >= 9.6, and display of proacl (access privileges) for all server versions. Minor tweak of column ordering to keep related columns together. Michael Paquier Discussion: <>
  • Print a given subplan only once in EXPLAIN. We have, for a very long time, allowed the same subplan (same member of the PlannedStmt.subplans list) to be referenced by more than one SubPlan node; this avoids problems for cases such as subplans within an IndexScan's indxqual and indxqualorig fields. However, EXPLAIN had not gotten the memo and would print each reference as though it were an independent identical subplan. To fix, track plan_ids of subplans we've printed and don't print the same plan_id twice. Per report from Pavel Stehule. BTW: the particular case of IndexScan didn't cause visible duplication in a plain EXPLAIN, only EXPLAIN ANALYZE, because in the former case we short-circuit executor startup before the indxqual field is processed by ExecInitExpr. That seems like it could easily lead to other EXPLAIN problems in future, but it's not clear how to avoid it without breaking the "EXPLAIN a plan using hypothetical indexes" use-case. For now I've left that issue alone. Although this is a longstanding bug, it's purely cosmetic (no great harm is done by the repeat printout) and we haven't had field complaints before. So I'm hesitant to back-patch it, especially since there is some small risk of ABI problems due to the need to add a new field to ExplainState. In passing, rearrange order of fields in ExplainState to be less random, and update some obsolete comments about when/where to initialize them. Report: <>
  • Allow IMPORT FOREIGN SCHEMA within pl/pgsql. Since IMPORT FOREIGN SCHEMA has an INTO clause, pl/pgsql needs to be aware of that and avoid capturing the INTO as an INTO-variables clause. This isn't hard, though it's annoying to have to make IMPORT a plpgsql keyword just for this. (Fortunately, we have the infrastructure now to make it an unreserved keyword, so at least this shouldn't break any existing pl/pgsql code.) Per report from Merlin Moncure. Back-patch to 9.5 where IMPORT FOREIGN SCHEMA was introduced. Report: <>
  • Fix obsolete header-file reference in pg_buffercache docs. Commit 2d0019049 moved enum ForkNumber from relfilenode.h into relpath.h, but missed updating this documentation reference. Alexander Law
  • Add a regression test case to improve code coverage for tuplesort. Test the external-sort code path in CLUSTER for two different scenarios: multiple-pass external sorting, and the best case for replacement selection, where only one run is produced, so that no merge is required. This test would have caught the bug fixed in commit 1b0fc8507, at least when run with valgrind enabled. In passing, add a short-circuit test in plan_cluster_use_sort() to make dead certain that it selects sorting when enable_indexscan is off. As things stand, that would happen anyway, but it seems like good future proofing for this test. Peter Geoghegan Discussion: <>
  • Minor test adjustment. Dept of second thoughts: given the RESET SESSION AUTHORIZATION that was just added by commit cec550139, we don't need the reconnection that used to be here. Might as well buy back a few microseconds.
  • Fix GiST index build for NaN values in geometric types. GiST index build could go into an infinite loop when presented with boxes (or points, circles or polygons) containing NaN component values. This happened essentially because the code assumed that x == x is true for any "double" value x; but it's not true for NaNs. The looping behavior was not the only problem though: we also attempted to sort the items using simple double comparisons. Since NaNs violate the trichotomy law, qsort could (in principle at least) get arbitrarily confused and mess up the sorting of ordinary values as well as NaNs. And we based splitting choices on box size calculations that could produce NaNs, again resulting in undesirable behavior. To fix, replace all comparisons of doubles in this logic with float8_cmp_internal, which is NaN-aware and is careful to sort NaNs consistently, higher than any non-NaN. Also rearrange the box size calculation to not produce NaNs; instead it should produce an infinity for a box with NaN on one side and not-NaN on the other. I don't by any means claim that this solves all problems with NaNs in geometric values, but it should at least make GiST index insertion work reliably with such data. It's likely that the index search side of things still needs some work, and probably regular geometric operations too. But with this patch we're laying down a convention for how such cases ought to behave. Per bug #14238 from Guang-Dih Lei. Back-patch to 9.2; the code used before commit 7f3bd86843e5aad8 is quite different and doesn't lock up on my simple test case, nor on the submitter's dataset. Report: <> Discussion: <>
  • Improve documentation about search_path for SECURITY DEFINER functions. Clarify that the reason for recommending that pg_temp be put last is to prevent temporary tables from capturing unqualified table names. Per discussion with Albe Laurenz. Discussion: <>
  • Avoid invalidating all foreign-join cached plans when user mappings change. We must not push down a foreign join when the foreign tables involved should be accessed under different user mappings. Previously we tried to enforce that rule literally during planning, but that meant that the resulting plans were dependent on the current contents of the pg_user_mapping catalog, and we had to blow away all cached plans containing any remote join when anything at all changed in pg_user_mapping. This could have been improved somewhat, but the fact that a syscache inval callback has very limited info about what changed made it hard to do better within that design. Instead, let's change the planner to not consider user mappings per se, but to allow a foreign join if both RTEs have the same checkAsUser value. If they do, then they necessarily will use the same user mapping at runtime, and we don't need to know specifically which one that is. Post-plan-time changes in pg_user_mapping no longer require any plan invalidation. This rule does give up some optimization ability, to wit where two foreign table references come from views with different owners or one's from a view and one's directly in the query, but nonetheless the same user mapping would have applied. We'll sacrifice the first case, but to not regress more than we have to in the second case, allow a foreign join involving both zero and nonzero checkAsUser values if the nonzero one is the same as the prevailing effective userID. In that case, mark the plan as only runnable by that userID. The plancache code already had a notion of plans being userID-specific, in order to support RLS. It was a little confused though, in particular lacking clarity of thought as to whether it was the rewritten query or just the finished plan that's dependent on the userID. Rearrange that code so that it's clearer what depends on which, and so that the same logic applies to both RLS-injected role dependency and foreign-join-injected role dependency. Note that this patch doesn't remove the other issue mentioned in the original complaint, which is that while we'll reliably stop using a foreign join if it's disallowed in a new context, we might fail to start using a foreign join if it's now allowed, but we previously created a generic cached plan that didn't use one. It was agreed that the chance of winning that way was not high enough to justify the much larger number of plan invalidations that would have to occur if we tried to cause it to happen. In passing, clean up randomly-varying spelling of EXPLAIN commands in postgres_fdw.sql, and fix a COSTS ON example that had been allowed to leak into the committed tests. This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the previous attempt at ensuring we wouldn't push down foreign joins that span permissions contexts. Etsuro Fujita and Tom Lane Discussion: <>
  • Advance PG_CONTROL_VERSION. This should have been done in commit 73c986adde5d73a5 which added several new fields to pg_control, and again in commit 5028f22f6eb05798 which changed the CRC algorithm, but it wasn't. It's far too late to fix it in the 9.5 branch, but let's do so in 9.6, so that if a 9.6 postmaster is started against a 9.4-era pg_control it will complain about a versioning problem rather than a CRC failure. We already forced initdb/pg_upgrade for beta3, so there's no downside to doing this now. Discussion: <>
  • Clarify usage of clientcert authentication option. For some reason this option wasn't discussed at all in client-auth.sgml. Document it there, and be more explicit about its relationship to the "cert" authentication method. Per gripe from Srikanth Venkatesh. I failed to resist the temptation to do some minor wordsmithing in the same area, too. Discussion: <>
  • Fix crash in close_ps() for NaN input coordinates. The Assert() here seems unreasonably optimistic. Andreas Seltenreich found that it could fail with NaNs in the input geometries, and it seems likely to me that it might fail in corner cases due to roundoff error, even for ordinary input values. As a band-aid, make the function return SQL NULL instead of crashing. Report: <>
  • Add regression test case exercising the sorting path for hash index build. We've broken this code path at least twice in the past, so it's prudent to have a test case that covers it. To allow exercising the code path without creating a very large (and slow to run) test case, redefine the sort threshold to be bounded by maintenance_work_mem as well as the number of available buffers. While at it, fix an ancient oversight that when building a temp index, the number of available buffers is not NBuffers but NLocBuffer. Also, if assertions are enabled, apply a direct test that the sort actually does return the tuples in the expected order. Peter Geoghegan Patch: <>
  • Improve test case exercising the sorting path for hash index build. On second thought, we should probably do at least a minimal check that the constructed index is valid, since the big problem with the most recent breakage was not whether the sorting was correct but that the index had incorrect hash codes placed in it.
  • Update 9.6 release notes through today.

Magnus Hagander pushed:

Peter Eisentraut pushed:

Stephen Frost pushed:

Teodor Sigaev pushed:

Alvaro Herrera pushed:

  • Avoid serializability errors when locking a tuple with a committed update. When key-share locking a tuple that has been not-key-updated, and the update is a committed transaction, in some cases we raised serializability errors: ERROR: could not serialize access due to concurrent update Because the key-share doesn't conflict with the update, the error is unnecessary and inconsistent with the case that the update hasn't committed yet. This causes problems for some usage patterns, even if it can be claimed that it's sufficient to retry the aborted transaction: given a steady stream of updating transactions and a long locking transaction, the long transaction can be starved indefinitely despite multiple retries. To fix, we recognize that HeapTupleSatisfiesUpdate can return HeapTupleUpdated when an updating transaction has committed, and that we need to deal with that case exactly as if it were a non-committed update: verify whether the two operations conflict, and if not, carry on normally. If they do conflict, however, there is a difference: in the HeapTupleBeingUpdated case we can just sleep until the concurrent transaction is gone, while in the HeapTupleUpdated case this is not possible and we must raise an error instead. Per trouble report from Olivier Dony. In addition to a couple of test cases that verify the changed behavior, I added a test case to verify the behavior that remains unchanged, namely that errors are raised when a update that modifies the key is used. That must still generate serializability errors. One pre-existing test case changes behavior; per discussion, the new behavior is actually the desired one. Discussion: Backpatch to 9.3, where the problem appeared.

Andres Freund pushed:

  • Make HEAP_LOCK/HEAP2_LOCK_UPDATED replay reset HEAP_XMAX_INVALID. 0ac5ad5 started to compress infomask bits in WAL records. Unfortunately the replay routines for XLOG_HEAP_LOCK/XLOG_HEAP2_LOCK_UPDATED forgot to reset the HEAP_XMAX_INVALID (and some other) hint bits. Luckily that's not problematic in the majority of cases, because after a crash/on a standby row locks aren't meaningful. Unfortunately that does not hold true in the presence of prepared transactions. This means that after a crash, or after promotion, row level locks held by a prepared, but not yet committed, prepared transaction might not be enforced. Discussion: Backpatch: 9.3, the oldest branch on which 0ac5ad5 is present.
  • Fix torn-page, unlogged xid and further risks from heap_update(). When heap_update needs to look for a page for the new tuple version, because the current one doesn't have sufficient free space, or when columns have to be processed by the tuple toaster, it has to release the lock on the old page during that. Otherwise there'd be lock ordering and lock nesting issues. To avoid concurrent sessions from trying to update / delete / lock the tuple while the page's content lock is released, the tuple's xmax is set to the current session's xid. That unfortunately was done without any WAL logging, thereby violating the rule that no XIDs may appear on disk, without an according WAL record. If the database were to crash / fail over when the page level lock is released, and some activity lead to the page being written out to disk, the xid could end up being reused; potentially leading to the row becoming invisible. There might be additional risks by not having t_ctid point at the tuple itself, without having set the appropriate lock infomask fields. To fix, compute the appropriate xmax/infomask combination for locking the tuple, and perform WAL logging using the existing XLOG_HEAP_LOCK record. That allows the fix to be backpatched. This issue has existed for a long time. There appears to have been partial attempts at preventing dangers, but these never have fully been implemented, and were removed a long time ago, in 11919160 (cf. HEAP_XMAX_UNLOGGED). In master / 9.6, there's an additional issue, namely that the visibilitymap's freeze bit isn't reset at that point yet. Since that's a new issue, introduced only in a892234f830, that'll be fixed in a separate commit. Author: Masahiko Sawada and Andres Freund. Reported-By: Different aspects by Thomas Munro, Noah Misch, and others Discussion: Backpatch: 9.1/all supported versions

Correctifs en attente

Masahiko Sawada sent another revision of a patch to fix heap_udpate set xmax without WAL logging in the already_marked = true case.

Jeff Janes sent another revision of a patch which introduces a config variable (JJNOWAL) that skips all WAL, except for the WAL of certain checkpoints (which are needed for initdb and to restart the server after a clean shutdown).

Michael Paquier sent another revision of a patch to allow pointing pg_basebackup at a disconnected standby.

Etsuro Fujita sent two revisions of a patch to fix a misalignment between cached plans and FDW user mappings.

Michael Paquier sent another revision of a patch to remove the RecoveryTargetTLI dead variable from XLogCtlData.

Michael Paquier sent a patch to simplify the interface of UpdateMinRecoveryPoint.

Andres Freund sent in two revisions a patch to improve the performance of the executor.

Corey Huinker sent another revision of a patch to add an "interval" argument to psql's \timing which makes it less necessary to learn to do arithmetic in modulo 3600000.

Tom Lane sent a patch to move the source code in \df+ to a footer.

Fabien COELHO sent two more revisions of a patch to fix meta command only scripts in pgbench.

Fabien COELHO sent another revision of a patch to compute and show latency consistently in pgbench.

Laurenz Albe sent two revisions of a patch to fix the documentation for CREATE FUNCTION, which refers to a temporary namespace no longer searched.

Masahiko Sawada sent a patch to fixes a typo in contrib/postgres_fdw/deparse.c, to wit: s/whenver/whenever/g

Amit Kapila sent another revision of a patch to fix hash indexes so they're WAL logged.

Dilip Kumar sent in two more revisions of a patch to fix the pg_indexes view.

Magnus Hagander sent a patch to allow pg_xlogdump to follow into the future similar to the way "tail -f" does.

Dmitriy Sarafannikov sent a patch to backport 98a64d0bd713cb89e61bef6432befc4b7b5da59e to PostgreSQL 9.5.

Fabien COELHO sent in another revision of a patch to allow pgbench to store select results into variables.

Pavel Stehule sent another revision of a patch to add a RAW output format to COPY.

Jun Cheol Gim sent in a patch to add created and last_updated fields to pg_stat_statements.

par N Bougain le mardi 19 juillet 2016 à 00h08

jeudi 30 juin 2016


Appel à soutien : a besoin de 2 vm Linux

Bonjour à toutes et à tous,

Entre 2012 et 2015, la plateforme a été hébergée à titre gracieux par la société Gandi. Aujourd'hui Gandi souhaite mettre un terme à ce soutien et nous les remercions chaleureusement pour leur aide pendant les 3 dernières années.

L'équipe des administrateurs prépare donc une migration de ces services et pour ce faire nous sommes à la recherche de deux serveurs linux virtuels ayant chacun la configuration suivante :

  • 50 Go d'espace disque;
  • 4 CPU virtuels ;
  • 1Go de RAM ;
  • une bande passante illimitée

Évidement ça peut être plus :-)

Nous souhaitons par ailleurs les conditions d'hébergement suivantes :

  • 4 adresses IPv4 publiques;
  • Un système d'exploitation GNU/Linux (Debian a notre préférence);
  • Un accès direct pour redémarrer les machines virtuelles nous-même;
  • Une sauvegarde quotidienne des machines;
  • Un moyen simple (email, téléphone) et direct de contacter les administrateurs de la plateforme (si possible 7jour/7 et de préférence en français).

L'équipe souhaite par ailleurs signer une convention pour assurer la disponibilité et la gratuité des machines virtuelles pendant une durée de 1 à 3 ans.

Pour rappel, la plateforme c'est :

  • le site, son blog et son forum;
  • la documentation de PostgreSQL en français;
  • la documentation de Slony en français;
  • le site du PG Day France;
  • le service mail du domaine et autres;
  • les services web de l'association postgresqlfr;
  • la planète des blogs postgresql francophones;
  • d'autres choses qui m'échappent au moment où j'écris ces lignes :)

Donner une machine virtuelle à l'équipe c'est donc aider activement l'ensemble de ces projets et de manière générale soutenir la communauté francophone de PostgreSQL...

Si vous disposez des moyens pour nous fournir tout ou partie de cette infrastructure, merci de nous contacter sur

Bonne journée !

Damien Clochard pour l'équipe des admins de

'' PS : Je précise que l'équipe est autonome, bénévole et ouverte aux bonnes volontés. Nous n'avons pas compte bancaire et refusons les dons en espèce ou via paypal.

Nous n'avons pas besoin d'argent. Nous avons besoin de bénévoles et de matériel ;-)''

par daamien le jeudi 30 juin 2016 à 07h23

mercredi 29 juin 2016

Damien Clochard

Sortie de PostgreSQL 9.6 beta 2

Il y a une semaine sortait la seconde beta de la future version de PostgreSQL. De nombreux bugs ont été découverts et corrigés depuis la permière beta1, en particulier sur tout ce qui concerne le parallelisme. Si vous n’avez pas encore eu le temps de découvrir les nouveautés qui vous attendent c’est le moment !

Le menu est copieux :

  • Traitement parallèle des requêtes (scans séquentiels, jointures et aggrégats)
  • Pour le connecteur postgres_fdw, transmission (pushdown) des tris, des jointures et ordres DML (update/Delete) sur les tables externes
  • Pour le réplication Hot Standby : gestion de plusieurs instances synchrones et nouveau mode “remote_apply”
  • Pour les transactions : l’extension pg_visibility
  • Support des opérateurs KNN pour l’extension Cube
  • Un indicateur de progression des commandes
  • Plus d’informations sur les temps d’attente (‘waits’) dans pg_stat_activity
  • Parcours d’index seul (Index-only scans) pour les index partiels
  • Recherche plein texte sur des phrases entières
  • Nouvelle API pour les sauvegardes physiques

Et plus encore !

Attention ! Il y a eu pas mal de changements notables ces dernières semaines. Vous devez migrer les données de la beta1 vers la beta2 en rechargeant les données (sauvegarde puis restauration) et vous obtiendrez peut-être des erreurs pendant ce processus. Les principaux changements sont :

  • Le paramète ‘‘max_parallel_degree’’ est renomé ‘‘max_parallel_workers_per_gather’’
  • une fonction de vérification d’intégrité a été ajouté à pg_visibility
  • ajout de VACUUM (DISABLE_PAGE_SKIPPING) en cas d’urgence
  • ajout de la fonction pg_truncate_visibility_map
  • ajout du paramètre min_parallel_relation_size
  • désactivation du paramètre backend_flush_after

Pour suivre les bugs détectés et l’avancée des correctifs, rendez-vous la page Open Items

Et si vous cherchez un moyen de découvrir cette nouvelle version tout en contribuant à la communauté, il reste encore quelques pages de la doc à traduire !

par Damien Clochard le mercredi 29 juin 2016 à 23h52

mardi 28 juin 2016


Nouvelles hebdomadaires de PostgreSQL - 26 juin 2016

PostgreSQL 9.6 Beta 2 disponible. À vos tests !

Le PGDay POA 2016 aura lieu à Porto Alegre (État du Rio Grande do Sul, au Brésil) le 14 juillet 2016, en conjonction avec le FISL17 :

Le PGDay Philly 2016 se tiendra le 21 juillet au Huntsman Hall de la Wharton School et partage les lieux avec la DjangoCon de cette année Pour plus d'informations, cf

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en juin

PostgreSQL Local

  • FOSS4G NA (Free and Open Source Software for Geospatial - North America) se tiendra à Raleigh, en Caroline du Nord, du 2 au 5 mai 2016 :
  • La PGCon 2016 se tiendra du 17 au 21 mai 2016 à Ottawa :
  • Le PGDay suisse sera, cette année, tenue à l'Université des Sciences Appliquées (HSR) de Rapperswil le 24 juin 2016 :
  • "5432 ... Meet us!" aura lieu à Milan (Italie) les 28 & 29 juin 2016. Les inscriptions sont ouvertes :
  • Le PG Day UK aura lieu le 5 juillet 2016 :
  • Le PostgresOpen 2016 aura lieu à Dallas (Texas, USA) du 13 au 16 septembre. L'appel à conférenciers à été lancé :
  • Session PostgreSQL le 22 septembre 2016 à Lyon (France). La date limite de candidature est le 20 mai : envoyez vos propositions à call-for-paper AT postgresql-sessions DOT org [ndt:]
  • La PgConf Silicon Valley 2016 aura lieu du 14 au 16 novembre 2016 :

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Tom Lane pushed:

  • Docs: improve description of psql's %R prompt escape sequence. Dilian Palauzov pointed out in bug #14201 that the docs failed to mention the possibility of %R producing '(' due to an unmatched parenthesis. He proposed just adding that in the same style as the other options were listed; but it seemed to me that the sentence was already nearly unintelligible, so I rewrote it a bit more extensively. Report: <>
  • Fix comparison of similarity to threshold in GIST trigram searches. There was some very strange code here, dating to commit b525bf77, that purported to work around an ancient gcc bug by forcing a float4 comparison to be done as int instead. Commit 5871b8848 broke that when it changed one side of the comparison to "double" but left the comparison code alone. Commit f576b17cd doubled down on the weirdness by introducing a "volatile" marker, which had nothing to do with the actual problem. Guess that the gcc bug, even if it's still present in the wild, was triggered by comparison of float4's and can be avoided if we store the result of cnt_sml() into a double before comparing to the double "nlimit". This will at least work correctly on non-broken compilers, and it's way more readable. Per bug #14202 from Greg Navis. Add a regression test based on his example. Report: <>
  • pg_trgm's set_limit() function is parallel unsafe, not parallel restricted. Per buildfarm. Fortunately, it's not quite too late to squeeze this fix into the pg_trgm 1.3 update.
  • Add missing check for malloc failure in plpgsql_extra_checks_check_hook(). Per report from Andreas Seltenreich. Back-patch to affected versions. Report: <>
  • Stamp 9.6beta2.
  • Refactor planning of projection steps that don't need a Result plan node. The original upper-planner-pathification design (commit 3fc6e2d7f5b652b4) assumed that we could always determine during Path formation whether or not we would need a Result plan node to perform projection of a targetlist. That turns out not to work very well, though, because createplan.c still has some responsibilities for choosing the specific target list associated with sorting/grouping nodes (in particular it might choose to add resjunk columns for sorting). We might not ever refactor that --- doing so would push more work into Path formation, which isn't attractive --- and we certainly won't do so for 9.6. So, while create_projection_path and apply_projection_to_path can tell for sure what will happen if the subpath is projection-capable, they can't tell for sure when it isn't. This is at least a latent bug in apply_projection_to_path, which might think it can apply a target to a non-projecting node when the node will end up computing something different. Also, I'd tied the creation of a ProjectionPath node to whether or not a Result is needed, but it turns out that we sometimes need a ProjectionPath node anyway to avoid modifying a possibly-shared subpath node. Callers had to use create_projection_path for such cases, and we added code to them that knew about the potential omission of a Result node and attempted to adjust the cost estimates for that. That was uncertainly correct and definitely ugly/unmaintainable. To fix, have create_projection_path explicitly check whether a Result is needed and adjust its cost estimate accordingly, though it creates a ProjectionPath in either case. apply_projection_to_path is now mostly just an optimized version that can avoid creating an extra Path node when the input is known to not be shared with any other live path. (There is one case that create_projection_path doesn't handle, which is pushing parallel-safe expressions below a Gather node. We could make it do that by duplicating the GatherPath, but there seems no need as yet.) create_projection_plan still has to recheck the tlist-match condition, which means that if the matching situation does get changed by createplan.c then we'll have made a slightly incorrect cost estimate. But there seems no help for that in the near term, and I doubt it occurs often enough, let alone would change planning decisions often enough, to be worth stressing about. I added a "dummypp" field to ProjectionPath to track whether create_projection_path thinks a Result is needed. This is not really necessary as-committed because create_projection_plan doesn't look at the flag; but it seems like a good idea to remember what we thought when forming the cost estimate, if only for debugging purposes. In passing, get rid of the target_parallel parameter added to apply_projection_to_path by commit 54f5c5150. I don't think that's a good idea because it involves callers in what should be an internal decision, and opens us up to missing optimization opportunities if callers think they don't need to provide a valid flag, as most don't. For the moment, this just costs us an extra has_parallel_hazard call when planning a Gather. If that starts to look expensive, I think a better solution would be to teach PathTarget to carry/cache knowledge of parallel-safety of its contents.
  • Document that dependency tracking doesn't consider function bodies. If there's anyplace in our SGML docs that explains this behavior, I can't find it right at the moment. Add an explanation in "Dependency Tracking" which seems like the authoritative place for such a discussion. Per gripe from Michelle Schwan. While at it, update this section's example of a dependency-related error message: they last looked like that in 8.3. And remove the explanation of dependency updates from pre-7.3 installations, which is probably no longer worth anybody's brain cells to read. The bogus error message example seems like an actual documentation bug, so back-patch to all supported branches. Discussion: <>
  • Make "postgres -C guc" print "" not "(null)" for null-valued GUCs. Commit 0b0baf262 et al made this case print "(null)" on the grounds that that's what happened on platforms that didn't crash. But neither behavior was actually intentional. What we should print is just an empty string, for compatibility with the behavior of SHOW and other ways of examining string GUCs. Those code paths don't distinguish NULL from empty strings, so we should not here either. Per gripe from Alain Radix. Like the previous patch, back-patch to 9.2 where -C option was introduced. Discussion: <>
  • Fix type-safety problem with parallel aggregate serial/deserialization. The original specification for this called for the deserialization function to have signature "deserialize(serialtype) returns transtype", which is a security violation if transtype is INTERNAL (which it always would be in practice) and serialtype is not (which ditto). The patch blithely overrode the opr_sanity check for that, which was sloppy-enough work in itself, but the indisputable reason this cannot be allowed to stand is that CREATE FUNCTION will reject such a signature and thus it'd be impossible for extensions to create parallelizable aggregates. The minimum fix to make the signature type-safe is to add a second, dummy argument of type INTERNAL. But to lock it down a bit more and make misuse of INTERNAL-accepting functions less likely, let's get rid of the ability to specify a "serialtype" for an aggregate and just say that the only useful serialtype is BYTEA --- which, in practice, is the only interesting value anyway, due to the usefulness of the send/recv infrastructure for this purpose. That means we only have to allow "serialize(internal) returns bytea" and "deserialize(bytea, internal) returns internal" as the signatures for these support functions. In passing fix bogus signature of int4_avg_combine, which I found thanks to adding an opr_sanity check on combinefunc signatures. catversion bump due to removing pg_aggregate.aggserialtype and adjusting signatures of assorted built-in functions. David Rowley and Tom Lane Discussion: <>
  • Update oidjoins regression test for 9.6. Looks like we had some more catalog drift recently.
  • Improve user-facing documentation for partial/parallel aggregation. Add a section to xaggr.sgml, as we have done in the past for other extensions to the aggregation functionality. Assorted wordsmithing and other minor improvements. David Rowley and Tom Lane
  • Fix small memory leak in partial-aggregate deserialization functions. A deserialize function's result is short-lived data during partial aggregation, since we're just going to pass it to the combine function and then it's of no use anymore. However, the built-in deserialize functions allocated their results in the aggregate state context, resulting in a query-lifespan memory leak. It's probably not possible for this to amount to anything much at present, since the number of leaked results would only be the number of worker processes. But it might become a problem in future. To fix, don't use the same convenience subroutine for setting up results that the aggregate transition functions use. David Rowley Report: <>
  • Fix building of large (bigger than shared_buffers) hash indexes. When the index is predicted to need more than NBuffers buckets, CREATE INDEX attempts to sort the index entries by hash key before insertion, so as to reduce thrashing. This code path got broken by commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not just an alias for index_form_tuple(). The index got built anyway, but with garbage data, so that searches for pre-existing tuples always failed. Fix by refactoring to separate construction of the indexable data from calling index_form_tuple(). Per bug #14210 from Daniel Newman. Back-patch to 9.5 where the bug was introduced. Report: <>
  • Simplify planner's final setup of Aggrefs for partial aggregation. Commit e06a38965's original coding for constructing the execution-time expression tree for a combining aggregate was rather messy, involving duplicating quite a lot of code in setrefs.c so that it could inject a nonstandard matching rule for Aggrefs. Get rid of that in favor of explicitly constructing a combining Aggref with a partial Aggref as input, then allowing setref's normal matching logic to match the partial Aggref to the output of the lower plan node and hence replace it with a Var. In passing, rename and redocument make_partialgroup_input_target to have some connection to what it actually does.
  • Rethink node-level representation of partial-aggregation modes. The original coding had three separate booleans representing partial aggregation behavior, which was confusing, unreadable, and error-prone, not least because the booleans weren't always listed in the same order. It was also inadequate for the allegedly-desirable future extension to support intermediate partial aggregation, because we'd need separate markers for serialization and deserialization in such a case. Merge these bools into an enum "AggSplit" to provide symbolic names for the supported operating modes (and document what those are). By assigning the values of the enum constants carefully, we can treat AggSplit values as options bitmasks so that tests of what to do aren't noticeably more expensive than before. While at it, get rid of Aggref.aggoutputtype. That's not needed since commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison code, and it likewise seemed more confusing than helpful. Assorted comment cleanup as well (there's still more that I want to do in that line). catversion bump for change in Aggref node contents. Should be the last one for partial-aggregation changes. Discussion: <>
  • Avoid making a separate pass over the query to check for partializability. It's rather silly to make a separate pass over the tlist + HAVING qual, and a separate set of visits to the syscache, when get_agg_clause_costs already has all the required information in hand. This nets out as less code as well as fewer cycles.

Magnus Hagander pushed:

Peter Eisentraut pushed:

Bruce Momjian pushed:

Andrew Dunstan pushed:

Robert Haas pushed:

Álvaro Herrera pushed:

  • Fix handling of multixacts predating pg_upgrade. After pg_upgrade, it is possible that some tuples' Xmax have multixacts corresponding to the old installation; such multixacts cannot have running members anymore. In many code sites we already know not to read them and clobber them silently, but at least when VACUUM tries to freeze a multixact or determine whether one needs freezing, there's an attempt to resolve it to its member transactions by calling GetMultiXactIdMembers, and if the multixact value is "in the future" with regards to the current valid multixact range, an error like this is raised: ERROR: MultiXactId 123 has not been created yet -- apparent wraparound and vacuuming fails. Per discussion with Andrew Gierth, it is completely bogus to try to resolve multixacts coming from before a pg_upgrade, regardless of where they stand with regards to the current valid multixact range. It's possible to get from under this problem by doing SELECT FOR UPDATE of the problem tuples, but if tables are large, this is slow and tedious, so a more thorough solution is desirable. To fix, we realize that multixacts in xmax created in 9.2 and previous have a specific bit pattern that is never used in 9.3 and later (we already knew this, per comments and infomask tests sprinkled in various places, but we weren't leveraging this knowledge appropriately). Whenever the infomask of the tuple matches that bit pattern, we just ignore the multixact completely as if Xmax wasn't set; or, in the case of tuple freezing, we act as if an unwanted value is set and clobber it without decoding. This guarantees that no errors will be raised, and that the values will be progressively removed until all tables are clean. Most callers of GetMultiXactIdMembers are patched to recognize directly that the value is a removable "empty" multixact and avoid calling GetMultiXactIdMembers altogether. To avoid changing the signature of GetMultiXactIdMembers() in back branches, we keep the "allow_old" boolean flag but rename it to "from_pgupgrade"; if the flag is true, we always return an empty set instead of looking up the multixact. (I suppose we could remove the argument in the master branch, but I chose not to do so in this commit). This was broken all along, but the error-facing message appeared first because of commit 8e9a16ab8f7f and was partially fixed in a25c2b7c4db3. This fix, backpatched all the way back to 9.3, goes approximately in the same direction as a25c2b7c4db3 but should cover all cases. Bug analysis by Andrew Gierth and Álvaro Herrera. A number of public reports match this bug:

Correctifs rejetés (à ce jour)

No one was disappointed this week :-)

Correctifs en attente

Andreas Karlsson sent in another revision of a patch to tag extension functions with their parallel safety status.

Paul A Jungwirth sent in another revision of a patch to add GiST support for UUIDs.

Michaël Paquier sent in three revisions of a patch to add conninfo to the pg_stat_wal_receiver view.

Muhammad Asif Naeem sent in another revision of a patch to add an EMERGENCY option to VACUUM.

Etsuro Fujita sent in a patch to remove an unnecessary return from deparseFromExprForRel in contrib/postgres_fdw/deparse.c.

Teodor Sigaev sent in another revision of a patch to change the precedence of to |, &, <->, | , and simplifies a bit a code around printing and parsing of tsquery.

Amit Langote and Ashutosh Bapat traded patches to fix an issue where whole-row references got a wrong result in foreign join pushdowns.

Bruce Momjian sent in a patch to remove references to UT1 from the documentation.

Michaël Paquier sent in two revisions of a patch to fix the documentation for pg_visibility.

Michaël Paquier sent in another revision of a patch to fix an issue where pg_backup from a disconnected standby fails.

SAWADA Masahiko sent in a patch to fix a mismatch between a comment and function argument names in bugmgr.

Peter Geoghegan sent in a patch to remove some dead code in tuplesort.c

Michaël Paquier sent in a patch to ensure that in Win32 with debugging turned on, it always produces crashdumps.

David Rowley sent in a patch to refactor the representation of partial aggregate steps.

David Rowley sent in a patch to fix a bug in citext's upgrade script for parallel aggregates.

Julien Rouhaud sent in another revision of a patch to add a global_max_parallel_workers GUC.

Piotr Stefaniak sent in some improvements to pg_bsd_indent.

Piotr Stefaniak sent in a patch to make some cosmetic improvements around shm_mq and test_shm_mq.

par N Bougain le mardi 28 juin 2016 à 07h05

vendredi 20 mai 2016

Guillaume Lelarge

Quelques nouvelles sur les traductions du manuel

J'ai passé beaucoup de temps ces derniers temps sur la traduction du manuel de PostgreSQL.

  • mise à jour pour les dernières versions mineures
  • corrections permettant de générer les PDF déjà disponibles (9.1, 9.2, 9.3 et 9.4) mais aussi le PDF de la 9.5
  • merge pour la traduction de la 9.6

Elles sont toutes disponibles sur le site

De ce fait, la traduction du manuel de la 9.6 peut commencer. Pour les intéressés, c'est par là :

N'hésitez pas à m'envoyer toute question si vous êtes intéressé pour participer.

par Guillaume Lelarge le vendredi 20 mai 2016 à 20h35

vendredi 13 mai 2016


PostgreSQL 9.6 Beta est sortie ! Testez le parallèlisme

Le PostgreSQL Global Development Group (PGDG) vient d'annoncer la sortie de la première version bêta de PostgreSQL 9.6. Cette version donne un aperçu de toutes les fonctionnalités qui seront présentes dans la version finale, même si certains détails peuvent encore changer. Le projet PostgreSQL encourage tous les utilisateurs à tester leurs applications sur cette version.

Fonctionnalités principales de la 9.6

La version 9.6 contient des changements importants et des améliorations prometteuses, notamment :

  • Le parallélisme : les parcours séquentiels, les jointures et les agrégats peuvent désormais être exécutés en parallèle
  • La cohérence des lectures entre plusieurs noeuds Hot Standby synchrones
  • la recherche plein-texte de phrases ;
  • Le connecteur postgres_fdw peut exécuter les tris, jointures, UPDATEs, DELETEs sur les serveurs distants ;
  • La réduction de l'impact des autovacuum sur les grandes tables en évitant de "refreezer" les données anciennes.

Parmi ces nouveautés, le parallèlisme devrait apporter un gain de performance important pour certaines requêtes.

La chasse aux bugs est ouvertes

Comme pour toutes les versions majeures, les améliorations de PostgreSQL impliquent des modifications sur de très grandes parties du code. La communauté PostgreSQL compte sur vous pour tester cette version dans votre contexte et signaler les bugs et les régressions avant la sortie de la version 9.6.0. Au-delà des nouvelles fonctionnalités, vous pouvez également tester si :

  • les requêtes parallèles améliorent les performances pour vous ?
  • les requêtes parallèles crashent ou provoquent des pertes de données ?
  • PostgreSQL fonctionne correctement sur vos plateformes ?
  • l'amélioration du vacuum freeze réduit l'autovacuum des tables volumineuses ?
  • la recherche plein texte sur des phrases retourne les résultats attendus ?

La version 9.6 beta 1 modifie également l'API des backups binaires. Les DBA sont invités à tester la version 9.6 avec les outils de sauvegarde tels que pgBackRest, Barman, WAL-E ou tout autre outil interne ou packagé.

Il s'agit d'une beta et donc des modifications mineures sont possibles au niveau du comportement, des fonctionnalités et des APIs. Votre feedback et vos tests aideront à finaliser les nouveautés, alors attaquez les tests le plus rapidement possible ! La qualité des tests utilisateurs a un impact important sur la date de sortie de la version finale.


Ceci est la première version beta de la version 9.6. Le projet PostgreSQL sortira des versions beta supplémentaires aussi souvent que nécessaires, suivies de version candidates et enfin la version finale prévue pour fin 2016. Pour plus d'information, rendez-vous sur cette page :


par daamien le vendredi 13 mai 2016 à 15h03

Sortie de PostgreSQL 9.5.3, 9.4.8, 9.3.13, 9.2.17 et 9.1.22

Le PostgreSQL Global Development Group a publié une mise à jour de toutes les versions supportées du SGBD, incluant les versions 9.5.3, 9.4.8, 9.3.13, 9.2.17 and 9.1.22. Ces versions mineures corrigent un certain nombre de problèmes découverts par les utilisateurs sur les deux derniers mois. Les utilisateurs touchés par les problèmes corrigés sont invités à mettre à jour leur installation immédiatement; les autres doivent planifier la mise à jour dès que possible.

Corrections de bugs et améliorations

Cette mise à jour corrige plusieurs problèmes qui provoquent des indisponibilités pour les utilisateurs :

  • Suppression de la file d’erreur OpenSSL avant un appel à OpenSSL, pour éviter des erreurs sur les connexions SSL, en particulier en utilisant les wrappers OpenSSL Python, Ruby ou PHP
  • Correction de l’optimiseur pour éviter des erreurs de type “failed to build N-way joins”
  • Correction sur la gestion des équivalences dans des plans d’exécution utilisant plusieurs Nested Loop, qui peuvent retourner des lignes qui ne correspondaient pas à la clause WHERE
  • Correction de deux fuites mémoires lors de l’utilisation d’index GIN, dont une pouvant potentiellement entraîner des corruptions d’index

Cette mise à jour inclut également des corrections pour des problèmes reportés, dont la plupart affectent toutes les versions supportées :

  • Correction d’erreurs de parsing lorsque le paramètre operator_precedence_warning est activé
  • Correction d’un possible mauvais comportement de la fonction to_timestamp() avec les codes TH, th et Y,YYY
  • Correction de l’export des vues et des règles qui utilisent ANY (array) dans un sous-SELECT
  • Interdiction d’utiliser un caractère retour-chariot dans la valeur d’un paramètre avec ALTER SYSTEM
  • Correction de différents problèmes liés à la suppression du lien symbolique vers un tablespace
  • Correction d’un plantage du décodage logique sur certaines plate-formes sensibles à l’alignement des données
  • Suppression des demandes répétées de retour du destinataire lors de l’arrêt du processus walsender
  • Correction de plusieurs problèmes de pg_upgrade
  • Ajout du support de la compilation avec Visual Studio 2015

Cette publication contient aussi la version 2016d de tzdata, qui réalise les mises à jour des timezones pour la Russie, le Venezuela, Kirov, Tomsk.

Mise à jour

Les mises à jours mineures sont cumulatives. Comme avec les autres versions mineures, les utilisateurs n’ont pas besoin de sauvegarder et restaurer les bases de données ou d’utiliser pg_upgrade pour mettre en place cette mise à jour. Vous pouvez simplement arrêter PostgreSQL, mettre à jour les binaires et le redémarrer. Les utilisateurs qui ont ignoré plusieurs mises à jour mineures peuvent avoir besoin de réaliser des opérations après mise à jour. Voir les notes de version pour les détails.


par daamien le vendredi 13 mai 2016 à 11h51

mardi 3 mai 2016


Nouvelles hebdomadaires de PostgreSQL - 1er mai 2016

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en mai

PostgreSQL Local

  • FOSS4G NA (Free and Open Source Software for Geospatial - North America) se tiendra à Raleigh, en Caroline du Nord, du 2 au 5 mai 2016 :
  • La PGCon 2016 se tiendra du 17 au 21 mai 2016 à Ottawa :
  • Le PGDay suisse sera, cette année, tenue à l'Université des Sciences Appliquées (HSR) de Rapperswil le 24 juin 2016 :
  • "5432 ... Meet us!" aura lieu à Milan (Italie) les 28 & 29 juin 2016. Les inscriptions sont ouvertes :
  • Le PG Day UK aura lieu le 5 juillet 2016 :
  • Le PostgresOpen 2016 aura lieu à Dallas (Texas, USA) du 13 au 16 septembre. L'appel à conférenciers à été lancé :
  • Session PostgreSQL le 22 septembre 2016 à Lyon (France). La date limite de candidature est le 20 mai : envoyez vos propositions à call-for-paper AT postgresql-sessions DOT org.
  • La PgConf Silicon Valley 2016 aura lieu du 14 au 16 novembre 2016 :

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Peter Eisentraut pushed:

Tom Lane pushed:

  • Try harder to detect a port conflict in Commit fab84c7787f25756 tried to get away without doing an actual bind(), but buildfarm results show that that doesn't get the job done. So we must really bind to the target port --- and at least on my Linux box, we need a listen() as well, or conflicts won't be detected. We rely on SO_REUSEADDR to prevent problems from starting a postmaster on the socket immediately after we've bound to it in the test code. (There may be platforms where that doesn't work too well. But fortunately, we only really care whether this works on Windows, and there the default behavior should be OK.)
  • New method for preventing compile-time calculation of degree constants. Commit 65abaab547a5758b tried to prevent the scaling constants used in the degree-based trig functions from being precomputed at compile time, because some compilers do that with functions that don't yield results identical-to-the-last-bit to what you get at runtime. A report from Peter Eisentraut suggests that some recent compilers are smart enough to see through that trick, though. Instead, let's put the inputs to these calculations into non-const global variables, which should be a more reliable way of convincing the compiler that it can't assume that they are compile-time constants. (If we really get desperate, we could mark these variables "volatile", but I do not believe we should have to.)
  • Yet more portability hacking for degree-based trig functions. The true explanation for Peter Eisentraut's report of inexact asind results seems to be that (a) he's compiling into x87 instruction set, which uses wider-than-double float registers, plus (b) the library function asin() on his platform returns a result that is wider than double and is not rounded to double width. To fix, we have to force the function's result to be rounded comparably to what happened to the scaling constant asin_0_5. Experimentation suggests that storing it into a volatile local variable is the least ugly way of making that happen. Although only asin() is known to exhibit an observable inexact result, we'd better do this in all the places where we're hoping to get an exact result by scaling.
  • Fix order of shutdown cleanup operations in Previously, database clusters created by a TAP test were shut down by DESTROY methods attached to the PostgresNode objects representing them. The trouble with that is that if the objects survive into the final global destruction phase (which they do), Perl executes the DESTROY methods in an unspecified order. Thus, the order of shutdown of multiple clusters was indeterminate, which might lead to not-very-reproducible errors getting logged (eg from a slave whose master might or might not get killed first). Worse, the File::Temp objects representing the temporary PGDATA directories might get destroyed before the PostgresNode objects, resulting in attempts to delete PGDATA directories that still have live servers in them. On Windows, this would lead to directory deletion failures; on Unix, it usually had no effects worse than erratic "could not open temporary statistics file "pg_stat/global.tmp": No such file or directory" log messages. While none of this would affect the reported result of the TAP test, which is already determined, it could be very confusing when one is trying to understand from the logs what went wrong with a failed test. To fix, do the postmaster shutdowns in an END block rather than at object destruction time. The END block will execute at a well-defined (and reasonable) time during script termination, and it will stop the postmasters in order of PostgresNode object creation. (Perhaps we should change that to be reverse order of creation, but the main point here is that we now have control which we did not before.) Use "pg_ctl stop", not an asynchronous kill(SIGQUIT), so that we wait for the postmasters to shut down before proceeding with directory deletion. Deletion of temporary directories still happens in an unspecified order during global destruction, but I can see no reason to care about that once the postmasters are stopped.
  • Add a --brief option to git_changelog. In commit c0b050192, Andres introduced the idea of including one-line commit references in our major release notes. Teach git_changelog to emit a (lightly adapted) version of that format, so that we don't have to laboriously add it to the notes after the fact. The default output isn't changed, since I anticipate still using that for minor release notes.
  • Clean up parsing of synchronous_standby_names GUC variable. Commit 989be0810dffd08b added a flex/bison lexer/parser to interpret synchronous_standby_names. It was done in a pretty crufty way, though, making assorted end-use sites responsible for calling the parser at the right times. That was not only vulnerable to errors of omission, but made it possible that lexer/parser errors occur at very undesirable times, and created memory leakages even if there was no error. Instead, perform the parsing once during check_synchronous_standby_names and let guc.c manage the resulting data. To do that, we have to flatten the parsed representation into a single hunk of malloc'd memory, but that is not very hard. While at it, work a little harder on making useful error reports for parsing problems; the previous code felt that "synchronous_standby_names parser returned 1" was an appropriate user-facing error message. (To be fair, it did also log a syntax error message, but separately from the GUC problem report, which is at best confusing.) It had some outright bugs in the face of invalid input, too. I (tgl) also concluded that we need to restrict unquoted names in synchronous_standby_names to be just SQL identifiers. The previous coding would accept darn near anything, which (1) makes the quoting convention both nearly-unnecessary and formally ambiguous, (2) makes it very hard to understand what is a syntax error and what is a creative interpretation of the input as a standby name, and (3) makes it impossible to further extend the syntax in future without a compatibility break. I presume that we're intending future extensions of the syntax, else this parsing infrastructure is massive overkill, so (3) is an important objection. Since we've taken a compatibility hit for non-identifier names with this change anyway, we might as well lock things down now and insist that users use double quotes for standby names that aren't identifiers. Kyotaro Horiguchi and Tom Lane
  • Use memmove() not memcpy() to slide some pointers down. The previous coding here was formally undefined, though it seems to accidentally work on most platforms in the buildfarm. Caught by some OpenBSD platforms in which libc contains an assertion check for overlapping areas passed to memcpy(). Thomas Munro
  • Revert "Convert contrib/seg's bool-returning SQL functions to V1 call convention." This reverts commit c8e81afc60093b199a128ccdfbb692ced8e0c9cd. That turns out to have been based on a faulty diagnosis of why the VS2015 build was misbehaving. Instead, we need to fix DatumGetBool().
  • Adjust DatumGetBool macro, this time for sure. Commit 23a41573c attempted to fix the DatumGetBool macro to ignore bits in a Datum that are to the left of the actual bool value. But it did that by casting the Datum to bool; and on compilers that use C99 semantics for bool, that ends up being a whole-word test, not a 1-byte test. This seems to be the true explanation for contrib/seg failing in VS2015. To fix, use GET_1_BYTE() explicitly. I think in the previous patch, I'd had some idea of not having to commit to bool being exactly 1 byte wide, but regardless of what the compiler's bool is, boolean columns and Datums are certainly 1 byte wide. The previous fix was (eventually) back-patched into all active versions, so do likewise with this one.
  • Fix mishandling of equivalence-class tests in parameterized plans. Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is *extremely* hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.
  • Remove warning about num_sync being too large in synchronous_standby_names. If we're not going to reject such setups entirely, throwing a WARNING in check_synchronous_standby_names() is unhelpful, because it will cause the warning to be logged again every time the postmaster receives SIGHUP. Per discussion, just remove the warning. In passing, improve the documentation for synchronous_commit, which had not gotten the word that now there can be more than one synchronous standby.
  • Fix planner crash from pfree'ing a partial path that a GatherPath uses. We mustn't run generate_gather_paths() during add_paths_to_joinrel(), because that function can be invoked multiple times for the same target joinrel. Not only is it wasteful to build GatherPaths repeatedly, but a later add_partial_path() could delete the partial path that a previously created GatherPath depends on. Instead establish the convention that we do generate_gather_paths() for a rel only just before set_cheapest(). The code was accidentally not broken for baserels, because as of today there never is more than one partial path for a baserel. But that assumption obviously has a pretty short half-life, so move the generate_gather_paths() calls for those cases as well. Also add some generic comments explaining how and why this all works. Per fuzz testing by Andreas Seltenreich. Report: <>
  • Small improvements to OPTIMIZER_DEBUG code. Now that Paths have their own rows field, print that rather than the parent relation's rowcount. Show the relid sets associated with Paths using table names rather than numbers; since this code is able to print simple Var references using table names, it seems a bit silly that print_relids can't. Print the cheapest_parameterized_paths list for a RelOptInfo, and include information about a parameterized path's required_outer rels. Noted while trying to use this feature to debug Alexander Kirkouski's recent bug report.
  • Update contrib/unaccent documentation about its unaccent.rules file. Commit 1bbd52cb9a4aa61a didn't bother with such niceties.
  • Add a --non-master-only option to git_changelog. This has the inverse effect of --master-only. It's needed to help find cases where a commit should not be described in major release notes because it was back-patched into older branches, though not at the same time as the HEAD commit.

Kevin Grittner pushed:

  • Fix C comment typo and redundant test
  • Add a few entries to the tail of time mapping, to see old values. Without a few entries beyond old_snapshot_threshold, the lookup would often fail, resulting in the more aggressive pruning or vacuum being skipped often enough to matter. This was very clearly shown by a python test script posted by Ants Aasma, and was likely a factor in an earlier but somewhat less clear-cut test case posted by Jeff Janes. This patch makes no change to the logic, per se -- it just makes the array of mapping entries big enough to make lookup misses based on timing much less likely. An occasional miss is still possible if a thread stalls for more than 10 minutes, but that does not create any problem with correctness of behavior. Besides, if things are so busy that a thread is stalling for more than 10 minutes, it is probably OK to skip the more aggressive cleanup at that particular point in time.

Magnus Hagander pushed:

Robert Haas pushed:

Teodor Sigaev pushed:

  • Fix tsearch docs. Remove mention of setweight(tsquery) which wasn't included in 9.6. Also replace old forgotten phrase operator to new one. Dmitry Ivanov
  • Prevent multiple cleanup process for pending list in GIN. Previously, ginInsertCleanup could exit early if it detects that someone else is cleaning up the pending list, without waiting for that someone else to finish the job. But in this case vacuum could miss tuples to be deleted. Cleanup process now locks metapage with a help of heavyweight LockPage(ExclusiveLock), and it guarantees that there is no another cleanup process at the same time. Lock is taken differently depending on caller of cleanup process: any vacuums and gin_clean_pending_list() will be blocked until lock becomes available, ordinary insert uses conditional lock to prevent indefinite waiting on lock. Insert into pending list doesn't use this lock, so insertion isn't blocked. Also, patch adds stopping of cleanup process when at-start-cleanup-tail is reached in order to prevent infinite cleanup in case of massive insertion. But it will stop only for automatic maintenance tasks like autovacuum. Patch introduces choice of limit of memory to use: autovacuum_work_mem, maintenance_work_mem or work_mem depending on call path. Patch for previous releases should be reworked due to changes between 9.6 and previous ones in this area. Discover and diagnostics by Jeff Janes and Tomas Vondra Patch by me with some ideas of Jeff Janes
  • Prevent to use magic constants. Use macroses for definition amstrategies/amsupport fields instead of hardcoded values. Author: Nikolay Shaplov with addition for contrib/bloom

Noah Misch pushed:

  • Impose a full barrier in generic-xlc.h atomics functions. pg_atomic_compare_exchange_*_impl() were providing only the semantics of an acquire barrier. Buildfarm members hornet and mandrill revealed this deficit beginning with commit 008608b9d51061b1f598c197477b3dc7be9c4a64. While we have no report of symptoms in 9.5, we can't rule out the possibility of certain compilers, hardware, or extension code relying on these functions' specified barrier semantics. Back-patch to 9.5, where commit b64d92f1a5602c55ee8b27a7ac474f03b7aee340 introduced atomics. Reviewed by Andres Freund.

Andres Freund pushed:

  • Don't open formally non-existent segments in _mdfd_getseg(). Before this commit _mdfd_getseg(), in contrast to mdnblocks(), did not verify whether all segments leading up to the to-be-opened one, were RELSEG_SIZE sized. That is e.g. not the case after truncating a relation, because later segments just get truncated to zero length, not removed. Once a "non-existent" segment has been opened in a session, mdnblocks() will return wrong results, causing errors like "could not read block %u in file" when accessing blocks. Closing the session, or the later arrival of relevant invalidation messages, would "fix" the problem. That, so far, was mostly harmless, because most segment accesses are only done after an mdnblocks() call. But since 428b1d6b29ca we try to open segments that might have been deleted, to trigger kernel writeback from a backend's queue of recent writes. To fix check segment sizes in _mdfd_getseg() when opening previously unopened segments. In practice this shouldn't imply a lot of additional lseek() calls, because mdnblocks() will most of the time already have opened all relevant segments. This commit also fixes a second problem, namely that _mdfd_getseg( EXTENSION_RETURN_NULL) extends files during recovery, which is not desirable for the mdwriteback() case. Add EXTENSION_REALLY_RETURN_NULL, which does not behave that way, and use it. Reported-By: Thom Brown Author: Andres Freund, Abhijit Menon-Sen Reviewd-By: Robert Haas, Fabien Coehlo Discussion: Fixes: 428b1d6b29ca599c5700d4bc4f4ce4c5880369bf
  • Emit invalidations to standby for transactions without xid. So far, when a transaction with pending invalidations, but without an assigned xid, committed, we simply ignored those invalidation messages. That's problematic, because those are actually sent for a reason. Known symptoms of this include that existing sessions on a hot-standby replica sometimes fail to notice new concurrently built indexes and visibility map updates. The solution is to WAL log such invalidations in transactions without an xid. We considered to alternatively force-assign an xid, but that'd be problematic for vacuum, which might be run in systems with few xids. Important: This adds a new WAL record, but as the patch has to be back-patched, we can't bump the WAL page magic. This means that standbys have to be updated before primaries; otherwise "PANIC: standby_redo: unknown op code 32" errors can be encountered. XXX: Reported-By: ВаÑильев Дмитрий, Masahiko Sawada Discussion:
  • Remember asking for feedback during walsender shutdown. Since 5a991ef8 we're explicitly asking for feedback from the receiving side when shutting down walsender, if there's not yet replicated data. Unfortunately we didn't remember (i.e. set waiting_for_ping_response to true) having asked for feedback, leading to scenarios in which replies were requested at a high frequency. I can't reproduce this problem on my laptop, I think that's because the problem requires a significant TCP window to manifest due to the !pq_is_send_pending() condition. But since this clearly is a bug, let's fix it. There's quite possibly more wrong than just this though. While fiddling with WalSndDone(), I rewrote a hard to understand comment about looking at the flush vs. the write position. Reported-By: Nick Cleaton, Magnus Hagander Author: Nick Cleaton Discussion: Backpatch: 9.4, were 5a991ef8 introduced the use of feedback messages during shutdown.

Andrew Dunstan pushed:

Correctifs rejetés (à ce jour)

No one was disappointed this week :-)

Correctifs en attente

Stephen Frost sent in another revision of a patch to fix pg_dump for catalog ACLs.

Sehrope Sarkuni sent in a patch to add jsonb_compact(...) for whitespace-free jsonb to text.

Peter Eisentraut sent in another revision of a patch to fix an OpenSSL error queue bug.

Kyotaro HORIGUCHI sent in two more revisions of a patch to fix synchronous replication update configuration.

Ashutosh Sharma sent in a patch to allow pg_basebackup to create pg_stat_tmp and pg_replslot as an empty directories if they are symbolic links.

Etsuro Fujita sent in a patch to fix subtransaction callbacks in the PostgreSQL FDW.

Christian Ullrich sent in three revisions of a patch to enable parallel builds with MSVC.

Dean Rasheed sent in a patch to fix an issue in psql's \ev feature which adds support for missing VIEW options like WITH CHECK OPTION.

Daniel Gustafsson sent in a patch to remove some unused macros from contrib.

Julien Rouhaud sent in two revisions of a patch to fix an issue with FK joins that could cause a segfault.

Craig Ringer sent in a patch to allow a stop LSN to be specified to pg_recvlogical.

Andreas Karlsson sent in a patch to fix the parallel safety markings for some built-in functions that got missed in the shuffle.

Ian Lawrence Barwick sent in a patch to add tab completion to psql's \l command.

Amit Kapila sent in a patch to fix an issue sqlsmith uncovered in apply_projection_to_path.

par N Bougain le mardi 3 mai 2016 à 23h33

mercredi 27 avril 2016


Nouvelles hebdomadaires de PostgreSQL - 24 avril 2016

Session PostgreSQL le 22 septembre 2016 à Lyon (France). La date limite de candidature est le 20 mai : envoyez vos propositions à call-for-paper AT postgresql-sessions DOT org.

Le PostgresOpen 2016 aura lieu à Dallas (Texas, USA) du 13 au 16 septembre. L'appel à conférenciers à été lancé :

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en avril

PostgreSQL Local

  • FOSS4G NA (Free and Open Source Software for Geospatial - North America) se tiendra à Raleigh, en Caroline du Nord, du 2 au 5 mai 2016 :
  • La PGCon 2016 se tiendra du 17 au 21 mai 2016 à Ottawa :
  • Le PGDay suisse sera, cette année, tenue à l'Université des Sciences Appliquées (HSR) de Rapperswil le 24 juin 2016 :
  • "5432 ... Meet us!" aura lieu à Milan (Italie) les 28 & 29 juin 2016. Les inscriptions sont ouvertes :
  • Le PG Day UK aura lieu le 5 juillet 2016 :
  • Session PostgreSQL le 22 septembre 2016 à Lyon (France). La date limite de candidature est le 20 mai : envoyez vos propositions à call-for-paper AT postgresql-sessions DOT org.
  • La PgConf Silicon Valley 2016 aura lieu du 14 au 16 novembre 2016 :

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Peter Eisentraut pushed:

Fujii Masao pushed:

Tom Lane pushed:

  • Further reduce the number of semaphores used under --disable-spinlocks. Per discussion, there doesn't seem to be much value in having NUM_SPINLOCK_SEMAPHORES set to 1024: under any scenario where you are running more than a few backends concurrently, you really had better have a real spinlock implementation if you want tolerable performance. And 1024 semaphores is a sizable fraction of the system-wide SysV semaphore limit on many platforms. Therefore, reduce this setting's default value to 128 to make it less likely to cause out-of-semaphores problems.
  • Make partition-lock-release coding more transparent in BufferAlloc(). Coverity complained that oldPartitionLock was possibly dereferenced after having been set to NULL. That actually can't happen, because we'd only use it if (oldFlags & BM_TAG_VALID) is true. But nonetheless Coverity is justified in complaining, because at line 1275 we actually overwrite oldFlags, and then still expect its BM_TAG_VALID bit to be a safe guide to whether to release the oldPartitionLock. Thus, the code would be incorrect if someone else had changed the buffer's BM_TAG_VALID flag meanwhile. That should not happen, since we hold pin on the buffer throughout this sequence, but it's starting to look like a rather shaky chain of logic. And there's no need for such assumptions, because we can simply replace the (oldFlags & BM_TAG_VALID) tests with (oldPartitionLock != NULL), which has identical results and makes it plain to all comers that we don't dereference a null pointer. A small side benefit is that the range of liveness of oldFlags is greatly reduced, possibly allowing the compiler to save a register. This is just cleanup, not an actual bug fix, so there seems no need for a back-patch.
  • Improve regression tests for degree-based trigonometric functions. Print the actual value of each function result that's expected to be exact, rather than merely emitting a NULL if it's not right. Although we print these with extra_float_digits = 3, we should not trust that the platform will produce a result visibly different from the expected value if it's off only in the last place; hence, also include comparisons against the exact values as before. This is a bit bulkier and uglier than the previous printout, but it will provide more information and be easier to interpret if there's a test failure. Discussion: <>
  • Fix memory leak and other bugs in ginPlaceToPage() & subroutines. Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and its subroutines in gindatapage.c and ginentrypage.c into a royal mess: page-update critical sections were started in one place and finished in another place not even in the same file, and the very same subroutine might return having started a critical section or not. Subsequent patches band-aided over some of the problems with this design by making things even messier. One user-visible resulting problem is memory leaks caused by the need for the subroutines to allocate storage that would survive until ginPlaceToPage calls XLogInsert (as reported by Julien Rouhaud). This would not typically be noticeable during retail index updates. It could be visible in a GIN index build, in the form of memory consumption swelling to several times the commanded maintenance_work_mem. Another rather nasty problem is that in the internal-page-splitting code path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before entering the critical section that it's supposed to be cleared in; a failure in between would leave the index in a corrupt state. There were also assorted coding-rule violations with little immediate consequence but possible long-term hazards, such as beginning an XLogInsert sequence before entering a critical section, or calling elog(DEBUG) inside a critical section. To fix, redefine the API between ginPlaceToPage() and its subroutines by splitting the subroutines into two parts. The "beginPlaceToPage" subroutine does what can be done outside a critical section, including full computation of the result pages into temporary storage when we're going to split the target page. The "execPlaceToPage" subroutine is called within a critical section established by ginPlaceToPage(), and it handles the actual page update in the non-split code path. The critical section, as well as the XLOG insertion call sequence, are both now always started and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and work in a short-lived memory context to eliminate the leakage problem. (Since a short-lived memory context had been getting created in the most common code path in the subroutines, this shouldn't cause any noticeable performance penalty; we're just moving the overhead up one call level.) In passing, fix a bunch of comments that had gone unmaintained throughout all this klugery. Report: <>
  • Honor PGCTLTIMEOUT environment variable for pg_regress' startup wait. In commit 2ffa86962077c588 we made pg_ctl recognize an environment variable PGCTLTIMEOUT to set the default timeout for starting and stopping the postmaster. However, pg_regress uses pg_ctl only for the "stop" end of that; it has bespoke code for starting the postmaster, and that code has historically had a hard-wired 60-second timeout. Further buildfarm experience says it'd be a good idea if that timeout were also controlled by PGCTLTIMEOUT, so let's make it so. Like the previous patch, back-patch to all active branches. Discussion: <>
  • PGDLLIMPORT-ify old_snapshot_threshold. Revert commit 7cb1db1d9599f0a09d6920d2149d956ef6d88b0e, which represented a misunderstanding of the problem (if snapmgr.h weren't already included in bufmgr.h, things wouldn't compile anywhere). Instead install what I think is the real fix.
  • Fix ruleutils.c's dumping of ScalarArrayOpExpr containing an EXPR_SUBLINK. When we shoehorned "x op ANY (array)" into the SQL syntax, we created a fundamental ambiguity as to the proper treatment of a sub-SELECT on the righthand side: perhaps what's meant is to compare x against each row of the sub-SELECT's result, or perhaps the sub-SELECT is meant as a scalar sub-SELECT that delivers a single array value whose members should be compared against x. The grammar resolves it as the former case whenever the RHS is a select_with_parens, making the latter case hard to reach --- but you can get at it, with tricks such as attaching a no-op cast to the sub-SELECT. Parse analysis would throw away the no-op cast, leaving a parsetree with an EXPR_SUBLINK SubLink directly under a ScalarArrayOpExpr. ruleutils.c was not clued in on this fine point, and would naively emit "x op ANY ((SELECT ...))", which would be parsed as the first alternative, typically leading to errors like "operator does not exist: text = text[]" during dump/reload of a view or rule containing such a construct. To fix, emit a no-op cast when dumping such a parsetree. This might well be exactly what the user wrote to get the construct accepted in the first place; and even if she got there with some other dodge, it is a valid representation of the parsetree. Per report from Karl Czajkowski. He mentioned only a case involving RLS policies, but actually the problem is very old, so back-patch to all supported branches. Report: <>
  • Remove dead code in win32.h. There's no longer a need for the MSVC-version-specific code stanza that forcibly redefines errno code symbols, because since commit 73838b52 we're unconditionally redefining them in the stanza before this one anyway. Now it's merely confusing and ugly, so get rid of it; and improve the comment that explains what's going on here. Although this is just cosmetic, back-patch anyway since I'm intending to back-patch some less-cosmetic changes in this same hunk of code.
  • Improve TranslateSocketError() to handle more Windows error codes. The coverage was rather lean for cases that bind() or listen() might return. Add entries for everything that there's a direct equivalent for in the set of Unix errnos that elog.c has heard of.
  • Fix planner failure with full join in RHS of left join. Given a left join containing a full join in its righthand side, with the left join's joinclause referencing only one side of the full join (in a non-strict fashion, so that the full join doesn't get simplified), the planner could fail with "failed to build any N-way joins" or related errors. This happened because the full join was seen as overlapping the left join's RHS, and then recent changes within join_is_legal() caused that function to conclude that the full join couldn't validly be formed. Rather than try to rejigger join_is_legal() yet more to allow this, I think it's better to fix initsplan.c so that the required join order is explicit in the SpecialJoinInfo data structure. The previous coding there essentially ignored full joins, relying on the fact that we don't flatten them in the joinlist data structure to preserve their ordering. That's sufficient to prevent a wrong plan from being formed, but as this example shows, it's not sufficient to ensure that the right plan will be formed. We need to work a bit harder to ensure that the right plan looks sane according to the SpecialJoinInfos. Per bug #14105 from Vojtech Rylko. This was apparently induced by commit 8703059c6 (though now that I've seen it, I wonder whether there are related cases that could have failed before that); so back-patch to all active branches. Unfortunately, that patch also went into 9.0, so this bug is a regression that won't be fixed in that branch.
  • Fix unexpected side-effects of operator_precedence_warning. The implementation of that feature involves injecting nodes into the raw parsetree where explicit parentheses appear. Various places in parse_expr.c that test to see "is this child node of type Foo" need to look through such nodes, else we'll get different behavior when operator_precedence_warning is on than when it is off. Note that we only need to handle this when testing untransformed child nodes, since the AEXPR_PAREN nodes will be gone anyway after transformExprRecurse. Per report from Scott Ribe and additional code-reading. Back-patch to 9.5 where this feature was added. Report: <>
  • Convert contrib/seg's bool-returning SQL functions to V1 call convention. It appears that we can no longer get away with using V0 call convention for bool-returning functions in newer versions of MSVC. The compiler seems to generate code that doesn't clear the higher-order bits of the result register, causing the bool result Datum to often read as "true" when "false" was intended. This is not very surprising, since the function thinks it's returning a bool-width result but fmgr_oldstyle assumes that V0 functions return "char *"; what's surprising is that that hack worked for so long on so many platforms. The only functions of this description in core+contrib are in contrib/seg, which we'd intentionally left mostly in V0 style to serve as a warning canary if V0 call convention breaks. We could imagine hacking things so that they're still V0 (we'd have to redeclare the bool-returning functions as returning some suitably wide integer type, like size_t, at the C level). But on the whole it seems better to convert 'em to V1. We can still leave the pointer- and int-returning functions in V0 style, so that the test coverage isn't gone entirely. Back-patch to 9.5, since our intention is to support VS2015 in 9.5 and later. There's no SQL-level change in the functions' behavior so back-patching should be safe enough. Discussion: <> Michael Paquier, adjusted some by me
  • Rename strtoi() to strtoint(). NetBSD has seen fit to invent a libc function named strtoi(), which conflicts with the long-established static functions of the same name in datetime.c and ecpg's interval.c. While muttering darkly about intrusions on application namespace, we'll rename our functions to avoid the conflict. Back-patch to all supported branches, since this would affect attempts to build any of them on recent NetBSD. Thomas Munro
  • Improve's logic for detecting already-in-use ports. Buildfarm members bowerbird and jacana have shown intermittent "could not bind IPv4 socket" failures in the BinInstallCheck stage since mid-December, shortly after commits 1caef31d9e550408 and 9821492ee417a591 changed the logic for selecting which port to use in temporary installations. One plausible explanation is that we are randomly selecting ports that are already in use for some non-Postgres purpose. Although the code tried to defend against already-in-use ports, it used pg_isready to probe the port which is quite unhelpful: if some non-Postgres server responds at the given address, pg_isready will generally say "no response", leading to exactly the wrong conclusion about whether the port is free. Instead, let's use a simple TCP connect() call to see if anything answers without making assumptions about what it is. Note that this means there's no direct check for a conflicting Unix socket, but that should be okay because there should be no other Unix sockets in use in the temporary socket directory created for a test run. This is only a partial solution for the TCP case, since if the port number is in use for an outgoing connection rather than a listening socket, we'll fail to detect that. We could try to bind() to the proposed port as a means of detecting that case, but that would introduce its own failure modes, since the system might consider the address to remain reserved for some period of time after we drop the bound socket. Close study of the errors returned by bowerbird and jacana suggests that what we're seeing there may be conflicts with listening not outgoing sockets, so let's try this and see if it improves matters. It's certainly better than what's there now, in any case. Michael Paquier, adjusted by me to work on non-Windows as well as Windows

Kevin Grittner pushed:

  • Revert no-op changes to BufferGetPage(). The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
  • Inline initial comparisons in TestForOldSnapshot(). Even with old_snapshot_threshold = -1 (which disables the "snapshot too old" feature), performance regressions were seen at moderate to high concurrency. For example, a one-socket, four-core system running 200 connections at saturation could see up to a 2.3% regression, with larger regressions possible on NUMA machines. By inlining the early (smaller, faster) tests in the TestForOldSnapshot() function, the i7 case dropped to a 0.2% regression, which could easily just be noise, and is clearly an improvement. Further testing will show whether more is needed.
  • Include snapmgr.h in blscan.c. Windows builds on buildfarm are failing because old_snapshot_threshold is not found in the bloom filter contrib module.

Magnus Hagander pushed:

Robert Haas pushed:

  • Forbid parallel Hash Right Join or Hash Full Join. That won't work. You'll get bogus null-extended rows. Mithun Cy
  • Add pg_dump support for the new PARALLEL option for aggregates. This was an oversight in commit 41ea0c23761ca108e2f08f6e3151e3cb1f9652a1. Fabrízio de Royes Mello, per a report from Tushar Ahuja
  • postgres_fdw: Don't push down certain full joins. If there's a filter condition on either side of a full outer join, it is neither correct to attach it to the join's ON clause nor to throw it into the toplevel WHERE clause. Just don't push down the join in that case. To maximize the number of cases where we can still push down full joins, push inner join conditions into the ON clause at the first opportunity rather than postponing them to the top-level WHERE clause. This produces nicer SQL, anyway. This bug was introduced in e4106b2528727c4b48639c0e12bf2f70a766b910. Ashutosh Bapat, per report from Rajkumar Raghuwanshi.
  • Allow queries submitted by postgres_fdw to be canceled. This fixes a problem which is not new, but with the advent of direct foreign table modification in 0bf3ae88af330496517722e391e7c975e6bad219, it's somewhat more likely to be annoying than previously. So, arrange for a local query cancelation to propagate to the remote side. Michael Paquier, reviewed by Etsuro Fujita. Original report by Thom Brown.
  • Fix assorted defects in 09adc9a8c09c9640de05c7023b27fb83c761e91c. That commit increased all shared memory allocations to the next higher multiple of PG_CACHE_LINE_SIZE, but it didn't ensure that allocation started on a cache line boundary. It also failed to remove a couple other pieces of now-useless code. BUFFERALIGN() is perhaps obsolete at this point, and likely should be removed at some point, too, but that seems like it can be left to a future cleanup. Mistakes all pointed out by Andres Freund. The patch is mine, with a few extra assertions which I adopted from his version of this fix.
  • Comment improvements for ForeignPath. It's not necessarily just scanning a base relation any more. Amit Langote and Etsuro Fujita
  • Prevent possible crash reading pg_stat_activity. Also, avoid reading PGPROC's wait_event field twice, once for the wait event and again for the wait_event_type, because the value might change in the middle. Petr Jelinek and Robert Haas

Bruce Momjian pushed:

Andres Freund pushed:

  • Fix documentation & config inconsistencies around 428b1d6b2. Several issues: 1) checkpoint_flush_after doc and code disagreed about the default 2) new GUCs were missing from postgresql.conf.sample 3) Outdated source-code comment about bgwriter_flush_after's default 4) Sub-optimal categories assigned to new GUCs 5) Docs suggested backend_flush_after is PGC_SIGHUP, but it's PGC_USERSET. 6) Spell out int as integer in the docs, as done elsewhere Reported-By: Magnus Hagander, Fujii Masao Discussion:

Correctifs rejetés (à ce jour)

No one was disappointed this week :-)

Correctifs en attente

Michaël Paquier sent in a patch to ensure that a reserved role is never a member of another role or group.

Kyotaro HORIGUCHI sent in a patch to fix the documentation for synchronous_standby_names.

Kyotaro HORIGUCHI sent in two more revisions of a patch to fix synchronous replication update configuration.

Michaël Paquier sent in another revision of a patch to fix an OOM in libpq and infinite loop with getCopyStart().

Fujii Masao sent in a patch to add error checks to BRIN summarize new values.

Michaël Paquier sent in another revision of a patch to do hot standby checkpoints.

Amit Langote sent in two more revisions of a patch to implement declarative partitioning.

Amit Langote sent in two revisions of a patch to fix some issues in the Bloom documentation.

David Rowley sent in two more revisions of a patch to fix EXPLAIN VERBOSE with parallel aggregate.

Ants Aasma sent in another revision of a patch to update old snapshot map once per tick.

Dmitry Ivanov sent in a patch to fix some of the documentation for the new phrase search capability.

Michaël Paquier sent in a patch to change contrib/seg/ to convert functions to use the V1 declaration.

Juergen Hannappel sent in a patch to add an option to pg_dumpall to exclude tables from the dump.

Andres Freund sent in a patch to keep from opening formally non-existant segments in _mdfd_getseg().

Thomas Munro sent in a patch to implement kqueue for *BSD.

Amit Kapila sent in a patch to fix an old snapshot threshold performance issue.

Noah Misch sent in a patch to add xlc atomics.

Andres Freund sent in a patch to emit invalidations to standby for transactions without xid.

Andrew Dunstan sent in a patch to add transactional enum additions.

Andrew Dunstan sent in a patch to add VS2015 support.

Simon Riggs sent in a patch to fix some suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

par N Bougain le mercredi 27 avril 2016 à 00h17

jeudi 21 avril 2016

Adrien Nayrat

Index BRIN – Performances

La version 9.5 de PostgreSQL sortie en Janvier 2016 propose un nouveau type d’index : les Index BRIN pour Bloc Range INdex. Ces derniers sont recommandés pour les tables volumineuses et corrélées avec leur emplacement. J’ai décidé de consacrer une série d’article sur ces index :

Pour information, je serai présent au PGDay France à Lille le mardi 31 mai pour présenter cet index. Il y aura également plein d’autres conférences intéressantes!

Cet article est la dernier de la série, il sera consacré aux performances (maintenance, lecture, insertion…)


Les articles précédents ont abordé le fonctionnement et les spécificités des index BRIN. Cet article sera plutôt consacré aux performances. Les exemples précédents portaient sur de petites volumétries. Maintenant nous allons voir ce que peuvent apporter ces index sur une volumétrie plus importante.

Les tests ont été effectués sur un PC portable, les journaux de transaction, la table et les index sont stockés sur un disque mécanique. Les résultats seront différents suivant la matériel utilisé. Ces chiffres sont purement indicatifs et servent surtout à se donner un ordre d’idée.


Dans un premier temps il est nécessaire de créer une table avec une volumétrie importante.

Par exemple : un système de mesure avec 100 sondes et une mesure toutes les secondes. Nous allons donc obtenir 100*365*24*3600 mesures => un peu plus de 3 milliards de lignes.

-- Utilisation d'une fonction pour générer du texte aléatoire.
-- Trouvée ici :

  chars text[] := '{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}';
  RESULT text := '';
  i INTEGER := 0;
    raise exception 'Given length cannot be less than 0';
  FOR i IN 1..LENGTH loop
    RESULT := RESULT || chars[1+random()*(array_length(chars, 1)-1)];
  END loop;
$$ LANGUAGE plpgsql;
-- Creation de la table contenant les sondes 
CREATE TABLE probe (id serial PRIMARY KEY, name text);
INSERT INTO probe (name ) SELECT random_string(5) FROM generate_series(1,100);

WITH generation AS (
SELECT '2015-01-01'::TIMESTAMP + i * INTERVAL '1 second' AS date_metric,sonde::text,random() AS metric
FROM generate_series(0, 3600*24*365) i,
LATERAL (SELECT name FROM probe) sonde)
SELECT * FROM generation;

La table obtenue fait un peu plus de 150Go, elle ne rentre donc pas dans  la RAM de la machine hébergeant l’instance et encore moins dans la mémoire partagée de PostgreSQL.


On crée plein d’index pour comparer leur taille :

CREATE INDEX metro_btree_idx ON data USING btree (date_metric);
CREATE INDEX metro_brin_idx_8 ON data USING brin (date_metric) WITH (pages_per_range = 8);
CREATE INDEX metro_brin_idx_16 ON data USING brin (date_metric) WITH (pages_per_range = 16);
CREATE INDEX metro_brin_idx_32 ON data USING brin (date_metric) WITH (pages_per_range = 32);
CREATE INDEX metro_brin_idx_64 ON data USING brin (date_metric) WITH (pages_per_range = 64);
CREATE INDEX metro_brin_idx_128 ON data USING brin (date_metric);
CREATE INDEX metro_brin_idx_256 ON data USING brin (date_metric) WITH (pages_per_range = 256);

Voici les résultats obtenus pour la durée de création des index et leurs tailles :


La création de l’index a été 4 fois plus rapide pour les index BRIN. Il est possible que leur création aurait été plus rapide avec un stockage plus performant.

La taille des index est également frappante,  l’index b-tree fait 66 Go alors que l’index BRIN avec le pages_per_range par défaut fait seulement 5 Mo.

On peut tout suite constater le gain sur l’espace utilisé et la rapidité de création des index. Les opérations de maintenances (REINDEX) seront grandement facilitées.

Performances en lecture

Nous allons effectuer plusieurs tests, l’idée est d’essayer de mettre en valeur les différences de comportements entre les index BRIN et b-tree.

La requête utilisée sera tout simple :

EXPLAIN (ANALYSE,BUFFERS,VERBOSE) SELECT date_metric,sonde,metric FROM DATA WHERE date_metric = 'xxx';

pour obtenir un résultat avec peu de lignes  :

WHERE date_metric = '2015-05-01 00:00:00'::TIMESTAMP

Pour obtenir plus de résultat on prendra un intervalle avec :


Voici les résultats obtenus :

lignes BRIN Btree Gain
Durée Blocs lus Durée Blocs lus Durée Volume données lues
100 24ms 697 0.06ms 7 Btree (x400) Btree (x100)
267 millions 170s 13Go 228s 18Go BRIN (x1.3) BRIN (x1.4)
777 millions 8min 38Go 11min 54Go BRIN (x1.37) BRIN (x1.4)
1.3 milliard 13min 63Go 32min (seqscan)
153 Go (seqscan)
90 Go
BRIN (x2) vs seqscan
BRIN (1.4x) vs Btree
BRIN (x2.4) vs seqscan
BRIN (1.4x) vs Btree

Pour comparer le volume de données lues et la durée d’exécution nous pouvons désactiver les index dans une transaction :

DROP index ...;
explain (analyse,verbose,buffers) SELECT ...

Pour le 1er test, le moteur choisit l’index-btree. En supprimant l’index b-tree il choisit l’index BRIN.

Pour les tests 2 et 3, le moteur choisit l’index BRIN, en supprimant l’index BRIN il choisit l’index b-tree.

Pour le dernier test j’ai rajouté d’autres mesures. En effet, en supprimant l’index BRIN le moteur va effectuer un seqscan (parcours de toute la table). Pour obtenir les mêmes comparaisons que les résultats précédents j’ai donc supprimé l’index BRIN et désactivé les parcours séquentiels (set enable_seqscan to ‘off’;)

Globalement on peut constater un gain de 30-40% dans les cas où beaucoup de résultats sont demandés. Le moteur lit moins de blocs lorsqu’il utilise les index BRIN, l’index b-tree étant volumineux, ses lectures sont coûteuses.

En revanche l’index b-tree s’avère particulièrement performant lorsque la requête est très sélective et que peu de résultats sont retournés. En effet, en utilisant un index BRIN, le moteur commence par lire l’intégralité de l’index. Puis il va lire un ensemble de blocs qui contiennent la valeur recherchée, certains ne contenants aucun résultat. Ces lectures supplémentaires se ressentent sur la durée d’exécution de la requête.

Performances en insertion

Vu que les index BRIN sont plus petits et leur durée de création plus courte, on peut se demander ce qu’il advient du surcoût de cet index lors d’insertion de données. Pour cela on va créer une table et mesurer l’insertion de 10 millions de lignes en fonction des index déjà présents sur la table. Afin de cibler le surcoût dû à la mise à jour des index, la table est non-journalisée, ceci permet d’éviter les écritures dans les journaux de transaction. L’autovacuum est également désactivé.

INSERT INTO brin_demo_2 SELECT * FROM generate_series(1,10000000);
TRUNCATE brin_demo_2;

CREATE INDEX brin_demo_2_brin_idx ON brin_demo_2 USING brin (c1);
INSERT INTO brin_demo_2 SELECT * FROM generate_series(1,10000000);
DROP INDEX brin_demo_2_brin_idx;
TRUNCATE brin_demo_2;
CREATE INDEX brin_demo_2_brin_idx ON brin_demo_2 USING brin (c1) WITH (pages_per_range = 256);
INSERT INTO brin_demo_2 SELECT * FROM generate_series(1,10000000);
DROP INDEX brin_demo_2_brin_idx;
TRUNCATE brin_demo_2;

Voici les résultats obtenus :

insertionComme pour les chiffres sur les performances en lecture, ces chiffres ne représentent que les durées d’insertion sur mon matériel. La table, les index et les journaux de transaction sont sur le même disque physique, ce qui ralenti les opérations.

Cependant on peut constater qu’il est moins coûteux d’insérer des données dans une table avec un index BRIN qu’avec un index b-tree. On constate également qu’il n’y a pas d’écart significatif entre les différents types d’index BRIN.


Cette série d’articles a permis de présenter les principes des index BRIN. puis leur fonctionnement à travers des exemples simples.

Ensuite nous avons vu l’importance de la corrélation pour exploiter pleinement ces index. Enfin, nous avons essayé de mesurer le gain que pouvait apporter cet index sur de multiples aspect (maintenance, performance en lecture et insertion).

Décrire le fonctionnement d’un index en simplifiant sa représentation est un exercice compliqué. On peut vite sacrifier le fond à la forme. Présenter des chiffres est également délicat tellement ils peuvent dépendre du contexte. J’ai fait l’effort de détailler comment je les ai obtenu afin que chacun puisse reproduire ses propres tests. L’idée est de donner un aperçu des cas d’utilisation de ce type d’index.

Globalement il faut retenir que les index BRIN sont utiles pour les tables volumineuses et où la corrélation avec l’emplacement des données est importante. Ils seront plus lents que les index b-tree lorsque la recherche nécessite de parcourir peu de blocs. Ils seront un peu plus rapide que les index b-tree dans les situations où le moteur doit lire beaucoup de blocs (moins de blocs à lire dans l’index).

L’étude de cet index ouvre d’autres pistes de réflexion. Comme la prise en compte de la corrélation dans le calcul du coût. J’avais également pensé à la possibilité d’utiliser un index pour créer un autre index.

Dans l’exemple avec la table volumineuse (150Go). Si on souhaite créer un index partiel sur le mois précédent, le moteur va parcourir l’intégralité de la table pour créer d’index. On pourrait envisager créer l’index b-tree en utilisant l’index BRIN pour ne parcourir que les lignes correspondant au moins précédent.

FacebookTwitterGoogle+ViadeoPrintEmailGoogle GmailLinkedInPartager

par Adrien Nayrat le jeudi 21 avril 2016 à 20h58

Nicolas Gollet

Bug PostgreSQL 9.4 et 9.5 sous Windows?

Certains d'entre nous rencontrent un bug lors de l'installation de PostgreSQL 9.4 ou 9.5 pour Windows avec le Package d'installation fourni par EntrepriseDB.

L'installateur échoue avec le message1 :

Problem running post-install step, Installation may not complete correctly The database cluster initialisation failed.

En regardant dans les logs d'installation on peut voir :

initializing dependencies ... child process was terminated by exception 0xC000001D

Le code erreur 0xC000001D signifie:
STATUS_ILLEGAL_INSTRUCTION = {EXCEPTION} Illegal Instruction An attempt was made to execute an illegal instruction.

En d'autres termes, l'utilisation d'une instruction non supportée par le système.

Ce problème se produit lorsqu'un système n'expose pas correctement les extensions supportées par le processeur les "CPUID2".

Par exemple le CPU expose le flag AVX2 sans exposer les flags XSAVE et OSXSAVE, rendant ainsi le support des "Advanced Vector Extensions" partiel. Ce support partiel n'est pas détecté par le Runtime Visual Studio 2013, faisant ainsi planter l'application lors de l'utilisation de ces instructions. Ici lors de l'installation, c'est le processus "postgres.exe" qui plante...

On constate ce problème lorsque l'hyperviseur hébergeant une machine virtuelle est bogué, par exemple avec Citrix XenServer 6.5. (On peut aussi avoir un comportement similaire sur certain poste de travail avec un Bios bogué)

On peut facilement détecter sur système que tous les flags sont correctement exposés, pour effectuer ce contrôle, on peut exécuter ce petit programme disponible sur mon GitHub.

Concernant XenServer 6.5, le service pack 1 (SP1) corrige l'exposition des flags sur les machines virtuelles en désactivant l'AVX même si celui-ci est disponible sur le système.

Ceci n'est donc pas vraiment un bug PostgreSQL mais plutôt un bug de l'hyperviseur (/ machine) qui expose mal le support du CPU mais aussi de la runtime Visual Studio C++ 2013 qui n'est pas capable de détecter un support partiel....

Ceci nous prouve bien que la virtualisation n'est pas complètement transparente... et de l'importance de bien mettre à jour les hyperviseurs!

Un bug similaire a été détecté et corrigé dans les dernières versions mineur de PostgreSQL, lorsque l'on utilise un système d'exploitation Windows ne supportant pas l'AVX avec des processeur le supportant : Plus d'info ici

  1. Problem running post-install step, Installation may not complete correctly The database cluster initialisation failed.


par Nicolas GOLLET le jeudi 21 avril 2016 à 13h03

mardi 19 avril 2016

Damien Clochard

Grande Enquête sur l'écosystème PostgreSQL

Le succès de PostgreSQL ne se dément pas mais il reste encore beaucoup de logiciels qui ne sont pas compatibles avec Postgres. A chaque fois que je tiens un stand PostgreSQL dans un salon ou une conférence, une question classique revient “Est-ce le logiciel X est compatible avec PostgreSQL ?”.

Pour mieux comprendre quels sont les besoins en terme de compatibilité, Takayuki Tsunakawa vient de lancer un grand sondage pour recenser les logiciels qui ne supportente pas encore PostgreSQL. L’occasion de mesurer le chemin à parcourir chez les éditeurs….

Pour que le sondage soit instructif , il faut bien sur qu’un maximum de personnes y répondent !

Vous vous êtes déjà dit : “Si seulement ce logiciel étant compatible avec Postgres…” ? Alors c’est le moment de vous exprimer en répondant au sondage ci-dessous :

PostgreSQL Ecosystem Survey

C’est très simple et ça se passe en 3 étapes :

  1. Choisissez la catégorie du logiciel
  2. Donnez le nom du logiciel
  3. Ajoutez un commentaire si nécessaire

Vous pouvez déclarer autant de logiciels que vous le souhaitez.

Les résultats du sondage sont accessibles en temps réel ici et la discussion à propos de cette initiative a lieu sur la liste pgsql-advocacy

par Damien Clochard le mardi 19 avril 2016 à 22h52

lundi 18 avril 2016


Nouvelles hebdomadaires de PostgreSQL - 17 avril 2016

Session PostgreSQL le 22 septembre 2016 à Lyon (France). La date limite de candidature est le 20 mai : envoyez vos propositions à call-for-paper AT postgresql-sessions DOT org.

[ndt: meetups PostgreSQL à Lyon le 20 avril et à Nantes le 26 avril]

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en avril

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Fujii Masao pushed:

  • Use ereport(ERROR) instead of Assert() to emit syncrep_parser error. The existing code would either Assert or generate an invalid SyncRepConfig variable, neither of which is desirable. A regular error should be thrown instead. This commit silences compiler warning in non assertion-enabled builds. Per report from Jeff Janes. Suggested fix by Tom Lane.
  • Remove unused function GetOldestWALSendPointer from walsender code. That unused function was introduced as a sample because synchronous replication or replication monitoring tools might need it in the future. Recently commit 989be08 added the function SyncRepGetOldestSyncRecPtr which provides almost the same functionality for multiple synchronous standbys feature. So it's time to remove that unused sample function. This commit does that. Branch ------ master Details -------
  • Fix duplicated index entry in doc. Commit cfe96ae corrected the name of pg_logical_emit_message() in its index entry. But this typo fix caused duplicated index entry because there was another index entry for the function. Spotted by Tom Lane.
  • Make regression test for multiple synchronous standbys more stable. The regression test checks whether the output of pg_stat_replication is expected or not after changing synchronous_standby_names and reloading the configuration file. Regarding this test logic, previously there was a timing issue which made the test result unstable. That is, pg_stat_replication could return unexpected result during small window after the configuration file was reloaded before new setting value took effect, and which made the test fail. This commit changes the test logic so that it uses a loop with a timeout to give some room for the test to pass. Now the test fails only when pg_stat_replication keeps returning unexpected result for 30 seconds. Michael Paquier

Peter Eisentraut pushed:

Tom Lane pushed:

  • Fix missing "volatile" in PLy_output(). Commit 5c3c3cd0a3046339 plastered "volatile" on a bunch of variables in PLy_output(), but removed the one that actually mattered, ie the one on "oldcontext". This allows some versions of clang to generate code in which "oldcontext" has been trashed when control reaches the PG_CATCH block. Per buildfarm member tick.
  • Fix freshly-introduced PL/Python portability bug. It turns out that those PyErr_Clear() calls I removed from plpy_elog.c in 7e3bb080387f4143 et al were not quite as random as they appeared: they mask a Python 2.3.x bug. (Specifically, it turns out that PyType_Ready() can fail if the error indicator is set on entry, and PLy_traceback's fetch of frame.f_code may be the first operation in a session that requires the "frame" type to be readied. Ick.) Put back the clear call, but in a more centralized place closer to what it's protecting, and this time with a comment warning what it's really for. Per buildfarm member prairiedog. Although prairiedog was only failing on HEAD, it seems clearly possible for this to occur in older branches as well, so back-patch to 9.2 the same as the previous patch.
  • Fix two places that thought Windows64 is indicated by WIN64 macro. Everyplace else thinks it's _WIN64, so make these places fall in line. The pg_regress.c usage is not going to result in any change in behavior, only suppressing (or not) a compiler warning about downcasting HANDLEs. So there seems no need for back-patching there. The libpq/win32.mak usage might represent an actual bug, if anyone were using this script to build for Windows64, which perhaps nobody is. Given the lack of field complaints, no back-patch here either. pg_regress.c problem found by Christian Ullrich, the other by me.
  • Fix _SPI_execute_plan() for CREATE TABLE IF NOT EXISTS foo AS ... When IF NOT EXISTS was added to CREATE TABLE AS, this logic didn't get the memo, possibly resulting in an Assert failure. It looks like there would have been no ill effects in a non-Assert build, though. Back-patch to 9.5 where the IF NOT EXISTS option was added. Stas Kelvich
  • Remove unnecessary definition of _WIN64 in libpq/win32.mak. In commit b0e40d189325dc7a54d2546245e766f8c47a7c8d, I should have just removed the /D switch defining WIN64. The reason the code worked before is that all Windows64 compilers automatically predefine _WIN64. Perhaps at one time we had code that depended on WIN64 being defined, but it's long gone, and we should not encourage any reappearance. Per discussion with Christian Ullrich.
  • In generic WAL application and replay, ensure page "hole" is always zero. The previous coding could allow the contents of the "hole" between pd_lower and pd_upper to diverge during replay from what it had been when the update was originally applied. This would pose a problem if checksums were in use, and in any case would complicate forensic comparisons between master and slave servers. So force the "hole" to contain zeroes, both at initial application of a generically-logged action, and at replay. Alexander Korotkov, adjusted slightly by me
  • Improve API of GenericXLogRegister(). Rename this function to GenericXLogRegisterBuffer() to make it clearer what it does, and leave room for other sorts of "register" actions in future. Also, replace its "bool isNew" argument with an integer flags argument, so as to allow adding more flags in future without an API break. Alexander Korotkov, adjusted slightly by me
  • Improve coding of column-name parsing in psql's new crosstabview.c. Coverity complained about this code, not without reason because it was rather messy. Adjust it to not scribble on the passed string; that adds one malloc/free cycle per column name, which is going to be insignificant in context. We can actually const-ify both the string argument and the PGresult. Daniel Verité, with some further cleanup by me
  • Redefine create_upper_paths_hook as being invoked once per upper relation. Per discussion, this gives potential users of the hook more flexibility, because they can build custom Paths that implement only one stage of upper processing atop core-provided Paths for earlier stages.
  • Improve documentation for \crosstabview. Fix misleading syntax summary (there cannot be a space between colH and scolH). Provide a link from the existing crosstab() function's documentation to \crosstabview. Copy-edit the command's description. Christoph Berg and Tom Lane
  • Fix assorted portability issues with using msync() for data flushing. Commit 428b1d6b29ca599c5700d4bc4f4ce4c5880369bf introduced the use of msync() for flushing dirty data from the kernel's file buffers. Several portability issues were overlooked, though: * Not all implementations of mmap() think that nbytes == 0 means "map the whole file". To fix, use lseek() to find out the true length. Fix callers of pg_flush_data to be aware that nbytes == 0 may result in trashing the file's seek position. * Not all implementations of mmap() will accept partial-page mmap requests. To fix, round down the length request to whatever sysconf() says the page size is. (I think this is OK from a portability standpoint, because sysconf() is required by SUS v2, and we aren't trying to compile this part on Windows anyway. Buildfarm should let us know if not.) * On 32-bit machines, the file size might exceed the available free address space, or even exceed what will fit in size_t. Check for the latter explicitly to avoid passing a false request size to mmap(). If mmap fails, silently fall through to the next implementation method, rather than bleating to the postmaster log and giving up. * mmap'ing directories fails on some platforms, and even if it works, msync'ing the directory is quite unlikely to help, as for that matter are the other flush implementations. In pre_sync_fname(), just skip flush attempts on directories. In passing, copy-edit the comments a bit. Stas Kelvich and myself
  • Widen amount-to-flush arguments of FileWriteback and callers. It's silly to define these counts as narrower than they might someday need to be. Also, I believe that the BLCKSZ * nflush calculation in mdwriteback was capable of overflowing an int.
  • Fix pg_dump so pg_upgrade'ing an extension with simple opfamilies works. As reported by Michael Feld, pg_upgrade'ing an installation having extensions with operator families that contain just a single operator class failed to reproduce the extension membership of those operator families. This caused no immediate ill effects, but would create problems when later trying to do a plain dump and restore, because the seemingly-not-part-of- the-extension operator families would appear separately in the pg_dump output, and then would conflict with the families created by loading the extension. This has been broken ever since extensions were introduced, and many of the standard contrib extensions are affected, so it's a bit astonishing nobody complained before. The cause of the problem is a perhaps-ill-considered decision to omit such operator families from pg_dump's output on the grounds that the CREATE OPERATOR CLASS commands could recreate them, and having explicit CREATE OPERATOR FAMILY commands would impede loading the dump script into pre-8.3 servers. Whatever the merits of that decision when 8.3 was being written, it looks like a poor tradeoff now. We can fix the pg_upgrade problem simply by removing that code, so that the operator families are dumped explicitly (and then will be properly made to be part of their extensions). Although this fixes the behavior of future pg_upgrade runs, it does nothing to clean up existing installations that may have improperly-linked operator families. Given the small number of complaints to date, maybe we don't need to worry about providing an automated solution for that; anyone who needs to clean it up can do so with manual "ALTER EXTENSION ADD OPERATOR FAMILY" commands, or even just ignore the duplicate-opfamily errors they get during a pg_restore. In any case we need this fix. Back-patch to all supported branches. Discussion: <>
  • Fix broken dependency-mongering for index operator classes/families. For a long time, opclasscmds.c explained that "we do not create a dependency link to the AM [for an opclass or opfamily], because we don't currently support DROP ACCESS METHOD". Commit 473b93287040b200 invented DROP ACCESS METHOD, but it batted only 1 for 2 on adding the dependency links, and 0 for 2 on updating the comments about the topic. In passing, undo the same commit's entirely inappropriate decision to blow away an existing index as a side-effect of create_am.sql.
  • Fix prototype of pgwin32_bind(). I (tgl) had copied-and-pasted this from pgwin32_accept(), failing to notice that the third parameter should be "int" not "int *". David Rowley
  • Provide errno-translation wrappers around bind() and listen() on Windows. I've seen one too many "could not bind IPv4 socket: No error" log entries from the Windows buildfarm members. Per previous discussion, this is likely caused by the fact that we're doing nothing to translate WSAGetLastError() to errno. Put in a wrapper layer to do that. If this works as expected, it should get back-patched, but let's see what happens in the buildfarm first. Discussion: <>
  • Docs: clarify description of LIMIT/OFFSET behavior. Section 7.6 was a tad confusing because it specified what LIMIT NULL does, but neglected to do the same for OFFSET NULL, making this look like perhaps a special case or a wrong restatement of the bit about LIMIT ALL. Wordsmith a bit while at it. Per bug #14084.
  • Adjust datatype of ReplicationState.acquired_by. It was declared as "pid_t", which would be fine except that none of the places that printed it in error messages took any thought for the possibility that it's not equivalent to "int". This leads to warnings on some buildfarm members, and could possibly lead to actually wrong error messages on those platforms. There doesn't seem to be any very good reason not to just make it "int"; it's only ever assigned from MyProcPid, which is int. If we want to cope with PIDs that are wider than int, this is not the place to start. Also, fix the comment, which seems to perhaps be a leftover from a time when the field was only a bool? Per buildfarm. Back-patch to 9.5 which has same issue.
  • Adjust signature of walrcv_receive hook. Commit 314cbfc5da988eff redefined the signature of this hook as typedef int (*walrcv_receive_type) (char **buffer, int *wait_fd); But in fact the type of the "wait_fd" variable ought to be pgsocket, which is what WaitLatchOrSocket expects, and which is necessary if we want to be able to assign PGINVALID_SOCKET to it on Windows. So fix that.
  • Fix core dump in ReorderBufferRestoreChange on alignment-picky platforms. When re-reading an update involving both an old tuple and a new tuple from disk, reorderbuffer.c was careless about whether the new tuple is suitably aligned for direct access --- in general, it isn't. We'd missed seeing this in the buildfarm because the contrib/test_decoding tests exercise this code path only a few times, and by chance all of those cases have old tuples with length a multiple of 4, which is usually enough to make the access to the new tuple's t_len safe. For some still-not-entirely-clear reason, however, Debian's sparc build gets a bus error, as reported by Christoph Berg; perhaps it's assuming 8-byte alignment of the pointer? The lack of previous field reports is probably because you need all of these conditions to trigger a crash: an alignment-picky platform (not Intel), a transaction large enough to spill to disk, an update within that xact that changes a primary-key field and has an odd-length old tuple, and of course logical decoding tracing the transaction. Avoid the alignment assumption by using memcpy instead of fetching t_len directly, and add a test case that exposes the crash on picky platforms. Back-patch to 9.4 where the bug was introduced. Discussion: <>
  • Rethink \crosstabview's argument parsing logic. \crosstabview interpreted its arguments in an unusual way, including doing case-insensitive matching of unquoted column names, which is surely not the right thing. Rip that out in favor of doing something equivalent to the dequoting/case-folding rules used by other psql commands. To keep it simple, change the syntax so that the optional sort column is specified as a separate argument, instead of the also-quite-unusual syntax that attached it to the colH argument with a colon. Also, rework the error messages to be closer to project style.
  • Fix memory leak in GIN index scans. The code had a query-lifespan memory leak when encountering GIN entries that have posting lists (rather than posting trees, ie, there are a relatively small number of heap tuples containing this index key value). With a suitable data distribution this could add up to a lot of leakage. Problem seems to have been introduced by commit 36a35c550, so back-patch to 9.4. Julien Rouhaud
  • Fix portability problem induced by commit a6f6b7819. pg_xlogdump includes bufmgr.h. With a compiler that emits code for static inline functions even when they're unreferenced, that leads to unresolved external references in the new static-inline version of BufferGetPage(). So hide it with #ifndef FRONTEND, as we've done for similar issues elsewhere. Per buildfarm member pademelon.
  • Fix possible crash in ALTER TABLE ... REPLICA IDENTITY USING INDEX. Careless coding added by commit 07cacba983ef79be could result in a crash or a bizarre error message if someone tried to select an index on the OID column as the replica identity index for a table. Back-patch to 9.4 where the feature was introduced. Discussion: David Rowley
  • Use less-generic names in matview.sql. The original coding of this test used table and view names like "t", "tv", "foo", etc. This tended to interfere with doing simple manual tests in the regression database; not to mention that it posed a considerable risk of conflict with other regression test scripts. Prefix these names with "mvtest_" to avoid such conflicts. Also, change transiently-created role name to be "regress_xxx" per discussions about being careful with regression-test role creation.
  • Disallow creation of indexes on system columns (except for OID). Although OID acts pretty much like user data, the other system columns do not, so an index on one would likely misbehave. And it's pretty hard to see a use-case for one, anyway. Let's just forbid the case rather than worry about whether it should be supported. David Rowley
  • Adjust spin.c's spinlock emulation so that 0 is not a valid spinlock value. We've had repeated troubles over the years with failures to initialize spinlocks correctly; see 6b93fcd14 for a recent example. Most of the time, on most platforms, such oversights can escape notice because all-zeroes is the expected initial content of an slock_t variable. The only platform we have where the initialized state of an slock_t isn't zeroes is HPPA, and that's practically gone in the wild. To make it easier to catch such errors without needing one of those, adjust the --disable-spinlocks code so that zero is not a valid value for an slock_t for it. In passing, remove a bunch of unnecessary #include's from spin.c; commit daa7527afc227443 removed all the intermodule coupling that made them necessary.
  • Avoid code duplication in \crosstabview. In commit 6f0d6a507 I added a duplicate copy of psqlscanslash's identifier downcasing code, but actually it's not hard to split that out as a callable subroutine and avoid the duplication.

Stephen Frost pushed:

  • Correct copyright for newly added genericdesc.c It's 2016 these days (no, not entirely sure how we got here either). Pointed out by Amit Langote
  • Disallow SET SESSION AUTHORIZATION pg_* As part of reserving the pg_* namespace for default roles and in line with SET ROLE and other previous efforts, disallow settings the role to a default/reserved role using SET SESSION AUTHORIZATION. These checks and restrictions on what is allowed regarding default / reserved roles are under debate, but it seems prudent to ensure that the existing checks at least cover the intended cases while the debate rages on. On me to clean it up if the consensus decision is to remove these checks.
  • In recordExtensionInitPriv(), keep the scan til we're done with it For reasons of sheer brain fade, we (I) was calling systable_endscan() immediately after systable_getnext() and expecting the tuple returned by systable_getnext() to still be valid. That's clearly wrong. Move the systable_endscan() down below the tuple usage. Discovered initially by Pavel Stehule and then also by Alvaro. Add a regression test based on Alvaro's testing.

Kevin Grittner pushed:

  • Make oldSnapshotControl a pointer to a volatile structure It was incorrectly declared as a volatile pointer to a non-volatile structure. Eliminate the OldSnapshotControl struct definition; it is really not needed. Pointed out by Tom Lane. While at it, add OldSnapshotControlData to pgindent's list of structures.
  • Use static inline function for BufferGetPage() I was initially concerned that the some of the hundreds of references to BufferGetPage() where the literal BGP_NO_SNAPSHOT_TEST were passed might not optimize as well as a macro, leading to some hard-to-find performance regressions in corner cases. Inspection of disassembled code has shown identical code at all inspected locations, and the size difference doesn't amount to even one byte per such call. So make it readable. Per gripes from Álvaro Herrera and Tom Lane
  • Avoid extra locks in GetSnapshotData if old_snapshot_threshold < 0 On a big NUMA machine with 1000 connections in saturation load there was a performance regression due to spinlock contention, for acquiring values which were never used. Just fill with dummy values if we're not going to use them. This patch has not been benchmarked yet on a big NUMA machine, but it seems like a good idea on general principle, and it seemed to prevent an apparent 2.2% regression on a single-socket i7 box running 200 connections at saturation load.

Teodor Sigaev pushed:

Robert Haas pushed:

  • Fix costing for parallel aggregation. The original patch kind of ignored the fact that we were doing something different from a costing point of view, but nobody noticed. This patch fixes that oversight. David Rowley
  • Use PG_INT32_MIN instead of reiterating the constant. Makes no difference, but it's cleaner this way. Michael Paquier
  • Tweak EXPLAIN for parallel query to show workers launched. The previous display was sort of confusing, because it didn't distinguish between the number of workers that we planned to launch and the number that actually got launched. This has already confused several people, so display both numbers and label them clearly. Julien Rouhaud, reviewed by me.
  • postgres_fdw: Clean up handling of system columns. Previously, querying the xmin column of a single postgres_fdw foreign table fetched the tuple length, xmax the typmod, and cmin or cmax the composite type OID of the tuple. However, when you queried several such tables and the join got shipped to the remote side, these columns ended up containing the remote values of the corresponding columns. Both behaviors are rather unprincipled, the former for obvious reasons and the latter because the remote values of these columns don't have any local significance; our transaction IDs are in a different space than those of the remote machine. Clean this up by setting all of these fields to 0 in both cases. Also fix the handling of tableoid to be sane. Robert Haas and Ashutosh Bapat, reviewed by Etsuro Fujita.

Andres Freund pushed:

  • void atomic operation in MarkLocalBufferDirty(). The recent patch to make Pin/UnpinBuffer lockfree in the hot path (48354581a), accidentally used pg_atomic_fetch_or_u32() in MarkLocalBufferDirty(). Other code operating on local buffers was careful to only use pg_atomic_read/write_u32 which just read/write from memory; to avoid unnecessary overhead. On its own that'd just make MarkLocalBufferDirty() slightly less efficient, but in addition InitLocalBuffers() doesn't call pg_atomic_init_u32() - thus the spinlock fallback for the atomic operations isn't initialized. That in turn caused, as reported by Tom, buildfarm animal gaur to fail. As those errors are actually useful against this type of error, continue to omit - intentionally this time - initialization of the atomic variable. In addition, add an explicit note about only using pg_atomic_read/write on local buffers's state to BufferDesc's description. Reported-By: Tom Lane Discussion:
  • Make init_spin_delay() C89 compliant and change stuck spinlock reporting. The current definition of init_spin_delay (introduced recently in 48354581a) wasn't C89 compliant. It's not legal to refer to refer to non-constant expressions, and the ptr argument was one. This, as reported by Tom, lead to a failure on buildfarm animal pademelon. The pointer, especially on system systems with ASLR, isn't super helpful anyway, though. So instead of making init_spin_delay into an inline function, make s_lock_stuck() report the function name in addition to file:line and change init_spin_delay() accordingly. While not a direct replacement, the function name is likely more useful anyway (line numbers are often hard to interpret in third party reports). This also fixes what file/line number is reported for waits via s_lock(). As PG_FUNCNAME_MACRO is now used outside of elog.h, move it to c.h. Reported-By: Tom Lane Discussion:
  • Add required database and origin filtering for logical messages. Logical messages, added in 3fe3511d05, during decoding failed to filter messages emitted in other databases and messages emitted "under" a replication origin the output plugin isn't interested in. Add tests to verify that both types of filtering actually work. While touching message.sql remove hunk obsoleted by d25379e. Bump XLOG_PAGE_MAGIC because xl_logical_message changed and because 3fe3511d05 had omitted doing so. 3fe3511d05 additionally didn't bump catversion, but 7a542700d has done so since. Author: Petr Jelinek Reported-By: Andres Freund Discussion:
  • Remove trailing commas in enums. These aren't valid C89. Found thanks to gcc's -Wc90-c99-compat. These exist in differing places in most supported branches.
  • Make init_spin_delay() C89 compliant #2. My previous attempt at doing so, in 80abbeba23, was not sufficient. While that fixed the problem for bufmgr.c and lwlock.c , s_lock.c still has non-constant expressions in the struct initializer, because the file/line/function information comes from the caller of s_lock(). Give up on using a macro, and use a static inline instead. Discussion:
  • Fix trivial typo.

Magnus Hagander pushed:

Correctifs rejetés (à ce jour)

No one was disappointed this week :-)

Correctifs en attente

Michaël Paquier sent in two more revisions of a patch to add VS 2015 support to MSVC.

Robert Haas sent in a patch to fix some alignment issues in src/backend/storage/buffer/buf_init.c.

Stas Kelvich and Michaël Paquier traded patches to speed up two-phase commits.

Etsuro Fujita sent in a patch to improve the way foreign tables get written to in the PostgreSQL FDW.

Anastasia Lubennikova sent in another revision of a patch to implement covering unique indexes.

Amit Kapila sent in a patch to pad the PGXACT struct out to 64 bytes.

Teodor Sigaev sent in a patch to fix some GIN index corruption bugs.

David Rowley sent in two revisions of a patch to fix some issues with EXPLAIN output for parallel aggregates.

Ashutosh Sharma fix an issue with pg_basebackup's handling of symlinks.

Peter Geoghegan sent in a patch to remove an obsolete comment from fmgr.c.

Amit Langote sent in another revision of a patch to implement declarative partitioning.

Michaël Paquier sent in three more revisions of a patch to fix interrupt handling in the PostgreSQL FDW.

David Rowley sent in three revisions of a patch to disallow creating unique indexes on system columns where that doesn't make sense.

Etsuro Fujita sent in a patch to fix an issue where a combination of FULL and INNER joins could produce a wrong result with the PostgreSQL FDW.

Feike Steenbergen sent in a patch to make pg_get_functiondef actually add parallel indicators.

Piotr Stefaniak sent in another revision of a patch to avoid passing null pointers to functions that require the pointers to be non null.

Magnus Hagander sent in a patch to fix some issues with the backup docs.

Craig Ringer sent in a patch to enable logical timeline following in the walsender.

Terence Ferraro sent in a patch to make it possible for an environment variable to be provided to libpq to specify where to find the SSL certificate/key files used for a secure connection.

par N Bougain le lundi 18 avril 2016 à 22h06

vendredi 15 avril 2016


Nouvelles hebdomadaires de PostgreSQL - 10 avril 2016

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en avril

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Tom Lane pushed:

  • Clean up dubious code in contrib/seg. The restore() function assumed that the result of sprintf() with %e format would necessarily contain an 'e', which is false: what if the supplied number is an infinity or NaN? If that did happen, we'd get a null-pointer-dereference core dump. The case appears impossible currently, because seg_in() does not accept such values, and there are no seg-creating functions that would create one. But it seems unwise to rely on it never happening in future. Quite aside from that, the code was pretty ugly: it relied on modifying a static format string when it could use a "*" precision argument, and it used strtok() entirely gratuitously, and it stripped off trailing spaces by hand instead of just not asking for them to begin with. Coverity noticed the potential null pointer dereference (though I wonder why it didn't complain years ago, since this code is ancient). Since this is just code cleanup and forestalling a hypothetical future bug, there seems no need for back-patching.
  • Fix latent portability issue in pgwin32_dispatch_queued_signals(). The first iteration of the signal-checking loop would compute sigmask(0) which expands to 1<<(-1) which is undefined behavior according to the C standard. The lack of field reports of trouble suggest that it evaluates to 0 on all existing Windows compilers, but that's hardly something to rely on. Since signal 0 isn't a queueable signal anyway, we can just make the loop iterate from 1 instead, and save a few cycles as well as avoiding the undefined behavior. In passing, avoid evaluating the volatile expression UNBLOCKED_SIGNAL_QUEUE twice in a row; there's no reason to waste cycles like that. Noted by Aleksander Alekseev, though this isn't his proposed fix. Back-patch to all supported branches.
  • Introduce a LOG_SERVER_ONLY ereport level, which is never sent to client. This elevel is useful for logging audit messages and similar information that should not be passed to the client. It's equivalent to LOG in terms of decisions about logging priority in the postmaster log, but messages with this elevel will never be sent to the client. In the current implementation, it's just an alias for the longstanding COMMERROR elevel (or more accurately, we've made COMMERROR an alias for this). At some point it might be interesting to allow a LOG_ONLY flag to be attached to any elevel, but that would be considerably more complicated, and it's not clear there's enough use-cases to justify the extra work. For now, let's just take the easy 90% solution. David Steele, reviewed by Fabien Coelho, Petr Jelínek, and myself
  • Add a \gexec command to psql for evaluation of computed queries. \gexec executes the just-entered query, like \g, but instead of printing the results it takes each field as a SQL command to send to the server. Computing a series of queries to be executed is a fairly common thing, but up to now you always had to resort to kluges like writing the queries to a file and then inputting the file. Now it can be done with no intermediate step. The implementation is fairly straightforward except for its interaction with FETCH_COUNT. ExecQueryUsingCursor isn't capable of being called recursively, and even if it were, its need to create a transaction block interferes unpleasantly with the desired behavior of \gexec after a failure of a generated query (i.e., that it can continue). Therefore, disable use of ExecQueryUsingCursor when doing the master \gexec query. We can still apply it to individual generated queries, however, and there might be some value in doing so. While testing this feature's interaction with single-step mode, I (tgl) was led to conclude that SendQuery needs to recognize SIGINT (cancel_pressed) as a negative response to the single-step prompt. Perhaps that's a back-patchable bug fix, but for now I just included it here. Corey Huinker, reviewed by Jim Nasby, Daniel Vérité, and myself
  • Add a few comments about ANALYZE's strategy for collecting MCVs. Alex Shulgin complained that the underlying strategy wasn't all that apparent, particularly not the fact that we intentionally have two code paths depending on whether we think the column has a limited set of possible values or not. Try to make it clearer.
  • Partially revert commit 3d3bf62f30200500637b24fdb7b992a99f9704c3. On reflection, the pre-existing logic in ANALYZE is specifically meant to compare the frequency of a candidate MCV against the estimated frequency of a random distinct value across the whole table. The change to compare it against the average frequency of values actually seen in the sample doesn't seem very principled, and if anything it would make us less likely not more likely to consider a value an MCV. So revert that, but keep the aspect of considering only nonnull values, which definitely is correct. In passing, rename the local variables in these stanzas to "ndistinct_table", to avoid confusion with the "ndistinct" that appears at an outer scope in compute_scalar_stats.
  • Disallow newlines in parameter values to be set in ALTER SYSTEM. As noted by Julian Schauder in bug #14063, the configuration-file parser doesn't support embedded newlines in string literals. While there might someday be a good reason to remove that restriction, there doesn't seem to be one right now. However, ALTER SYSTEM SET could accept strings containing newlines, since many of the variable-specific value-checking routines would just see a newline as whitespace. This led to writing a file that was broken and had to be removed manually. Pending a reason to work harder, just throw an error if someone tries this. In passing, fix several places in the ALTER SYSTEM logic that failed to provide an errcode() for an ereport(), and thus would falsely log the failure as an internal XX000 error. Back-patch to 9.4 where ALTER SYSTEM was introduced.
  • Fix PL/Python for recursion and interleaved set-returning functions. PL/Python failed if a PL/Python function was invoked recursively via SPI, since arguments are passed to the function in its global dictionary (a horrible decision that's far too ancient to undo) and it would delete those dictionary entries on function exit, leaving the outer recursion level(s) without any arguments. Not deleting them would be little better, since the outer levels would then see the innermost level's arguments. Since PL/Python uses ValuePerCall mode for evaluating set-returning functions, it's possible for multiple executions of the same SRF to be interleaved within a query. PL/Python failed in such a case, because it stored only one iterator per function, directly in the function's PLyProcedure struct. Moreover, one interleaved instance of the SRF would see argument values that should belong to another. Hence, invent code for saving and restoring the argument entries. To fix the recursion case, we only need to save at recursive entry and restore at recursive exit, so the overhead in non-recursive cases is negligible. To fix the SRF case, we have to save when suspending a SRF and restore when resuming it, which is potentially not negligible; but fortunately this is mostly a matter of manipulating Python object refcounts and should not involve much physical data copying. Also, store the Python iterator and saved argument values in a structure associated with the SRF call site rather than the function itself. This requires adding a memory context deletion callback to ensure that the SRF state is cleaned up if the calling query exits before running the SRF to completion. Without that we'd leak a refcount to the iterator object in such a case, resulting in session-lifespan memory leakage. (In the pre-existing code, there was no memory leak because there was only one iterator pointer, but what would happen is that the previous iterator would be resumed by the next query attempting to use the SRF. Hardly the semantics we want.) We can buy back some of whatever overhead we've added by getting rid of PLy_function_delete_args(), which seems a useless activity: there is no need to delete argument entries from the global dictionary on exit, since the next time anyone would see the global dict is on the next fresh call of the PL/Python function, at which time we'd overwrite those entries with new arg values anyway. Also clean up some really ugly coding in the SRF implementation, including such gems as returning directly out of a PG_TRY block. (The only reason that failed to crash hard was that all existing call sites immediately exited their own PG_TRY blocks, popping the dangling longjmp pointer before there was any chance of it being used.) In principle this is a bug fix; but it seems a bit too invasive relative to its value for a back-patch, and besides the fix depends on memory context callbacks so it could not go back further than 9.5 anyway. Alexey Grishchenko and Tom Lane
  • Run pgindent on a batch of (mostly-planner-related) source files. Getting annoyed at the amount of unrelated chatter I get from pgindent'ing Rowley's unique-joins patch. Re-indent all the files it touches.
  • Refactor join_is_removable() to separate out distinctness-proving logic. Extracted from pending unique-join patch, since this is a rather large delta but it's simply moving code out into separately-accessible subroutines. I (tgl) did choose to add a bit more logic to rel_supports_distinctness, so that it verifies that there's at least one potentially usable unique index rather than just checking indexlist != NIL. Otherwise there's no functional change here. David Rowley
  • Fix multiple bugs in tablespace symlink removal. Don't try to examine S_ISLNK(st.st_mode) after a failed lstat(). It's undefined. Also, if the lstat() reported ENOENT, we do not wish that to be a hard error, but the code might nonetheless treat it as one (giving an entirely misleading error message, too) depending on luck-of-the-draw as to what S_ISLNK() returned. Don't throw error for ENOENT from rmdir(), either. (We're not really expecting ENOENT because we just stat'd the file successfully; but if we're going to allow ENOENT in the symlink code path, surely the directory code path should too.) Generate an appropriate errcode for its-the-wrong-type-of-file complaints. (ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write errcode() around it certainly doesn't work, and not writing an errcode at all is not per project policy.) Valgrind noticed the undefined S_ISLNK result; the other problems emerged while reading the code in the area. All of this appears to have been introduced in 8f15f74a44f68f9c. Back-patch to 9.5 where that commit appeared.
  • Add BSD authentication method. Create a "bsd" auth method that works the same as "password" so far as clients are concerned, but calls the BSD Authentication service to check the password. This is currently only available on OpenBSD. Marisa Emerson, reviewed by Thomas Munro
  • Fix unstable regression test output. Output order from the pg_indexes view might vary depending on the phase of the moon, so add ORDER BY to ensure stable results of tests added by commit 386e3d7609c49505e079c40c65919d99feb82505. Per buildfarm.
  • Run pgindent on generic_xlog.c. This code desperately needs some micro-optimization, and I'd like it to be formatted a bit more nicely while I work on it.
  • Code review/prettification for generic_xlog.c. Improve commentary, use more specific names for the delta fields, const-ify pointer arguments where possible, avoid assuming that initializing only the first element of a local array will guarantee that the remaining elements end up as we need them. (I think that code in generic_redo actually worked, but only because InvalidBuffer is zero; this is a particularly ugly way of depending on that ...)
  • Get rid of blinsert()'s use of GenericXLogUnregister(). That routine is dangerous, and unnecessary once we get rid of this one caller. In passing, fix failure to clean up temp memory context, or switch back to caller's context, during slowest exit path.
  • Get rid of GenericXLogUnregister(). This routine is unsafe as implemented, because it invalidates the page image pointers returned by previous GenericXLogRegister() calls. Rather than complicate the API or the implementation to avoid that, let's just get rid of it; the use-case for having it seems much too thin to justify a lot of work here. While at it, do some wordsmithing on the SGML docs for generic WAL.
  • Fix PL/Python ereport() test to work on Python 2.3. Per buildfarm. Pavel Stehule
  • Micro-optimize GenericXLogFinish(). Make the inner comparison loops of computeDelta() as tight as possible by pulling considerations of valid and invalid ranges out of the inner loops, and extending a match or non-match detection as far as possible before deciding what to do next. To keep this tractable, give up the possibility of merging fragments across the pd_lower to pd_upper gap. The fraction of pages where that could happen (ie, there are 4 or fewer bytes in the gap, *and* data changes immediately adjacent to it on both sides) is too small to be worth spending cycles on. Also, avoid two BLCKSZ-length memcpy()s by computing the delta before moving data into the target buffer, instead of after. This doesn't save nearly as many cycles as being tenser about computeDelta(), but it still seems worth doing. On my machine, this patch cuts a full 40% off the runtime of contrib/bloom's regression test.
  • Further minor improvement in generic_xlog.c: always say REGBUF_STANDARD. Since we're requiring pages handled by generic_xlog.c to be standard format, specify REGBUF_STANDARD when doing a full-page image, so that xloginsert.c can compress out the "hole" between pd_lower and pd_upper. Given the current API in which this path will be taken only for a newly initialized page, the hole is likely to be particularly large in such cases, so that this oversight could easily be performance-significant. I don't notice any particular change in the runtime of contrib/bloom's regression test, though.
  • Improve contrib/bloom regression test using code coverage info. Originally, this test created a 100000-row test table, which made it run rather slowly compared to other contrib tests. Investigation with gcov showed that we got no further improvement in code coverage after the first 700 or so rows, making the large table 99% a waste of time. Cut it back to 2000 rows to fix the runtime problem and still leave some headroom for testing behaviors that may appear later. A closer look at the gcov results showed that the main coverage omissions in contrib/bloom occurred because the test never filled more than one entry in the notFullPage array; which is unsurprising because it exercised index cleanup only in the scenario of complete table deletion, allowing every page in the index to become deleted rather than not-full. Add testing that allows the not-full path to be exercised as well. Also, test the amvalidate function, because blvalidate.c had zero coverage without that, and besides it's a good idea to check for mistakes in the bloom opclass definitions.
  • Fix access-to-already-freed-memory issue in plpython's error handling. PLy_elog() could attempt to access strings that Python had already freed, because the strings that PLy_get_spi_error_data() returns are simply pointers into storage associated with the error "val" PyObject. That's fine at the instant PLy_get_spi_error_data() returns them, but just after that PLy_traceback() intentionally releases the only refcount on that object, allowing it to be freed --- so that the strings we pass to ereport() are dangling pointers. In principle this could result in garbage output or a coredump. In practice, I think the risk is pretty low, because there are no Python operations between where we decrement that refcount and where we use the strings (and copy them into PG storage), and thus no reason for Python to recycle the storage. Still, it's clearly hazardous, and it leads to Valgrind complaints when running under a Valgrind that hasn't been lobotomized to ignore Python memory allocations. The code was a mess anyway: we fetched the error data out of Python (clearing Python's error indicator) with PyErr_Fetch, examined it, pushed it back into Python with PyErr_Restore (re-setting the error indicator), then immediately pulled it back out with another PyErr_Fetch. Just to confuse matters even more, there were some gratuitous-and-yet-hazardous PyErr_Clear calls in the "examine" step, and we didn't get around to doing PyErr_NormalizeException until after the second PyErr_Fetch, making it even less clear which object was being manipulated where and whether we still had a refcount on it. (If PyErr_NormalizeException did substitute a different "val" object, it's possible that the problem could manifest for real, because then we'd be doing assorted Python stuff with no refcount on the object we have string pointers into.) So, rearrange all that into some semblance of sanity, and don't decrement the refcount on the Python error objects until the end of PLy_elog(). In HEAD, I failed to resist the temptation to reformat some messy bits from 5c3c3cd0a3046339 along the way. Back-patch as far as 9.2, because the code is substantially the same that far back. I believe that 9.1 has the bug as well; but the code around it is rather different and I don't want to take a chance on breaking something for what seems a low-probability problem.
  • Clean up foreign-key caching code in planner. Coverity complained that the code added by 015e88942aa50f0d lacked an error check for SearchSysCache1 failures, which it should have. But the code was pretty duff in other ways too, including failure to think about whether it could really cope with arrays of different lengths.
  • pg_dump: add missing "destroyPQExpBuffer(query)" in dumpForeignServer(). Coverity complained about this resource leak (why now, I don't know, since it's been like that a long time). Our general policy in pg_dump is that PQExpBuffers are worth cleaning up, so do it here too. But don't bother with a back-patch, because it seems unlikely that very many databases contain enough FOREIGN SERVER objects to notice.
  • Add comment about intentional fallthrough in switch. Coverity complained about an apparent missing "break" in a switch added by bb140506df605fab. The human-readable comments are pretty clear that this is intentional, but add a standard /* FALL THRU */ comment to make it clear to tools too.
  • Fix poorly thought-through code from commit 5c3c3cd0a3046339. It's not entirely clear to me whether PyString_AsString can return null (looks like the answer might vary between Python 2 and 3). But in any case, this code's attempt to cope with the possibility was quite broken, because pstrdup() neither allows a null argument nor ever returns a null. Moreover, the code below this point assumes that "message" is a palloc'd string, which would not be the case for a dgettext result. Fix both problems by doing the pstrdup step separately.

Dean Rasheed pushed:

  • Improve estimate of distinct values in estimate_num_groups(). When adjusting the estimate for the number of distinct values from a rel in a grouped query to take into account the selectivity of the rel's restrictions, use a formula that is less likely to produce under-estimates. The old formula simply multiplied the number of distinct values in the rel by the restriction selectivity, which would be correct if the restrictions were fully correlated with the grouping expressions, but can produce significant under-estimates in cases where they are not well correlated. The new formula is based on the random selection probability, and so assumes that the restrictions are not correlated with the grouping expressions. This is guaranteed to produce larger estimates, and of course risks over-estimating in cases where the restrictions are correlated, but that has less severe consequences than under-estimating, which might lead to a HashAgg that consumes an excessive amount of memory. This could possibly be improved upon in the future by identifying correlated restrictions and using a hybrid of the old and new formulae. Author: Tomas Vondra, with some hacking be me Reviewed-by: Mark Dilger, Alexander Korotkov, Dean Rasheed and Tom Lane Discussion:

Teodor Sigaev pushed:

Ãlvaro Herrera pushed:

Peter Eisentraut pushed:

  • Fix error message from wal_level value renaming found by Ian Barwick.
  • pg_dump: Add table qualifications to some tags. Some object types have names that are only unique for one table. But for those we generally didn't put the table name into the dump TOC tag. So it was impossible to identify these objects if the same name was used for multiple tables. This affects policies, column defaults, constraints, triggers, and rules. Fix by adding the table name to the TOC tag, so that it now reads "$schema $table $object". Reviewed-by: Michael Paquier <>
  • Set PAM_RHOST item for PAM authentication. The PAM_RHOST item is set to the remote IP address or host name and can be used by PAM modules. A pg_hba.conf option is provided to choose between IP address and resolved host name. From: Grzegorz Sampolski <> Reviewed-by: Haribabu Kommi <>
  • Fix printf format.
  • Replace printf format %i by %d. see also ce8d7bb6440710058503d213b2aafcdf56a5b481
  • Distrust external OpenSSL clients; clear err queue. OpenSSL has an unfortunate tendency to mix per-session state error handling with per-thread error handling. This can cause problems when programs that link to libpq with OpenSSL enabled have some other use of OpenSSL; without care, one caller of OpenSSL may cause problems for the other caller. Backend code might similarly be affected, for example when a third party extension independently uses OpenSSL without taking the appropriate precautions. To fix, don't trust other users of OpenSSL to clear the per-thread error queue. Instead, clear the entire per-thread queue ahead of certain I/O operations when it appears that there might be trouble (these I/O operations mostly need to call SSL_get_error() to check for success, which relies on the queue being empty). This is slightly aggressive, but it's pretty clear that the other callers have a very dubious claim to ownership of the per-thread queue. Do this is both frontend and backend code. Finally, be more careful about clearing our own error queue, so as to not cause these problems ourself. It's possibly that control previously did not always reach SSLerrmessage(), where ERR_get_error() was supposed to be called to clear the queue's earliest code. Make sure ERR_get_error() is always called, so as to spare other users of OpenSSL the possibility of similar problems caused by libpq (as opposed to problems caused by a third party OpenSSL library like PHP's OpenSSL extension). Again, do this is both frontend and backend code. See bug #12799 and Based on patches by Dave Vitek and Peter Eisentraut. From: Peter Geoghegan <>

Magnus Hagander pushed:

  • Fix typo. Etsuro Fujita
  • Implement backup API functions for non-exclusive backups. Previously non-exclusive backups had to be done using the replication protocol and pg_basebackup. With this commit it's now possible to make them using pg_start_backup/pg_stop_backup as well, as long as the backup program can maintain a persistent connection to the database. Doing this, backup_label and tablespace_map are returned as results from pg_stop_backup() instead of being written to the data directory. This makes the server safe from a crash during an ongoing backup, which can be a problem with exclusive backups. The old syntax of the functions remain and work exactly as before, but since the new syntax is safer this should eventually be deprecated and removed. Only reference documentation is included. The main section on backup still needs to be rewritten to cover this, but since that is already scheduled for a separate large rewrite, it's not included in this patch. Reviewed by David Steele and Amit Kapila
  • Add authentication parameters compat_realm and upn_usename for SSPI. These parameters are available for SSPI authentication only, to make it possible to make it behave more like "normal gssapi", while making it possible to maintain compatibility. compat_realm is on by default, but can be turned off to make the authentication use the full Kerberos realm instead of the NetBIOS name. upn_username is off by default, and can be turned on to return the users Kerberos UPN rather than the SAM-compatible name (a user in Active Directory can have both a legacy SAM-compatible username and a new Kerberos one. Normally they are the same, but not always) Author: Christian Ullrich Reviewed by: Robbie Harwood, Alvaro Herrera, me

Robert Haas pushed:

Fujii Masao pushed:

  • Support multiple synchronous standby servers. Previously synchronous replication offered only the ability to confirm that all changes made by a transaction had been transferred to at most one synchronous standby server. This commit extends synchronous replication so that it supports multiple synchronous standby servers. It enables users to consider one or more standby servers as synchronous, and increase the level of transaction durability by ensuring that transaction commits wait for replies from all of those synchronous standbys. Multiple synchronous standby servers are configured in synchronous_standby_names which is extended to support new syntax of 'num_sync ( standby_name [ , ... ] )', where num_sync specifies the number of synchronous standbys that transaction commits need to wait for replies from and standby_name is the name of a standby server. The syntax of 'standby_name [ , ... ]' which was used in 9.5 or before is also still supported. It's the same as new syntax with num_sync=1. This commit doesn't include "quorum commit" feature which was discussed in pgsql-hackers. Synchronous standbys are chosen based on their priorities. synchronous_standby_names determines the priority of each standby for being chosen as a synchronous standby. The standbys whose names appear earlier in the list are given higher priority and will be considered as synchronous. Other standby servers appearing later in this list represent potential synchronous standbys. The regression test for multiple synchronous standbys is not included in this commit. It should come later. Authors: Sawada Masahiko, Beena Emerson, Michael Paquier, Fujii Masao Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Robert Haas, Simon Riggs, Amit Langote, Thomas Munro, Sameer Thakur, Suraj Kharage, Abhijit Menon-Sen, Rajeev Rastogi Many thanks to the various individuals who were involved in discussing and developing this feature.
  • Use proper format specifier %X/%X for LSN, again. Commit cee31f5 fixed this problem, but commit 989be08 accidentally reverted the fix. Thomas Munro
  • Fix a couple of places in doc that implied there was only one sync standby. Thomas Munro
  • Add regression tests for multiple synchronous standbys. Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me Reviewed-By: Kyotaro Horiguchi

Simon Riggs pushed:

  • Revert bf08f2292ffca14fd133aa0901d1563b6ecd6894. Remove recent changes to logging XLOG_RUNNING_XACTS by request.
  • Avoid archiving XLOG_RUNNING_XACTS on idle server. If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle. Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier, though this is not his proposed fix. Simple non-invasive patch to allow later backpatch to 9.4 and 9.5
  • Modify test_decoding/messages to remove non-ascii chars
  • Generic Messages for Logical Decoding. API and mechanism to allow generic messages to be inserted into WAL that are intended to be read by logical decoding plugins. This commit adds an optional new callback to the logical decoding API. Messages are either text or bytea. Messages can be transactional, or not, and are identified by a prefix to allow multiple concurrent decoding plugins. (Not to be confused with Generic WAL records, which are intended to allow crash recovery of extensible objects.) Author: Petr Jelinek and Andres Freund Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs Discussion:
  • Load FK defs into relcache for use by planner. Fastpath ignores this if no triggers defined. Author: Tomas Vondra, with fastpath and comments added by me Reviewers: David Rowley, Simon Riggs
  • Use Foreign Key relationships to infer multi-column join selectivity. In cases where joins use multiple columns we currently assess each join separately causing gross mis-estimates for join cardinality. This patch adds use of FK information for the first time into the planner. When FKs are present and we have multi-column join information, plan estimates will be drastically improved. Cases with multiple FKs are handled, though partial matches are ignored currently. Net effect is substantial performance improvements for joins in many common cases. Additional planning time is isolated to cases that are currently performing poorly, measured at 0.08 - 0.15 ms. Please watch for planner performance regressions; circumstances seem unlikely but the law of unintended consequences may apply somewhen. Additional complex tests welcome to prove this before release. Tests can be performed using SET enable_fkey_estimates = on | off using scripts provided during Hackers discussions, message id: Authors: Tomas Vondra and David Rowley Reviewed and tested by Simon Riggs, adding comments only

Stephen Frost pushed:

  • Add new catalog called pg_init_privs. This new catalog holds the privileges which the system was initialized with at initdb time, along with any permissions set by extensions at CREATE EXTENSION time. This allows pg_dump (and any other similar use-cases) to detect when the privileges set on initdb-created or extension-created objects have been changed from what they were set to at initdb/extension-creation time and handle those changes appropriately. Reviews by Alexander Korotkov, Jose Luis Tallon
  • In pg_dump, use a bitmap to represent what to include. pg_dump has historically used a simple boolean 'dump' value to indicate if a given object should be included in the dump or not. Instead, use a bitmap which breaks down the components of an object into their distinct pieces and use that bitmap to only include the components requested. This does not include any behavioral change, but is in preperation for the change to dump out just ACLs for objects in pg_catalog. Reviews by Alexander Korotkov, Jose Luis Tallon
  • In pg_dump, include pg_catalog and extension ACLs, if changed. Now that all of the infrastructure exists, add in the ability to dump out the ACLs of the objects inside of pg_catalog or the ACLs for objects which are members of extensions, but only if they have been changed from their original values. The original values are tracked in pg_init_privs. When pg_dump'ing 9.6-and-above databases, we will dump out the ACLs for all objects in pg_catalog and the ACLs for all extension members, where the ACL has been changed from the original value which was set during either initdb or CREATE EXTENSION. This should not change dumps against pre-9.6 databases. Reviews by Alexander Korotkov, Jose Luis Tallon
  • In pg_dump, split "dump" into "dump" and "dump_contains". Historically, the "dump" component of the namespace has been used to decide if the objects inside of the namespace should be dumped also. Given that "dump" is now a bitmask and may be partial, and we may want to dump out all components of the namespace object but only some of the components of objects contained in the namespace, create a "dump_contains" bitmask which will represent what components of the objects inside of a namespace should be dumped out. No behavior change here, but in preparation for a change where we will dump out just the ACLs of objects in pg_catalog, but we might not dump out the ACL of the pg_catalog namespace itself (for instance, when it hasn't been changed from the value set at initdb time). Reviews by Alexander Korotkov, Jose Luis Tallon
  • Bump catversion for pg_dump dump catalog ACL patches. Pointed out by Tom.
  • Use GRANT system to manage access to sensitive functions. Now that pg_dump will properly dump out any ACL changes made to functions which exist in pg_catalog, switch to using the GRANT system to manage access to those functions. This means removing 'if (!superuser()) ereport()' checks from the functions themselves and then REVOKEing EXECUTE right from 'public' for these functions in system_views.sql. Reviews by Alexander Korotkov, Jose Luis Tallon
  • GRANT rights to CURRENT_USER instead of adding roles. We shouldn't be adding roles during the regression tests as that can cause back-to-back installcheck runs to fail and users running the regression tests likley don't want those extra roles. Pointed out by Tom
  • In dumpTable, re-instate the skipping logic. Pretty sure I removed this based on some incorrect thinking that it was no longer possible to reach this point for a table which will not be dumped, but that's clearly wrong. Pointed out on IRC by Erik Rijkers.
  • Fix improper usage of 'dump' bitmap. Now that 'dump' is a bitmap, we can't simply set it to 'true'. Noticed while debugging the prior issue.
  • Create default roles. This creates an initial set of default roles which administrators may use to grant access to, historically, superuser-only functions. Using these roles instead of granting superuser access reduces the number of superuser roles required for a system. Documention for each of the default roles has been added to user-manag.sgml. Bump catversion to 201604082, as we had a commit that bumped it to 201604081 and another that set it back to 201604071... Reviews by José Luis Tallón and Robert Haas
  • Reserve the "pg_" namespace for roles. This will prevent users from creating roles which begin with "pg_" and will check for those roles before allowing an upgrade using pg_upgrade. This will allow for default roles to be provided at initdb time. Reviews by José Luis Tallón and Robert Haas

Noah Misch pushed:

  • Standardize GetTokenInformation() error reporting. Commit c22650cd6450854e1a75064b698d7dcbb4a8821a sparked a discussion about diverse interpretations of "token user" in error messages. Expel old and new specimens of that phrase by making all GetTokenInformation() callers report errors the way GetTokenUser() has been reporting them. These error conditions almost can't happen, so users are unlikely to observe this change. Reviewed by Tom Lane and Stephen Frost.
  • Remove redundant message in AddUserToTokenDacl(). GetTokenUser() will have reported an adequate error message. These error conditions almost can't happen, so users are unlikely to observe this change. Reviewed by Tom Lane and Stephen Frost.

Kevin Grittner pushed:

  • Detect SSI conflicts before reporting constraint violations. While prior to this patch the user-visible effect on the database of any set of successfully committed serializable transactions was always consistent with some one-at-a-time order of execution of those transactions, the presence of declarative constraints could allow errors to occur which were not possible in any such ordering, and developers had no good workarounds to prevent user-facing errors where they were not necessary or desired. This patch adds a check for serialization failure ahead of duplicate key checking so that if a developer explicitly (redundantly) checks for the pre-existing value they will get the desired serialization failure where the problem is caused by a concurrent serializable transaction; otherwise they will get a duplicate key error. While it would be better if the reads performed by the constraints could count as part of the work of the transaction for serialization failure checking, and we will hopefully get there some day, this patch allows a clean and reliable way for developers to work around the issue. In many cases existing code will already be doing the right thing for this to "just work". Author: Thomas Munro, with minor editing of docs by me Reviewed-by: Marko Tiikkaja, Kevin Grittner
  • Modify BufferGetPage() to prepare for "snapshot too old" feature. This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
  • Add snapshot_too_old to NSVC @contrib_excludes. The buildfarm showed failure for Windows MSVC builds due to this omission. This might not be the only problem with the Makefile for this feature, but hopefully this will get it past the immediate problem. Fix suggested by Tom Lane
  • Fix typo in C comment.
  • Turn special page pointer validation to static inline function. Inclusion of multiple macros inside another macro was pushing MSVC past its size liimit. Reported by buildfarm.
  • Add the "snapshot too old" feature. This feature is controlled by a new old_snapshot_threshold GUC. A value of -1 disables the feature, and that is the default. The value of 0 is just intended for testing. Above that it is the number of minutes a snapshot can reach before pruning and vacuum are allowed to remove dead tuples which the snapshot would otherwise protect. The xmin associated with a transaction ID does still protect dead tuples. A connection which is using an "old" snapshot does not get an error unless it accesses a page modified recently enough that it might not be able to produce accurate results. This is similar to the Oracle feature, and we use the same SQLSTATE and error message for compatibility.

Andres Freund pushed:

  • Increase maximum number of clog buffers. Benchmarking has shown that the current number of clog buffers limits scalability. We've previously increased the number in 33aaa139, but that's not sufficient with a large number of clients. We've benchmarked the cost of increasing the limit by benchmarking worst case scenarios; testing showed that 128 buffers don't cause a regression, even in contrived scenarios, whereas 256 does There are a number of more complex patches flying around to address various clog scalability problems, but this is simple enough that we can get it into 9.6; and is beneficial even after those patches have been applied. It is a bit unsatisfactory to increase this in small steps every few releases, but a better solution seems to require a rewrite of slru.c; not something done quickly. Author: Amit Kapila and Andres Freund Discussion:
  • Expose more out/readfuncs support functions. Previously bcac23d exposed a subset of support functions, namely the ones Kaigai found useful. In I mentioned that there's some functions missing to use the facility in an external project. To avoid having to add functions piecemeal, add all the functions which are used to define READ_* and WRITE_* macros; users of the extensible node functionality are likely to need these. Additionally expose outDatum(), which doesn't have it's own WRITE_ macro, as it needs information from the embedding struct. Discussion:
  • Avoid the use of a separate spinlock to protect a LWLock's wait queue. Previously we used a spinlock, in adition to the atomically manipulated ->state field, to protect the wait queue. But it's pretty simple to instead perform the locking using a flag in state. Due to 6150a1b0 BufferDescs, on platforms (like PPC) with > 1 byte spinlocks, increased their size above 64byte. As 64 bytes are the size we pad allocated BufferDescs to, this can increase false sharing; causing performance problems in turn. Together with the previous commit this reduces the size to <= 64 bytes on all common platforms. Author: Andres Freund Discussion:
  • Allow Pin/UnpinBuffer to operate in a lockfree manner. Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell

Andrew Dunstan pushed:

Correctifs rejetés (à ce jour)

No one was disappointed this week :-)

Correctifs en attente

Kyotaro HORIGUCHI sent in two more revisions of a patch to add tab completion for IF [NOT] EXISTS to psql.

Emre Hasegeli sent in a patch to change "magick" to "magic."

Artur Zakirov sent in a patch to fix a typo in the documentation of indexam.

Aleksander Alekseev sent in two revisions of a patch to fix an issue in how sigmask works.

David Steele sent in two more revisions of a patch to filter messages including errors that might expose information needlessly.

Fabrízio de Royes Mello sent in two more revisions of a patch to add a sequence access method.

Amit Langote and Kyotaro HORIGUCHI traded patches to fix an issue where altering a foreign table was failing to invalidate prepared statement execution plan that depended on its previous state.

Robbie Harwood sent in another revision of a patch to implement GSSAPI encryption.

Etsuro Fujita sent in another revision of a patch to handle odd system-column handling in postgres_fdw join pushdown.

Etsuro Fujita and Rushabh Lathia traded patches to optimize writes to the postgres fdw.

Rod Taylor sent in a patch to implement LOCK TABLE .. DEFERRABLE.

Craig Ringer sent in a patch to add some tests to pg_xlogdump.

Craig Ringer sent in a patch to fix incorrect comments introduced in logical decoding timeline following.

Anastasia Lubennikova sent in five more revisions of a patch to add covering unique indexes.

Pavel Stěhule sent in another revision of a patch to create a RAW output format for COPY.

Aleksander Alekseev sent in a patch to simplify reorderbuffer.c

Simon Riggs sent in another revision of a patch to avoid archiving XLOG_RUNNING_XACTS on idle server.

Muhammad Asif Naeem sent in a patch to add an EMERGENCY option to VACUUM that forces to avoid extend any entries in the VM or FSM.

David Rowley and Tom Lane traded patches to improve performance of outer joins when the outer side is unique.

Fabien COELHO sent in a patch to allow seeding randomness in pgbench.

WANGSHUO sent in a patch to allow UPDATE to operate on column aliases.

Michaël Paquier sent in another revision of a patch to add support for VS 2015 in MSVC scripts.

Michaël Paquier sent in a patch to fix parallel pg_dump on Win32.

Aleksander Alekseev sent in a patch to fix an issue in snapmgr.c that wrongly assumes that subxcnt > 0 iff xcnt > 0.

Anastasia Lubennikova sent in a patch to make the amcheck tool work with covering unique indexes.

Constantin S. Pan sent in another revision of a patch to speed up GIN index builds with parallel workers.

Stephen Frost sent in a patch to fix breakage of pg_dump caused by the covering unique indexes patch.

Stephen Frost sent in a patch to remove superuser checks in pgstattuple 1.4.

David Fetter sent in two more revisions of a patch to implement weighted central moments as aggregates.

Stas Kelvich sent in another revision of a patch to speed up 2PC transactions.

Jeff Janes sent in a patch to add tab completion for ALTER EXTENSION to psql.

David Rowley sent in a patch to make parallel aggregate costs consider combine/serial/deserial functions.

Stephen Frost sent in a patch to add regression tests for CREATE ROLE/USAGE.

Andres Freund sent in a patch to disable pymalloc when running under valgrind.

par N Bougain le vendredi 15 avril 2016 à 20h42

lundi 4 avril 2016


Nouvelles hebdomadaires de PostgreSQL - 3 avril 2016

Les mises à jour de sécurité 9.5.2, 9.4.7, 9.3.12, 9.2.16 et 9.1.21 ont été publiées. Mettez à jour dès que possible !

Le premier meetup du PUG islamabadien (Pakistan) aura lieu le 8 avril. Détails et RSVP ci-après :

Les nouveautés des produits dérivés

Offres d'emplois autour de PostgreSQL en avril

PostgreSQL Local

PostgreSQL dans les média

PostgreSQL Weekly News / les nouvelles hebdomadaires vous sont offertes cette semaine par David Fetter. Traduction par l'équipe PostgreSQLFr sous licence CC BY-NC-SA. La version originale se trouve à l'adresse suivante :

Proposez vos articles ou annonces avant dimanche 15:00 (heure du Pacifique). Merci de les envoyer en anglais à david (a), en allemand à pwn (a), en italien à pwn (a) et en espagnol à pwn (a)

Correctifs appliqués

Andres Freund pushed:

  • pg_rewind: Close backup_label file descriptor. This was a relatively harmless leak, as createBackupLabel() is only called once per pg_rewind invocation. Author: Michael Paquier Reported-By: Michael Paquier Discussion: Backpatch: 9.5, where pg_rewind was introduced
  • Fix LWLockReportWaitEnd() parameter list to be (void). Previously it was an "old style" function declaration.
  • pg_rewind: fsync target data directory. Previously pg_rewind did not fsync any files. That's problematic, given that the target directory is modified. If the database was started afterwards, 2ce439f33 luckily already caused the data directory to be synced to disk at postmaster startup; reducing the scope of the problem. To fix, use initdb -S, at the end of the pg_rewind run. It doesn't seem worthwhile to duplicate the code into pg_rewind, and initdb -S is already used that way by pg_upgrade. Reported-By: Andres Freund Author: Michael Paquier, somewhat edited by me Discussion: Backpatch: 9.5, where pg_rewind was introduced

Tom Lane pushed:

  • Clamp adjusted ndistinct to positive integer in estimate_hash_bucketsize(). This avoids a possible divide-by-zero in the following calculation, and rounding the number to an integer seems like saner behavior anyway. Assuming IEEE math, the division would yield +Infinity which would get replaced by 1.0 at the bottom of the function, so nothing really interesting would ensue; but avoiding divide-by-zero seems like a good idea on general principles. Per report from Piotr Stefaniak. No back-patch since this seems mostly cosmetic.
  • Guard against zero vardata.rel->tuples in estimate_hash_bucketsize(). If the referenced rel was proven empty, we'd compute 0/0 here, which results in the function returning NaN. That's a bit more serious than the other zero-divide case. Still, it only seems to be possible in HEAD, so no back-patch. Per report from Piotr Stefaniak. I looked through the rest of selfuncs.c and found no other likely trouble spots.
  • Release notes for 9.5.2, 9.4.7, 9.3.12, 9.2.16, 9.1.21.
  • Code and docs review for commit 3187d6de0e5a9e805b27c48437897e8c39071d45. Fix up check for high-bit-set characters, which provoked "comparison is always true due to limited range of data type" warnings on some compilers, and was unlike the way we do it elsewhere anyway. Fix omission of "$" from the set of valid identifier continuation characters. Get rid of sanitize_text(), which was utterly inconsistent with any other error report anywhere in the system, and wasn't even well designed on its own terms (double-quoting the result string without escaping contained double quotes doesn't seem very well thought out). Fix up error messages, which didn't follow the message style guidelines very well, and were overly specific in situations where the actual mistake might not be what they said. Improve documentation. (I started out just intending to fix the compiler warning, but the more I looked at the patch the less I liked it.)
  • Document errhidecontext() where it ought to be documented. Seems to have been missed when this function was added. Noted while looking at David Steele's proposal to add another similar function.
  • Sync our copy of the timezone library with IANA release tzcode2016c. We hadn't done this in about six years, which proves to have been a mistake because there's been a lot of code churn upstream, making the merge rather painful. But putting it off any further isn't going to lessen the pain, and there are at least two incompatible changes that we need to absorb before someone starts complaining that --with-system-tzdata doesn't work at all on their platform, or we get blindsided by a tzdata release that our out-of-date zic can't compile. Last week's "time zone abbreviation differs from POSIX standard" mess was a wake-up call in that regard. This is a sufficiently large patch that I'm afraid to back-patch it immediately, though the foregoing considerations imply that we probably should do so eventually. For the moment, just put it in HEAD so that it can get some testing. Maybe we can wait till the end of the 9.6 beta cycle before deeming it okay.
  • Fix MSVC build for changes in zic. zic now only needs zic.c, but I didn't realize knowledge about it was hardwired into Per buildfarm.
  • Sync tzload() and tzparse() APIs with IANA release tzcode2016c. This brings us a bit closer to matching upstream, but since it affects files outside src/timezone/, we might choose not to back-patch it. Hence keep it separate from the main update patch.
  • Fix portability issues in 86c43f4e22c0771fd0cc6bce2799802c894ee2ec. INT64_MIN/MAX should be spelled PG_INT64_MIN/MAX, per well established convention in our sources. Less obviously, a symbol named DOUBLE causes problems on Windows builds, so rename that to DOUBLE_CONST; and rename INTEGER to INTEGER_CONST for consistency. Also, get rid of incorrect/obsolete hand-munging of yycolumn, and fix the grammar for float constants to handle expected cases such as ".1". First two items by Michael Paquier, second two by me.
  • Fix zic for Windows. The new coding of dolink() is dependent on link() returning an on-point errno when it fails; but the quick-hack implementation of link() that we'd put in for Windows didn't bother with setting errno. Fix that. Analysis and patch by Christian Ullrich.
  • Protect zic's symlink() call with #ifdef HAVE_SYMLINK. The IANA crew seem to think that symlink() exists everywhere nowadays, and they may well be right. But we use #ifdef HAVE_SYMLINK elsewhere so for consistency we should do it here too. Noted by Michael Paquier.
  • Avoid possibly-unsafe use of Windows' FormatMessage() function. Whenever this function is used with the FORMAT_MESSAGE_FROM_SYSTEM flag, it's good practice to include FORMAT_MESSAGE_IGNORE_INSERTS as well. Otherwise, if the message contains any %n insertion markers, the function will try to fetch argument strings to substitute --- which we are not passing, possibly leading to a crash. This is exactly analogous to the rule about not giving printf() a format string you're not in control of. Noted and patched by Christian Ullrich. Back-patch to all supported branches.
  • Allow to_timestamp(float8) to convert float infinity to timestamp infinity. With the original SQL-function implementation, such cases failed because we don't support infinite intervals. Converting the function to C lets us bypass the interval representation, which should be a bit faster as well as more flexible. Vitaly Burovoy, reviewed by Anastasia Lubennikova
  • Fix interval_mul() to not produce insane results. interval_mul() attempts to prevent its calculations from producing silly results, but it forgot that zero times infinity yields NaN in IEEE arithmetic. Hence, a case like '1 second'::interval * 'infinity'::float8 produced a NaN for the months product, which didn't trigger the range check, resulting in bogus and possibly platform-dependent output. This isn't terribly obvious to the naked eye because if you try that exact case, you get "interval out of range" which is what you expect --- but if you look closer, the error is coming from interval_out not interval_mul. interval_mul has allowed a bogus value into the system. Fix by adding isnan tests. Noted while testing Vitaly Burovoy's fix for infinity input to to_timestamp(). Given the lack of field complaints, I doubt this is worth a back-patch.
  • Remove TZ environment-variable entry from postgres reference page. The server hasn't paid attention to the TZ environment variable since commit ca4af308c32d03db, but that commit missed removing this documentation reference, as did commit d883b916a947a3c6 which added the reference where it now belongs (initdb). Back-patch to 9.2 where the behavior changed. Also back-patch d883b916a947a3c6 as needed. Matthew Somerville
  • Remove just-added tests for to_timestamp(float8) with out-of-range inputs. Reporting the specific out-of-range input value produces platform-dependent results. We could skip reporting the value, but that's contrary to our message style guidelines and unhelpful to users. Or we could add a separate expected-output file for Windows, but that would be a substantial maintenance burden, and these test cases seem unlikely to be worth it. Per buildfarm.
  • Suppress uninitialized-variable warnings. My compiler doesn't like the lack of initialization of "flag", and I think it's right: if there were zero keys we'd have an undefined result. The AND of zero items is TRUE, so initialize to TRUE.
  • Improve portability of I/O behavior for the geometric types. Formerly, the geometric I/O routines such as box_in and point_out relied directly on strtod() and sprintf() for conversion of the float8 component values of their data types. However, the behavior of those functions is pretty platform-dependent, especially for edge-case values such as infinities and NaNs. This was exposed by commit acdf2a8b372aec1d, which added test cases involving boxes with infinity endpoints, and immediately failed on Windows and AIX buildfarm members. We solved these problems years ago in the main float8in and float8out functions, so let's fix it by making the geometric types use that code instead of depending directly on the platform-supplied functions. To do this, refactor the float8in code so that it can be used to parse just part of a string, and as a convenience make the guts of float8out usable without going through DirectFunctionCall. While at it, get rid of geo_ops.c's fairly shaky assumptions about the maximum output string length for a double, by having it build results in StringInfo buffers instead of fixed-length strings. In passing, convert all the "invalid input syntax for type foo" messages in this area of the code into "invalid input syntax for type %s" to reduce the number of distinct translatable strings, per recent discussion. We would have needed a fair number of the latter anyway for code-sharing reasons, so we might as well just go whole hog. Note: this patch is by no means intended to guarantee that the geometric types uniformly behave sanely for infinity or NaN component values. But any bugs we have in that line were there all along, they were just harder to reach in a platform-independent way.
  • Last-minute updates for release notes. Security: CVE-2016-2193, CVE-2016-3065
  • Support using index-only scans with partial indexes in more cases. Previously, the planner would reject an index-only scan if any restriction clause for its table used a column not available from the index, even if that restriction clause would later be dropped from the plan entirely because it's implied by the index's predicate. This is a fairly common situation for partial indexes because predicates using columns not included in the index are often the most useful kind of predicate, and we have to duplicate (or at least imply) the predicate in the WHERE clause in order to get the index to be considered at all. So index-only scans were essentially unavailable with such partial indexes. To fix, we have to do detection of implied-by-predicate clauses much earlier in the planner. This patch puts it in check_index_predicates (nee check_partial_indexes), meaning it gets done for every partial index, whereas we previously only considered this issue at createplan time, so that the work was only done for an index actually selected for use. That could result in a noticeable planning slowdown for queries against tables with many partial indexes. However, testing suggested that there isn't really a significant cost, especially not with reasonable numbers of partial indexes. We do get a small additional benefit, which is that cost_index is more accurate since it correctly discounts the evaluation cost of clauses that will be removed. We can also avoid considering such clauses as potential indexquals, which saves useless matching cycles in the case where the predicate columns aren't in the index, and prevents generating bogus plans that double-count the clause's selectivity when the columns are in the index. Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and Konstantin Knizhnik, and whacked around a little by me
  • Another zic portability fix. I should have remembered that we can't use INT64_MODIFIER with sscanf(): configure chooses that to work with snprintf(), but it might be for our src/port/snprintf.c implementation and so not compatible with the platform's sscanf(). This appears to be the explanation for buildfarm member frogmouth's continuing unhappiness with the tzcode update. Fortunately, in all of the places where zic is attempting to read into an int64 variable, it's reading a year which certainly will fit just fine into an int. So make it read into an int with %d, and then cast or copy as necessary.
  • Fix oversight in getParamDescriptions(), and improve comments. When getParamDescriptions was changed to handle out-of-memory better by cribbing error recovery logic from getRowDescriptions/getAnotherTuple, somebody omitted to copy the stanza about checking for excess data in the message. But you need to do that, since continue'ing out of the switch in pqParseInput3 means no such check gets applied there anymore. Noted while looking at Michael Paquier's patch that made yet another copy of this advance_and_error logic. (This whole business desperately needs refactoring, because I sure don't want to see a dozen copies of this code, but that's where we seem to be headed. What's more, the "suspend parsing on EOF return" convention is a holdover from protocol 2 and shouldn't exist at all in protocol 3, because we don't process partial messages anymore. But for now, just fix the obvious bug.) Also, fix some wrong/missing comments about what the API spec is for these three functions. This doesn't seem worthy of back-patching, even though it's a bug; the case shouldn't ever arise in the field.
  • Get rid of minus zero in box regression test. Commit acdf2a8b added a test case involving minus zero as a box endpoint. This is not very portable, as evidenced by the several older buildfarm members that are failing on the test because they print minus zero as just "0". If there were any significant reason to test this behavior, we could consider carrying a separate expected-file; but it doesn't look to me like there's adequate justification to accept such a maintenance burden. Just change the test to use plain zero, instead.
  • Omit null rows when applying the Haas-Stokes estimator for ndistinct. Previously, we included null rows in the values of n and N that went into the formula, which amounts to considering null as a value in its own right; but the d and f1 values do not include nulls. This is inconsistent, and it contributes to significant underestimation of ndistinct when the column is mostly nulls. In any case stadistinct is defined as the number of distinct non-null values, so we should exclude nulls when doing this computation. This is an aboriginal bug in our application of the Haas-Stokes formula, but we'll refrain from back-patching for fear of destabilizing plan choices in released branches. While at it, make the code a bit more readable by omitting unnecessary casts and intermediate variables. Observation and original patch by Tomas Vondra, adjusted to fix both uses of the formula by Alex Shulgin, cosmetic improvements by me
  • Omit null rows when setting the threshold for what's a most-common value. As with the previous patch, large numbers of null rows could skew this calculation unfavorably, causing us to discard values that have a legitimate claim to be MCVs, since our definition of MCV is that it's most common among the non-null population of the column. Hence, make the numerator of avgcount be the number of non-null sample values not the number of sample rows; likewise for maxmincount in the compute_scalar_stats variant. Also, make the denominator be the number of distinct values actually observed in the sample, rather than reversing it back out of the computed stadistinct. This avoids depending on the accuracy of the Haas-Stokes approximation, and really it's what we want anyway; the threshold should depend only on what we see in the sample, not on what we extrapolate about the contents of the whole column. Alex Shulgin, reviewed by Tomas Vondra and myself
  • Suppress compiler warning. Some buildfarm members are showing "comparison is always false due to limited range of data type" complaints on this test, so #ifdef it out on machines with 32-bit int.
  • Add missing "static". Per buildfarm member pademelon.
  • Make all the declarations of WaitEventSetWaitBlock be marked "inline". The inconsistency here triggered compiler warnings on some buildfarm members, and it's surely pretty pointless.
  • Add psql \errverbose command to see last server error at full verbosity. Often, upon getting an unexpected error in psql, one's first wish is that the verbosity setting had been higher; for example, to be able to see the schema-name field or the server code location info. Up to now the only way has been to adjust the VERBOSITY variable and repeat the failing query. That's a pain, and it doesn't work if the error isn't reproducible. This commit adds a psql feature that redisplays the most recent server error at full verbosity, without needing to make any variable changes or re-execute the failed command. We just need to hang onto the latest error PGresult in case the user executes \errverbose, and then apply libpq's new PQresultVerboseErrorMessage() function to it. This will consume some trivial amount of psql memory, but otherwise the cost when the feature isn't used should be negligible. Alex Shulgin, reviewed by Daniel Vérité, some improvements by me
  • Add libpq support for recreating an error message with different verbosity. Often, upon getting an unexpected error in psql, one's first wish is that the verbosity setting had been higher; for example, to be able to see the schema-name field or the server code location info. Up to now the only way has been to adjust the VERBOSITY variable and repeat the failing query. That's a pain, and it doesn't work if the error isn't reproducible. This commit adds support in libpq for regenerating the error message for an existing error PGresult at any desired verbosity level. This is almost just a matter of refactoring the existing code into a subroutine, but there is one bit of possibly-needed information that was not getting put into PGresults: the text of the last query sent to the server. We must add that string to the contents of an error PGresult. But we only need to save it if it might be used, which with the existing error-formatting code only happens if there is a PG_DIAG_STATEMENT_POSITION error field, which is probably pretty rare for errors in production situations. So really the overhead when the feature isn't used should be negligible. Alex Shulgin, reviewed by Daniel Vérité, some improvements by me
  • Clean up some stuff in new contrib/bloom module. Coverity complained about implicit sign-extension in the BloomPageGetFreeSpace macro, probably because sizeOfBloomTuple isn't wide enough for size calculations. No overflow is really possible as long as maxoff and sizeOfBloomTuple are small enough to represent a realistic situation, but it seems like a good idea to declare sizeOfBloomTuple as Size not int32. Add missing check on BloomPageAddItem() result, again from Coverity. Avoid core dump due to not allocating so->sign array when scan->numberOfKeys is zero. Also thanks to Coverity. Use FLEXIBLE_ARRAY_MEMBER rather than declaring an array as size 1 when it isn't necessarily. Very minor beautification of related code. Unfortunately, none of the Coverity-detected mistakes look like they could account for the remaining buildfarm unhappiness with this module. It's barely possible that the FLEXIBLE_ARRAY_MEMBER mistake does account for that, if it's enabling bogus compiler optimizations; but I'm not terribly optimistic. We probably still have bugs to find here.
  • Fix contrib/bloom to not fail under CLOBBER_CACHE_ALWAYS. The code was supposing that rd_amcache wouldn't disappear from under it during a scan; which is wrong. Copy the data out of the relcache rather than trying to reference it there.

Teodor Sigaev pushed:

Ãlvaro Herrera pushed:

  • Fix minor leak in pg_dump for ACCESS METHOD. Bug reported by Coverity. Author: Michaël Paquier
  • pg_rewind: Improve internationalization. This is mostly cosmetic since two of the three changes are debug messages, and the third one is just a progress indicator. Author: Michaël Paquier
  • Update expected file from quoting change. I neglected to update this in 59a2111b23f. Per buildfarm
  • Mention BRIN as able to do multi-column indexes. Documentation mentioned B-tree, GiST and GIN as able to do multicolumn indexes; I failed to add BRIN to the list. Author: Petr Jediný Reviewed-By: Fujii Masao, Emre Hasegeli
  • PostgresNode: initialize $timed_out if passed. Corrects an oversight in 2c83f435a3 where the $timed_out reference var isn't initialized; using it would require the caller to initialize it beforehand, which is cumbersome. Author: Craig Ringer
  • pgbench: allow a script weight of zero. This refines the previous weight range and allows a script to be "turned off" by passing a zero weight, which is useful when scripting multiple pgbench runs. I did not apply the suggested warning when a script uses zero weight; we use the principle elsewhere that if there's nothing to be done, do nothing quietly. Adjust docs accordingly. Author: Jeff Janes, Fabien Coelho
  • I forgot the alternate expected file in previous commit. Without this, the test_slot_timelines modules fails "make installcheck" because the required feature is not enabled in a stock server. Per buildfarm
  • Blind attempt at fixing Win32 issue on 24c5f1a103c. As best as I can tell, MyReplicationSlot needs to be PGDLLIMPORT in order for the new test_slot_timelines test module to compile. Per buildfarm
  • Fix broken variable declaration Author: Konstantin Knizhnik
  • Add missing checks to some of pageinspect's BRIN functions. brin_page_type() and brin_metapage_info() did not enforce being called by superuser, like other pageinspect functions that take bytea do. Since they don't verify the passed page thoroughly, it is possible to use them to read the server memory with a carefully crafted bytea value, up to a file kilobytes from where the input bytea is located. Have them throw errors if called by a non-superuser. Report and initial patch: Andreas Seltenreich Security: CVE-2016-3065
  • Fix recovery_min_apply_delay test. Previously this test was relying too much on WAL replay to occur in the exact configured interval, which was unreliable on slow or overly busy servers. Use a custom loop instead of poll_query_until, which is hopefully more reliable. Per continued failures on buildfarm member hamster (which is probably the only one running this test suite) Author: Michaël Paquier
  • Enable logical slots to follow timeline switches. When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines; otherwise, decoding fails after failover, because the archives are in the historical timeline. This is required to make "failover logical slots" possible; it currently has no other use, although theoretically it could be used by an extension that creates a slot on a standby and continues to replay from the slot when the standby is promoted. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek
  • Type names should not be quoted. Our actual convention, contrary to what I said in 59a2111b23f, is not to quote type names, as evidenced by unquoted use of format_type_be() result value in error messages. Remove quotes from recently tweaked messages accordingly. Per note from Tom Lane
  • Improve internationalization of messages involving type names. Change the slightly different variations of the message function FOO must return type BAR to a single wording, removing the variability in type name so that they all create a single translation entry; since the type name is not to be translated, there's no point in it being part of the message anyway. Also, change them all to use the same quoting convention, namely that the function name is not to be quoted but the type name is. (I'm not quite sure why this is so, but it's the clear majority.) Some similar messages such as "encoding conversion function FOO must ..." are also changed.
  • Fix logical_decoding_timelines test crashes. In the test_slot_timelines test module, we were abusing passing NULL values which was received as zeroes in x86, but this breaks in ARM (buildfarm member hamster) by crashing instead. Fix the breakage by marking these functions as STRICT; the InvalidXid value that was previously implicit in NULL values (on x86 at least) can now be passed as 0. Failing to follow the fmgr protocol to check for NULLs beforehand was causing ARM to fail, as evidenced by segmentation faults in buildfarm member hamster. In order to use the new functionality in the test script, use COALESCE in the right spot to avoid forwarding NULL values. This was diagnosed from the hamster crash by Craig Ringer, who also proposed a different patch (checking for NULL values explicitely in the C function code, and keeping the non-strictness in the C functions). I decided to go with this approach instead.
  • pgbench: Remove unused parameter. For some reason this parameter was introduced as unused in 3da0dfb4b146, and has never been used for anything. Remove it. Author: Fabien Coelho
  • test_slot_timelines: Fix alternate expected output
  • XLogReader general code cleanup. Some minor tweaks and comment additions, for cleanliness sake and to avoid having the upcoming timeline-following patch be polluted with unrelated cleanup. Extracted from a larger patch by Craig Ringer, reviewed by Andres Freund, with some additions by myself.

Robert Haas pushed:

  • pgbench: Support double constants and functions. The new functions are pi(), random(), random_exponential(), random_gaussian(), and sqrt(). I was worried that this would be slower than before, but, if anything, it actually turns out to be slightly faster, because we now express the built-in pgbench scripts using fewer lines; each \setrandom can be merged into a subsequent \set. Fabien Coelho
  • Fix typo in comment. Thomas Munro
  • On all Windows platforms, not just Cygwin, use _timezone and _tzname. Up until now, we've been using timezone and tzname, but Visual Studio 2015 (for which we wish to add support) no longer declares those symbols. All versions since Visual Studio 2003 apparently support the underscore-equipped names, and we don't support anything older than Visual Studio 2005, so this should work OK everywhere. But let's see what the buildfarm thinks. Michael Paquier, reviewed by Petr Jelinek
  • Don't require a user mapping for FDWs to work. Commit fbe5a3fb73102c2cfec11aaaa4a67943f4474383 accidentally changed this behavior; put things back the way they were, and add some regression tests. Report by Andres Freund; patch by Ashutosh Bapat, with a bit of kibitzing by me.
  • Rework custom scans to work more like the new extensible node stuff. Per discussion, the new extensible node framework is thought to be better designed than the custom path/scan/scanstate stuff we added in PostgreSQL 9.5. Rework the latter to be more like the former. This is not backward-compatible, but we generally don't promise that for C APIs, and there probably aren't many people using this yet anyway. KaiGai Kohei, reviewed by Petr Jelinek and me. Some further cosmetic changes by me.
  • pgbench: Remove \setrandom. You can now do the same thing via \set using the appropriate function, either random(), random_gaussian(), or random_exponential(), depending on the desired distribution. This is not backward-compatible, but per discussion, it's worth it to avoid having the old syntax hang around forever. Fabien Coelho, reviewed by Michael Paquier, and adjusted by me.
  • Fix pgbench documentation error. The description of what the per-transaction log file says for skipped transactions is just plain wrong. Report and patch by Tomas Vondra, reviewed by Fabien Coelho and modified by me.
  • Improve pgbench docs regarding per-transaction logging. The old documentation didn't know about the new -b flag, only about -f. Fabien Coelho
  • Allow aggregate transition states to be serialized and deserialized. This is necessary infrastructure for supporting parallel aggregation for aggregates whose transition type is "internal". Such values can't be passed between cooperating processes, because they are just pointers. David Rowley, reviewed by Tomas Vondra and by me.
  • Fix bug in aggregate (de)serialization commit. resulttypeLen and resulttypeByVal must be set correctly when serializing aggregates, not just when finalizing them. This was in David's final patch but I downloaded the wrong version by mistake and failed to spot the error. David Rowley
  • Add new replication mode synchronous_commit = 'remote_apply'. In this mode, the master waits for the transaction to be applied on the remote side, not just written to disk. That means that you can count on a transaction started on the standby to see all commits previously acknowledged by the master. To make this work, the standby sends a reply after replaying each commit record generated with synchronous_commit >= 'remote_apply'. This introduces a small inefficiency: the extra replies will be sent even by standbys that aren't the current synchronous standby. But previously-existing synchronous_commit levels make no attempt at all to optimize which replies are sent based on what the primary cares about, so this is no worse, and at least avoids any extra replies for people not using the feature at all. Thomas Munro, reviewed by Michael Paquier and by me. Some additional tweaks by me.

Magnus Hagander pushed:

Fujii Masao pushed:

Stephen Frost pushed:

  • Reset plan->row_security_env and planUserId. In the plancache, we check if the environment we planned the query under has changed in a way which requires us to re-plan, such as when the user for whom the plan was prepared changes and RLS is being used (and, therefore, there may be different policies to apply). Unfortunately, while those values were set and checked, they were not being reset when the query was re-planned and therefore, in cases where we change role, re-plan, and then change role again, we weren't re-planning again. This leads to potentially incorrect policies being applied in cases where role-specific policies are used and a given query is planned under one role and then executed under other roles, which could happen under security definer functions or when a common user and query is planned initially and then re-used across multiple SET ROLEs. Further, extensions which made use of CopyCachedPlan() may suffer from similar issues as the RLS-related fields were not properly copied as part of the plan and therefore RevalidateCachedQuery() would copy in the current settings without invalidating the query. Fix by using the same approach used for 'search_path', where we set the correct values in CompleteCachedPlan(), check them early on in RevalidateCachedQuery() and then properly reset them if re-planning. Also, copy through the values during CopyCachedPlan(). Pointed out by Ashutosh Bapat. Reviewed by Michael Paquier. Back-patch to 9.5 where RLS was introduced. Security: CVE-2016-2193
  • Fix typo in pg_regress.c. s/afer/after Pointed out by Andreas 'ads' Scherbaum

Noah Misch pushed:

Simon Riggs pushed:

  • Avoid pin scan for replay of XLOG_BTREE_VACUUM in all cases. Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require complex interlocking that matched the requirements on the master. This required an O(N) operation that became a significant problem with large indexes, causing replication delays of seconds or in some cases minutes while the XLOG_BTREE_VACUUM was replayed. This commit skips the pin scan that was previously required, by observing in detail when and how it is safe to do so, with full documentation. The pin scan is skipped only in replay; the VACUUM code path on master is not touched here and WAL is identical. The current commit applies in all cases, effectively replacing commit 687f2cd7a0150647794efe432ae0397cb41b60ff.

Correctifs rejetés (à ce jour)

No one was disappointed this week :-)

Correctifs en attente

Andres Freund and Michaël Paquier traded patches to remove a race condition that could have caused fsync to have been assumed to have worked by not actually to have been checked.

SAWADA Masahiko sent in three more revisions of a patch to allow having N>1 synchronous standby servers.

Alexander Korotkov sent in a patch to prevent a false alarm about xid wraparound.

Anastasia Lubennikova sent in another revision of a patch to implement covering + unique indexes.

Peter Geoghegan sent in a patch to add an amcheck extension to contrib.

Kyotaro HORIGUCHI sent in a patch for an issue in psql to fix the condition to decide whether to add schema names.

Etsuro Fujita sent in a patch to add GetFdwScanTupleExtraData() and FillFdwScanTupleSysAttr().

Thomas Munro sent in four more revisions of a patch to implement "causal reads."

Dilip Kumar and Robert Haas traded patches to help scale relation extension.

Kyotaro HORIGUCHI and Artur Zakirov traded patches to add tab completion support for IF [NOT] EXISTS to psql.

Alexander Korotkov sent in five more revisions of a patch to move PinBuffer and UnpinBuffer to atomics.

Thomas Munro sent in another revision of a patch to add kqueue support.

Christian Ullrich sent in another revision of a patch to fix an SSPI authentication failure where the wrong realm name was used.

Fabrízio de Royes Mello and Petr Jelínek traded patches to add a sequence access method and implement gapless sequences using same.

Teodor Sigaev sent in another revision of a patch to add ICU support.

Alexander Korotkov sent in another revision of a patch to add partial sort.

Tom Lane sent in a patch to fix an issue that manifested as pg_restore casting check constraints differently.

Amit Langote sent in a patch to perform constraint name uniqueness check for index constraints.

Michaël Paquier sent in a patch to add a missing mention of GSSAPI in MSVC's

Dmitry Dolgov sent in two more revisions of a patch to add jsonb_insert().

Ian Lawrence Barwick sent in a patch to fix a replication slot creation error message in 9.6.

Andres Freund and Amit Kapila traded patches to speed up clog Access by increasing CLOG buffers.

Tomas Vondra and Dean Rasheed traded patches to improve GROUP BY estimation.

Pavel Stěhule sent in another revision of a patch to add a raw mode for COPY.

Aleksander Alekseev sent in a patch to implement a --disable-setproctitle flag.

Teodor Sigaev and Dmitry Ivanov traded patches to add a phrase search option to tsearch.

Bernd Helmle sent in a patch to fix an issue where a standalone backend can PANIC during recovery.

Andres Freund sent in another revision of a patch to avoid the use of a separate spinlock to protect LWLock's wait queue.

Peter Eisentraut sent in a patch to add an optional SSL indicator to the psql prompt.

Abhijit Menon-Sen sent in another revision of a patch to work around some issues in extension dependencies.

Stephen Frost sent in a patch to add new catalog called pg_init_privs, change pg_dump to use a bitmap to represent what to include, split "dump" into "dump" and "dump_contains", include pg_catalog and extension ACLs, if changed, and use the GRANT system to manage access to sensitive functions.

Kevin Grittner sent in another revision of a patch to implement "snapshot too old," configured by time.

David Rowley sent in two more revisions of a patch to improve performance for joins where outer side is unique.

Noah Misch sent in another revision of a patch to refer to a TOKEN_USER as "token user"

Fabien COELHO sent in a patch to patch which adds a bunch of operators (bitwise: & | ^ ~, comparisons: =/== <>/!= < <= > >=, logical: and/&& or/|| xor/^^ not/!) and functions (exp ln if) to pgbench.

Tom Lane sent in a patch to make a better MCV cutoff.

par N Bougain le lundi 4 avril 2016 à 21h50

dimanche 13 mars 2016

Guillaume Lelarge

Fin de la traduction du manuel de la 9.5

Beaucoup de retard pour cette fois, mais malgré tout, on a fini la traduction du manuel 9.5 de PostgreSQL. Évidemment, tous les manuels ont aussi été mis à jour avec les dernières versions mineures.

N'hésitez pas à me remonter tout problème sur la traduction.

De même, j'ai pratiquement terminé la traduction des applications. Elle devrait être disponible pour la version 9.5.2 (pas de date encore connue).

par Guillaume Lelarge le dimanche 13 mars 2016 à 10h41

vendredi 11 mars 2016

Nicolas Gollet

PostgreSQL Studio, un GUI web pour PostgreSQL

PostgreSQL Studio est une interface graphique orientée web pour PostgreSQL compatible avec la dernière version de PostgreSQL (la 9.5 avec par exemple la prise en charge des RLS). Ce produit est plus fait pour les Dev que pour les DBA/Admin.

Pour la liste des fonctionnalités, rendez-vous sur le site officiel

pgstudio s'installe facilement dans un conteneur web de servlets comme Tomcat, Jetty, JBoss...

Pour que vous puissiez vous même vous faire une idée, j'ai mis en place la dernière version 2.0 sur la plateforme "cloud" de RedHat dans un conteneur Jboss/Java7 :

Le produit n'est pas encore à mon avis 100% mature, mais il est en bonne voie...

Bon test :)

par Nicolas GOLLET le vendredi 11 mars 2016 à 17h53

mardi 8 mars 2016

Sébastien Lardière

Dates à retenir

Trois dates à retenir autour de PostgreSQL : 17 mars, à Nantes, un premier meetup, dans lequel j'évoquerai les nouveautés de PostgreSQL 9.5. 31 mars, à Paris, où j'essayerai de remonter le fil du temps de bases de données. 31 mai, à Lille, où je plongerai dans les structures du stockage de PostgreSQL. Ces trois dates sont l'occasion... Lire Dates à retenir

par Sébastien Lardière le mardi 8 mars 2016 à 08h30

samedi 6 février 2016

Guillaume Lelarge

Début de la traduction du manuel 9.5

J'ai enfin fini le merge du manuel de la version 9.5. Très peu de temps avant la 9.5, le peu de temps que j'avais étant consacré à mon livre. Mais là, c'est bon, on peut bosser. D'ailleurs, Flavie a déjà commencé et a traduit un paquet de nouveaux fichiers. Mais il reste du boulot. Pour les intéressés, c'est par là :

N'hésitez pas à m'envoyer toute question si vous êtes intéressé pour participer.

par Guillaume Lelarge le samedi 6 février 2016 à 12h18

jeudi 4 février 2016

Rodolphe Quiédeville

Indexer pour rechercher des chaines courtes dans PostgreSQL

Les champs de recherche dans les applications web permettent de se voir propooser des résultats à chaque caractère saisies dans le formulaire, pour éviter de trop solliciter les systèmes de stockage de données, les modules standards permettent de définir une limite basse, la recherche n'étant effective qu'à partir du troisième caractères entrés. Cette limite de 3 caractères s'explique par la possibilité de simplement définir des index trigram dans les bases de données, pour PostgreSQL cela se fait avec l'extension standard pg_trgm, (pour une étude détaillé des trigrams je conseille la lecture de cet article).

Si cette technique a apporté beaucoup de confort dans l'utilisation des formulaires de recherche elle pose néanmoins le problème lorsque que l'on doit rechercher une chaîne de deux caractères, innoportun, contre-productif me direz-vous (je partage assez cet avis) mais imaginons le cas de madame ou monsieur Ba qui sont présent dans la base de données et dont on a oublié de saisir le prénom ou qui n'ont pas de prénom, ils ne pourront jamais remonter dans ces formulaires de recherche, c'est assez fâcheux pour eux.

Nous allons voir dans cet article comment résoudre ce problème, commençons par créer une table avec 50000 lignes de données text aléatoire :

CREATE TABLE blog AS SELECT s, md5(random()::text) as d 
   FROM generate_series(1,50000) s;
~# SELECT * from blog LIMIT 4;
 s |                 d                
 1 | 8fa4044e22df3bb0672b4fe540dec997
 2 | 5be79f21e03e025f00dea9129dc96afa
 3 | 6b1ffca1425326bef7782865ad4a5c5e
 4 | 2bb3d7093dc0fffd5cebacd07581eef0
(4 rows)

Admettons que l'on soit un fan de musique des années 80 et que l'on recherche si il existe dans notre table du texte contenant la chaîne fff.

~# EXPLAIN ANALYZE SELECT * FROM blog WHERE d like '%fff%';

                                             QUERY PLAN                                              
 Seq Scan on blog  (cost=0.00..1042.00 rows=5 width=37) (actual time=0.473..24.130 rows=328 loops=1)
   Filter: (d ~~ '%fff%'::text)
   Rows Removed by Filter: 49672
 Planning time: 0.197 ms
 Execution time: 24.251 ms
(5 rows)

Sans index on s'en doute cela se traduit pas une lecture séquentielle de la table, ajoutons un index. Pour indexer cette colonne avec un index GIN nous allons utiliser l'opérateur gin_trgm_ops disponible dans l'extension pg_trgm.

~# CREATE INDEX blog_trgm_idx ON blog USING GIN(d gin_trgm_ops);
~# EXPLAIN ANALYZE SELECT * FROM blog WHERE d like '%fff%';
                                                       QUERY PLAN                                                        
 Bitmap Heap Scan on blog  (cost=16.04..34.46 rows=5 width=37) (actual time=0.321..1.336 rows=328 loops=1)
   Recheck Cond: (d ~~ '%fff%'::text)
   Heap Blocks: exact=222
   ->  Bitmap Index Scan on blog_trgm_idx  (cost=0.00..16.04 rows=5 width=0) (actual time=0.176..0.176 rows=328 loops=1)
         Index Cond: (d ~~ '%fff%'::text)
 Planning time: 0.218 ms
 Execution time: 1.451 ms

Cette fois l'index a pu être utilisé, on note au passage que le temps de requête est réduit d'un facteur 20, mais si l'on souhaite désormais rechercher une chaîne de seulement 2 caractères de nouveau une lecture séquentielle a lieu, notre index trigram devient inefficace pour cette nouvelle recherche.

~# EXPLAIN ANALYZE SELECT * FROM blog WHERE d like '%ff%';
                                               QUERY PLAN                                                
 Seq Scan on blog  (cost=0.00..1042.00 rows=3030 width=37) (actual time=0.016..11.712 rows=5401 loops=1)
   Filter: (d ~~ '%ff%'::text)
   Rows Removed by Filter: 44599
 Planning time: 0.165 ms
 Execution time: 11.968 ms

C'est ici que vont intervenir les index bigram, qui comme leur nom l'index travaille sur des couples et non plus des triplets. En premier nous allons tester pgroonga, packagé pour Debian, Ubuntu, CentOS et d'autres systèmes exotiques vous trouverez toutes les explications pour le mettre en place sur la page d'install du projet.

Les versions packagées de la version 1.0.0 ne supportent actuellement que les versions 9.3 et 9.4, mais les sources viennent d'être taguées 1.0.1 avec le support de la 9.5.


La création de l'index se fait ensuite en utilisant

~# CREATE INDEX blog_pgroonga_idx ON blog USING pgroonga(d);
~# EXPLAIN ANALYZE SELECT * FROM blog WHERE d like '%ff%';
                                                           QUERY PLAN                                                            
 Bitmap Heap Scan on blog  (cost=27.63..482.51 rows=3030 width=37) (actual time=3.721..5.874 rows=2378 loops=1)
   Recheck Cond: (d ~~ '%ff%'::text)
   Heap Blocks: exact=416
   ->  Bitmap Index Scan on blog_pgroonga_idx  (cost=0.00..26.88 rows=3030 width=0) (actual time=3.604..3.604 rows=2378 loops=1)
         Index Cond: (d ~~ '%ff%'::text)
 Planning time: 0.280 ms
 Execution time: 6.230 ms

On retrouve une utilisation de l'index, avec comme attendu un gain de performance.

Autre solution : pg_bigm qui est dédié plus précisément aux index bigram, l'installation se fait soit à partie de paquets RPM, soit directement depuis les sources avec une explication sur le site, claire et détaillée. pg_bigm supporte toutes les versions depuis la 9.1 jusqu'à 9.5.

~# CREATE INDEX blog_bigm_idx ON blog USING GIN(d gin_bigm_ops);
~# EXPLAIN ANALYZE SELECT * FROM blog WHERE d like '%ff%';
                                                         QUERY PLAN                                                          
 Bitmap Heap Scan on blog  (cost=35.48..490.36 rows=3030 width=37) (actual time=2.121..5.347 rows=5401 loops=1)
   Recheck Cond: (d ~~ '%ff%'::text)
   Heap Blocks: exact=417
   ->  Bitmap Index Scan on blog_bigm_idx  (cost=0.00..34.73 rows=3030 width=0) (actual time=1.975..1.975 rows=5401 loops=1)
         Index Cond: (d ~~ '%ff%'::text)
 Planning time: 4.406 ms
 Execution time: 6.052 ms

Sur une table de 500k tuples la création de l'index prend 6,5 secondes pour bigm contre 4,8 pour pgroonga ; en terme de lecture je n'ai pas trouvé de pattern avec de réelle différence, bien pgroonga s'annonce plus rapide que pg_bigm, ce premier étant plus récent que le second on peut s'attendre à ce qu'il ait profité de l'expérience du second.

Coté liberté les deux projets sont publiés sous licence PostgreSQL license.

La réelle différence entre les deux projets est que Pgroonga est une sous partie du projet global Groonga qui est dédié à la recherche fulltext, il existe par exemple Mgroonga dont vous devinerez aisément la cible, pg_bigm est lui un projet autonome qui n'implémente que les bigram dans PostgreSQL.

Vous avez désormais deux méthodes pour indexer des 2-gram, prenez garde toutefois de ne pas en abuser.

La version 9.4.5 de PostgreSQL a été utilisée pour la rédaction de cet article.

par Rodolphe Quiédeville le jeudi 4 février 2016 à 08h38

lundi 1 février 2016

Guillaume Lelarge

Parution de mon livre : "PostgreSQL, architecture et notions avancées"

Après pratiquement deux ans de travail, mon livre est enfin paru. Pour être franc, c'est assez étonnant de l'avoir entre les mains : un vrai livre, avec une vraie reliure et une vraie couverture, écrit par soi. C'est à la fois beaucoup de fierté et pas mal de questionnements sur la façon dont il va être reçu.

Ceci étant dit, sans savoir si le livre sera un succès en soi, c'est déjà pour moi un succès personnel. Le challenge était de pouvoir écrire un livre de 300 pages sur PostgreSQL, le livre que j'aurais aimé avoir entre les mains quand j'ai commencé à utiliser ce SGBD il y a maintenant plus de 15 ans sous l'impulsion de mon ancien patron.

Le résultat est à la hauteur de mes espérances et les premiers retours sont très positifs. Ce livre apporte beaucoup d'explications sur le fonctionnement et le comportement de PostgreSQL qui, de ce fait, n'est plus cette espèce de boîte noire à exécuter des requêtes. La critique rédigée par Jean-Michel Armand dans le GNU/Linux Magazine France numéro 190 est vraiment très intéressante. Je suis d'accord avec son auteur sur le fait que le début est assez ardu : on plonge directement dans la technique, sans trop montrer comment c'est utilisé derrière, en production. Cette partie-là n'est abordée qu'après. C'est une question que je m'étais posée lors de la rédaction, mais cette question est l'éternel problème de l'oeuf et de la poule ... Il faut commencer par quelque chose : soit on explique la base technique (ce qui est un peu rude), puis on finit par montrer l'application de cette base, soit on fait l'inverse. Il n'y a certainement pas une solution meilleure que l'autre. Le choix que j'avais fait me semble toujours le bon, même maintenant. Mais en effet, on peut avoir deux façons de lire le livre : en commençant par le début ou en allant directement dans les chapitres thématiques.

Je suis déjà prêt à reprendre le travail pour proposer une deuxième édition encore meilleure. Cette nouvelle édition pourrait se baser sur la prochaine version majeure de PostgreSQL, actuellement numérotée 9.6, qui comprend déjà des nouveautés très excitantes. Mais cette édition ne sera réellement intéressante qu'avec la prise en compte du retour des lecteurs de la première édition, pour corriger et améliorer ce qui doit l'être. N'hésitez donc pas à m'envoyer tout commentaire sur le livre, ce sera très apprécié.

par Guillaume Lelarge le lundi 1 février 2016 à 17h57

mercredi 13 janvier 2016

Sébastien Lardière

Version 9.5 de PostgreSQL - 3

Une nouvelle version majeure de PostgreSQL est disponible depuis le 7 janvier. Chacune des versions de PostgreSQL ajoute son lot de fonctionnalités, à la fois pour le développeur et l'administrateur. Cette version apporte de nombreuses fonctionnalités visant à améliorer les performances lors du requêtage de gros volumes de données.

Cette présentation en trois billets introduit trois types de fonctionnalités :

Ce dernier billet de la série liste quelques paramètres de configuration qui font leur apparition dans cette nouvelle version.Suivi de l'horodatage des COMMITLe paramètre track_commit_timestamp permet de marquer dans les journaux de transactions ("WAL") chaque validation ("COMMIT") avec la date et l'heure du serveur. Ce paramètre est un booléen, et... Lire Version 9.5 de PostgreSQL - 3

par Sébastien Lardière le mercredi 13 janvier 2016 à 15h12

Rodolphe Quiédeville

Index multi colonnes GIN, GIST

Ce billet intéressera tous les utilisateurs de colonnes de type hstore ou json avec PostgreSQL. Bien que celui-ci prenne pour exemple hstore il s'applique également aux colonnes json ou jsonb.

Commençons par créer une table et remplissons là avec 100 000 lignes de données aléatoires. Notre exemple représente des articles qui sont associés à un identifiant de langue (lang_id) et des tags catégorisés (tags), ici chaque article peut être associé à un pays qui sera la Turquie ou l'Islande.

~# CREATE TABLE article (id int4, lang_id int4, tags hstore);
~# INSERT INTO article 
SELECT generate_series(1,10e4::int4), cast(random()*20 as int),
CASE WHEN random() > 0.5 
THEN 'country=>Turquie'::hstore 
WHEN random() > 0.8 THEN 'country=>Islande' ELSE NULL END AS x;
INSERT 0 100000

Pour une recherche efficace des articles dans une langue donnée nous ajountons un index de type B-tree sur la colonne lang_id et un index de type GIN sur la colonne tags.

~# CREATE INDEX ON article(lang_id);
~# CREATE INDEX ON article USING GIN (tags);

Nous avons maintenant nos données et nos index, nous pouvons commencer les recherches. Recherchons tous les articles écrit en français (on considère que l'id du français est le 17), qui sont associés à un pays (ils ont un tag country), et analysons le plan d'exécution.

~# EXPLAIN ANALYZE SELECT * FROM article WHERE lang_id=17 AND tags ? 'country';
                                                                QUERY PLAN                                                                
 Bitmap Heap Scan on article  (cost=122.42..141.21 rows=5 width=35) (actual time=12.348..13.912 rows=3018 loops=1)
   Recheck Cond: ((tags ? 'country'::text) AND (lang_id = 17))
   Heap Blocks: exact=663
   ->  BitmapAnd  (cost=122.42..122.42 rows=5 width=0) (actual time=12.168..12.168 rows=0 loops=1)
         ->  Bitmap Index Scan on article_tags_idx  (cost=0.00..12.75 rows=100 width=0) (actual time=11.218..11.218 rows=60051 loops=1)
               Index Cond: (tags ? 'country'::text)
         ->  Bitmap Index Scan on article_lang_id_idx  (cost=0.00..109.42 rows=4950 width=0) (actual time=0.847..0.847 rows=5016 loops=1)
               Index Cond: (lang_id = 17)
 Planning time: 0.150 ms
 Execution time: 14.111 ms
(10 rows)

On a logiquement 2 parcours d'index, suivis d'une opération de combinaison pour obtenir le résultat final. Pour gagner un peu en performance on penserait naturellement à créer un index multi colonnes qui contienne lang_id et tags, mais si vous avez déjà essayé de le faire vous avez eu ce message d'erreur :

~# CREATE INDEX ON article USING GIN (lang_id, tags);
ERROR:  42704: data type integer has no default operator class for access method "gin"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.
LOCATION:  GetIndexOpClass, indexcmds.c:1246

Le HINT donnne une piste intéressante, en effet les index de type GIN ne peuvent pas s'appliquer sur les colonnes de type int4 (et bien d'autres).

La solution réside dans l'utilisation d'une extension standard, qui combine les opérateurs GIN et B-tree, btree-gin, précisons tout de suite qu'il existe l'équivalent btree-gist.

Comme toute extension elle s'installe aussi simplement que :

~# CREATE EXTENSION btree_gin;

Désormais nous allons pouvoir créer notre index multi-colonne et rejouer notre requête pour voir la différence.

~# CREATE INDEX ON article USING GIN (lang_id, tags);
~# EXPLAIN ANALYZE SELECT * FROM article WHERE lang_id=17 AND tags ? 'country';
                                                             QUERY PLAN                                                              
 Bitmap Heap Scan on article  (cost=24.05..42.84 rows=5 width=35) (actual time=1.983..3.777 rows=3018 loops=1)
   Recheck Cond: ((lang_id = 17) AND (tags ? 'country'::text))
   Heap Blocks: exact=663
   ->  Bitmap Index Scan on article_lang_id_tags_idx  (cost=0.00..24.05 rows=5 width=0) (actual time=1.875..1.875 rows=3018 loops=1)
         Index Cond: ((lang_id = 17) AND (tags ? 'country'::text))
 Planning time: 0.211 ms
 Execution time: 3.968 ms
(7 rows)

A la lecture de ce deuxième explain le gain est explicite, même avec un petit jeu de données le coût estimé est divisé par 3, l'on gagne une lecture d'index et une opération de composition. Maintenant nous pouvons supprimer les 2 autres index pour ne conserver que celui-ci.

par Rodolphe Quiédeville le mercredi 13 janvier 2016 à 14h11

mardi 12 janvier 2016

Sébastien Lardière

Version 9.5 de PostgreSQL - 2

Une nouvelle version majeure de PostgreSQL est disponible depuis le 7 janvier. Chacune des versions de PostgreSQL ajoute son lot de fonctionnalités, à la fois pour le développeur et l'administrateur. Cette version apporte de nombreuses fonctionnalités visant à améliorer les performances lors du requêtage de gros volumes de données.

Cette présentation en trois billets introduit trois types de fonctionnalités :

Ce deuxième billet évoque donc les nouvelles méthodes internes du moteur, c'est-à-dire de nouveaux outils dont PostgreSQL dispose pour traiter les données.Index BRINIl s'agit d'un nouveau type d'index, créé pour résoudre des problèmes d'accès à de très gros volumes de données ; le cas d'usage est une table de log, dans laquelle les données sont... Lire Version 9.5 de PostgreSQL - 2

par Sébastien Lardière le mardi 12 janvier 2016 à 10h45

vendredi 8 janvier 2016

Sébastien Lardière

Version 9.5 de PostgreSQL

Une nouvelle version majeure de PostgreSQL est disponible depuis le 7 janvier. Chacune des versions de PostgreSQL ajoute son lot de fonctionnalités, à la fois pour le développeur et l'administrateur. Cette version apporte de nombreuses fonctionnalités visant à améliorer les performances lors du requêtage de gros volumes de données.

Cette présentation en trois billets introduit trois types de fonctionnalités :

Ce premier billet revient donc sur les ajouts au langage SQL de la version 9.5 de PostgreSQL.Ces ajouts portent sur de nombreux champs de fonctionnalités, de la sécurité aux gestions de performances en passant par la gestion des transactions.UPSERTCe mot clé désigne en réalité la possibilité d'intercepter une erreur de clé primaire sur un ordre... Lire Version 9.5 de PostgreSQL

par Sébastien Lardière le vendredi 8 janvier 2016 à 10h49

mercredi 30 décembre 2015

Nicolas Thauvin

Réflexions sur l'archivage des fichiers WAL

PostgreSQL faisant son bonhomme de chemin, on se retrouve désormais avec des configurations où il faut archiver plusieurs fois les WAL parce qu'on a de la sauvegarde PITR et de la réplication.

Le plus gros piège lorsqu'on a du PITR et de la réplication, ou plusieurs serveurs standby, c'est d'oublier que chaque élément de l'architecture qui consomme du WAL doit avoir son propre répertoire de WAL archivés, car chacun purge les fichiers différemment.

On tombe dans le piège facilement, en se disant, "pas de problème pour la purge des vieux fichiers, c'est le PITR ou le slave le plus éloigné qui purgera". Si la purge du PITR passe alors que le standby était déconnecté du maitre, cette purge peut casser la réplication.

La solution est d'archiver plusieurs fois le même fichier WAL. Pour optimiser, le plus efficace à l'usage est d'utiliser des hardlinks. En gros, on archive une fois le fichier WAL et on crée autant de lien hard qu'il faut pour les autres consommateurs de WAL archivés. Rappelons, que la donnée n'est supprimée que lorsqu'il n'existe plus aucun lien et que plus aucun processus n'a le fichier ouvert, à ne pas confondre avec un lien symbolique.

Pour archiver vite, il vaut mieux éviter de compresser et stocker les archives soit en local, soit sur un partage NFS, l'archivage par SSH restant le plus lent. Tout est compromis entre chiffrage des communications sur le réseau et espace disque disponible, les liens hard restant rapides à créer et avec une consommation d'espace disque supplémentaire négligeable.

Enfin, PostgreSQL exécute la commande d'archivage avec l'appel system() qui fork un shell : toutes les possibilités du shell sont alors disponibles, par exemple :

archive_command = 'rsync -a %p slave1:/archived_xlog/slave1/%f && ssh slave1 "for h in slave2 pitr; do ln /archived_xlog/slave1/%f /archived_xlog/$h/%f; done"'

Oui, une boucle et une seule copie avec rsync par SSH pour 3 utilisations. On préfèrera surement faire un script pour rendre les choses plus lisibles. Ça marche aussi pour restore_command et archive_cleanup_command dans recovery.conf :

restore_command = 'cp /archived_xlog/$(hostname)/%f %p'
archive_cleanup_command = 'pg_archivecleanup /archived_xlog/$(hostname) %r'

mercredi 30 décembre 2015 à 21h01

jeudi 19 novembre 2015

Guillaume Lelarge

Version finale du livre

Elle n'est pas encore sortie. Elle est pratiquement terminée, on attend d'avoir le livre en version imprimée.

Néanmoins, je peux déjà dire les nouveautés par rapport à la beta 0.4 :

  • Global
    • mise à jour du texte pour la 9.5
    • ajout du chapitre sur la sécurité
    • ajout du chapitre sur la planification
    • mise à jour des exemples avec PostgreSQL 9.5 beta 1
  • Fichiers
    • Ajout d'un schéma sur les relations entre tables, FSM et VM
    • Ajout de la description des répertoires pg_dynshmem et pg_logical
  • Contenu des fichiers
    • Ajout d'informations sur le stockage des données, colonne par colonne
    • Ajout d'un schéma sur la structure logique et physique d'un index B-tree
    • Ajout de la description des index GIN
    • Ajout de la description des index GiST
    • Ajout de la description des index SP-GiST
  • Architecture mémoire
    • calcul du work_mem pour un tri
    • calcul du maintenance_work_mem pour un VACUUM
  • Gestion des transactions
    • Gestion des verrous et des accès concurrents
  • Maintenance
    • Description de la sortie d'un VACUUM VERBOSE

J'avoue que j'ai hâte d'avoir la version finale entre mes mains :-) Bah, oui, c'est quand même 1 an et demi de boulot acharné !

par Guillaume Lelarge le jeudi 19 novembre 2015 à 22h36

mercredi 21 octobre 2015

Nicolas Thauvin

pitrery 1.10

Un semaine après la sortie de la version 1.9, un collègue découvre un bug dans le strict de restore des fichiers WAL, restore_xlog. Lorsqu'on utilise SSH pour stocker les archives des fichiers WAL, le script ne tient pas compte des paramètres spécifiant l'utilisateur et la machine distante en provenance du fichier de configuration.

Ce qui fait que la restore ne fonctionne pas lorsqu'on démarre PostgreSQL à moins de modifier l'appel à restore_xlog dans le paramètre restore_command du fichier recovery.conf pour y indiquer ces deux informations, car en ligne de commande ils sont bien pris en compte.

On peut aussi ajouter dans le fichier de configuration :

RESTORE_COMMAND="/path/to/restore_xlog -C <config_file> -h <archive_host> -u <archive_user> %f %p"

La version 1.10 corrige ce problème, voir le site de pitrery.

mercredi 21 octobre 2015 à 16h42

mardi 22 septembre 2015

Guillaume Lelarge

Version beta 0.4 du livre

La dernière beta datait de mi-mai. Beaucoup de choses se sont passées pendant les 4 mois qui ont suivi. Quatre nouveaux chapitres sont mis à disposition :

  • Sauvegarde
  • Réplication
  • Statistiques
  • Maintenance

Mais ce n'est évidemment pas tout. Dans les nouveautés importantes, notons :

  • Chapitres Fichiers, Processus et Mémoire
    • Ajout des schémas disques/processus/mémoire
  • Chapitre Contenu physique des fichiers
    • Déplacement des informations sur le contenu des journaux de transactions dans ce chapitre
    • Ajout de la description du contenu d'un index B-tree
    • Ajout de la description du contenu d'un index Hash
    • Ajout de la description du contenu d'un index BRIN
    • Restructuration du chapitre dans son ensemble
  • Chapitre Architecture des processus
    • Ajout de sous-sections dans la description des processus postmaster et startup
    • Ajout d'un exemple sur la mort inattendue d'un processus du serveur PostgreSQL
  • Chapitre Architecture mémoire
    • Ajout de plus de détails sur la mémoire cache (shared_buffers)
  • Chapitre Gestion des transactions
    • Ajout d'informations sur le CLOG, le FrozenXid et les Hint Bits
  • Chapitre Gestion des objets
    • Ajout d'une section sur les options spécifiques des vues et fonctions pour la sécurité
    • Ajout d'un paragraphe sur le pseudo-type serial
  • Divers
    • Mise à jour des exemples avec PostgreSQL 9.4.4

Bref, c'est par ici.

Quant à la prochaine version ? cela devrait être la version finale. Elle comportera le chapitre Sécurité (déjà écrit, en cours de relecture) et le chapitre sur le planificateur de requêtes (en cours d'écriture). Elle devrait aussi disposer d'une mise à jour complète concernant la version 9.5 (dont la beta devrait sortir début octobre).

Bonne lecture et toujours intéressé pour savoir ce que vous en pensez (via la forum mis en place par l'éditrice ou via mon adresse email).

par Guillaume Lelarge le mardi 22 septembre 2015 à 21h59