Streaming a csv from postgresql

If you want to build an endpoint to download a csv, that could contain a large number of rows; you want to use streams, so you don’t need to hold all the data in memory before writing it.

If you are already using the pg client, it has a nifty add-on for this purpose:

const { Client } = require('pg');
const QueryStream = require('pg-query-stream');
const csvWriter = require("csv-write-stream");

module.exports = function(connectionString) {
    this.handle = function(req, res) {
        var sql = "SELECT...";
        var args = [...];

        const client = new Client({connectionString});
        client.connect().then(() => {
            var stream = new QueryStream(sql, args);
            stream.on('end', () => {
                client.end();
            });
            var query = client.query(stream);

            var writer = csvWriter();
            res.contentType("text/csv");
            writer.pipe(res);

            query.pipe(writer);
        });
    };
};

If you need to transform the data, you can add another step:

...

const transform = require('stream-transform');

            ...

            var query = client.query(stream);

            var transformer = transform(r => ({
                "User ID": r.user_id,
                "Created": r.created.toISOString(),
                ...
            }));

            ...

            query.pipe(transformer).pipe(writer);

Grokking postgresql logs with logstash

Logstash provides a grok pattern for postgresql logs. Unfortunately, it doesn’t seem to be compatible with our postgres version (9.4), and our messages were all tagged with “_grokparsefailure”.

Using the fantastic grok debugger, I was able to produce something that worked:

%{DATESTAMP:timestamp} %{TZ} %{DATA:user_id} %{GREEDYDATA:connection_id} %{DATA:level}:  %{GREEDYDATA:msg}

I’ve created an issue here, to track it.

Connecting to postgres using domain sockets and libpq

We recently started using Goose to run our migrations, and I wanted to connect to the Postgres instance on a local machine using a domain socket. The documentation states:

host
Name of host to connect to. If this begins with a slash, it specifies Unix-domain communication rather than TCP/IP communication; the value is the name of the directory in which the socket file is stored. The default behavior when host is not specified is to connect to a Unix-domain socket in /tmp (or whatever socket directory was specified when PostgreSQL was built). On machines without Unix-domain sockets, the default is to connect to localhost.

So I assumed that specifying a connection string without a host would “just work”:

db, err := sql.Open("postgres", "dbname=foo")

This doesn’t return an error, but any attempt to use the connection resulted in a “bad connection” failure. For some reason I needed to specify the path to the socket (which wasn’t in /tmp on the Debian instance I was using!):

db, err := sql.Open("postgres", "host=/run/postgresql dbname=foo sslmode=disable")