在Postgres中进行批量插入的最快方法是什么?

我需要通过编程的方式将数千万条记录插入Postgres数据库。目前，我在一个查询中执行了数千条插入语句。

有没有更好的方法来做到这一点，一些我不知道的批量插入语句?

当前回答

下面的查询可以创建带有generate_series列的测试表，该列有10000行。*我通常创建这样的测试表来测试查询性能，你可以检查generate_series():

CREATE TABLE test AS SELECT generate_series(1, 10000);

postgres=# SELECT count(*) FROM test;
 count
-------
 10000
(1 row)

postgres=# SELECT * FROM test;
 generate_series
-----------------
               1
               2
               3
               4
               5
               6
-- More --

并且，如果你已经有了测试表，运行下面的查询来插入10000行:

INSERT INTO test (generate_series) SELECT generate_series(1, 10000);

2022-12-17 11:18:32

其他回答

你可以使用COPY表TO…使用二进制，它“比文本和CSV格式略快”。只有当您有数百万行要插入，并且您对二进制数据感到满意时才这样做。

下面是一个使用psycopg2和二进制输入的Python食谱示例。

2011-11-17 09:33:08

使用COPY还有一种替代方法，即Postgres支持的多行值语法。从文档中可以看到:

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

上面的代码插入了两行，但是您可以任意扩展它，直到达到预处理语句令牌的最大数量(可能是999美元，但我不能100%确定)。有时不能使用COPY，对于这些情况，这是一个有价值的替代品。

2015-01-27 10:18:29

正如其他人所注意到的，在将数据导入Postgres时，会因为Postgres为您设计的检查而减慢速度。此外，您经常需要以某种方式操作数据，以使其适合使用。任何可以在Postgres进程之外完成的操作都意味着您可以使用COPY协议进行导入。

For my use I regularly import data from the httparchive.org project using pgloader. As the source files are created by MySQL you need to be able to handle some MySQL oddities such as the use of \N for an empty value and along with encoding problems. The files are also so large that, at least on my machine, using FDW runs out of memory. pgloader makes it easy to create a pipeline that lets you select the fields you want, cast to the relevant data types and any additional work before it goes into your main database so that index updates, etc. are minimal.

2022-01-05 12:14:22

May be I'm late already. But, there is a Java library called pgbulkinsert by Bytefish. Me and my team were able to bulk insert 1 Million records in 15 seconds. Of course, there were some other operations that we performed like, reading 1M+ records from a file sitting on Minio, do couple of processing on the top of 1M+ records, filter down records if duplicates, and then finally insert 1M records into the Postgres Database. And all these processes were completed within 15 seconds. I don't remember exactly how much time it took to do the DB operation, but I think it was around less then 5 seconds. Find more details from https://www.bytefish.de/blog/pgbulkinsert_bulkprocessor.html

2021-07-29 17:23:02

PostgreSQL有一个关于如何最好地初始填充数据库的指南，他们建议使用COPY命令批量加载行。该指南还提供了其他一些关于如何加快处理速度的好技巧，比如在加载数据之前删除索引和外键(然后再将它们添加回来)。

2009-04-17 03:57:23

在Postgres中进行批量插入的最快方法是什么?

推荐文章

最新文章

标签