couch-to-postgres, 将CouchDB转换为PostgreSQL的node libary

分享于 

34分钟阅读

GitHub

  繁體 雙語
Node libary to stream CouchDB changes into PostgreSQL
  • 源代码名称:couch-to-postgres
  • 源代码网址:http://www.github.com/sysadminmike/couch-to-postgres
  • couch-to-postgres源代码文档
  • couch-to-postgres源代码下载
  • Git URL:
    git://www.github.com/sysadminmike/couch-to-postgres.git
    Git Clone代码到本地:
    git clone http://www.github.com/sysadminmike/couch-to-postgres
    Subversion代码到本地:
    $ svn co --depth empty http://www.github.com/sysadminmike/couch-to-postgres
    Checked out revision 1.
    $ cd repo
    $ svn up trunk
    
    couch-to-postgres/pgcouch/couchpg/couchgres/postcouch

    使用简单客户端示例,node libary将CouchDB更改流到 PostgreSQL。 基于 https://github.com/orchestrate-io/orchestrate-couchdb

    通过添加几个额外的位,不仅允许在数据上选择查询,还可以在couchdb文档中选择 UPDATE/INSERT/DELETE。 也可以把你的沙发视图当作桌子。

    基本上它允许postgres使用couchdb作为它的数据存储,类似于外部数据包装( 比如 couchdb_fdw ),但它在postgres中有一个近实时的记录副本。

    例如:

    在沙发上添加医生

    
     curl -X PUT http://192.168.3.21:5984/example/1234567 -d '{"myvar":"foo"}'
    
    
     {"ok":true,"id":"1234567","rev":"1-d3747a58baa817834a21ceeaf3084c41"} 
    
    
    
    

    在postgres中查看:

    
     postgresdb=> SELECT id, doc FROM example WHERE id='1234567';
    
    
    
     id | doc 
    
    
     ---------+----------------------------------------------------------------------------------
    
    
     1234567 | {"_id":"1234567","_rev":"1-d3747a58baa817834a21ceeaf3084c41","myvar":"foo"}
    
    
     (1 row)
    
    
    
    

    使用postgres更新文档:

    
     postgresdb=> UPDATE example 
    
    
     postgresdb-> SET doc=json_object_set_key(doc::json, 'myvar'::text, 'bar'::text)::jsonb, from_pg=true 
    
    
     postgresdb-> WHERE id='1234567';
    
    
     DEBUG: pgsql-http: queried http://192.168.3.21:5984/example/1234567
    
    
     CONTEXT: SQL statement"SELECT headers FROM http_post('http://192.168.3.21:5984/' || TG_TABLE_NAME || '/' || NEW.id::text, '', NEW.doc::text, 'application/json'::text)"
    
    
     PL/pgSQL function couchdb_put() line 9 at SQL statement
    
    
     UPDATE 0
    
    
    
    

    couchdb_put函数需要更多工作。

    在沙发上看看

    
     curl -X GET http://192.168.3.21:5984/example/1234567 
    
    
     {"_id":"1234567","_rev":"2-b9f4c54fc36bdeb78c31590920c9751b","myvar":"bar"}
    
    
    
    

    在postgres中:

    
     postgresdb=> SELECT id, doc FROM example WHERE id='1234567';
    
    
     id | doc 
    
    
     ---------+----------------------------------------------------------------------------------
    
    
     1234567 | {"_id":"1234567","_rev":"2-b9f4c54fc36bdeb78c31590920c9751b","myvar":"bar"}
    
    
     (1 row)
    
    
    
    

    使用postgres添加文档

    
     postgresdb=> INSERT INTO example (id, doc, from_pg) VALUES ('7654321', json_object('{_id,myvar}','{7654321, 100}')::jsonb, true);
    
    
     DEBUG: pgsql-http: queried http://192.168.3.21:5984/example/7654321
    
    
     CONTEXT: SQL statement"SELECT headers FROM http_post('http://192.168.3.21:5984/' || TG_TABLE_NAME || '/' || NEW.id::text, '', NEW.doc::text, 'application/json'::text)"
    
    
     PL/pgSQL function couchdb_put() line 9 at SQL statement
    
    
     INSERT 0 0
    
    
    
    

    在沙发上看看

    
     curl -X GET http://192.168.3.21:5984/example/7654321 
    
    
     {"_id":"7654321","_rev":"1-08343cb32bb0903348c0903e574cfbd0","myvar":"100"}
    
    
    
    

    更新文档使用沙发创建 postgres

    
     curl -X PUT http://192.168.3.21:5984/example/7654321 -d '{"_id":"7654321","_rev":"1-08343cb32bb0903348c0903e574cfbd0","myvar":"50"}'
    
    
     {"ok":true,"id":"7654321","rev":"2-5057c4942c6b92f8a9e2c3e5a75fd0b9"
    
    
    
    

    在postgres中查看

    
     SELECT id, doc FROM example WHERE id='1234567';
    
    
     id | doc 
    
    
     ---------+----------------------------------------------------------------------------------
    
    
     1234567 | {"_id":"1234567","_rev":"2-b9f4c54fc36bdeb78c31590920c9751b","myvar":"bar"}
    
    
     (1 row)
    
    
    
    

    添加更多文档

    
     INSERT INTO example (id, doc, from_pg) VALUES ('test1', json_object('{_id,myvar}','{test1, 100}')::jsonb, true);
    
    
     INSERT INTO example (id, doc, from_pg) VALUES ('test2', json_object('{_id,myvar}','{test2, 50}')::jsonb, true);
    
    
    
    

    或者

    
     curl -X PUT http://192.168.3.21:5984/example/test3 -d '{"_id":"test3","myvar":"100"}'
    
    
     curl -X PUT http://192.168.3.21:5984/example/test4 -d '{"_id":"test4","myvar":"50"}'
    
    
     curl -X PUT http://192.168.3.21:5984/example/test5 -d '{"_id":"test5","myvar":"70"}'
    
    
     curl -X PUT http://192.168.3.21:5984/example/test6 -d '{"_id":"test6","myvar":"20"}'
    
    
     curl -X PUT http://192.168.3.21:5984/example/test7 -d '{"_id":"test7","myvar":"10"}'
    
    
    
    

    对文档进行查询

    
     SELECT id, doc->'myvar' AS myvar FROM example 
    
    
     WHERE id LIKE 'test%' AND CAST(doc->>'myvar' AS numeric)> 50
    
    
     ORDER BY myvar
    
    
    
     id | myvar 
    
    
     -------+-------
    
    
     test3 |"100"
    
    
     test1 |"100"
    
    
     test5 |"70"
    
    
     (3 rows)
    
    
    
    

    更新一些文档

    
     UPDATE example 
    
    
     SET doc=json_object_set_key(
    
    
     doc::json, 'myvar'::text, (CAST(doc->>'myvar'::text AS numeric) + 50)::text
    
    
     )::jsonb,
    
    
     from_pg=true 
    
    
     WHERE id LIKE 'test%' AND CAST(doc->>'myvar' AS numeric) <60
    
    
    
    

    Peform相同的查询

    
    SELECT id, doc->'myvar' AS myvar FROM example 
    
    
    WHERE id LIKE 'test%' AND CAST(doc->>'myvar' AS numeric)> 50
    
    
    ORDER BY myvar
    
    
    
     id | myvar 
    
    
     -------+-------
    
    
     test4 |"100"
    
    
     test2 |"100"
    
    
     test3 |"100"
    
    
     test1 |"100"
    
    
     test7 |"60"
    
    
     test5 |"70"
    
    
     test6 |"70"
    
    
     (7 rows)
    
    
    
    

    最初我没有发现上面的顺序是错误的所以你需要小心。

    
     SELECT id, CAST(doc->>'myvar' AS numeric) as myvar FROM example 
    
    
     WHERE id LIKE 'test%' AND CAST(doc->>'myvar' AS numeric)> 50
    
    
     ORDER BY myvar, doc->>'_id'
    
    
    
     id | myvar 
    
    
     -------+-------
    
    
     test7 |"60"
    
    
     test5 |"70"
    
    
     test6 |"70"
    
    
     test1 |"100"
    
    
     test2 |"100"
    
    
     test3 |"100"
    
    
     test4 |"100"
    
    
     (7 rows)
    
    
    
    

    订单现在是正确的。

    最后在沙发上

    
     curl -s -X POST '192.168.3.21:5984/example/_temp_view?include_docs=false' -H 'Content-Type: application/json' 
    
    
     -d '{"map":"function(doc) { emit(doc._id, doc.myvar) };"}' 
    
    
     {"total_rows":7,"offset":0,"rows":[
    
    
     {"id":"test1","key":"test1","value":"100"},
    
    
     {"id":"test2","key":"test2","value":"100"},
    
    
     {"id":"test3","key":"test3","value":"100"},
    
    
     {"id":"test4","key":"test4","value":"100"},
    
    
     {"id":"test5","key":"test5","value":"70"},
    
    
     {"id":"test6","key":"test6","value":"70"},
    
    
     {"id":"test7","key":"test7","value":"60"}
    
    
     ]}
    
    
    
    

    还可以将couchdb视图用作表:

    沙发设计医生:

    
    {
    
    
    "_id":"_design/mw_views",
    
    
    "language":"javascript",
    
    
    "views": {
    
    
    "by_feedName": {
    
    
    "map":"function(doc) { emit(doc.feedName,null); }",
    
    
    "reduce":"_count"
    
    
     },
    
    
    "by_tags": {
    
    
    "map":"function(doc) { for(var i in doc.tags) { emit (doc.tags[i],null); } }",
    
    
    "reduce":"_count"
    
    
     }
    
    
     }
    
    
    }
    
    
    
    WITH by_feedname_reduced AS (
    
    
     SELECT * FROM json_to_recordset(
    
    
     (
    
    
     SELECT (content::json->>'rows')::json 
    
    
     FROM http_get('http://192.168.3.23:5984/articles/_design/mw_views/_view/by_feedName?group=true'))
    
    
     ) AS x (key text, value text)
    
    
    )
    
    
    
    SELECT * FROM by_feedname_reduced WHERE value::numeric> 6000 ORDER BY key 
    
    
    
    

    这需要一秒钟运行,但是视图的初始构建需要大约 20分钟的couchdb的新副本。

    使用postgres中的数据的equivilent查询

    
    WITH tbl AS (
    
    
     SELECT DISTINCT doc->>'feedName' as key, COUNT(doc->>'feedName') AS value 
    
    
     FROM articles
    
    
     GROUP BY doc->>'feedName'
    
    
    )
    
    
    SELECT key, value FROM tbl WHERE value> 6000 ORDER BY key;
    
    
    
    

    这需要超过 4秒。

    从birdreader的文章数据库测试- https://github.com/glynnbird/birdreader

    
    curl -X GET http://localhost:5984/articles
    
    
    {"db_name":"articles","doc_count":63759,"doc_del_count":2,"update_seq":92467,"purge_seq":0,"compact_running":false,"disk_size":151752824,"data_size":121586165,"instance_start_time":"1418686121041424","disk_format_version":6,"committed_update_seq":92467}
    
    
    
    SELECT DISTINCT jsonb_object_keys(doc) AS myfields
    
    
    FROM articles ORDER BY myfields
    
    
    
    

    这将查询所有文档并检索沙发文档字段。

    在另一个包含'类型'字段的沙发数据库中,存储在同一数据库中的不同文档类型- 大约 70k 个文档。

    
    SELECT DISTINCT doc->>'type' as doctype, count(doc->>'type')
    
    
    FROM mytable GROUP BY doctype ORDER BY doctype 
    
    
    
    

    在一秒钟内。

    
    SELECT DISTINCT doc->>'type' as doctype, jsonb_object_keys(doc) AS myfields
    
    
    FROM mytable
    
    
    ORDER BY doctype, myfields;
    
    
    
    

    如果没有索引,上查询只需要 10秒。 我没有对默认的FreeBSD postgresql94-server-9.4.r1 端口进行索引或者调整。

    示例设置和postgres配置

    
    git clone git@github.com:sysadminmike/couch-to-postgres.git
    
    
    
    

    获取需要的模块:

     
    npm install 
    
    
    
     

    编辑./bin/index.js 以设置你的设置:

    
    var settings =
    
    
     {
    
    
     couchdb: {
    
    
     url: 'http://192.168.3.21:5984',
    
    
     pgtable: 'example',
    
    
     database: 'example'
    
    
     }
    
    
     };
    
    
    
     pgclient = new pg.Client("postgres://mike@localhost/pgdatabase");
    
    
    
    

    在启动since_checkpoints表之前先启动它

    
    CREATE TABLE since_checkpoints
    
    
    (
    
    
     pgtable text NOT NULL,
    
    
     since numeric DEFAULT 0,
    
    
     enabled boolean DEFAULT false, --not used in the simple client example
    
    
     CONSTRAINT since_checkpoint_pkey PRIMARY KEY (pgtable)
    
    
    )
    
    
    
    

    这个表用于存储与与 couchdb _replicator数据库类似的数据库的检查点的检查点。

    创建用于存储沙发文档的表格:

    
    CREATE TABLE example
    
    
    (
    
    
     id text NOT NULL,
    
    
     doc jsonb,
    
    
     CONSTRAINT example_pkey PRIMARY KEY (id)
    
    
    )
    
    
    
    

    开始观看更改

     
    ./bin/index.js
    
    
    
     

    它将向since_checkpoints表添加一个记录并开始同步。

    此时,你可以执行SELECT中的docs查询queries如上面的示例所示。 这应该适用于生产 couchdb,因为它没有更改和执行与弹性搜索河插件相同的任务。 有了一些拷贝/粘贴,就可以使用sql创建简单的脚本,或者使用curl在 shell 中运行一个 liners。

    还可以查看一下/bin/daemon.js 和 https://github.com/sysadminmike/couch-to-postgres/blob/master/daemon-README.md。

    要处理 UPDATE/INSERT/DELETE,需要更多的配置。 注意这仍然是实验性的,所以我不会在任何生产数据中指出这点。

    首先在 https://github.com/pramsey/pgsql-http 服务器上安装postgres扩展查询 http。

    如果不知道如何安装postgres扩展- 请注意,这是最新版本的更新,请稍后再尝试更新。http_post DELETE我还没有测试任何关于的内容,但需要更新的新版本 !

    
    http_post(uri VARCHAR, content VARCHAR, content_type VARCHAR)
    
    
    
    

    但在这一页上我使用的是旧

    
    http_post(url VARCHAR, params VARCHAR, data VARCHAR, contenttype VARCHAR DEFAULT NULL)
    
    
    
    

    所以请记住,如果设置这个。

    然后将它的添加到要使用的数据库中:

    
    CREATE EXTENSION http
    
    
    
    

    如果你还没有完成:

    
    CREATE TABLE since_checkpoints 
    
    
    (
    
    
     pgtable text NOT NULL,
    
    
     since numeric DEFAULT 0,
    
    
     enabled boolean DEFAULT false,
    
    
     CONSTRAINT since_checkpoint_pkey PRIMARY KEY (pgtable)
    
    
    );
    
    
    
    

    添加函数将数据放入 couchdb:

    
    CREATE OR REPLACE FUNCTION couchdb_put() RETURNS trigger AS $BODY$
    
    
    DECLARE
    
    
     RES RECORD;
    
    
    BEGIN
    
    
     IF (NEW.from_pg) IS NULL THEN
    
    
     RETURN NEW;
    
    
     ELSE 
    
    
    
     SELECT status FROM http_post('http://192.168.3.21:5984/' || TG_TABLE_NAME || '/' || NEW.id::text, '', NEW.doc::text, 'application/json'::text) INTO RES; 
    
    
    
     --Need to check RES for response code
    
    
     --RAISE EXCEPTION 'Result: %', RES;
    
    
     RETURN null;
    
    
     END IF;
    
    
    END;
    
    
    $BODY$
    
    
    LANGUAGE plpgsql VOLATILE 
    
    
    
    

    添加函数来修改 PostgreSQL JSON数据类型中的字段- 来自: http://stackoverflow.com/questions/18209625/how-do-i-modify-fields-inside-the-new-postgresql-json-datatype

    
    CREATE OR REPLACE FUNCTION json_object_set_key(json json, key_to_set text, value_to_set anyelement)
    
    
     RETURNS json AS
    
    
    $BODY$
    
    
    SELECT COALESCE(
    
    
     (SELECT ('{' || string_agg(to_json("key") || ':' ||"value", ',') || '}')
    
    
     FROM (SELECT *
    
    
     FROM json_each("json")
    
    
     WHERE"key" <>"key_to_set"
    
    
     UNION ALL
    
    
     SELECT"key_to_set", to_json("value_to_set")) AS"fields"),
    
    
     '{}'
    
    
    )::json
    
    
    $BODY$
    
    
     LANGUAGE sql IMMUTABLE STRICT;
    
    
    
    

    创建保存文档的表

    
    CREATE TABLE example
    
    
    (
    
    
     id text NOT NULL,
    
    
     doc jsonb,
    
    
     from_pg boolean, -- for trigger nothing stored here
    
    
     CONSTRAINT example_pkey PRIMARY KEY (id)
    
    
    );
    
    
    
    

    创建触发器以停止插入到表中的数据,并将它的发送到沙发上

    
    CREATE TRIGGER add_doc_to_couch 
    
    
    BEFORE INSERT OR UPDATE 
    
    
    ON example FOR EACH ROW EXECUTE PROCEDURE couchdb_put();
    
    
    
    

    注:postgres中的所有查询必须有"from_pg=true"用于插入和更新,否则postgres将把数据发送到表,而不是将数据发送到。

    我打算反转这个逻辑,让libary包含这个,这样就可以发布插入/更新并排除这个字段。

    你现在可以启动 node 客户机并给它一个测试。

    ./lib/index.js 中需要调整的几个变量需要移动到配置选项

    在checkpoint_changes函数中:

    
    ckwait = 3 * 1000; 
    
    
    
    

    流处于活动状态时,流的默认值为 checkpointed。 我将根据你的couchdb的繁忙程度调整这一点。 当流空闲时,将增加到 10秒。

    在startFollowing函数中,有: //非活动定时器是在更改或者//初始连接与第一次更改之间的时间。 所以这里是 stream.inactivity_ms = 30000 ;

    可以能使用通知,并且 node 客户端在第一次调用 couchdb_put() 时听到消息( 可以在ccr中进行定时器)? 或者 node 会得到关于每个更新的通知,并且只需要在空闲时间后唤醒。

    php转储脚本相比,性能更明智

    通过使用大约 150Mb 个文档的couchdb测试,在大约2 分钟内将所有文档添加到空表中,然后将所有文档保存到一个空表中,然后保持同步。

    对于初始同步和重新同步,couch-to-postgres-php-dumper脚本 https://github.com/sysadminmike/couch-to-postgres-php-dump 大约需要 28分钟。

    在同一台机器上,将同一个数据库从一个运行沙发的监狱复制到另一个机器上,将花费 8分钟的时间:

    
    {"session_id":"661411f2137c64efc940f55b802dc35b","start_time":"Tue, 16 Dec 2014 17:00:05 GMT","end_time":"Tue, 16 Dec 2014 17:08:10 GMT","start_last_seq":0,"end_last_seq":92862,"recorded_seq":92862,"missing_checked":63840,"missing_found":63840,"docs_read":63840,"docs_written":63840,"doc_write_failures":0}
    
    
    
    

    Looking top postgres在多数情况下postgres等待磁盘waiting而不是cpu绑定进程- - 单个php进程调用curl的过程 hitting curl,或者一次处理多个数据库,或者一次处理多个数据库。

    关于 dogey postgres conf设置的进一步测试:

    
    fsync = off
    
    
    synchronous_commit = off
    
    
    
    

    如果当前数据模块考虑到全重建的数据模块在 2分钟之下,则:不是主数据存储。

    
    mike:~/postgres-couch/couch-to-postgres-test % time./bin/index.js
    
    
    articlespg: {"db_name":"articlespg","doc_count":63838,"doc_del_count":2,"update_seq":63840,"purge_seq":0,"compact_running":false,"disk_size":242487416,"data_size":205414817,"instance_start_time":"1418749205916149","disk_format_version":6,"committed_update_seq":63840}
    
    
    Connected to postgres
    
    
    articlespg: initial since=0
    
    
    articlespg: Starting checkpointer
    
    
    articlespg: Checkpoint set to 7180 next check in 3 seconds
    
    
    articlespg: Checkpoint set to 9344 next check in 3 seconds
    
    
    articlespg: Checkpoint set to 11536 next check in 3 seconds
    
    
    ...
    
    
    articlespg: Checkpoint set to 60920 next check in 3 seconds
    
    
    articlespg: Checkpoint set to 63636 next check in 3 seconds
    
    
    articlespg: Checkpoint set to 63840 next check in 3 seconds
    
    
    articlespg: Checkpoint 63840 is current next check in: 10 seconds
    
    
    ^C45.919u 3.226s 1:42.10 48.1% 10864+321k 158+0io 0pf+0w
    
    
    mike:~/postgres-couch/couch-to-postgres-test % 
    
    
    
    

    所以在 2分钟之下,同样的测试数据库的初始同步速度比原来的沙发同步要快 4倍。 我认为这比弹性搜索河流做类似的任务要快。

    同步时顶部的代码段:

    
     PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
    
    
    57635 mike 6 45 0 621M 66064K uwait 1 0:25 50.78% node
    
    
    57636 70 1 36 0 186M 97816K sbwait 1 0:11 22.75% postgres
    
    
    44831 919 11 24 0 181M 30048K uwait 0 67:28 20.51% beam.smp
    
    
    23891 919 11 20 0 232M 69168K uwait 0 26:22 0.39% beam.smp
    
    
    57624 70 1 20 0 180M 17840K select 0 0:00 0.29% postgres
    
    
    57622 70 1 21 0 180M 65556K select 1 0:00 0.20% postgres
    
    
    
    

    使用couchdbs主数据存储设置主控制器的可以能方法,并使用a 和沙发在所有位置之间设置复制。

    
     Location 1 Location 2
    
    
     Postgres == CouchDB ---------- CouchDB == Postgres
    
    
    /
    
    
    /
    
    
    /
    
    
    /
    
    
    /
    
    
     Location 3
    
    
     CouchDB
    
    
     ||
    
    
     Postgres 
    
    
    
     Where === is the node client keeping the paired postgres up to date
    
    
     And ----- is couchdb performing replication 
    
    
    
    

    建议/管理- 最受欢迎的评论。

    如何进行批量更新:

    
    WITH new_docs AS (
    
    
     SELECT json_object_set_key(doc::json, 'test'::text, 'Couch & Postgres are scool'::text)::jsonb AS docs
    
    
     FROM articlespg
    
    
    ),
    
    
    agg_docs AS (
    
    
     SELECT json_agg(docs) AS aggjson FROM new_docs
    
    
    )
    
    
    
    SELECT headers FROM 
    
    
     http_post('http://192.168.3.21:5984/articlespg/_bulk_docs', '',
    
    
     '{"all_or_nothing":true,"docs":' || (SELECT * FROM agg_docs) || '}',
    
    
     'application/json'::text) ; 
    
    
    
    

    我试图使用我正在使用的文章测试数据库,然后更快地更新 <100行。

    
    DEBUG: pgsql-http: queried http://192.168.3.21:5984/articlespg/_bulk_docs
    
    
    ERROR: Failed to connect to 192.168.3.21 port 5984: Connection refused
    
    
    couchplay=> 
    
    
    
    

    但是,如果我们将请求分成更小的块:

    
    WITH newdocs AS ( -- Make chage to json here 
    
    
     SELECT id, json_object_set_key(doc::json, 'test'::text, 'Couch & Postgres are scool'::text)::jsonb AS docs
    
    
     FROM articlespg 
    
    
    ),
    
    
    chunked AS ( -- get in chunks 
    
    
    SELECT docs, ((ROW_NUMBER() OVER (ORDER BY id) - 1)/50) +1 AS chunk_no 
    
    
    FROM newdocs
    
    
    ),
    
    
    chunked_newdocs AS ( -- Bulk up bulk_docs chunks to send 
    
    
     SELECT json_agg(docs) AS bulk_docs, chunk_no FROM chunked GROUP BY chunk_no ORDER BY chunk_no
    
    
    )
    
    
    
    SELECT chunk_no, status FROM chunked_newdocs,
    
    
     http_post('http://192.168.3.21:5984/articlespg/_bulk_docs', '',
    
    
     '{"all_or_nothing":true,"docs":' || (bulk_docs) || '}',
    
    
     'application/json'::text); 
    
    
    
    

    块大小- 在这种情况下,50 safe safe safe doc safe safe safe safe safe tried begin begin begin begin begin begin fine fine fine fine fine。

    当我正在运行区块批量更新时,我可以看到更改streaming回到 postgres almost所以我想最好使用更新,因为 postgres happening UPDATE UPDATE UPDATE。

    不过我认为最好改变所有的帖子- 需要一个这样的函数:

    
     post_docs(docs,chunk_size) - returning recordset of status codes? or just true/false?
    
    
    
    

    如何处理有 5个块和第一个 2 sucseed但 3rd 失败的情况? 可以回滚一个事务,并给函数oldocs和 newdocs,然后一个post_docs块失败,它可以回滚成功的块。

    要在以下情况下使用:

    
     SELECT post_docs(json_object_set_key(doc::json, 'test'::text, 'Couch & Postgres are scool'::text)::jsonb,100)
    
    
     AS results
    
    
     FROM articlespg
    
    
    
    

    这也使得制作新数据库非常简单- 只需在沙发中添加一个新数据库,并更改url指向:

    
    chunked AS (
    
    
     SELECT docs, ((ROW_NUMBER() OVER (ORDER BY id) - 1)/500) +1 AS chunk_no 
    
    
     FROM articlespg
    
    
    ),
    
    
    chunked_newdocs AS (
    
    
     SELECT json_agg(docs) AS bulk_docs, chunk_no FROM chunked GROUP BY chunk_no ORDER BY chunk_no
    
    
    )
    
    
    SELECT chunk_no, status FROM chunked_newdocs,
    
    
     http_post('http://192.168.3.21:5984/NEW_articlespg_COPY/_bulk_docs', '',
    
    
     '{"all_or_nothing":true,"docs":' || (bulk_docs) || '}', 'application/json'::text); 
    
    
    
    

    我想也许比复制快。

    注意:在 pgsql MODULE 安装:

    https://wiki.postgresql.org/wiki/Building_and_Installing_PostgreSQL_Extension_Modules

    对于FreeBSD来说,你需要安装curl和gettext工具。

    
    # gmake PG_CONFIG=/usr/local/bin/pg_config
    
    
    cc -O2 -pipe -fstack-protector -fno-strict-aliasing -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fPIC -DPIC -I. -I./-I/usr/local/include/postgresql/server -I/usr/local/include/postgresql/internal -I/usr/local/include/libxml2 -I/usr/include -I/usr/local/include -I/usr/local/include -c -o http.o http.c
    
    
    http.c:89:1: warning: unused function 'header_value' [-Wunused-function]
    
    
    header_value(const char* header_str, const char* header_name)
    
    
    ^
    
    
    1 warning generated.
    
    
    cc -O2 -pipe -fstack-protector -fno-strict-aliasing -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fPIC -DPIC -I. -I./-I/usr/local/include/postgresql/server -I/usr/local/include/postgresql/internal -I/usr/local/include/libxml2 -I/usr/include -I/usr/local/include -I/usr/local/include -c -o stringbuffer.o stringbuffer.c
    
    
    cc -O2 -pipe -fstack-protector -fno-strict-aliasing -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fPIC -DPIC -shared -o http.so http.o stringbuffer.o -L/usr/local/lib -L/usr/local/lib -pthread -Wl,-rpath,/usr/lib:/usr/local/lib -fstack-protector -L/usr/local/lib -L/usr/lib -L/usr/local/lib -Wl,--as-needed -Wl,-R'/usr/local/lib' -L/usr/local/lib -lcurl
    
    
    
    # gmake PG_CONFIG=/usr/local/bin/pg_config install
    
    
    /bin/mkdir -p '/usr/local/lib/postgresql'
    
    
    /bin/mkdir -p '/usr/local/share/postgresql/extension'
    
    
    /bin/mkdir -p '/usr/local/share/postgresql/extension'
    
    
    /usr/bin/install -c -o root -g wheel -m 755 http.so '/usr/local/lib/postgresql/http.so'
    
    
    /usr/bin/install -c -o root -g wheel -m 644 http.control '/usr/local/share/postgresql/extension/'
    
    
    /usr/bin/install -c -o root -g wheel -m 644 http--1.0.sql '/usr/local/share/postgresql/extension/'
    
    
    
    

    进一步的想法和想法/问题或者希望帮助? https://github.com/sysadminmike/couch-to-postgres/issues

    更多测试:

    更新测试文章数据库中的所有记录。-

    
    SELECT id, doc->>'read' FROM articlespg WHERE doc->>'read'='false'
    
    
    
    couchplay=> SELECT id, doc->>'read' FROM articlespg WHERE doc->>'read'='false'
    
    
    couchplay-> 
    
    
    couchplay-> ;
    
    
     id |?column? 
    
    
    ------------------+----------
    
    
     _design/mw_views | false
    
    
    (1 row)
    
    
    
    

    只返回设计文档。

    运行时:

    
     UPDATE articlespg 
    
    
     SET doc = json_object_set_key(doc::json, 'read'::text, true)::jsonb, from_pg=true ;
    
    
    
    

    有趣的是,当feed进行查询时,会锁定表,但在更新完成之前不会更新 inode。

    当查询运行时,你可以在沙发更新中看到提交顺序:

    
    articlespg: {"db_name":"articlespg","doc_count":63838,"doc_del_count":2,"update_seq":233296,"purge_seq":0,"compact_running":false,"disk_size":2145373958,"data_size":214959726,"instance_start_time":"1418762851354294","disk_format_version":6,"committed_update_seq":233224}
    
    
    articlespg: Checkpoint 192414 is current next check in: 10 seconds
    
    
    PG_WATCHDOG: OK
    
    
    
    articlespg: {"db_name":"articlespg","doc_count":63838,"doc_del_count":2,"update_seq":242301,"purge_seq":0,"compact_running":false,"disk_size":2234531964,"data_size":215440194,"instance_start_time":"1418762851354294","disk_format_version":6,"committed_update_seq":242301}
    
    
    articlespg: Checkpoint 192414 is current next check in: 10 seconds
    
    
    PG_WATCHDOG: OK
    
    
    
    

    当我获得查询的返回时,提要会发生变化,所以想在更新运行时锁定表。 *** 我需要在我引入了这个测试之后,在我引入了一个停止更新的情况下,对这个进行了处理。

    在返回river需要 475秒,然后在返回之后花大约 3分钟,以便对所有个记录进行更新,以便更新分钟。

    现在我们就可以在交易过程中进行更新,我认为 2更新是可能的,但第二个失败可能会发生。
    如果单一的a 失败了a 假定没有更新任何数据,但是大部分文档都会被更新。 注到目前为止没有一个插入或者更新失败,但我没有杀死/2 1的路径。

    这让我知道了另一个用途。 通过简单地将put_function更新为不同的ip或者数据库,在postgres中的couchdb表的子集中填充一个新的couchdb,例如:

    
    --SELECT headers FROM http_post('http://192.168.3.21:5984/' || TG_TABLE_NAME || '/' || NEW.id::text, '', NEW.doc::text, 'application/json'::text) INTO RES; 
    
    
     SELECT headers FROM http_post('http://192.168.3.21:5984/articlespg-subset' || '/' || NEW.id::text, '', NEW.doc::text, 'application/json'::text) INTO RES; 
    
    
    
    

    然后运行更新,但使用

    
    UPDATE articlespg 
    
    
    SET doc = json_object_set_key(doc::json, 'read'::text, true)::jsonb, from_pg=true 
    
    
    WHERE doc ->>'feedName' ='::Planet PostgreSQL::';
    
    
    
    

    大约 10秒后,一个填充了 761文档的填充的couchdb与以下位置匹配:

    
    {"db_name":"articlespg-subset","doc_count":761,"doc_del_count":0,"update_seq":761,"purge_seq":0,"compact_running":false,"disk_size":6107249,"data_size":3380130,"instance_start_time":"1418770153501066","disk_format_version":6,"committed_update_seq":761}
    
    
    
    

    为经过筛选的复制创建设计文档的操作简单得多。 没有理由为什么你不能在Posgres中的两个沙发数据库表并将它们合并到一个新的couchdb。

    我还做了一个快速测试和&查询访问和通过odbc查询的直通sql查询。

    我想我喜欢最好我认为大多数工作将由 http_post function POSTing (from postgres) 完成

    ( 我认为也有一个不错的想法来做选择)


    str  pos  POST  postgres  Postgresql  Couchdb