《PostgreSQL 任意字段数组合 AND\OR 条件,指定返回结果条数,构造测试数据算法举例》

《PostgreSQL 实践 - 实时广告位推荐 2 (任意字段组合、任意维度组合搜索、输出TOP-K)》

《PostgreSQL ADHoc(任意字段组合)查询 与 字典化 (rum索引加速) - 实践与方案1》

《HTAP数据库 PostgreSQL 场景与性能测试之 20 - (OLAP) 用户画像圈人场景 - 多个字段任意组合条件筛选与透视》

1亿记录,128个字段,任意字段组合查询。性能如何?

PG凭什么可以搞定大数据量的任意字段组合实时搜索?

《PostgreSQL 并行计算解说 汇总》

2、写入1亿数据

  1. insert into test (c1) select random()*100 from generate_series(1,100);
  2. nohup pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 20000 >/dev/null 2>&1 &

3、写完后的大小

  1. postgres=# \dt+ test
  2. List of relations
  3. Schema | Name | Type | Owner | Size | Description
  4. --------+------+-------+----------+-------+-------------
  5. public | test | table | postgres | 55 GB |
  6. (1 row)
  7. postgres=# select count(*) from test;
  8. count
  9. -----------
  10. 100000000
  11. (1 row)

4、高效率创建索引

  1. vi idx.sql
  2. vacuum (analyze,verbose) test;
  3. set maintenance_work_mem='8GB';
  4. set max_parallel_workers=128;
  5. set max_parallel_workers_per_gather=32;
  6. set min_parallel_index_scan_size=0;
  7. set min_parallel_table_scan_size=0;
  8. set parallel_setup_cost=0;
  9. set parallel_tuple_cost=0;
  10. set max_parallel_maintenance_workers=16;
  11. alter table test set (parallel_workers=64);
  12. do language plpgsql $$
  13. declare
  14. sql text;
  15. begin
  16. for i in 1..128 loop
  17. execute format('create index idx_test_%s on test (c%s) %s', i, i, 'tablespace tbs_8001');
  18. end loop;
  19. end;
  20. $$;
  21. vacuum (analyze,verbose) test;
  22. nohup psql -f ./idx.sql >/dev/null 2>&1 &

5、建完索引后

当前有129个索引,写入性能如何?

9505行/s。

  1. transaction type: ./test.sql
  2. scaling factor: 1
  3. query mode: prepared
  4. number of clients: 24
  5. duration: 120 s
  6. number of transactions actually processed: 11433
  7. latency average = 252.195 ms
  8. tps = 95.054689 (including connections establishing)
  9. tps = 95.058210 (excluding connections establishing)
  10. statement latencies in milliseconds:
  11. 252.179 insert into test (c1) select random()*100 from generate_series(1,100);

瓶颈,磁盘读写5.5GB/s。

  1. Total DISK READ : 207.91 K/s | Total DISK WRITE : 3.54 G/s
  2. Actual DISK READ: 207.91 K/s | Actual DISK WRITE: 2015.64 M/s
  3. TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
  4. 55887 be/4 digoal 15.40 K/s 158.54 M/s 0.00 % 1.05 % postgres: postgres postgres [local] INSERT
  5. 55872 be/4 digoal 7.70 K/s 157.62 M/s 0.00 % 0.84 % postgres: postgres postgres [local] INSERT
  6. 55886 be/4 digoal 23.10 K/s 158.78 M/s 0.00 % 0.78 % postgres: postgres postgres [local] INSERT
  7. 55897 be/4 digoal 7.70 K/s 158.79 M/s 0.00 % 0.75 % postgres: postgres postgres [local] INSERT
  8. 55889 be/4 digoal 0.00 B/s 158.72 M/s 0.00 % 0.69 % postgres: postgres postgres [local] INSERT
  9. 55894 be/4 digoal 0.00 B/s 157.25 M/s 0.00 % 0.69 % postgres: postgres postgres [local] INSERT
  10. 55888 be/4 digoal 7.70 K/s 136.26 M/s 0.00 % 0.68 % postgres: postgres postgres [local] INSERT
  11. 55885 be/4 digoal 7.70 K/s 143.24 M/s 0.00 % 0.67 % postgres: postgres postgres [local] INSERT
  12. 55890 be/4 digoal 0.00 B/s 159.07 M/s 0.00 % 0.67 % postgres: postgres postgres [local] INSERT
  13. 55865 be/4 digoal 15.40 K/s 158.27 M/s 0.00 % 0.65 % postgres: postgres postgres [local] INSERT
  14. 55900 be/4 digoal 7.70 K/s 151.00 M/s 0.00 % 0.64 % postgres: postgres postgres [local] INSERT
  15. 55891 be/4 digoal 0.00 B/s 160.40 M/s 0.00 % 0.63 % postgres: postgres postgres [local] INSERT
  16. 55896 be/4 digoal 0.00 B/s 158.79 M/s 0.00 % 0.62 % postgres: postgres postgres [local] INSERT
  17. 55902 be/4 digoal 15.40 K/s 157.65 M/s 0.00 % 0.62 % postgres: postgres postgres [local] INSERT
  18. 55875 be/4 digoal 0.00 B/s 158.52 M/s 0.00 % 0.58 % postgres: postgres postgres [local] INSERT
  19. 55892 be/4 digoal 7.70 K/s 136.20 M/s 0.00 % 0.58 % postgres: postgres postgres [local] INSERT
  20. 55868 be/4 digoal 0.00 B/s 139.10 M/s 0.00 % 0.58 % postgres: postgres postgres [local] INSERT
  21. 55895 be/4 digoal 0.00 B/s 159.75 M/s 0.00 % 0.57 % postgres: postgres postgres [local] INSERT
  22. 55898 be/4 digoal 0.00 B/s 113.43 M/s 0.00 % 0.55 % postgres: postgres postgres [local] INSERT
  23. 55880 be/4 digoal 46.20 K/s 121.68 M/s 0.00 % 0.50 % postgres: postgres postgres [local] INSERT
  24. 55884 be/4 digoal 23.10 K/s 126.35 M/s 0.00 % 0.47 % postgres: postgres postgres [local] INSERT
  25. 55901 be/4 digoal 15.40 K/s 117.46 M/s 0.00 % 0.46 % postgres: postgres postgres [local] INSERT
  26. 55899 be/4 digoal 7.70 K/s 115.13 M/s 0.00 % 0.46 % postgres: postgres postgres [local] INSERT

瓶颈在读写数据文件

  1. postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity where wait_event is not null group by 1,2 order by 3 desc;
  2. wait_event_type | wait_event | count
  3. -----------------+---------------------+-------
  4. IO | DataFileWrite | 15
  5. IO | DataFileRead | 5
  6. Activity | WalWriterMain | 1
  7. Activity | LogicalLauncherMain | 1
  8. Activity | CheckpointerMain | 1
  9. Activity | AutoVacuumMain | 1
  10. (6 rows)

任意字段组合查询性能如何

1、

2、

  1. set min_parallel_index_scan_size=0;
  2. set min_parallel_table_scan_size=0;
  3. set parallel_setup_cost=0;
  4. set parallel_tuple_cost=0;
  5. set work_mem='1GB';
  6. set max_parallel_workers=128;
  7. set max_parallel_workers_per_gather=24;
  8. set random_page_cost =1.1;
  9. set effective_cache_size ='400GB';
  10. set enable_bitmapscan=off;
  1. count
  2. -------
  3. 9764
  4. (1 row)
  5. Time: 50.160 ms
  6. postgres=# select count(*) from test where c1=2 and c99 between 100 and 1000 and c98 between 100 and 200 and c2=1;
  7. count
  8. -------
  9. 0
  10. (1 row)
  11. Time: 20.969 ms
  12. postgres=# select count(*) from test where c1=2 and c99 between 100 and 10000 and c108 between 100 and 10000;
  13. count
  14. -------
  15. 102
  16. (1 row)
  17. Time: 72.359 ms
  18. postgres=# select count(*) from test where c1=2 and c99=1;
  19. count
  20. -------
  21. 2
  22. (1 row)
  23. Time: 1.118 ms

3、OR

  1. set enable_bitmapscan=on;
  2. postgres=# explain select count(*) from test where c1=2 and c99=1 or c100 between 10 and 100;
  3. QUERY PLAN
  4. --------------------------------------------------------------------------------------------
  5. Aggregate (cost=10000010781.91..10000010781.92 rows=1 width=8)
  6. -> Bitmap Heap Scan on test (cost=10000000130.57..10000010758.33 rows=9430 width=0)
  7. Recheck Cond: ((c99 = 1) OR ((c100 >= 10) AND (c100 <= 100)))
  8. Filter: (((c1 = 2) AND (c99 = 1)) OR ((c100 >= 10) AND (c100 <= 100)))
  9. -> BitmapOr (cost=130.57..130.57 rows=9526 width=0)
  10. -> Bitmap Index Scan on idx_test_99 (cost=0.00..2.39 rows=96 width=0)
  11. Index Cond: (c99 = 1)
  12. -> Bitmap Index Scan on idx_test_100 (cost=0.00..123.47 rows=9430 width=0)
  13. Index Cond: ((c100 >= 10) AND (c100 <= 100))
  14. (9 rows)
  15. Time: 1.281 ms
  16. postgres=# select count(*) from test where c1=2 and c99=1 or c100 between 10 and 100;
  17. count
  18. -------
  19. 9174
  20. (1 row)

性能差异:

2、扫描量

3、运算量(与结果集大小无直接关系,关键看扫描方法和中间计算量)。

写入能力:129个索引,写入9505行/s。瓶颈在IO侧,通过提升IO能力,加分区可以提高。

《PostgreSQL 设计优化case - 大宽表任意字段组合查询索引如何选择(btree, gin, rum) - (含单个索引列数超过32列的方法)》

《PostgreSQL ADHoc(任意字段组合)查询(rums索引加速) - 非字典化,普通、数组等组合字段生成新数组》

《PostgreSQL 实践 - 实时广告位推荐 1 (任意字段组合、任意维度组合搜索、输出TOP-K)》

《PostgreSQL 如何高效解决 按任意字段分词检索的问题 - case 1》

《PostgreSQL 多字段任意组合搜索的性能》