《PostgreSQL 任意字段数组合 AND\OR 条件,指定返回结果条数,构造测试数据算法举例》
《PostgreSQL 实践 - 实时广告位推荐 2 (任意字段组合、任意维度组合搜索、输出TOP-K)》
《PostgreSQL ADHoc(任意字段组合)查询 与 字典化 (rum索引加速) - 实践与方案1》
《HTAP数据库 PostgreSQL 场景与性能测试之 20 - (OLAP) 用户画像圈人场景 - 多个字段任意组合条件筛选与透视》
1亿记录,128个字段,任意字段组合查询。性能如何?
PG凭什么可以搞定大数据量的任意字段组合实时搜索?
2、写入1亿数据
insert into test (c1) select random()*100 from generate_series(1,100);
nohup pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 20000 >/dev/null 2>&1 &
3、写完后的大小
postgres=# \dt+ test
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+-------+-------------
public | test | table | postgres | 55 GB |
(1 row)
postgres=# select count(*) from test;
count
-----------
100000000
(1 row)
4、高效率创建索引
vi idx.sql
vacuum (analyze,verbose) test;
set maintenance_work_mem='8GB';
set max_parallel_workers=128;
set max_parallel_workers_per_gather=32;
set min_parallel_index_scan_size=0;
set min_parallel_table_scan_size=0;
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set max_parallel_maintenance_workers=16;
alter table test set (parallel_workers=64);
do language plpgsql $$
declare
sql text;
begin
for i in 1..128 loop
execute format('create index idx_test_%s on test (c%s) %s', i, i, 'tablespace tbs_8001');
end loop;
end;
$$;
vacuum (analyze,verbose) test;
nohup psql -f ./idx.sql >/dev/null 2>&1 &
5、建完索引后
当前有129个索引,写入性能如何?
9505行/s。
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 24
duration: 120 s
number of transactions actually processed: 11433
latency average = 252.195 ms
tps = 95.054689 (including connections establishing)
tps = 95.058210 (excluding connections establishing)
statement latencies in milliseconds:
252.179 insert into test (c1) select random()*100 from generate_series(1,100);
瓶颈,磁盘读写5.5GB/s。
Total DISK READ : 207.91 K/s | Total DISK WRITE : 3.54 G/s
Actual DISK READ: 207.91 K/s | Actual DISK WRITE: 2015.64 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
55887 be/4 digoal 15.40 K/s 158.54 M/s 0.00 % 1.05 % postgres: postgres postgres [local] INSERT
55872 be/4 digoal 7.70 K/s 157.62 M/s 0.00 % 0.84 % postgres: postgres postgres [local] INSERT
55886 be/4 digoal 23.10 K/s 158.78 M/s 0.00 % 0.78 % postgres: postgres postgres [local] INSERT
55897 be/4 digoal 7.70 K/s 158.79 M/s 0.00 % 0.75 % postgres: postgres postgres [local] INSERT
55889 be/4 digoal 0.00 B/s 158.72 M/s 0.00 % 0.69 % postgres: postgres postgres [local] INSERT
55894 be/4 digoal 0.00 B/s 157.25 M/s 0.00 % 0.69 % postgres: postgres postgres [local] INSERT
55888 be/4 digoal 7.70 K/s 136.26 M/s 0.00 % 0.68 % postgres: postgres postgres [local] INSERT
55885 be/4 digoal 7.70 K/s 143.24 M/s 0.00 % 0.67 % postgres: postgres postgres [local] INSERT
55890 be/4 digoal 0.00 B/s 159.07 M/s 0.00 % 0.67 % postgres: postgres postgres [local] INSERT
55865 be/4 digoal 15.40 K/s 158.27 M/s 0.00 % 0.65 % postgres: postgres postgres [local] INSERT
55900 be/4 digoal 7.70 K/s 151.00 M/s 0.00 % 0.64 % postgres: postgres postgres [local] INSERT
55891 be/4 digoal 0.00 B/s 160.40 M/s 0.00 % 0.63 % postgres: postgres postgres [local] INSERT
55896 be/4 digoal 0.00 B/s 158.79 M/s 0.00 % 0.62 % postgres: postgres postgres [local] INSERT
55902 be/4 digoal 15.40 K/s 157.65 M/s 0.00 % 0.62 % postgres: postgres postgres [local] INSERT
55875 be/4 digoal 0.00 B/s 158.52 M/s 0.00 % 0.58 % postgres: postgres postgres [local] INSERT
55892 be/4 digoal 7.70 K/s 136.20 M/s 0.00 % 0.58 % postgres: postgres postgres [local] INSERT
55868 be/4 digoal 0.00 B/s 139.10 M/s 0.00 % 0.58 % postgres: postgres postgres [local] INSERT
55895 be/4 digoal 0.00 B/s 159.75 M/s 0.00 % 0.57 % postgres: postgres postgres [local] INSERT
55898 be/4 digoal 0.00 B/s 113.43 M/s 0.00 % 0.55 % postgres: postgres postgres [local] INSERT
55880 be/4 digoal 46.20 K/s 121.68 M/s 0.00 % 0.50 % postgres: postgres postgres [local] INSERT
55884 be/4 digoal 23.10 K/s 126.35 M/s 0.00 % 0.47 % postgres: postgres postgres [local] INSERT
55901 be/4 digoal 15.40 K/s 117.46 M/s 0.00 % 0.46 % postgres: postgres postgres [local] INSERT
55899 be/4 digoal 7.70 K/s 115.13 M/s 0.00 % 0.46 % postgres: postgres postgres [local] INSERT
瓶颈在读写数据文件
postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity where wait_event is not null group by 1,2 order by 3 desc;
wait_event_type | wait_event | count
-----------------+---------------------+-------
IO | DataFileWrite | 15
IO | DataFileRead | 5
Activity | WalWriterMain | 1
Activity | LogicalLauncherMain | 1
Activity | CheckpointerMain | 1
Activity | AutoVacuumMain | 1
(6 rows)
任意字段组合查询性能如何
1、
2、
set min_parallel_index_scan_size=0;
set min_parallel_table_scan_size=0;
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set work_mem='1GB';
set max_parallel_workers=128;
set max_parallel_workers_per_gather=24;
set random_page_cost =1.1;
set effective_cache_size ='400GB';
set enable_bitmapscan=off;
count
-------
9764
(1 row)
Time: 50.160 ms
postgres=# select count(*) from test where c1=2 and c99 between 100 and 1000 and c98 between 100 and 200 and c2=1;
count
-------
0
(1 row)
Time: 20.969 ms
postgres=# select count(*) from test where c1=2 and c99 between 100 and 10000 and c108 between 100 and 10000;
count
-------
102
(1 row)
Time: 72.359 ms
postgres=# select count(*) from test where c1=2 and c99=1;
count
-------
2
(1 row)
Time: 1.118 ms
3、OR
set enable_bitmapscan=on;
postgres=# explain select count(*) from test where c1=2 and c99=1 or c100 between 10 and 100;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=10000010781.91..10000010781.92 rows=1 width=8)
-> Bitmap Heap Scan on test (cost=10000000130.57..10000010758.33 rows=9430 width=0)
Recheck Cond: ((c99 = 1) OR ((c100 >= 10) AND (c100 <= 100)))
Filter: (((c1 = 2) AND (c99 = 1)) OR ((c100 >= 10) AND (c100 <= 100)))
-> BitmapOr (cost=130.57..130.57 rows=9526 width=0)
-> Bitmap Index Scan on idx_test_99 (cost=0.00..2.39 rows=96 width=0)
Index Cond: (c99 = 1)
-> Bitmap Index Scan on idx_test_100 (cost=0.00..123.47 rows=9430 width=0)
Index Cond: ((c100 >= 10) AND (c100 <= 100))
(9 rows)
Time: 1.281 ms
postgres=# select count(*) from test where c1=2 and c99=1 or c100 between 10 and 100;
count
-------
9174
(1 row)
性能差异:
2、扫描量
3、运算量(与结果集大小无直接关系,关键看扫描方法和中间计算量)。
写入能力:129个索引,写入9505行/s。瓶颈在IO侧,通过提升IO能力,加分区可以提高。
《PostgreSQL 设计优化case - 大宽表任意字段组合查询索引如何选择(btree, gin, rum) - (含单个索引列数超过32列的方法)》
《PostgreSQL ADHoc(任意字段组合)查询(rums索引加速) - 非字典化,普通、数组等组合字段生成新数组》
《PostgreSQL 实践 - 实时广告位推荐 1 (任意字段组合、任意维度组合搜索、输出TOP-K)》