Indexing Complex PostgreSQL Data Types

Indexing
Complex
PostgreSQL

Data
Types
Jonathan
S.
Katz
-‐
PGDay
UK
2014
-‐
July
9,
2014

About
• Jonathan
S.
Katz

– CTO,
VenueBook

– Co-‐Organizer,
NYC
PostgreSQL
User
Group

– PGConf
NYC
2015

• Mar
25
-‐
27,
2015

• New
York
Marriott
Downtown

• http://nyc.pgconf.us

– @jkatz05
2

Quick
Overview
• Introductory
Talk
with
demos
and
fun

• B-‐trees

• GiST:
Generalized
Search
Trees

• GIN:
Generalized
Inverted
Index

• SP-‐GiST:
Space
Partitioned
Generalized
Search

Trees
3

Assumptions
• PostgreSQL
9.3+

• most
will
be
9.0+

• PostGIS
2.0+

• Believe
it
will
work
for
most
available

versions

• PostgreSQL
9.4
beta?
4

The
Beginning
6
4
3
8
7
2
9
1
6
10
6

Find
3
7
4
3
8
7
2
9
1
6
10
5

Find
3
8
4
3
8
7
2
9
1
6
10
5

Find
3
9
4
3
8
7
2
9
1
6
10
5

Find
10
10
4
3
8
7
2
9
1
6
10
5

Find
10
11
4
3
8
7
2
9
1
6
10
5

Find
10
12
4
3
8
7
2
9
1
6
10
5

Find
10
13
4
3
8
7
2
9
1
6
10
5

Find
10
14
4
3
8
7
2
9
1
6
10
5

Find
10
15
4
3
8
7
2
9
1
6
10
5

Find
10
16
4
3
8
7
2
9
1
6
10
5

Find
10
17
4
3
8
7
2
9
1
6
10
5

Find
10
18
4
3
8
7
2
9
1
6
10
5

Find
10
19
4
3
8
7
2
9
1
6
10
5

What
We
Learned
• Without
any
data
structure
around
search,
we

rely
on
"hope"

• Assumed
"unique
values"
and
"equality"

– would
have
to
scan
all
rows
otherwise

• …and
what
about:

– INSERT

– UPDATE

– DELETE
20

What
We
Need
• Need
a
data
structure
for
search
that:

– allows
efficient
lookups

– plays
nicely
with
disk
I/O

– does
not
take
too
long
for
updates

21

B-‐Trees
• "default"
index

• quick
traversal
to
leaf
nodes

• leaf
nodes
pre-‐sorted

• node
size
designed
to
fit
in
disk
block
size

– "degree"
of
nodes
has
max-‐size

• theoretical
performance

– reads:
O(log
n)

– writes:
O(log
n)

– space:
O(n)
22

B-‐Trees
and
PostgreSQL
• supports

– <=,
<,
=,
>,
>=

– BETWEEN,
IN

– IS
NOT
NULL,
IS
NULL

– LIKE
in
specific
case
of
‘plaintext%’

– ~
in
specific
case
of
‘^plaintext’

– ILIKE
and
~*
if
pattern
starts
with
nonalpha
characters

• does
not
support

• IS
NOT
DISTINCT
FROM
23

B-‐Trees
and
PostgreSQL
• data
types
supported

– any
data
type
with
all
the
equality
operators
defined

– number
types

• integer,
numeric,
decimal

– text

• char,
varchar,
text

– date
/
times

• timestamptz,
timestamp,
date,
time,
timetz,
interval

- arrays,
ranges
24

Demo
Specs
+
Configuration
• Hardware
specs

– 2.3GHz
Intel
i7
2x4
core

– 16GB
RAM
DDR3
1600MHz

– Apple
SSD
6Gbps

• postgresql.conf
(9.3,
9.4
beta
1)

– shared_buffers
=
1GB

– work_mem
=
64MB

– maintenance_work_mem
=
1024MB

– effective_cache_size
=
8GB
25

Demo
#1:
Basic
Indexing
Plans
26

Demo
#1
Notes
• Index
maintenance

• VACUUM
–
"cleans
up"
after
writes
on
table
/
indexes

– ANALYZE
–
keeps
statistics
up-‐to-‐date
for
planner

!
VACUUM ANALYZE tablename;

!
• Good
idea
to
leave
autovacuum
on
27

Indexing
in
Production
• CREATE
INDEX
CONCURRENTLY

• REINDEX

– corruption,
bloat,
invalid

• FILLFACTOR

– 10
–
100

– default:
90

– strategy:
lower
%
::
write-‐activity

• TABLESPACE

• NULLS
LAST,
NULLS
FIRST
28

Demo
#2:
Partial
Indexes
29
CREATE INDEX indexname ON tablename (columnname)
WHERE somecondition;

Demo
#2
Notes
• Partial
Indexes
are

– good
if
known
to
query
limited
subset
of
table

– take
up
less
space

– allow
for
much
quicker
writes

• Like
all
good
things,
do
not
overuse
and

saturate
your
I/O
30

Unique
Indexes
• only
for
B-‐trees

• NULL
not
unique

• use
UNIQUE
constraints
–
automatically
create

indexes

!
CREATE TABLE foo (bar int UNIQUE);

-- or

CREATE UNIQUE INDEX foo_bar_idx ON foo (bar);

ALTER TABLE foo ADD CONSTRAINT a_unique
USING INDEX a_idx;
31

Multi-‐Column
Indexes
• Useful
for

– querying
two
columns
that
are
frequently
queried

together

– enforcing
UNIQUEness
across
columns

• n.b.
creating
UNIQUE
constraint
on
table
creates
UNIQUE

INDEX

• PostgreSQL
supports

– up
to
32
columns

– B-‐tree,
GiST,
GIN

• Be
careful
of
how
you
choose
initial
column
order!
32

Multi-‐Column
Indexes
33
CREATE INDEX multicolumn_idx ON
tablename (col1, col2);
!
!
!
CREATE UNIQUE INDEX pn_idx ON
phone_numbers (country_code, national_number)
WHERE extension IS NULL

Demo
#3
Notes
• Multi-‐column
indexes
can
be

– efficient
for
speed
+
space

– inefficient
with
performance

• Usage
depends
on
your
application
needs
34

Expression
Indexes
• can
index
on
expressions
to
speed
up
lookups

– e.g.
case
insensitive
email
addresses

– can
use
functions
or
scalars

• (x * y) / 100

• COALESCE(first_name, '') || ' ' ||
COALESCE(last_name, '')

• LOWER(email_address)

• tradeoff:
slower
writes
35

Demo
#4:
Expression
Indexes
36

Demo
#4
Notes
• fast
lookups,
slow
updates

• size
on
disk

• cool
data
types

– geometric

– JSON
37

Geometric
Data
Types
38

Geometric
Data
Types
CREATE TABLE points (coord point);
!
CREATE INDEX points_idx ON points (coord);
ERROR: data type point has no default
operator class for access method "btree"
HINT: You must specify an operator class
for the index or define a default operator
class for the data type.
39

GiST
• "generalized
search
tree"

• infrastructure
that
provides
template
to
create
arbitrary

indexing
schemes

– supports
concurrency,
logging,
searching
–
only
have
to
define

behavior

– user-‐defined
operator
class

• <<,
&<,
&>,
>>,
<<|,
&<|,
|&>,
|>>,
@>,
<@,
~=,
&&

– have
to
implement
functions
in
interface

• supports
lossless
+
lossy
indexes

• provides
support
for
"nearest-‐neighbor"
queries
–

"KNN-‐Gist"
40
CREATE INDEX points_coord_gist_idx ON points
USING gist(coord)

Demo
#5:
Geometric
GiST
+
KNN-‐Gist
41

Demo
#5
Notes
• GiST
indexes
on
geometric
types
radically

speedup
reads

• Writes
are
slower
due
to
distance
calculation

• Index
size
can
be
very
big
42

PostGIS
• For
when
you
are
doing
real
things
with
shapes
43• (and
geographic
information
systems)

PostGIS
+
Indexes
• B-‐Tree?

• R-‐Tree?

• PostGIS
docs
do
not
recommend
using
just

an
R-‐Tree
index

• GiST

• overlaps!

containment!

• uses
a
combination
of
GiST
+
R-‐Tree
44

PostGIS
+
GiST
45
2-‐D
CREATE INDEX zipcodes_geom_gist_idx ON zipcodes
USING gist(geom);
N-‐D
(PostGIS
2.0+
CREATE INDEX zipcodes_geom_gist_idx ON zipcodes
USING gist(geom gist_geometry_ops_nd);

Example
-‐
USA
Zipcode
Boundaries
• 33,120
rows

• geom:
MultiPolygon

• 52MB
without
indexes

• With
geometry
GiST
+
integer
B-‐Tree:
869MB
46

What
Zipcode
Is
My
Office
In?
• Geocoded
Address

• Lat,Long
=
40.7356197,-‐73.9891102

• PostGIS:
POINT(-‐73.9891102
40.7356197)

• 4269
-‐
“SRID”
-‐
unique
ID
for
coordinate
system

definitions
47
SELECT zcta5ce10 AS zipcode
FROM zipcodes
WHERE ST_Contains(
geom, --MultiPolygon
ST_GeomFromText('POINT(-73.9891102 40.7356197)', 4269)
);

What
Zipcode
Is
My
Office
In?
• No
Index
48
Seq Scan on zipcodes (cost=0.00..15382.00 rows=1 width=6) (actual
time=64.780..5153.485 rows=1 loops=1)
Filter: ((geom &&
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry) AND
_st_contains(geom,
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry))
Rows Removed by Filter: 33119
Total runtime: 5153.505 ms

What
Zipcode
Is
My
Office
In?
• Here’s
the
GiST:
49
Index Scan using zipcodes_geom_gist on zipcodes (cost=0.28..8.54
rows=1 width=6) (actual time=0.120..0.207 rows=1 loops=1)
Index Cond: (geom &&
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)
Filter: _st_contains(geom,
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)
!

PostGIS
+
GiST
Conclusion
• 5153ms
=>
0.235ms
=
21,900x
speedup
:-‐)
50
SELECT zcta5ce10 AS zipcode
FROM zipcodes
WHERE ST_Contains(
geom,
ST_GeomFromText('POINT(-73.9891102 40.7356197)',
4269)
);
!
zipcode
---------
10003

Full
Text
Search
• PostgreSQL
offers
full
text
search
with
the

tsearch2
engine

– algorithms
for
performing
FTS

– to_tsvector('english',
content)
@@

to_tsquery('irish
&
conference
|
meeting')

– provides
indexing
capabilities
for
efficient
search
51

Test
Data
Set
• Wikipedia
English
category
titles
–
all

1,823,644
that
I
downloaded
52

Full-‐Text
Search:
Basics
53
SELECT title
FROM category
WHERE
to_tsvector('english', title) @@ to_tsquery('united & kingdom’);
!
title
-----
Lists of railway stations in the United Kingdom
Political history of the United Kingdom
Military of the United Kingdom
United Kingdom constitution
Television channels in the United Kingdom
United Kingdom
Roman Catholic secondary schools in the United Kingdom
[results truncated]
!
!
QUERY PLAN
------------
Seq Scan on category (cost=0.00..49262.77 rows=46 width=29) (actual time=21.900..16809.890
rows=8810 loops=1)
Filter: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text))
!

Full-‐Text
Search
+
GiST
54
CREATE INDEX category_title_gist_idx ON category
USING gist(to_tsvector('english', title));
!
SELECT title
FROM category
WHERE to_tsvector('english', title) @@ to_tsquery('united & kingdom');
QUERY PLAN
-------------
Bitmap Heap Scan on category (cost=4.77..182.47 rows=46 width=29) (actual time=75.517..180.650
rows=8810 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
-> Bitmap Index Scan on category_title_gist_idx (cost=0.00..4.76 rows=46 width=0) (actual
time=74.687..74.687 rows=8810 loops=1)
Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom’::text))
!

Full
Text
Search
+
GiST
• GiST
indexes
can
produce
false
positives

– "documents"
represented
by
fixed
length
signature

• words
are
hashed
into
single
bits
and
concatenated

– when
false
positive
occurs,
row
is
returned
and

checked
to
see
if
false
match

• Extra
validations
=
performance
degradation
55

Performance
Summary
with
GiST
• initial
index
build
takes
awhile
=>
slow
writes

• reads
are
quick

• Table
size:
271MB

• Index
size:
83MB
56

GIN
Index
• "generalized
inverted
index"

• supports
searching
within
composite
data

– arrays,
full-‐text
documents,
hstore

• key
is
stored
once
and
points
to
composites
it
is

contained
in

• like
GiST,
provides
index
infrastructure
to
extend
GIN

based
on
behavior

– supports
operators
<@,
@>,
=,
&&

• GIN
performance
⬄
log(#
unique
things)
57

Full
Text
Search
+
GIN
58
CREATE INDEX category_title_gin_idx ON category
USING gin(to_tsvector('english', title));
!
EXPLAIN ANALYZE SELECT title FROM category WHERE to_tsvector('english',
title) @@ to_tsquery('united & kingdom');
!
QUERY PLAN
-------
Bitmap Heap Scan on category (cost=28.36..206.06 rows=46 width=29) (actual time=8.864..14.674
rows=8810 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
-> Bitmap Index Scan on category_title_gin_idx (cost=0.00..28.35 rows=46 width=0) (actual
time=7.905..7.905 rows=8810 loops=1)
Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
!
!

Performance
Summary
with
GIN
• index
build
was
much
quicker

• significant
speedup
from
no
index

– (12,000ms
=>
15ms)

• significant
speedup
from
GiST

– (181ms
=>
15ms)

• Table
size:
271MB

• Index
size:

• 9.3:
71MB

• 9.4
beta
1:

40MB
59

What
Was
Not
Discussed
• Word
density

– prior
to
9.3,
performance
issues
with
greater
word

density

• Type
of
text
data
–
phrases
vs
paragraphs
60

Full
Text
Search
–
GiST
vs
GIN
• Reads

– overall,
GIN
should
win

• Writes

– traditionally,
GiST
has
better
performance
for

writes

– GIN

• FASTUPDATE

• 9.4:
compression
61

Regular
Expression
Indexes
• Added
in
9.3

• Support
for
LIKE/ILIKE
wildcard
indexes
in
9.1

– title
LIKE
'%ab%e'

• Uses
pg_trgm
extension
+
GIN

!
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX category_title_regex_idx ON
category USING GIN(title gin_trgm_ops);
62

Regular
Expressions
-‐
No
Index
63
EXPLAIN ANALYZE SELECT title FROM category
WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$';
!
QUERY PLAN
----------
Seq Scan on category (cost=0.00..40144.55 rows=182 width=29) (actual
time=2.509..4260.792 rows=5878 loops=1)
Filter: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
!

Regular
Expressions
-‐
Indexed
64
CREATE INDEX category_title_regex_idx ON category
USING gin(title gin_trgm_ops);
!
EXPLAIN ANALYZE SELECT title FROM category
WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$';
QUERY PLAN
-----------
Bitmap Heap Scan on category (cost=197.41..871.77 rows=182 width=29) (actual
time=107.445..146.713 rows=5878 loops=1)
Recheck Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
Rows Removed by Index Recheck: 4712
-> Bitmap Index Scan on category_title_regex_idx (cost=0.00..197.37 rows=182 width=0) (actual
time=106.645..106.645 rows=10590 loops=1)
Index Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
!
!

Range
Types
• stores
range
data

–
1
to
6

– 2013-‐10-‐29
–
2013-‐11-‐2

• easy-‐to-‐use
operators
to
check
inclusion,

overlaps

• built-‐in
types:
integers,
numerics,
dates,

timestamps

• extensible
65

Range
Type
Examples
66
--find all ranges that overlap with [100, 200)
!
SELECT * FROM ranges WHERE int4range(100, 200) && range;
!
range
-----------
[10,102)
[13,102)
[18,108)
[32,101)
[34,134)
[37,123)
[43,111)
[46,132)
[48,107)
[results trunctated]
!QUERY PLAN
-----------
Seq Scan on ranges (cost=0.00..14239.86 rows=7073 width=32) (actual time=0.018..185.411 rows=143 loops
Filter: ('[100,200)'::int4range && range)
!

Range
Types
+
GiST
67
CREATE INDEX ranges_range_gist_idx ON ranges USING gist(range);
!
EXPLAIN ANALYZE SELECT * FROM ranges WHERE
int4range(100, 200) && range;
!
QUERY PLAN
------------
Bitmap Heap Scan on ranges (cost=5.29..463.10 rows=130 width=13)
(actual time=0.120..0.135 rows=144 loops=1)
Recheck Cond: ('[100,200)'::int4range && range)
-> Bitmap Index Scan on ranges_range_gist_idx (cost=0.00..5.26
Index Cond: ('[100,200)'::int4range && range)
!
!

SP-‐GiST
• space-‐partitioned
generalized
search
tree

• ideal
for
non-‐balanced
data
structures

– k-‐d
trees,
quad-‐trees,
suffix
trees

– divides
search
space
into
partitions
of
unequal
size

• matching
partitioning
rule
=
fast
search

• traditionally
for
"in-‐memory"
transactions,

converted
to
play
nicely
with
I/O
68

Range
Types:
GiST
vs
SP-‐Gist
CREATE TABLE ranges AS
SELECT
int4range(
(random()*5)::int,
(random()*5)::int + 5
) AS range
FROM generate_series(1,<N>) x;
!
SELECT * FROM ranges WHERE range <operator>
int4range(3,6);
69

N
=
1,000,000
70
CREATE INDEX ranges_range_spgist_idx ON ranges USING spgist(range);
ERROR: unexpected spgdoinsert() failure
Fixed in 9.3.2
GiST
Used GiST
Time SP-Gist Used SP-GiST
Time= Yes 121 Yes 37
&& No 257
No 260
@> No 223 No 223
<@ Yes 163
Yes 111
<< Yes 95 Yes 5
>> Yes 95 Yes 25
&< No 184 No 185
&> No 203 No 203

Range
Types:
GiST
vs
SP-‐GiST
CREATE TABLE ranges AS
SELECT
int4range(x, x + (random()*5)::int + 5)
AS range
FROM generate_series(1,<N>) x;
71

N
=
250,000
72
GiST
Used GiST
Time SP-‐GiST
Used SP-‐GiST
Time
= Yes 0.5 Yes 0.7
&& Yes 0.3 Yes 0.3
@> Yes 0.3 Yes 0.3
<@ Yes 0.06 Yes 0.25
<< No 40 Yes 0.2
>> No 60 No 60
&< Yes 0.3 Yes 0.2
&> No 74 No 61

GiST
vs
SP-‐GiST:
Space
73
GiST
Clustered SP-‐GiST
Clustered GiST
Sparse SP-‐GiST
Sparse
100K
Size 6MB 5MB 6MB 11MB
100K
Time 0.5s .4s 2.5s 7.8s
250K
250K
Time 1.5s 1.1s 6.3s 47.2s
500K
500K
Time 3.1s 3.0s 13.9s 192s
1MM
Size 59MB 52MB! 60MB 110MB
1MM
Time 5.1s 5.7s 29.2 777s

Integer
Arrays
74
CREATE UNLOGGED TABLE int_arrays AS
SELECT ARRAY[x, x + 1, x + 2] AS data
FROM generate_series(1,1000000) x;
!
CREATE INDEX int_arrays_data_idx ON
int_arrays (data);
!
CREATE INDEX int_arrays_data_gin_idx ON
int_arrays USING GIN(data);

B-‐Tree(?)
+
Integer
Arrays
75
EXPLAIN ANALYZE
SELECT *
FROM int_arrays
WHERE 5432 = ANY (data);
QUERY PLAN
-----------
Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual
time=1.260..159.197 rows=3 loops=1)
Filter: (5432 = ANY (data))
!

GIN
+
Integer
Arrays
76
EXPLAIN ANALYZE
SELECT *
FROM int_arrays
WHERE ARRAY[5432] <@ data;
QUERY PLAN
-----------
Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33)
Recheck Cond: ('{5432}'::integer[] <@ data)
-> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50
Index Cond: ('{5432}'::integer[] <@ data)
!

Hash
Indexes
• only
work
with
"="
operator

• are
still
not
WAL
logged
as
of
9.4
beta
1

– not
crash
safe

– not
replicated
77

btree_gin
78
CREATE EXTENSION IF NOT EXISTS btree_gin;
!
CREATE UNLOGGED TABLE numbers AS
SELECT (random() * 2000)::int AS a FROM generate_series(1, 2000000) x;
!
CREATE INDEX numbers_gin_idx ON numbers USING gin(a);
!
EXPLAIN ANALYZE SELECT * FROM numbers WHERE a = 1000;
!
QUERY PLAN
------------
Bitmap Heap Scan on numbers (cost=113.50..9509.26 rows=10000 width=4) (actual
time=0.388..1.459 rows=991 loops=1)
Recheck Cond: (a = 1000)
-> Bitmap Index Scan on numbers_gin_idx (cost=0.00..111.00 rows=10000 width=0)
Index Cond: (a = 1000)
!

btree_gin
vs
btree
79
-- btree
SELECT pg_size_pretty(pg_total_relation_size('numbers_idx'));
pg_size_pretty
----------------
43 MB
!
!
!
-- GIN
SELECT pg_size_pretty(pg_total_relation_size('numbers_gin_idx'));
pg_size_pretty
----------------
16 MB
• Only
use
GIN
over
btree
if
you
have
a
lot
of
duplicate
entries

hstore
-‐
the
PostgreSQL
Key-‐Value

Store
80
CREATE EXTENSION IF NOT EXISTS hstore;
!
CREATE UNLOGGED TABLE keypairs AS
SELECT
(x || ' => ' || (x + (random() * 5)::int))::hstore AS data
SELECT pg_size_pretty(pg_relation_size('keypairs'));
!
!
SELECT * FROM keypairs WHERE data ? ‘3';
data
----------
"3"=>"4"
!
EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3';
QUERY PLAN
-----------
Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual time=0.065..208.808
rows=1 loops=1)
Filter: (data ? '3'::text)
!

hstore
-‐
the
PostgreSQL
Key-‐Value

Store
81
CREATE INDEX keypairs_data_gin_idx ON keypairs
USING gin(data);
!
EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3';
!
QUERY PLAN
-----------
Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24) (actual
time=0.044..0.045 rows=1 loops=1)
Recheck Cond: (data ? '3'::text)
-> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50 rows=1000 width=0)
Index Cond: (data ? '3'::text)
!

JSONB:
Coming
in
9.4
82
INSERT INTO documents
SELECT row_to_json(ROW(x, x + 2, x + 3))::jsonb
!
!
CREATE INDEX documents_data_gin_idx ON documents
USING gin(data jsonb_path_ops);
!
!
!
SELECT * FROM documents WHERE data @> '{ "f1": 10 }';
data
--------------------------------
{"f1": 10, "f2": 12, "f3": 13}
!
!
Execution time: 0.084 ms

Awesome
vs
WTF:

A
Note
On
Operator
Indexability
83
EXPLAIN ANALYZE SELECT * FROM documents WHERE data @> '{ "f1": 10 }';
!
QUERY PLAN
-----------
Bitmap Heap Scan on documents (cost=27.75..3082.65 rows=1000 width=66) (actual time=0.029..0.030
rows=1 loops=1)
Recheck Cond: (data @> '{"f1": 10}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on documents_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual
time=0.014..0.014 rows=1 loops=1)
Index Cond: (data @> '{"f1": 10}'::jsonb)
!
!
EXPLAIN ANALYZE SELECT * FROM documents WHERE '{ "f1": 10 }' <@ data;
!
QUERY PLAN
-----------
Seq Scan on documents (cost=0.00..24846.00 rows=1000 width=66) (actual time=0.015..245.924
rows=1 loops=1)
Filter: ('{"f1": 10}'::jsonb <@ data)
!

For
More
Information…
• http://www.postgresql.org/docs/current/static/
indexes.html

gist.html

gin.html

spgist.html

• GiST
+
GIN
+
Full
Text
Search:

– http://www.postgresql.org/docs/current/static/textsearch-‐
indexes.html
84

Conclusion
• Postgres
has
*a
lot*
of
different
types
of

indexes,
and
variations
on
each
of
its
engines

• Extensions
make
use
of
PostgreSQL
indexes

– PostGIS

• Need
to
understand
where
index
usage
is

appropriate
in
your
application
85

Thanks!
• http://nyc.pgconf.us

• Twitter:
@jkatz05
86

Indexing Complex PostgreSQL Data Types

More Related Content

Indexing Complex PostgreSQL Data Types

Editor's Notes