Can we use this index, please? – Why not?

It’s Christmas time and relatively quiet in my day job, so let’s make it story time again! One more tale from the trenches: how wrong you can go with one table and one index?

Several weeks ago, a user asked me why one of the queries had an “inconsistent performance.” According to the user, “Sometimes it takes three minutes, sometimes thirty, or just never finishes.” After taking a look at the query, I could tell that the actual problem was not the 30+ minutes, but 3 minutes – when you have a several hundred million row table and your select yields just over a thousand rows, it’s a classical “short query,” so you should be able to get results in milliseconds.

The original query was over a view with self-join, and at first, I suspected that something was wrong with the view itself, but then I got it down to one SELECT from one table, which was indeed super-slow: taking minutes while it should have taken seconds. The “inconsistency” was due to the high I/O and dependent on what was in the shared buffers at the execution time. The query looked like this:

SELECT * FROM large_table
  WHERE col1='AAA'
  AND col2='BCD'
  AND created_at BETWEEN '01-01-2012' AND '12-31-2012'
  AND extract (hour FROM created_at)=16
  AND extract (minute FROM created_at)=15

There was an index on all of the attributes which were referenced in the query:

CREATE INDEX large_table_index ON large_table (col1, col2, created_at);

The query plan looked perfect: INDEX SCAN using that index; however, the query was incredibly slow because, for each fetched record, the hour and minute had to be verified (and you’ve already guessed that the table was not only large but also wide).

According to the execution plan, the number of rows selected during the index scan was about 30M, and subsequent filtering reduced it to a little bit over 1K. I started to think that as ridiculous as it sounds, it could be a good idea to create an additional partial index or to include the “hour” and “minute” parts into the index. (Un)fortunately, both of these solutions didn’t work because extract and other alternatives are not immutable and can’t be used in indexes. I didn’t know what to do, but at some point, I ran

SELECT count(*) FROM large_table WHERE col1='AAA' AND col2='BCD' AND created_at BETWEEN '01-01-2012' AND '12-31-2012' AND extract (hour FROM created_at)=16 AND extract (minute FROM created_at)=15

just because I needed this count, and to my astonishment, it ran in milliseconds! Immediately, I ran EXPLAIN ANALYZE and saw that in this case, Postgres chose INDEX ONLY SCAN! Since the whole record was not needed, the filtering was performed in the index blocks themselves!

That was great, and there was no reason the original query could not be optimized the same way, but how could I explain it to the query planner? I remembered my conversation with the user who mentioned that “in most cases, this index works perfectly, and the results for any interval are returned very fast.” Do not ask why I had decided to rewrite a query as presented below, but it did the trick! I guess, in the moments like this, I do “think like Postgres.”

SELECT * FROM large_table
WHERE (col1, col2, created_at) IN (
   SELECY col1, col2, created_at 
   FROM large_table
     WHERE col1='AAA'
     AND col2='BCD'
     AND created_at BETWEEN '01-01-2012' AND '12-31-2012'
     AND extract (hour FROM created_at)=16
     AND extract (minute FROM created_at)=15)

I hope you enjoyed reading this Christmas story as much as I enjoyed sharing it!

15 responses to “Can we use this index, please? – Why not?”

virendercse

December 30, 2024 at 6:25 am

Nice trick, Thanks for sharing. I wonder why Postgres could not filter on it’s own before reading the heap block.

BTW I was able to create a partial index:

create table t (dt timestamp);

create index idx on t (dt) where extract (hour FROM dt)=16 AND extract (minute FROM dt)=15;

Hettie D.

December 30, 2024 at 6:37 am

Which version was it? Now I am curious!

Pingback: PostgreSQL and Indexing on EXTRACT() – Curated SQL
tomjohnkin

December 30, 2024 at 4:39 pm

Thanks Hettie for this excellent blog.

- Hettie D.
  
  December 30, 2024 at 8:38 pm
  
  You’re welcome 🙂
  
M.S. Dousti

December 30, 2024 at 8:34 pm

There are several extract functions, all of which are immutable except the one on timestamptz.

Here’s the output from `df+ extract` in psql:

The reasoning is that when time zones are involved, a change in Postgres config (PGTZ) can change the result of this function, as shown in this example: https://dbfiddle.uk/on2PgaNz

If the column type is timestamptz, but you are certain that the database timezone will never change, you can “trick” Postgres by defining an immutable function that calls `extract` internally: https://dbfiddle.uk/GfR671be

- Hettie D.
  
  December 30, 2024 at 8:37 pm
  
  Yes, I understand the reasoning for why this function is mutable, and I know that one of the ways to get around it is to build an immutable version, however, I really wanted to avoid building of additional index(es) because I knew that everything was already there!
  
  - Sadeq Dousti
    
    December 30, 2024 at 8:44 pm
    
    Makes sense, especially with large tables!
    
    This part of the article, however, led me to think that you wanted to create it any way, but Postgres didn’t let you
    
    I started to think that as ridiculous as it sounds, it could be a good idea to create an additional partial index or to include the “hour” and “minute” parts into the index. (Un)fortunately, both of these solutions didn’t work because extract and other alternatives are not immutable and can’t be used in indexes.
    
    - Hettie D.
      
      December 30, 2024 at 8:51 pm
      
      It was an excuse :). Everyone around me was saying “why you don’t create a partial index?” If there won’t be a better way, I would choose that route, but in this case, there was a way!
      
    - Hettie D.
      
      December 30, 2024 at 8:52 pm
      
      It’s a Christmas story, after all, and it should not be boring 🙂
      
      - Sadeq Dousti
        
        December 30, 2024 at 9:04 pm
        
        Yes, yyes, a fantastic one! Thank you for sharing.
        
        
        Hettie D.
        
        December 30, 2024 at 9:41 pm
        
        😀
        
@FranckPachot

January 3, 2025 at 5:37 am

Happy New Year! Thanks for this great rewrite tip to workaround the absence of filtering on index entries.

It inspires me the following post:
https://dev.to/franckpachot/index-filtering-in-postgresql-and-yugabytedb-1ck7

- Hettie D.
  
  January 3, 2025 at 6:40 am
  
  Thank you for the feedback! reading your post now 🙂
  
- Hettie D.
  
  January 6, 2025 at 10:05 pm
  
  Thank you for your blog post, it was very informative! I have to say, however, that “to text” function didn’t work in my case either – it was also not immutable. So what I did was pretty much the only possible option.

	Hettie D. on Can we use this index, please?…
	@FranckPachot on Can we use this index, please?…
	Hettie D. on Optimizing access to partition…
	Pavel Luzanov on Optimizing access to partition…
	Hettie D. on Can we use this index, please?…

Can we use this index, please? – Why not?

15 responses to “Can we use this index, please? – Why not?”

Leave a comment Cancel reply

Pages

Recent Posts

Recent Comments

Archives

Categories

Follow Blog via Email

Meta

Can we use this index, please? – Why not?

Share this:

Related

15 responses to “Can we use this index, please? – Why not?”

Leave a comment Cancel reply

Pages

Recent Posts

Recent Comments

Archives

Categories

Follow Blog via Email

Meta