PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
The document provides an overview of PostgreSQL performance tuning. It discusses caching, query processing internals, and optimization of storage and memory usage. Specific topics covered include the PostgreSQL configuration parameters for tuning shared buffers, work memory, and free space map settings.
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...Mydbops
PostgreSQL 15 was released on October 13. This presentation on the major features and improvements in PostgreSQL 15, and how they can help for better performance and brings more reliability.
Robert Haas
Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Joins you can't reorder. Join removal. Aggregates and DISTINCT. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Other ways the planner can fail. Parameters you can tune. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work.
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우PgDay.Seoul
B-tree is ideal for unique values while GIN is ideal for indexes with many duplicates. GIST can index most data types and is useful for operations like containment and overlap. A comparison found that GIN indexes have faster search times but slower update times than GiST indexes, and GIN indexes are larger in size and take longer to build. In summary, the best index type depends on the data characteristics and query operations.
The document discusses the Performance Schema in MySQL. It provides an overview of what the Performance Schema is and how it can be used to monitor events within a MySQL server. It also describes how to configure the Performance Schema by setting up actors, objects, instruments, consumers and threads to control what is monitored. Finally, it explains how to initialize the Performance Schema by truncating existing summary tables before collecting new performance data.
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
1. The document summarizes a presentation about parallel query in AWS Aurora. It discusses Aurora architecture, parallel query features and implementation steps, use cases, prerequisites, and provides examples testing performance with and without parallel query enabled.
2. Parallel query allows SQL queries to execute in parallel across multiple Aurora nodes, improving performance for queries with certain characteristics like equal, in, and range filters.
3. Test results show parallel query significantly reducing query execution time from hours to minutes for large analytical queries on a 255GB database.
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
As more and more people are moving to PostgreSQL from Oracle, a pattern of mistakes is emerging. They can be caused by the tools being used or just not understanding how PostgreSQL is different than Oracle. In this talk we will discuss the top mistakes people generally make when moving to PostgreSQL from Oracle and what the correct course of action.
The document discusses tuning autovacuum in PostgreSQL. It provides an overview of autovacuum, how it helps prevent database bloat, and best practices for configuring autovacuum parameters like autovacuum_vacuum_threshold, autovacuum_analyze_threshold, autovacuum_naptime, and autovacuum_max_workers. It emphasizes regularly monitoring for bloat, configuring autovacuum appropriately based on table sizes and usage, and avoiding manual vacuuming when not needed.
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovationsGrant McAlister
With an innovative architecture that decouples compute from storage as well as advanced features like Global Database and low-latency read replicas, Amazon Aurora reimagines what it means to be a relational database. The result is a modern database service that offers performance and high availability at scale, fully open-source MySQL- and PostgreSQL-compatible editions, and a range of developer tools for building serverless and machine learning-driven applications. In this session, dive deep into some of the most exciting features Aurora offers, including Aurora Serverless v2 and Global Database. Also learn about recent innovations that enhance performance, scalability, and security while reducing operational challenges.
En savoir plus sur www.opensourceschool.fr
Ce support est diffusé sous licence Creative Commons (CC BY-SA 3.0 FR) Attribution - Partage dans les Mêmes Conditions 3.0 France
Plan :
1. Introduction
2. Installation
3. The psql client
4. Authentication and privileges
5. Backup and restoration
6. Internal Architecture
7. Performance optimization
8. Stats and monitoring
9. Logs
10. Replication
This document provides an overview of Postgresql, including its history, capabilities, advantages over other databases, best practices, and references for further learning. Postgresql is an open source relational database management system that has been in development for over 30 years. It offers rich SQL support, high performance, ACID transactions, and extensive extensibility through features like JSON, XML, and programming languages.
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenPostgresOpen
PostgreSQL uses shared memory structures to coordinate access to data across multiple database processes. The main shared memory structures include shared buffers for caching data pages, a proc array for tracking server processes, lightweight locks for synchronizing access to shared resources, and lock hashes for coordinating locks on database objects. Other shared structures store information for multi-version concurrency control, two-phase commit, subtransactions, the write-ahead log, and background worker synchronization.
This document discusses PostgreSQL's VACUUM utility. It explains that VACUUM is needed to reclaim space from deleted and updated tuples, prevent transaction ID wraparound issues, and update statistics. The document covers various aspects that interact with VACUUM like commit logs, visibility maps, and free space maps. It also describes the tasks performed by VACUUM, options available, and tuning autovacuum. Finally, it provides a high-level overview of the internal workings of VACUUM.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This document discusses PostgreSQL statistics and how to use them effectively. It provides an overview of various PostgreSQL statistics sources like views, functions and third-party tools. It then demonstrates how to analyze specific statistics like those for databases, tables, indexes, replication and query activity to identify anomalies, optimize performance and troubleshoot issues.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
Have you ever needed to get some additional write throughput from MySQL ? If yes, you probably found that setting sync_binlog to 0 (and trx_commit to 2) gives you an extra performance boost. As all such easy optimisation, it comes at a cost. This talk explains how this tuning works, presents its consequences and makes recommendations to avoid them. This will bring us to the details of how MySQL commits transactions and how those are replicated to slaves. Come to this talk to learn how to get the benefit of this tuning the right way and to learn some replication internals.
Efficient Query Processing in Geographic Web Search EnginesYen-Yu Chen
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
The document discusses PostgreSQL extensions for indexing and querying semi-structured data like JSON. It introduces hstore, an existing PostgreSQL extension for storing key-value pairs, and notes its limitations compared to JSON. It then summarizes upcoming talks on supporting JSON natively in PostgreSQL, including indexing JSON with GIN and GIST indexes, a JSON query language called Jsquery, and a new indexing access method called VODKA. Exercises are also planned for working with JSON GIN indexes and Jsquery.
1. The document summarizes a presentation about parallel query in AWS Aurora. It discusses Aurora architecture, parallel query features and implementation steps, use cases, prerequisites, and provides examples testing performance with and without parallel query enabled.
2. Parallel query allows SQL queries to execute in parallel across multiple Aurora nodes, improving performance for queries with certain characteristics like equal, in, and range filters.
3. Test results show parallel query significantly reducing query execution time from hours to minutes for large analytical queries on a 255GB database.
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
As more and more people are moving to PostgreSQL from Oracle, a pattern of mistakes is emerging. They can be caused by the tools being used or just not understanding how PostgreSQL is different than Oracle. In this talk we will discuss the top mistakes people generally make when moving to PostgreSQL from Oracle and what the correct course of action.
The document discusses tuning autovacuum in PostgreSQL. It provides an overview of autovacuum, how it helps prevent database bloat, and best practices for configuring autovacuum parameters like autovacuum_vacuum_threshold, autovacuum_analyze_threshold, autovacuum_naptime, and autovacuum_max_workers. It emphasizes regularly monitoring for bloat, configuring autovacuum appropriately based on table sizes and usage, and avoiding manual vacuuming when not needed.
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovationsGrant McAlister
With an innovative architecture that decouples compute from storage as well as advanced features like Global Database and low-latency read replicas, Amazon Aurora reimagines what it means to be a relational database. The result is a modern database service that offers performance and high availability at scale, fully open-source MySQL- and PostgreSQL-compatible editions, and a range of developer tools for building serverless and machine learning-driven applications. In this session, dive deep into some of the most exciting features Aurora offers, including Aurora Serverless v2 and Global Database. Also learn about recent innovations that enhance performance, scalability, and security while reducing operational challenges.
En savoir plus sur www.opensourceschool.fr
Ce support est diffusé sous licence Creative Commons (CC BY-SA 3.0 FR) Attribution - Partage dans les Mêmes Conditions 3.0 France
Plan :
1. Introduction
2. Installation
3. The psql client
4. Authentication and privileges
5. Backup and restoration
6. Internal Architecture
7. Performance optimization
8. Stats and monitoring
9. Logs
10. Replication
This document provides an overview of Postgresql, including its history, capabilities, advantages over other databases, best practices, and references for further learning. Postgresql is an open source relational database management system that has been in development for over 30 years. It offers rich SQL support, high performance, ACID transactions, and extensive extensibility through features like JSON, XML, and programming languages.
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenPostgresOpen
PostgreSQL uses shared memory structures to coordinate access to data across multiple database processes. The main shared memory structures include shared buffers for caching data pages, a proc array for tracking server processes, lightweight locks for synchronizing access to shared resources, and lock hashes for coordinating locks on database objects. Other shared structures store information for multi-version concurrency control, two-phase commit, subtransactions, the write-ahead log, and background worker synchronization.
This document discusses PostgreSQL's VACUUM utility. It explains that VACUUM is needed to reclaim space from deleted and updated tuples, prevent transaction ID wraparound issues, and update statistics. The document covers various aspects that interact with VACUUM like commit logs, visibility maps, and free space maps. It also describes the tasks performed by VACUUM, options available, and tuning autovacuum. Finally, it provides a high-level overview of the internal workings of VACUUM.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This document discusses PostgreSQL statistics and how to use them effectively. It provides an overview of various PostgreSQL statistics sources like views, functions and third-party tools. It then demonstrates how to analyze specific statistics like those for databases, tables, indexes, replication and query activity to identify anomalies, optimize performance and troubleshoot issues.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
Have you ever needed to get some additional write throughput from MySQL ? If yes, you probably found that setting sync_binlog to 0 (and trx_commit to 2) gives you an extra performance boost. As all such easy optimisation, it comes at a cost. This talk explains how this tuning works, presents its consequences and makes recommendations to avoid them. This will bring us to the details of how MySQL commits transactions and how those are replicated to slaves. Come to this talk to learn how to get the benefit of this tuning the right way and to learn some replication internals.
Efficient Query Processing in Geographic Web Search EnginesYen-Yu Chen
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
The document discusses PostgreSQL extensions for indexing and querying semi-structured data like JSON. It introduces hstore, an existing PostgreSQL extension for storing key-value pairs, and notes its limitations compared to JSON. It then summarizes upcoming talks on supporting JSON natively in PostgreSQL, including indexing JSON with GIN and GIST indexes, a JSON query language called Jsquery, and a new indexing access method called VODKA. Exercises are also planned for working with JSON GIN indexes and Jsquery.
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
Congratulations: you've been selected to build an application that will manage whether or not the rooms for PGConf.EU are being occupied by a session!
On the surface, this sounds simple, but we will be managing the rooms of PGConf.EU, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the PGConf.EU website checking to see what availability each of the PGConf.EU rooms has.
To do this, we will explore the following PGConf.EU features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges
Indexes such as:
* GiST
* SP-Gist
* Common Table Expressions and Recursion
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all PGConf.EU attendees made possible by the innovation of PGConf.EU!
PostgreSQL is an object-relational database management system (ORDBMS) that is cross-platform and implements the majority of the SQL:2011 standard. It uses a client-server model with a postmaster daemon process that manages connections to backend server processes. PostgreSQL supports features like ACID compliance, transactions, complex queries, user-defined objects, and built-in replication. It allows custom indexing and inheritance between tables. To get started, users can create databases and tables, populate them with data, and perform queries.
Ryan will expand on his popular blog series and drill down into the internals of the database. Ryan will discuss optimizing query performance, best indexing schemes, how to manage clustering (including meta and data nodes), the impact of IFQL on the database, the impact of cardinality on performance, TSI, and other internals that will help you architect better solutions around InfluxDB.
The document discusses SQL query performance analysis. It covers topics like the query optimizer, execution plans, statistics analysis, and different types of queries and indexes. The query optimizer uses cost-based optimization to determine the most efficient execution plan. Statistics help it estimate cardinality and choose good plans, so keeping statistics up to date is important. The goal is to have queries use indexes through seeks instead of scans whenever possible to improve performance.
PostgreSQL and the future
Aaron Thul discusses PostgreSQL 9.0 which includes new features like streaming replication and improved error messages. He talks about the growing PostgreSQL community and major events. Potential threats to PostgreSQL include patent attacks and hiring away volunteer developers. The presentation encourages best practices like avoiding unnecessary data types and indexes to improve performance.
The document discusses SQL query performance analysis. It covers topics like the query optimizer, execution plans, statistics analysis, and different types of queries and scanning. The query optimizer is cost-based and determines the most efficient execution plan using cardinality estimates and cost models. Addhoc queries are non-parameterized queries that SQL Server treats differently than prepared queries. Execution plans show the steps and methods used to retrieve and process data. Statistics help the optimizer generate accurate cardinality estimates to pick high-performing plans.
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Yandex
- The document discusses schema-less PostgreSQL, including current and future features like hstore and JSON support
- Hstore was introduced in 2003 and provides a flexible way to store semi-structured data, but has limitations as it only supports key-value pairs
- JSON has become more popular and supports hierarchical data structures, but early implementations in PostgreSQL were slow due to textual storage
- Recent developments include the introduction of binary-stored JSONB in PostgreSQL 9.4, which addresses performance issues by avoiding reparsing and supports indexing
- JSONB outperforms regular JSON for input, access, and search performance on real-world bookmark data, with up to 20x faster access times for getting values by key
The document discusses spatial SQL and databases. It provides an agenda for installing software, building a database, importing shapefiles, and writing queries. It then defines spatial SQL, discusses drivers for increased use of location data, and lists databases that support spatial SQL like Oracle, MySQL, SQL Server, Spatial Lite, and PostGIS. Finally, it covers functionality of spatial databases and shapefiles, and capabilities of PostGIS like spatial indexing.
Conceptos básicos. Seminario web 1: Introducción a NoSQLMongoDB
This document contains an agenda and summary for a webinar on introducing NoSQL databases. The webinar covers why NoSQL databases exist, different types of NoSQL databases including key-value stores, column stores, graph stores, multi-model databases and document stores. It also discusses MongoDB specifically, covering its document data model, indexing, querying, aggregation capabilities, replication and sharding for scalability. The webinar invites participants to a follow up session on building a first MongoDB application.
SQL-based databases have been around for decades and they power a wide range of applications. So what exactly do NoSQL databases bring to the table? In this webcast, you'll find out how NoSQL can liberate your development cycle, allow your application to scale and improve your system's uptime.
This document discusses various techniques for enhancing the performance of .NET applications, including:
1) Implementing value types correctly by overriding Equals, GetHashCode, and IEquatable<T> to avoid boxing;
2) Applying precompilation techniques like NGen to improve startup time;
3) Using unsafe code and pointers for high-performance scenarios like reading structures from streams at over 100 million structures per second;
4) Choosing appropriate collection types like dictionaries for fast lookups or linked lists for fast insertions/deletions.
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
PostgreSQL provides several advantages for analytics projects:
1) It allows connecting to external data sources and performing analytics queries across different data stores using features like foreign data wrappers.
2) Features like materialized views, transactional DDLs, and rich SQL capabilities help build effective data warehouses and data marts for analytics.
3) Performance optimizations like table partitioning, BRIN indexes, and parallel queries enable PostgreSQL to handle large datasets and complex queries efficiently.
Nearly every application uses some sort of data storage. Proper data structure can lead to increased performance, reduced application complexity, and ensure data integrity. Foreign keys, indexes, and correct data types truly are your best friends when you respect them and use them for the correct purposes. Structuring data to be normalized and with the correct data types can lead to significant performance increases. Learn how to structure your tables to achieve normalization, performance, and integrity, by building a database from the ground up during this tutorial.
This document provides an overview and introduction to Cassandra including:
- An agenda that outlines the topics covered in the overview including architecture, data modeling differences from RDBMS, and CQL.
- Recommended resources for learning more about Cassandra including documentation, video courses, books, and articles.
- Requirements that Cassandra aims to meet for database management including scaling, uptime, performance, and cost.
- Key aspects of Cassandra including being open source, distributed, decentralized, scalable, fault tolerant, and using a flexible data model.
- Examples of large companies that use Cassandra in production including Apple, Netflix, eBay, and others handling large datasets.
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITchaitalidarode1
This document provides information about the Software Workshop Lab course. The course aims to help students understand database, Python, and data processing/visualization concepts. It covers MySQL, scientific computation using Python, Pandas, Matplotlib, and application development with Streamlit. The course has 4 modules that cover basic SQL, scientific Python libraries, data processing/visualization, and Streamlit app development. Students will learn skills applicable for jobs in software and IT industries.
This document provides an overview and agenda for a presentation on Amazon DynamoDB. It discusses key features of DynamoDB including its data model, scaling capabilities, and data modeling best practices. It also provides examples of how to model and query common data scenarios like event logging, product catalogs, messaging apps, and multiplayer gaming.
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...HappyDev
Появление большого количества NoSQL СУБД обусловлено требованиями современных информационных систем, которым большинство традиционных реляционных баз данных не удовлетворяет. Одним из таких требований является поддержка данных, структура которых заранее не определена. Однако при выборе NoSQL БД ради отсутствия схем данных можно потерять ряд преимуществ, которые дают зрелые SQL-решения, а именно: транзакции, скорость чтения строк из таблиц. PostgreSQL, являющаяся передовой реляционной СУБД, имела поддержку слабо-структурированных данных задолго до появления NoSQL, которая обрела новое дыхание в последнем релизе в виде типа данных jsonb, который не только поддерживает стандарт JSON, но и обладает производительностью, сравнимой или даже превосходящей наиболее популярные NoSQL СУБД.
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. The heightened ease-of-use of AI/ML has lead to a surge of interested of storing vector data alongside application data, leading to some unique challenges. PostgreSQL has seen this story before with JSON, when JSON became the lingua franca of the web. So how can you use PostgreSQL to manage your vector data, and what challenges should you be aware of?
In this session, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector, an extension that adds additional vector search capabilities to PostgreSQL. Finally, we'll review ongoing development in both PostgreSQL and pgvector that will make it easier and more performant to search vector data in PostgreSQL.
There are parallels between storing JSON data in PostgreSQL and storing vectors that are produced from AI/ML systems. This lightning talk briefly covers the similarities in use-cases in storing JSON and vectors in PostgreSQL, shows some of the use-cases developers have for querying vectors in Postgres, and some roadmap items for improving PostgreSQL as a vector database.
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
Congratulations: you've been selected to build an application that will manage reservations for rooms!
On the surface, this sounds simple, but you are building a system for managing a high traffic reservation web page, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the website checking to see what availability each room has.
Fortunately, PostgreSQL is prepared for this! And even better, we will be using Postgres 14 to make the problem even easier!
We will explore the following PostgreSQL features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges / Multirnages
Indexes such as:
* GiST
* Common Table Expressions and Recursion (though multiranges will make things easier!)
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all users made possible by the innovation of PostgreSQL!
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
Passwords: they just seem to work. You connect to your PostgreSQL database and you are prompted for your password. You type in the correct character combination, and presto! you're in, safe and sound.
But what if I told you that all was not as it seemed. What if I told you there was a better, safer way to use passwords with PostgreSQL? What if I told you it was imperative that you upgraded, too?
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
In this talk, we will look at:
* A history of the evolution of password storage and authentication in PostgreSQL
* How SCRAM works with a step-by-step deep dive into the algorithm (and convince you why you need to upgrade!)
* SCRAM channel binding, which helps prevent MITM attacks during authentication
* How to safely set and modify your passwords, as well as how to upgrade to SCRAM-SHA-256 (which we will do live!)
all of which will be explained by some adorable elephants and hippos!
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
Jonathan S. Katz gave a talk on safely protecting passwords in PostgreSQL. He discussed:
- The evolution of password management in PostgreSQL, from storing passwords in plain text to using md5 hashes to modern SCRAM authentication.
- How plain text and md5 password storage are insecure as passwords can be intercepted or cracked.
- The SCRAM authentication standard which allows two parties to verify they know a secret without exchanging the secret directly.
- How PostgreSQL implements SCRAM-SHA-256 to generate a secure verifier from the password and authenticate users with random salts and iterations to secure against brute force attacks.
Operating PostgreSQL at Scale with KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
All this sounds great, but if you are new to the world of containers, it can be very overwhelming to find a place to start. In this talk, which centers around demos, we will see how you can get PostgreSQL up and running in a containerized environment with some advanced sidecars in only a few steps! We will also see how it extends to a larger production environment with Kubernetes, and what the future holds for PostgreSQL in a containerized world.
We will cover the following:
* Why containers are important and what they mean for PostgreSQL
* Create a development environment with PostgreSQL, pgadmin4, monitoring, and more
* How to use Kubernetes to create your own "database-as-a-service"-like PostgreSQL environment
* Trends in the container world and how it will affect PostgreSQL
At the conclusion of the talk, you will understand the fundamentals of how to use container technologies with PostgreSQL and be on your way to running a containerized PostgreSQL environment at scale!
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL along with pgadmin4 and monitoring
- Running PostgreSQL on Kubernetes with a Demo
- Trends in the container world and how it will affect PostgreSQL
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL container
- Extending your setup with a pgadmin4 container
- Container orchestration: What this means, and how to use Kubernetes to leverage database-as-a-service with PostgreSQL
- Trends in the container world and how it will affect PostgreSQL
Developing and Deploying Apps with the Postgres FDWJonathan Katz
This document summarizes Jonathan Katz's experience building a foreign data wrapper (FDW) between two PostgreSQL databases to enable an API for his company VenueBook. He created separate "app" and "api" databases, with the api database using FDWs to access tables in the app database. This allowed inserting and querying data across databases. However, he encountered permission errors and had to grant various privileges on the remote database to make it work properly, demonstrating the importance of permissions management with FDWs.
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
All data is relational and can be represented through relational algebra, right? Perhaps, but there are other ways to represent data, and the PostgreSQL team continues to work on making it easier and more efficient to do so!
With the upcoming 9.4 release, PostgreSQL is introducing the "JSONB" data type which allows for fast, compressed, storage of JSON formatted data, and for quick retrieval. And JSONB comes with all the benefits of PostgreSQL, like its data durability, MVCC, and of course, access to all the other data types and features in PostgreSQL.
How fast is JSONB? How do we access data stored with this type? What can it do with the rest of PostgreSQL? What can't it do? How can we leverage this new data type and make PostgreSQL scale horizontally? Follow along with our presentation as we try to answer these questions.
Just like life, our code must evolve to meet the demands of an ever-changing world. Adaptability is key in developing for the web, tablets, APIs, or serverless applications. Multi-runtime development is the future, and that future is dynamic. Enter BoxLang: Dynamic. Modular. Productive. (www.boxlang.io)
BoxLang transforms development with its dynamic design, enabling developers to write expressive, functional code effortlessly. Its modular architecture ensures flexibility, allowing easy integration into your existing ecosystems.
Interoperability at Its Core
BoxLang boasts 100% interoperability with Java, seamlessly blending traditional and modern development practices. This opens up new possibilities for innovation and collaboration.
Multi-Runtime Versatility
From a compact 6MB OS binary to running on our pure Java web server, CommandBox, Jakarta EE, AWS Lambda, Microsoft Functions, WebAssembly, Android, and more, BoxLang is designed to adapt to any runtime environment. BoxLang combines modern features from CFML, Node, Ruby, Kotlin, Java, and Clojure with the familiarity of Java bytecode compilation. This makes it the go-to language for developers looking to the future while building a solid foundation.
Empowering Creativity with IDE Tools
Unlock your creative potential with powerful IDE tools designed for BoxLang, offering an intuitive development experience that streamlines your workflow. Join us as we redefine JVM development and step into the era of BoxLang. Welcome to the future.
Understanding Traditional AI with Custom Vision & MuleSoft.pptxshyamraj55
Understanding Traditional AI with Custom Vision & MuleSoft.pptx | ### Slide Deck Description:
This presentation features Atul, a Senior Solution Architect at NTT DATA, sharing his journey into traditional AI using Azure's Custom Vision tool. He discusses how AI mimics human thinking and reasoning, differentiates between predictive and generative AI, and demonstrates a real-world use case. The session covers the step-by-step process of creating and training an AI model for image classification and object detection—specifically, an ad display that adapts based on the viewer's gender. Atulavan highlights the ease of implementation without deep software or programming expertise. The presentation concludes with a Q&A session addressing technical and privacy concerns.
Future-Proof Your Career with AI OptionsDianaGray10
Learn about the difference between automation, AI and agentic and ways you can harness these to further your career. In this session you will learn:
Introduction to automation, AI, agentic
Trends in the marketplace
Take advantage of UiPath training and certification
In demand skills needed to strategically position yourself to stay ahead
❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
World Information Architecture Day 2025 - UX at a CrossroadsJoshua Randall
User Experience stands at a crossroads: will we live up to our potential to design a better world? or will we be co-opted by “product management” or another business buzzword?
Looking backwards, this talk will show how UX has repeatedly failed to create a better world, drawing on industry data from Nielsen Norman Group, Baymard, MeasuringU, WebAIM, and others.
Looking forwards, this talk will argue that UX must resist hype, say no more often and collaborate less often (you read that right), and become a true profession — in order to be able to design a better world.
https://ncracked.com/7961-2/
Note: >> Please copy the link and paste it into Google New Tab now Download link
Free Download Wondershare Filmora 14.3.2.11147 Full Version - All-in-one home video editor to make a great video.Free Download Wondershare Filmora for Windows PC is an all-in-one home video editor with powerful functionality and a fully stacked feature set. Filmora has a simple drag-and-drop top interface, allowing you to be artistic with the story you want to create.Video Editing Simplified - Ignite Your Story. A powerful and intuitive video editing experience. Filmora 10 hash two new ways to edit: Action Cam Tool (Correct lens distortion, Clean up your audio, New speed controls) and Instant Cutter (Trim or merge clips quickly, Instant export).Filmora allows you to create projects in 4:3 or 16:9, so you can crop the videos or resize them to fit the size you want. This way, quickly converting a widescreen material to SD format is possible.
Formal Methods: Whence and Whither? [Martin Fränzle Festkolloquium, 2025]Jonathan Bowen
Alan Turing arguably wrote the first paper on formal methods 75 years ago. Since then, there have been claims and counterclaims about formal methods. Tool development has been slow but aided by Moore’s Law with the increasing power of computers. Although formal methods are not widespread in practical usage at a heavyweight level, their influence as crept into software engineering practice to the extent that they are no longer necessarily called formal methods in their use. In addition, in areas where safety and security are important, with the increasing use of computers in such applications, formal methods are a viable way to improve the reliability of such software-based systems. Their use in hardware where a mistake can be very costly is also important. This talk explores the journey of formal methods to the present day and speculates on future directions.
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraScyllaDB
Learn how Responsive replaced embedded RocksDB with ScyllaDB in Kafka Streams, simplifying the architecture and unlocking massive availability and scale. The talk covers unbundling stream processors, key ScyllaDB features tested, and lessons learned from the transition.
Gojek Clone is a versatile multi-service super app that offers ride-hailing, food delivery, payment services, and more, providing a seamless experience for users and businesses alike on a single platform.
DealBook of Ukraine: 2025 edition | AVentures CapitalYevgen Sysoyev
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2024 and the first deals of 2025.
Field Device Management Market Report 2030 - TechSci ResearchVipin Mishra
The Global Field Device Management (FDM) Market is expected to experience significant growth in the forecast period from 2026 to 2030, driven by the integration of advanced technologies aimed at improving industrial operations.
📊 According to TechSci Research, the Global Field Device Management Market was valued at USD 1,506.34 million in 2023 and is anticipated to grow at a CAGR of 6.72% through 2030. FDM plays a vital role in the centralized oversight and optimization of industrial field devices, including sensors, actuators, and controllers.
Key tasks managed under FDM include:
Configuration
Monitoring
Diagnostics
Maintenance
Performance optimization
FDM solutions offer a comprehensive platform for real-time data collection, analysis, and decision-making, enabling:
Proactive maintenance
Predictive analytics
Remote monitoring
By streamlining operations and ensuring compliance, FDM enhances operational efficiency, reduces downtime, and improves asset reliability, ultimately leading to greater performance in industrial processes. FDM’s emphasis on predictive maintenance is particularly important in ensuring the long-term sustainability and success of industrial operations.
For more information, explore the full report: https://shorturl.at/EJnzR
Major companies operating in Global Field Device Management Market are:
General Electric Co
Siemens AG
ABB Ltd
Emerson Electric Co
Aveva Group Ltd
Schneider Electric SE
STMicroelectronics Inc
Techno Systems Inc
Semiconductor Components Industries LLC
International Business Machines Corporation (IBM)
#FieldDeviceManagement #IndustrialAutomation #PredictiveMaintenance #TechInnovation #IndustrialEfficiency #RemoteMonitoring #TechAdvancements #MarketGrowth #OperationalExcellence #SensorsAndActuators
DevNexus - Building 10x Development Organizations.pdfJustin Reock
Developer Experience is Dead! Long Live Developer Experience!
In this keynote-style session, we’ll take a detailed, granular look at the barriers to productivity developers face today and modern approaches for removing them. 10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method, we invent to deliver products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches works? DORA? SPACE? DevEx? What should we invest in and create urgency behind today so we don’t have the same discussion again in a decade?
The Future of Repair: Transparent and Incremental by Botond DénesScyllaDB
Regularly run repairs are essential to keep clusters healthy, yet having a good repair schedule is more challenging than it should be. Repairs often take a long time, preventing running them often. This has an impact on data consistency and also limits the usefulness of the new repair based tombstone garbage collection. We want to address these challenges by making repairs incremental and allowing for automatic repair scheduling, without relying on external tools.
Fl studio crack version 12.9 Free Downloadkherorpacca127
Google the copied link 👉🏻👉🏻 https://activationskey.com/download-latest-setup/
👈🏻👈🏻
The ultimate guide to FL Studio 12.9 Crack, the revolutionary digital audio workstation that empowers musicians and producers of all levels. This software has become a cornerstone in the music industry, offering unparalleled creative capabilities, cutting-edge features, and an intuitive workflow.
With FL Studio 12.9 Crack, you gain access to a vast arsenal of instruments, effects, and plugins, seamlessly integrated into a user-friendly interface. Its signature Piano Roll Editor provides an exceptional level of musical expression, while the advanced automation features empower you to create complex and dynamic compositions.
UiPath Automation Developer Associate Training Series 2025 - Session 2DianaGray10
In session 2, we will introduce you to Data manipulation in UiPath Studio.
Topics covered:
Data Manipulation
What is Data Manipulation
Strings
Lists
Dictionaries
RegEx Builder
Date and Time
Required Self-Paced Learning for this session:
Data Manipulation with Strings in UiPath Studio (v2022.10) 2 modules - 1h 30m - https://academy.uipath.com/courses/data-manipulation-with-strings-in-studio
Data Manipulation with Lists and Dictionaries in UiPath Studio (v2022.10) 2 modules - 1h - https:/academy.uipath.com/courses/data-manipulation-with-lists-and-dictionaries-in-studio
Data Manipulation with Data Tables in UiPath Studio (v2022.10) 2 modules - 1h 30m - https:/academy.uipath.com/courses/data-manipulation-with-data-tables-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (平山毅)Tsuyoshi Hirayama
DAO UTokyo 2025
東京大学情報学環 ブロックチェーン研究イニシアティブ
https://utbciii.com/2024/12/12/announcing-dao-utokyo-2025-conference/
Session 1 :DLT mass adoption
IBM Tsuyoshi Hirayama (平山毅)
2. About
• Jonathan
S.
Katz
– CTO,
VenueBook
– Co-‐Organizer,
NYC
PostgreSQL
User
Group
– PGConf
NYC
2015
• Mar
25
-‐
27,
2015
• New
York
Marriott
Downtown
• http://nyc.pgconf.us
– @jkatz05
2
3. Quick
Overview
• Introductory
Talk
with
demos
and
fun
• B-‐trees
• GiST:
Generalized
Search
Trees
• GIN:
Generalized
Inverted
Index
• SP-‐GiST:
Space
Partitioned
Generalized
Search
Trees
3
4. Assumptions
• PostgreSQL
9.3+
• most
will
be
9.0+
• PostGIS
2.0+
• Believe
it
will
work
for
most
available
versions
• PostgreSQL
9.4
beta?
4
20. What
We
Learned
• Without
any
data
structure
around
search,
we
rely
on
"hope"
• Assumed
"unique
values"
and
"equality"
– would
have
to
scan
all
rows
otherwise
• …and
what
about:
– INSERT
– UPDATE
– DELETE
20
21. What
We
Need
• Need
a
data
structure
for
search
that:
– allows
efficient
lookups
– plays
nicely
with
disk
I/O
– does
not
take
too
long
for
updates
21
22. B-‐Trees
• "default"
index
• quick
traversal
to
leaf
nodes
• leaf
nodes
pre-‐sorted
• node
size
designed
to
fit
in
disk
block
size
– "degree"
of
nodes
has
max-‐size
• theoretical
performance
– reads:
O(log
n)
– writes:
O(log
n)
– space:
O(n)
22
23. B-‐Trees
and
PostgreSQL
• supports
– <=,
<,
=,
>,
>=
– BETWEEN,
IN
– IS
NOT
NULL,
IS
NULL
– LIKE
in
specific
case
of
‘plaintext%’
– ~
in
specific
case
of
‘^plaintext’
– ILIKE
and
~*
if
pattern
starts
with
nonalpha
characters
• does
not
support
• IS
NOT
DISTINCT
FROM
23
24. B-‐Trees
and
PostgreSQL
• data
types
supported
– any
data
type
with
all
the
equality
operators
defined
– number
types
• integer,
numeric,
decimal
– text
• char,
varchar,
text
– date
/
times
• timestamptz,
timestamp,
date,
time,
timetz,
interval
- arrays,
ranges
24
27. Demo
#1
Notes
• Index
maintenance
• VACUUM
–
"cleans
up"
after
writes
on
table
/
indexes
– ANALYZE
–
keeps
statistics
up-‐to-‐date
for
planner
!
VACUUM ANALYZE tablename;
!
• Good
idea
to
leave
autovacuum
on
27
28. Indexing
in
Production
• CREATE
INDEX
CONCURRENTLY
• REINDEX
– corruption,
bloat,
invalid
• FILLFACTOR
– 10
–
100
– default:
90
– strategy:
lower
%
::
write-‐activity
• TABLESPACE
• NULLS
LAST,
NULLS
FIRST
28
29. Demo
#2:
Partial
Indexes
29
CREATE INDEX indexname ON tablename (columnname)
WHERE somecondition;
30. Demo
#2
Notes
• Partial
Indexes
are
– good
if
known
to
query
limited
subset
of
table
– take
up
less
space
– allow
for
much
quicker
writes
• Like
all
good
things,
do
not
overuse
and
saturate
your
I/O
30
31. Unique
Indexes
• only
for
B-‐trees
• NULL
not
unique
• use
UNIQUE
constraints
–
automatically
create
indexes
!
CREATE TABLE foo (bar int UNIQUE);
-- or
CREATE UNIQUE INDEX foo_bar_idx ON foo (bar);
ALTER TABLE foo ADD CONSTRAINT a_unique
USING INDEX a_idx;
31
32. Multi-‐Column
Indexes
• Useful
for
– querying
two
columns
that
are
frequently
queried
together
– enforcing
UNIQUEness
across
columns
• n.b.
creating
UNIQUE
constraint
on
table
creates
UNIQUE
INDEX
• PostgreSQL
supports
– up
to
32
columns
– B-‐tree,
GiST,
GIN
• Be
careful
of
how
you
choose
initial
column
order!
32
33. Multi-‐Column
Indexes
33
CREATE INDEX multicolumn_idx ON
tablename (col1, col2);
!
!
!
CREATE UNIQUE INDEX pn_idx ON
phone_numbers (country_code, national_number)
WHERE extension IS NULL
34. Demo
#3
Notes
• Multi-‐column
indexes
can
be
– efficient
for
speed
+
space
– inefficient
with
performance
• Usage
depends
on
your
application
needs
34
35. Expression
Indexes
• can
index
on
expressions
to
speed
up
lookups
– e.g.
case
insensitive
email
addresses
– can
use
functions
or
scalars
• (x * y) / 100
• COALESCE(first_name, '') || ' ' ||
COALESCE(last_name, '')
• LOWER(email_address)
• tradeoff:
slower
writes
35
39. Geometric
Data
Types
CREATE TABLE points (coord point);
!
CREATE INDEX points_idx ON points (coord);
ERROR: data type point has no default
operator class for access method "btree"
HINT: You must specify an operator class
for the index or define a default operator
class for the data type.
39
40. GiST
• "generalized
search
tree"
• infrastructure
that
provides
template
to
create
arbitrary
indexing
schemes
– supports
concurrency,
logging,
searching
–
only
have
to
define
behavior
– user-‐defined
operator
class
• <<,
&<,
&>,
>>,
<<|,
&<|,
|&>,
|>>,
@>,
<@,
~=,
&&
– have
to
implement
functions
in
interface
• supports
lossless
+
lossy
indexes
• provides
support
for
"nearest-‐neighbor"
queries
–
"KNN-‐Gist"
40
CREATE INDEX points_coord_gist_idx ON points
USING gist(coord)
42. Demo
#5
Notes
• GiST
indexes
on
geometric
types
radically
speedup
reads
• Writes
are
slower
due
to
distance
calculation
• Index
size
can
be
very
big
42
43. PostGIS
• For
when
you
are
doing
real
things
with
shapes
43• (and
geographic
information
systems)
44. PostGIS
+
Indexes
• B-‐Tree?
• R-‐Tree?
• PostGIS
docs
do
not
recommend
using
just
an
R-‐Tree
index
• GiST
• overlaps!
containment!
• uses
a
combination
of
GiST
+
R-‐Tree
44
45. PostGIS
+
GiST
45
2-‐D
CREATE INDEX zipcodes_geom_gist_idx ON zipcodes
USING gist(geom);
N-‐D
(PostGIS
2.0+
CREATE INDEX zipcodes_geom_gist_idx ON zipcodes
USING gist(geom gist_geometry_ops_nd);
46. Example
-‐
USA
Zipcode
Boundaries
• 33,120
rows
• geom:
MultiPolygon
• 52MB
without
indexes
• With
geometry
GiST
+
integer
B-‐Tree:
869MB
46
47. What
Zipcode
Is
My
Office
In?
• Geocoded
Address
• Lat,Long
=
40.7356197,-‐73.9891102
• PostGIS:
POINT(-‐73.9891102
40.7356197)
• 4269
-‐
“SRID”
-‐
unique
ID
for
coordinate
system
definitions
47
SELECT zcta5ce10 AS zipcode
FROM zipcodes
WHERE ST_Contains(
geom, --MultiPolygon
ST_GeomFromText('POINT(-73.9891102 40.7356197)', 4269)
);
48. What
Zipcode
Is
My
Office
In?
• No
Index
48
Seq Scan on zipcodes (cost=0.00..15382.00 rows=1 width=6) (actual
time=64.780..5153.485 rows=1 loops=1)
Filter: ((geom &&
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry) AND
_st_contains(geom,
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry))
Rows Removed by Filter: 33119
Total runtime: 5153.505 ms
49. What
Zipcode
Is
My
Office
In?
• Here’s
the
GiST:
49
Index Scan using zipcodes_geom_gist on zipcodes (cost=0.28..8.54
rows=1 width=6) (actual time=0.120..0.207 rows=1 loops=1)
Index Cond: (geom &&
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)
Filter: _st_contains(geom,
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)
Rows Removed by Filter: 1
!
Total runtime: 0.235 ms
51. Full
Text
Search
• PostgreSQL
offers
full
text
search
with
the
tsearch2
engine
– algorithms
for
performing
FTS
– to_tsvector('english',
content)
@@
to_tsquery('irish
&
conference
|
meeting')
– provides
indexing
capabilities
for
efficient
search
51
52. Test
Data
Set
• Wikipedia
English
category
titles
–
all
1,823,644
that
I
downloaded
52
53. Full-‐Text
Search:
Basics
53
SELECT title
FROM category
WHERE
to_tsvector('english', title) @@ to_tsquery('united & kingdom’);
!
title
-----
Lists of railway stations in the United Kingdom
Political history of the United Kingdom
Military of the United Kingdom
United Kingdom constitution
Television channels in the United Kingdom
United Kingdom
Roman Catholic secondary schools in the United Kingdom
[results truncated]
!
!
QUERY PLAN
------------
Seq Scan on category (cost=0.00..49262.77 rows=46 width=29) (actual time=21.900..16809.890
rows=8810 loops=1)
Filter: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text))
Rows Removed by Filter: 1814834
!
Total runtime: 16811.108 ms
54. Full-‐Text
Search
+
GiST
54
CREATE INDEX category_title_gist_idx ON category
USING gist(to_tsvector('english', title));
!
SELECT title
FROM category
WHERE to_tsvector('english', title) @@ to_tsquery('united & kingdom');
QUERY PLAN
-------------
Bitmap Heap Scan on category (cost=4.77..182.47 rows=46 width=29) (actual time=75.517..180.650
rows=8810 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
-> Bitmap Index Scan on category_title_gist_idx (cost=0.00..4.76 rows=46 width=0) (actual
time=74.687..74.687 rows=8810 loops=1)
Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom’::text))
!
Total runtime: 181.354 ms
55. Full
Text
Search
+
GiST
• GiST
indexes
can
produce
false
positives
– "documents"
represented
by
fixed
length
signature
• words
are
hashed
into
single
bits
and
concatenated
– when
false
positive
occurs,
row
is
returned
and
checked
to
see
if
false
match
• Extra
validations
=
performance
degradation
55
56. Performance
Summary
with
GiST
• initial
index
build
takes
awhile
=>
slow
writes
• reads
are
quick
• Table
size:
271MB
• Index
size:
83MB
56
57. GIN
Index
• "generalized
inverted
index"
• supports
searching
within
composite
data
– arrays,
full-‐text
documents,
hstore
• key
is
stored
once
and
points
to
composites
it
is
contained
in
• like
GiST,
provides
index
infrastructure
to
extend
GIN
based
on
behavior
– supports
operators
<@,
@>,
=,
&&
• GIN
performance
⬄
log(#
unique
things)
57
58. Full
Text
Search
+
GIN
58
CREATE INDEX category_title_gin_idx ON category
USING gin(to_tsvector('english', title));
!
EXPLAIN ANALYZE SELECT title FROM category WHERE to_tsvector('english',
title) @@ to_tsquery('united & kingdom');
!
QUERY PLAN
-------
Bitmap Heap Scan on category (cost=28.36..206.06 rows=46 width=29) (actual time=8.864..14.674
rows=8810 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
-> Bitmap Index Scan on category_title_gin_idx (cost=0.00..28.35 rows=46 width=0) (actual
time=7.905..7.905 rows=8810 loops=1)
Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
!
!
Total runtime: 15.157 ms
59. Performance
Summary
with
GIN
• index
build
was
much
quicker
• significant
speedup
from
no
index
– (12,000ms
=>
15ms)
• significant
speedup
from
GiST
– (181ms
=>
15ms)
• Table
size:
271MB
• Index
size:
• 9.3:
71MB
• 9.4
beta
1:
40MB
59
60. What
Was
Not
Discussed
• Word
density
– prior
to
9.3,
performance
issues
with
greater
word
density
• Type
of
text
data
–
phrases
vs
paragraphs
60
61. Full
Text
Search
–
GiST
vs
GIN
• Reads
– overall,
GIN
should
win
• Writes
– traditionally,
GiST
has
better
performance
for
writes
– GIN
• FASTUPDATE
• 9.4:
compression
61
62. Regular
Expression
Indexes
• Added
in
9.3
• Support
for
LIKE/ILIKE
wildcard
indexes
in
9.1
– title
LIKE
'%ab%e'
• Uses
pg_trgm
extension
+
GIN
!
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX category_title_regex_idx ON
category USING GIN(title gin_trgm_ops);
62
63. Regular
Expressions
-‐
No
Index
63
EXPLAIN ANALYZE SELECT title FROM category
WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$';
!
QUERY PLAN
----------
Seq Scan on category (cost=0.00..40144.55 rows=182 width=29) (actual
time=2.509..4260.792 rows=5878 loops=1)
Filter: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
Rows Removed by Filter: 1817766
!
Total runtime: 4261.204 ms
64. Regular
Expressions
-‐
Indexed
64
CREATE INDEX category_title_regex_idx ON category
USING gin(title gin_trgm_ops);
!
EXPLAIN ANALYZE SELECT title FROM category
WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$';
QUERY PLAN
-----------
Bitmap Heap Scan on category (cost=197.41..871.77 rows=182 width=29) (actual
time=107.445..146.713 rows=5878 loops=1)
Recheck Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
Rows Removed by Index Recheck: 4712
-> Bitmap Index Scan on category_title_regex_idx (cost=0.00..197.37 rows=182 width=0) (actual
time=106.645..106.645 rows=10590 loops=1)
Index Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
!
!
Total runtime: 147.026 ms
65. Range
Types
• stores
range
data
–
1
to
6
– 2013-‐10-‐29
–
2013-‐11-‐2
• easy-‐to-‐use
operators
to
check
inclusion,
overlaps
• built-‐in
types:
integers,
numerics,
dates,
timestamps
• extensible
65
66. Range
Type
Examples
66
--find all ranges that overlap with [100, 200)
!
SELECT * FROM ranges WHERE int4range(100, 200) && range;
!
range
-----------
[10,102)
[13,102)
[18,108)
[32,101)
[34,134)
[37,123)
[43,111)
[46,132)
[48,107)
[results trunctated]
!QUERY PLAN
-----------
Seq Scan on ranges (cost=0.00..14239.86 rows=7073 width=32) (actual time=0.018..185.411 rows=143 loops
Filter: ('[100,200)'::int4range && range)
Rows Removed by Filter: 999857
!
Total runtime: 185.439 ms
67. Range
Types
+
GiST
67
CREATE INDEX ranges_range_gist_idx ON ranges USING gist(range);
!
EXPLAIN ANALYZE SELECT * FROM ranges WHERE
int4range(100, 200) && range;
!
QUERY PLAN
------------
Bitmap Heap Scan on ranges (cost=5.29..463.10 rows=130 width=13)
(actual time=0.120..0.135 rows=144 loops=1)
Recheck Cond: ('[100,200)'::int4range && range)
-> Bitmap Index Scan on ranges_range_gist_idx (cost=0.00..5.26
rows=130 width=0) (actual time=0.109..0.109 rows=144 loops=1)
Index Cond: ('[100,200)'::int4range && range)
!
!
Total runtime: 0.168 ms
68. SP-‐GiST
• space-‐partitioned
generalized
search
tree
• ideal
for
non-‐balanced
data
structures
– k-‐d
trees,
quad-‐trees,
suffix
trees
– divides
search
space
into
partitions
of
unequal
size
• matching
partitioning
rule
=
fast
search
• traditionally
for
"in-‐memory"
transactions,
converted
to
play
nicely
with
I/O
68
69. Range
Types:
GiST
vs
SP-‐Gist
CREATE TABLE ranges AS
SELECT
int4range(
(random()*5)::int,
(random()*5)::int + 5
) AS range
FROM generate_series(1,<N>) x;
!
SELECT * FROM ranges WHERE range <operator>
int4range(3,6);
69
70. N
=
1,000,000
70
CREATE INDEX ranges_range_spgist_idx ON ranges USING spgist(range);
ERROR: unexpected spgdoinsert() failure
Fixed in 9.3.2
GiST
Used GiST
Time SP-Gist Used SP-GiST
Time= Yes 121 Yes 37
&& No 257
No 260
@> No 223 No 223
<@ Yes 163
Yes 111
<< Yes 95 Yes 5
>> Yes 95 Yes 25
&< No 184 No 185
&> No 203 No 203
71. Range
Types:
GiST
vs
SP-‐GiST
CREATE TABLE ranges AS
SELECT
int4range(x, x + (random()*5)::int + 5)
AS range
FROM generate_series(1,<N>) x;
71
72. N
=
250,000
72
GiST
Used GiST
Time SP-‐GiST
Used SP-‐GiST
Time
= Yes 0.5 Yes 0.7
&& Yes 0.3 Yes 0.3
@> Yes 0.3 Yes 0.3
<@ Yes 0.06 Yes 0.25
<< No 40 Yes 0.2
>> No 60 No 60
&< Yes 0.3 Yes 0.2
&> No 74 No 61
74. Integer
Arrays
74
CREATE UNLOGGED TABLE int_arrays AS
SELECT ARRAY[x, x + 1, x + 2] AS data
FROM generate_series(1,1000000) x;
!
CREATE INDEX int_arrays_data_idx ON
int_arrays (data);
!
CREATE INDEX int_arrays_data_gin_idx ON
int_arrays USING GIN(data);
75. B-‐Tree(?)
+
Integer
Arrays
75
EXPLAIN ANALYZE
SELECT *
FROM int_arrays
WHERE 5432 = ANY (data);
QUERY PLAN
-----------
Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual
time=1.260..159.197 rows=3 loops=1)
Filter: (5432 = ANY (data))
Rows Removed by Filter: 999997
!
Total runtime: 159.222 ms
76. GIN
+
Integer
Arrays
76
EXPLAIN ANALYZE
SELECT *
FROM int_arrays
WHERE ARRAY[5432] <@ data;
QUERY PLAN
-----------
Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33)
(actual time=0.020..0.021 rows=3 loops=1)
Recheck Cond: ('{5432}'::integer[] <@ data)
-> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50
rows=5000 width=0) (actual time=0.014..0.014 rows=3 loops=1)
Index Cond: ('{5432}'::integer[] <@ data)
!
Total runtime: 0.045 ms
77. Hash
Indexes
• only
work
with
"="
operator
• are
still
not
WAL
logged
as
of
9.4
beta
1
– not
crash
safe
– not
replicated
77
78. btree_gin
78
CREATE EXTENSION IF NOT EXISTS btree_gin;
!
CREATE UNLOGGED TABLE numbers AS
SELECT (random() * 2000)::int AS a FROM generate_series(1, 2000000) x;
!
CREATE INDEX numbers_gin_idx ON numbers USING gin(a);
!
EXPLAIN ANALYZE SELECT * FROM numbers WHERE a = 1000;
!
QUERY PLAN
------------
Bitmap Heap Scan on numbers (cost=113.50..9509.26 rows=10000 width=4) (actual
time=0.388..1.459 rows=991 loops=1)
Recheck Cond: (a = 1000)
-> Bitmap Index Scan on numbers_gin_idx (cost=0.00..111.00 rows=10000 width=0)
(actual time=0.232..0.232 rows=991 loops=1)
Index Cond: (a = 1000)
!
Total runtime: 1.563 ms
79. btree_gin
vs
btree
79
-- btree
SELECT pg_size_pretty(pg_total_relation_size('numbers_idx'));
pg_size_pretty
----------------
43 MB
!
!
!
-- GIN
SELECT pg_size_pretty(pg_total_relation_size('numbers_gin_idx'));
pg_size_pretty
----------------
16 MB
• Only
use
GIN
over
btree
if
you
have
a
lot
of
duplicate
entries
80. hstore
-‐
the
PostgreSQL
Key-‐Value
Store
80
CREATE EXTENSION IF NOT EXISTS hstore;
!
CREATE UNLOGGED TABLE keypairs AS
SELECT
(x || ' => ' || (x + (random() * 5)::int))::hstore AS data
FROM generate_series(1,1000000) x;
SELECT pg_size_pretty(pg_relation_size('keypairs'));
!
!
SELECT * FROM keypairs WHERE data ? ‘3';
data
----------
"3"=>"4"
!
EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3';
QUERY PLAN
-----------
Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual time=0.065..208.808
rows=1 loops=1)
Filter: (data ? '3'::text)
Rows Removed by Filter: 999999
!
Total runtime: 208.825 ms
81. hstore
-‐
the
PostgreSQL
Key-‐Value
Store
81
CREATE INDEX keypairs_data_gin_idx ON keypairs
USING gin(data);
!
EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3';
!
QUERY PLAN
-----------
Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24) (actual
time=0.044..0.045 rows=1 loops=1)
Recheck Cond: (data ? '3'::text)
-> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50 rows=1000 width=0)
(actual time=0.039..0.039 rows=1 loops=1)
Index Cond: (data ? '3'::text)
!
Total runtime: 0.071 ms
82. JSONB:
Coming
in
9.4
82
INSERT INTO documents
SELECT row_to_json(ROW(x, x + 2, x + 3))::jsonb
FROM generate_series(1,1000000) x;
!
!
CREATE INDEX documents_data_gin_idx ON documents
USING gin(data jsonb_path_ops);
!
!
!
SELECT * FROM documents WHERE data @> '{ "f1": 10 }';
data
--------------------------------
{"f1": 10, "f2": 12, "f3": 13}
!
!
Execution time: 0.084 ms
83. Awesome
vs
WTF:
A
Note
On
Operator
Indexability
83
EXPLAIN ANALYZE SELECT * FROM documents WHERE data @> '{ "f1": 10 }';
!
QUERY PLAN
-----------
Bitmap Heap Scan on documents (cost=27.75..3082.65 rows=1000 width=66) (actual time=0.029..0.030
rows=1 loops=1)
Recheck Cond: (data @> '{"f1": 10}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on documents_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual
time=0.014..0.014 rows=1 loops=1)
Index Cond: (data @> '{"f1": 10}'::jsonb)
!
Execution time: 0.084 ms
!
EXPLAIN ANALYZE SELECT * FROM documents WHERE '{ "f1": 10 }' <@ data;
!
QUERY PLAN
-----------
Seq Scan on documents (cost=0.00..24846.00 rows=1000 width=66) (actual time=0.015..245.924
rows=1 loops=1)
Filter: ('{"f1": 10}'::jsonb <@ data)
Rows Removed by Filter: 999999
!
Execution time: 245.947 ms
84. For
More
Information…
• http://www.postgresql.org/docs/current/static/
indexes.html
• http://www.postgresql.org/docs/current/static/
gist.html
• http://www.postgresql.org/docs/current/static/
gin.html
• http://www.postgresql.org/docs/current/static/
spgist.html
• GiST
+
GIN
+
Full
Text
Search:
– http://www.postgresql.org/docs/current/static/textsearch-‐
indexes.html
84
85. Conclusion
• Postgres
has
*a
lot*
of
different
types
of
indexes,
and
variations
on
each
of
its
engines
• Extensions
make
use
of
PostgreSQL
indexes
– PostGIS
• Need
to
understand
where
index
usage
is
appropriate
in
your
application
85