SlideShare a Scribd company logo
Manticore 6
March 10th 2023
Manticore Buddy
● Written in PHP 8
● Supports multithreading
● Comes as a self-extracting PHAR
● Requires manticore-executor
● manticore-executor is just statically built PHP with all needed
modules
● apt/yum install manticore manticore-extra installs
everything including:
○ Manticore Server
○ Manticore Buddy
○ Manticore-executor
● Works in Linux, Windows and MacOS
● Manticore Search starts Buddy on start
○ And restarts it in case it stops
○ Can be disabled
● Pluggable architecture in progress
● Applications: SHOW QUERIES, auto schema, mysqldump, shards
orchestration, Apache superset, Grafana,
Opensearch/Elasticsearch Kibana/Logstash/Elasticsearch, new /cli
and many-many more
● SHOW THREADS shows threads with thread ids
● SHOW QUERIES shows queries with session ids
● KILL <session id> KILLs the SELECT running in the
session
● Only SELECTs can be killed
SHOW QUERIES and KILL
Elasticsearch-compatible writes
New endpoints
● POST /<table name>/_create/<id>
● POST /<table name>/_create/ (auto id)
● PUT /<table name>/_doc/<id>
● POST /_bulk
Secondary indexes are ON by default
● Manticore 5: secondary indexes are OFF for searching, but ON for
indexing
● Manticore 6: ON for searching too
● SI index format has been changed
● The algorithms have been optimized too
Auto-schema
● You don’t have to create a table before
writing to it
● Manticore can do auto type-detection
○ Textual data => text
○ Email => string
○ Numeric => int/bigint/float
○ JSON => json
○ Array => multi/multi64
● ON by default
● searchd.auto_schema=0 disables it
● Manticore uses CBO to decide what to use for non-full-text queries:
○ Just do plain scan
○ Additional data/algorithms:
■ Docid index
■ Columnar scan
■ Secondary indexes
● CBO estimates execution cost using attribute statistics like histograms, PGM and
columnar storage statistics
● Execution cost is calculated for every filter in the query
● Multithreaded query execution is considered
○ E.g. queries using secondary/docid indexes always run in a single thread
● Optimizer hints:
○ /*+ DocidIndex(id) */
○ /*+ SecondaryIndex(<attr_name1>) */
○ /*+ ColumnarScan(<attr_name1>) */
○ /*+ NO_ColumnarScan(id) */
Revamp of cost-based query optimizer
● What for: to understand what features we need to prioritize working on
● How do disable:
○ searchd.telemetry = 0
○ Env. var. TELEMETRY=0
● All metrics are completely anonymous and no sensitive information is transmitted
● Ex. of what we don’t collect:
○ Any data
○ Table names, field names, hostnames
● Ex. of what we collect:
○ Depersonalized machine ID (hash(machine id))
○ OS name
○ Manticore version information
○ Instance uptime
○ Whether the instance is running in RT mode or plain mode
○ Whether MCL is used or not
○ Whether crash happened or not
○ Whether backup was called
Telemetry
FREEZE/UNFREEZE and manticore-backup
● FREEZE is not LOCK. After FREEZE you can:
○ Read from the table.
○ And write to the table to some extent.
● FREEZE returns the list of table files that are
immutable until you UNFREEZE
● Eventually the table will be locked, but chances
are the backup will be already made then
● Manticore-backup uses
FREEZE/UNFREEZE
● Use manticore-backup --restore to
restore the whole instance
● Use IMPORT TABLE to restore a specific table
● manticore-backup is written in PHP and
uses manticore-executor
SQL BACKUP
● SQL BACKUP command does the
same as manticore-backup
and uses the same code
● BACKUP … OPTION async=1
● No RESTORE command yet
Plans:
● Plain tables backup
● Backup to an external storage
(e.g. S3)
● Default max_matches (1000) can be automatically
increased up to max_matches_increase_threshold
(16384)
○ Useful when pseudo sharding is on (which is by
default)
● SELECT .. OPTION accurate_aggregation=1:
○ May disable query parallelization to guarantee
accuration aggregation
Dynamic max_matches and accurate aggregation
● Manticore Search
● Manticore Columnar Library
● Official docker image
Arm64 support
● Manticore 5: from 1 to 263
-1
● Manticore 6: from -263
+1 to 263
-1
● Next version: from 1 to 264
-1 + UINT64()
64-bit ID
● History: “full-text index”
● Then we added attributes, docid index,
secondary index
● “Index” became confusing
● So we renamed “index” to “table”:
○ in the docs
○ in all the commands
● The old commands are still working
Index -> table
● New columnar storage format
○ Have to rebuild the tables
● New secondary indexes file format
○ ALTER TABLE <table name> REBUILD SECONDARY
○ Since it’s on by default now, backwards compatibility will be
maintained
● New binlog file format
○ Have to stop cleanly
Breaking changes in Manticore 6
● 80+ bugs fixed in Manticore 6
● New known: memory leak related with Buddy
● Preparing Manticore 6.0.4 maintenance
release
● Updated release process: will release more
frequently
Bugs
● Performance:
○ FT + secondary: ✅
○ select count(*) where attr… (in progress)
○ Weak-AND for FT (nearest plans)
○ Per-index binlog
○ Parallel merging
● Buddy-related:
○ Mysqldump ✅
○ /cli ✅
○ Auto-schema for Elastic-like writes ✅
○ Various clients integrations (in progress)
○ Kibana / Opensearch dashboards
● Features:
○ Auto-sharding and orchestration
Plans
Questions

More Related Content

Manticore 6.pdf

  • 2. Manticore Buddy ● Written in PHP 8 ● Supports multithreading ● Comes as a self-extracting PHAR ● Requires manticore-executor ● manticore-executor is just statically built PHP with all needed modules ● apt/yum install manticore manticore-extra installs everything including: ○ Manticore Server ○ Manticore Buddy ○ Manticore-executor ● Works in Linux, Windows and MacOS ● Manticore Search starts Buddy on start ○ And restarts it in case it stops ○ Can be disabled ● Pluggable architecture in progress ● Applications: SHOW QUERIES, auto schema, mysqldump, shards orchestration, Apache superset, Grafana, Opensearch/Elasticsearch Kibana/Logstash/Elasticsearch, new /cli and many-many more
  • 3. ● SHOW THREADS shows threads with thread ids ● SHOW QUERIES shows queries with session ids ● KILL <session id> KILLs the SELECT running in the session ● Only SELECTs can be killed SHOW QUERIES and KILL
  • 4. Elasticsearch-compatible writes New endpoints ● POST /<table name>/_create/<id> ● POST /<table name>/_create/ (auto id) ● PUT /<table name>/_doc/<id> ● POST /_bulk
  • 5. Secondary indexes are ON by default ● Manticore 5: secondary indexes are OFF for searching, but ON for indexing ● Manticore 6: ON for searching too ● SI index format has been changed ● The algorithms have been optimized too
  • 6. Auto-schema ● You don’t have to create a table before writing to it ● Manticore can do auto type-detection ○ Textual data => text ○ Email => string ○ Numeric => int/bigint/float ○ JSON => json ○ Array => multi/multi64 ● ON by default ● searchd.auto_schema=0 disables it
  • 7. ● Manticore uses CBO to decide what to use for non-full-text queries: ○ Just do plain scan ○ Additional data/algorithms: ■ Docid index ■ Columnar scan ■ Secondary indexes ● CBO estimates execution cost using attribute statistics like histograms, PGM and columnar storage statistics ● Execution cost is calculated for every filter in the query ● Multithreaded query execution is considered ○ E.g. queries using secondary/docid indexes always run in a single thread ● Optimizer hints: ○ /*+ DocidIndex(id) */ ○ /*+ SecondaryIndex(<attr_name1>) */ ○ /*+ ColumnarScan(<attr_name1>) */ ○ /*+ NO_ColumnarScan(id) */ Revamp of cost-based query optimizer
  • 8. ● What for: to understand what features we need to prioritize working on ● How do disable: ○ searchd.telemetry = 0 ○ Env. var. TELEMETRY=0 ● All metrics are completely anonymous and no sensitive information is transmitted ● Ex. of what we don’t collect: ○ Any data ○ Table names, field names, hostnames ● Ex. of what we collect: ○ Depersonalized machine ID (hash(machine id)) ○ OS name ○ Manticore version information ○ Instance uptime ○ Whether the instance is running in RT mode or plain mode ○ Whether MCL is used or not ○ Whether crash happened or not ○ Whether backup was called Telemetry
  • 9. FREEZE/UNFREEZE and manticore-backup ● FREEZE is not LOCK. After FREEZE you can: ○ Read from the table. ○ And write to the table to some extent. ● FREEZE returns the list of table files that are immutable until you UNFREEZE ● Eventually the table will be locked, but chances are the backup will be already made then ● Manticore-backup uses FREEZE/UNFREEZE ● Use manticore-backup --restore to restore the whole instance ● Use IMPORT TABLE to restore a specific table ● manticore-backup is written in PHP and uses manticore-executor
  • 10. SQL BACKUP ● SQL BACKUP command does the same as manticore-backup and uses the same code ● BACKUP … OPTION async=1 ● No RESTORE command yet Plans: ● Plain tables backup ● Backup to an external storage (e.g. S3)
  • 11. ● Default max_matches (1000) can be automatically increased up to max_matches_increase_threshold (16384) ○ Useful when pseudo sharding is on (which is by default) ● SELECT .. OPTION accurate_aggregation=1: ○ May disable query parallelization to guarantee accuration aggregation Dynamic max_matches and accurate aggregation
  • 12. ● Manticore Search ● Manticore Columnar Library ● Official docker image Arm64 support
  • 13. ● Manticore 5: from 1 to 263 -1 ● Manticore 6: from -263 +1 to 263 -1 ● Next version: from 1 to 264 -1 + UINT64() 64-bit ID
  • 14. ● History: “full-text index” ● Then we added attributes, docid index, secondary index ● “Index” became confusing ● So we renamed “index” to “table”: ○ in the docs ○ in all the commands ● The old commands are still working Index -> table
  • 15. ● New columnar storage format ○ Have to rebuild the tables ● New secondary indexes file format ○ ALTER TABLE <table name> REBUILD SECONDARY ○ Since it’s on by default now, backwards compatibility will be maintained ● New binlog file format ○ Have to stop cleanly Breaking changes in Manticore 6
  • 16. ● 80+ bugs fixed in Manticore 6 ● New known: memory leak related with Buddy ● Preparing Manticore 6.0.4 maintenance release ● Updated release process: will release more frequently Bugs
  • 17. ● Performance: ○ FT + secondary: ✅ ○ select count(*) where attr… (in progress) ○ Weak-AND for FT (nearest plans) ○ Per-index binlog ○ Parallel merging ● Buddy-related: ○ Mysqldump ✅ ○ /cli ✅ ○ Auto-schema for Elastic-like writes ✅ ○ Various clients integrations (in progress) ○ Kibana / Opensearch dashboards ● Features: ○ Auto-sharding and orchestration Plans