random list of Apache Cassndra Anti Patterns. There is a lot of info on what to use Cassandra for and how, but not a lot of information on what not to do. This presentation works towards filling that gap.
This document summarizes several Cassandra anti-patterns including:
- Using a non-Oracle JVM which is not recommended.
- Putting the commit log and data directories on the same disk which can impact performance.
- Using EBS volumes on EC2 which can have unpredictable performance and throughput issues.
- Configuring overly large JVM heaps over 16GB which can cause garbage collection issues.
- Performing large batch mutations in a single operation which risks timeouts if not broken into smaller batches.
This document discusses best practices for running Cassandra on Amazon EC2. It recommends instance sizes like m1.xlarge for most use cases. It emphasizes configuring data and commit logs on ephemeral drives for better performance than EBS volumes. It also stresses the importance of distributing nodes across availability zones and regions for high availability. Overall, the document provides guidance on optimizing Cassandra deployments on EC2 through choices of hardware, data storage, networking and operational practices.
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
Autovacuum is PostgreSQL's automatic vacuum process that helps manage bloat and garbage collection. It is critical for performance but is often improperly configured by default settings. Autovacuum works table-by-table to remove expired rows in small portions to avoid long blocking operations. Its settings like scale factors, thresholds, and costs can be tuned more aggressively for OLTP workloads to better control bloat and avoid long autovacuum operations.
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...DataStax Academy
Ooyala has been using Apache Cassandra since version 0.4. Our data ingest volume has exploded since 0.4 and Cassandra has scaled along with us. Al will cover many topics from an operational perspective on how to manage, tune, and scale Cassandra in a production environment.
Sharding: Past, Present and Future with Krutika DhananjayGluster.org
- Sharding is a client-side translator that splits files into equally sized chunks or shards to improve performance and utilization of storage resources. It sits above the distributed hash table (DHT) in Gluster.
- Sharding benefits virtual machine image storage by allowing data healing and replication at the shard level for better scalability. It also distributes load more evenly across bricks.
- For general purpose use, sharding aims to maximize parallelism during writes while maintaining consistency through atomic operations and locking frameworks. Key challenges include updating file metadata without locking and handling operations like truncates and appends correctly across shards.
This document discusses scaling Cassandra for big data applications. It describes how Ooyala uses Cassandra for fast access to data generated by MapReduce, high availability key-value storage from Storm, and playhead tracking for cross-device resume. It outlines Ooyala's experience migrating to newer Cassandra versions as data doubled yearly, including removing expired tombstones, schema changes, and Linux performance tuning.
The document discusses various ways to tune Linux and MySQL for performance. It recommends measuring different aspects of the database, operating system, disk and application performance. Some specific tuning techniques discussed include testing different IO schedulers, increasing the number of InnoDB threads, reducing swapping by lowering the swappiness value, enabling interleave mode for NUMA systems, and potentially using huge pages, though noting the complexity of configuring huge pages. The key message is that default settings may not be optimal and testing is needed to understand each individual system's performance.
This document discusses logical replication with pglogical. It begins by explaining that pglogical performs row-oriented replication and outputs replication data that can be used in various ways. It then covers the architectures of standalone PostgreSQL, physical replication, and logical replication. The rest of the document discusses key aspects of pglogical such as its output plugin, selective replication capabilities, performance and future plans, and examples of using the output with other applications.
The document summarizes the results of benchmarking and comparing the performance of PostgreSQL databases hosted on Amazon EC2, RDS, and Heroku. It finds that EC2 provides the most configuration options but requires more management, RDS offers simplified deployment but less configuration options, and Heroku requires no management but has limited configuration and higher costs. Benchmark results show EC2 performing best for raw performance while RDS and Heroku trade off some performance for manageability. Heroku was the most expensive option.
The document discusses data modeling goals and examples for Cassandra. It provides guidance on keeping related data together on disk, avoiding normalization, and modeling time series data. Examples covered include mapping time series data points to Cassandra rows and columns, querying time slices, bucketing data, and eventually consistent transaction logging to provide atomicity. The document aims to help with common Cassandra modeling questions and patterns.
This document summarizes a presentation about PostgreSQL replication. It discusses different replication terms like master/slave and primary/secondary. It also covers replication mechanisms like statement-based and binary replication. The document outlines how to configure and administer replication through files like postgresql.conf and recovery.conf. It discusses managing replication including failover, failback, remastering and replication lag. It also covers synchronous replication and cascading replication setups.
Seastore: Next Generation Backing Store for CephScyllaDB
Ceph is an open source distributed file system addressing file, block, and object storage use cases. Next generation storage devices require a change in strategy, so the community has been developing crimson-osd, an eventual replacement for ceph-osd intended to minimize cpu overhead and improve throughput and latency. Seastore is a new backing store for crimson-osd targeted at emerging storage technologies including persistent memory and ZNS devices.
PostgreSQL worst practices, version PGConf.US 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw twenty random slides and explain you why such practices are bad and how to avoid running into them.
Josh Berkus
Most users know that PostgreSQL has a 23-year development history. But did you know that Postgres code is used for over a dozen other database systems? Thanks to our liberal licensing, many companies and open source projects over the years have taken the Postgres or PostgreSQL code, changed it, added things to it, and/or merged it into something else. Illustra, Truviso, Aster, Greenplum, and others have seen the value of Postgres not just as a database but as some darned good code they could use. We'll explore the lineage of these forks, and go into the details of some of the more interesting ones.
This document summarizes Josh Berkus's presentation on new features in PostgreSQL versions 9.1, 9.2, and the upcoming 9.3. Some key highlights include improvements to read and write performance, the addition of JSON data type and PL/v8 and PL/Coffee procedural languages, index-only scans, cascading replication, SP-GiST indexing, and many new monitoring and administration features. Josh Berkus is available for questions at [email protected] and encourages attendees to upcoming PostgreSQL conferences.
P99CONF — What We Need to Unlearn About Persistent StorageScyllaDB
System software engineers have long been taught that disks are slow and sequential I/O is key to performance. With SSD drives I/O really got much faster but not simpler. In this brave new world of rocket-speed throughputs an engineer has to distinguish sustained workload from bursts, (still) take care about I/O buffer sizes, account for disks’ internal parallelism and study mixed I/O characteristics in advance. In this talk we will share some key performance measurements of the modern hardware we’re taking at ScyllaDB and our opinion about the implications for the database and system software design.
This document summarizes the results of benchmarking PostgreSQL database performance on several cloud platforms, including AWS EC2, RDS, Google Compute Engine, DigitalOcean, Rackspace, and Heroku.
The benchmarks tested small and large instance sizes across the clouds on different workload types, including in-memory and disk-based transactions and queries. Key metrics measured were transactions per second (TPS), load time to set up the database, and cost per TPS and load bandwidth.
The results show large performance and cost variations between clouds and instance types. In general, dedicated instances like EC2 outperformed shared instances, and DBaaS options like RDS were more expensive but offered higher availability. The document discusses challenges
NDB Cluster 8.0 was benchmarked using YCSB, the de facto cloud benchmark. The benchmark showed that:
1) NDB Cluster achieved the highest throughput of any distributed in-memory transactional database, scaling linearly as data nodes were added.
2) Increasing the number of rows in the cluster from 300M to 600M rows showed no impact on performance or latency.
3) Performance was optimized for latency versus throughput by adjusting load generators and the number of data manager threads per data node.
Responding rapidly when you have 100+ GB data sets in JavaPeter Lawrey
One way to speed up you application is to bring more of your data into memory. But how to do you handle hundreds of GB of data in a JVM and what tools can help you.
Mentions: Speedment, Azul, Terracotta, Hazelcast and Chronicle.
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico
The document summarizes several issues encountered with high load on Alibaba MySQL databases and solutions implemented:
1) Hotspot updating of a single row caused deadlocks; implementing queueing on the primary key resolved this.
2) Unexpected long transactions under high load led to clients waiting long periods; committing transactions early where possible addressed this.
3) More than 50,000 active threads overwhelmed MySQL's capabilities; implementing actions based on low and high thread thresholds helped.
This document discusses best practices for containerizing Java applications to avoid out of memory errors and performance issues. It covers choosing appropriate Java versions, garbage collector tuning, sizing heap memory correctly while leaving room for operating system caches, avoiding swapping, and monitoring applications to detect issues. Key recommendations include using the newest Java version possible, configuring the garbage collector appropriately for the workload, allocating all heap memory at startup, and monitoring memory usage to detect problems early.
There are two key choices when scaling a NoSQL data store:
choosing between a hash or a range based sharding and choosing the right sharding key. Any choice is a trade-off between scalability of read, append, and update workloads.
In this talk I will present the standard scaling techniques,
some non-universal sharding tricks, less obvious reasons for
hotspots, as well as techniques to avoid them.
Unikraft: Fast, Specialized Unikernels the Easy WayScyllaDB
P99 CONF
Unikernels are famous for providing excellent performance in terms of boot times, throughput and memory consumption, to name a few metrics. However, they are infamous for making it hard and extremely time consuming to extract such performance, and for needing significant engineering effort in order to port applications to them. We introduce Unikraft, a novel micro-library OS that (1) fully modularizes OS primitives so that it is easy to customize the unikernel and include only relevant components and (2) exposes a set of composable, performance-oriented APIs in order to make it easy for developers to obtain high performance.
Our evaluation using off-the-shelf applications such as nginx, SQLite, and Redis shows that running them on Unikraft results in a 1.7x-2.7x performance improvement compared to Linux guests. In addition, Unikraft images for these apps are around 1MB, require less than 10MB of RAM to run, and boot in around 1ms on top of the VMM time (total boot time 3ms-40ms). Unikraft is a Linux Foundation open source project and can be found at www.unikraft.org.
Unless you have a problem which scales to many independent tasks easily e.g. web services, you may find that the best way to improve throughput is by reducing latency. This talk starts with Little's Law and it's consequences for high performance computing.
1) Automated failover involves detecting failure of the primary database, promoting a replica to be the new primary, and failing over applications to connect to the new primary.
2) Detecting failure involves multiple checks like connecting to the primary, checking processes, and using pg_isready. Promoting a replica requires choosing the most up-to-date one and running pg_ctl promote.
3) Failing over applications can be done by updating a configuration system and restarting apps, using a tool like Zookeeper, or by failing over a virtual IP with Pacemaker. Proxies can also be used to fail over connections.
7 ways to crash Postgres
1. Do not apply updates and remain on outdated versions of PostgreSQL.
2. Run out of disk space by allowing the database to grow without monitoring disk usage. This can result in errors and panics.
3. Delete important database files and directories which causes the database to fail to start.
4. Set memory settings too high and overload the system memory, triggering out of memory kills of the PostgreSQL process.
5. Use faulty hardware without monitoring for failures which can lead to corrupted blocks and index errors.
6. Allow too many open connections without connection pooling which can prevent new connections.
7. Accumulate zombie locks by not closing transactions, slowing down
Cassandra nice use cases and worst anti patternsDuyhai Doan
This document discusses Cassandra use cases and anti-patterns. Some good use cases include rate limiting, fraud prevention, account validation, and storing sensor time series data. Poor designs include using Cassandra like a queue, storing null values, intensive updates to the same column, and dynamically changing the schema. The document provides examples and explanations of how to properly implement these scenarios in Cassandra.
Cassandra concepts, patterns and anti-patternsDave Gardner
The document discusses Cassandra concepts, patterns, and anti-patterns. It begins with an agenda that covers choosing NoSQL, Cassandra concepts based on Dynamo and Bigtable, and patterns and anti-patterns of use. It then delves into Cassandra concepts such as consistent hashing, vector clocks, gossip protocol, hinted handoff, read repair, and consistency levels. It also discusses Bigtable concepts like sparse column-based data model, SSTables, commit log, and memtables. Finally, it outlines several patterns and anti-patterns of Cassandra use.
This document discusses logical replication with pglogical. It begins by explaining that pglogical performs row-oriented replication and outputs replication data that can be used in various ways. It then covers the architectures of standalone PostgreSQL, physical replication, and logical replication. The rest of the document discusses key aspects of pglogical such as its output plugin, selective replication capabilities, performance and future plans, and examples of using the output with other applications.
The document summarizes the results of benchmarking and comparing the performance of PostgreSQL databases hosted on Amazon EC2, RDS, and Heroku. It finds that EC2 provides the most configuration options but requires more management, RDS offers simplified deployment but less configuration options, and Heroku requires no management but has limited configuration and higher costs. Benchmark results show EC2 performing best for raw performance while RDS and Heroku trade off some performance for manageability. Heroku was the most expensive option.
The document discusses data modeling goals and examples for Cassandra. It provides guidance on keeping related data together on disk, avoiding normalization, and modeling time series data. Examples covered include mapping time series data points to Cassandra rows and columns, querying time slices, bucketing data, and eventually consistent transaction logging to provide atomicity. The document aims to help with common Cassandra modeling questions and patterns.
This document summarizes a presentation about PostgreSQL replication. It discusses different replication terms like master/slave and primary/secondary. It also covers replication mechanisms like statement-based and binary replication. The document outlines how to configure and administer replication through files like postgresql.conf and recovery.conf. It discusses managing replication including failover, failback, remastering and replication lag. It also covers synchronous replication and cascading replication setups.
Seastore: Next Generation Backing Store for CephScyllaDB
Ceph is an open source distributed file system addressing file, block, and object storage use cases. Next generation storage devices require a change in strategy, so the community has been developing crimson-osd, an eventual replacement for ceph-osd intended to minimize cpu overhead and improve throughput and latency. Seastore is a new backing store for crimson-osd targeted at emerging storage technologies including persistent memory and ZNS devices.
PostgreSQL worst practices, version PGConf.US 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw twenty random slides and explain you why such practices are bad and how to avoid running into them.
Josh Berkus
Most users know that PostgreSQL has a 23-year development history. But did you know that Postgres code is used for over a dozen other database systems? Thanks to our liberal licensing, many companies and open source projects over the years have taken the Postgres or PostgreSQL code, changed it, added things to it, and/or merged it into something else. Illustra, Truviso, Aster, Greenplum, and others have seen the value of Postgres not just as a database but as some darned good code they could use. We'll explore the lineage of these forks, and go into the details of some of the more interesting ones.
This document summarizes Josh Berkus's presentation on new features in PostgreSQL versions 9.1, 9.2, and the upcoming 9.3. Some key highlights include improvements to read and write performance, the addition of JSON data type and PL/v8 and PL/Coffee procedural languages, index-only scans, cascading replication, SP-GiST indexing, and many new monitoring and administration features. Josh Berkus is available for questions at [email protected] and encourages attendees to upcoming PostgreSQL conferences.
P99CONF — What We Need to Unlearn About Persistent StorageScyllaDB
System software engineers have long been taught that disks are slow and sequential I/O is key to performance. With SSD drives I/O really got much faster but not simpler. In this brave new world of rocket-speed throughputs an engineer has to distinguish sustained workload from bursts, (still) take care about I/O buffer sizes, account for disks’ internal parallelism and study mixed I/O characteristics in advance. In this talk we will share some key performance measurements of the modern hardware we’re taking at ScyllaDB and our opinion about the implications for the database and system software design.
This document summarizes the results of benchmarking PostgreSQL database performance on several cloud platforms, including AWS EC2, RDS, Google Compute Engine, DigitalOcean, Rackspace, and Heroku.
The benchmarks tested small and large instance sizes across the clouds on different workload types, including in-memory and disk-based transactions and queries. Key metrics measured were transactions per second (TPS), load time to set up the database, and cost per TPS and load bandwidth.
The results show large performance and cost variations between clouds and instance types. In general, dedicated instances like EC2 outperformed shared instances, and DBaaS options like RDS were more expensive but offered higher availability. The document discusses challenges
NDB Cluster 8.0 was benchmarked using YCSB, the de facto cloud benchmark. The benchmark showed that:
1) NDB Cluster achieved the highest throughput of any distributed in-memory transactional database, scaling linearly as data nodes were added.
2) Increasing the number of rows in the cluster from 300M to 600M rows showed no impact on performance or latency.
3) Performance was optimized for latency versus throughput by adjusting load generators and the number of data manager threads per data node.
Responding rapidly when you have 100+ GB data sets in JavaPeter Lawrey
One way to speed up you application is to bring more of your data into memory. But how to do you handle hundreds of GB of data in a JVM and what tools can help you.
Mentions: Speedment, Azul, Terracotta, Hazelcast and Chronicle.
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico
The document summarizes several issues encountered with high load on Alibaba MySQL databases and solutions implemented:
1) Hotspot updating of a single row caused deadlocks; implementing queueing on the primary key resolved this.
2) Unexpected long transactions under high load led to clients waiting long periods; committing transactions early where possible addressed this.
3) More than 50,000 active threads overwhelmed MySQL's capabilities; implementing actions based on low and high thread thresholds helped.
This document discusses best practices for containerizing Java applications to avoid out of memory errors and performance issues. It covers choosing appropriate Java versions, garbage collector tuning, sizing heap memory correctly while leaving room for operating system caches, avoiding swapping, and monitoring applications to detect issues. Key recommendations include using the newest Java version possible, configuring the garbage collector appropriately for the workload, allocating all heap memory at startup, and monitoring memory usage to detect problems early.
There are two key choices when scaling a NoSQL data store:
choosing between a hash or a range based sharding and choosing the right sharding key. Any choice is a trade-off between scalability of read, append, and update workloads.
In this talk I will present the standard scaling techniques,
some non-universal sharding tricks, less obvious reasons for
hotspots, as well as techniques to avoid them.
Unikraft: Fast, Specialized Unikernels the Easy WayScyllaDB
P99 CONF
Unikernels are famous for providing excellent performance in terms of boot times, throughput and memory consumption, to name a few metrics. However, they are infamous for making it hard and extremely time consuming to extract such performance, and for needing significant engineering effort in order to port applications to them. We introduce Unikraft, a novel micro-library OS that (1) fully modularizes OS primitives so that it is easy to customize the unikernel and include only relevant components and (2) exposes a set of composable, performance-oriented APIs in order to make it easy for developers to obtain high performance.
Our evaluation using off-the-shelf applications such as nginx, SQLite, and Redis shows that running them on Unikraft results in a 1.7x-2.7x performance improvement compared to Linux guests. In addition, Unikraft images for these apps are around 1MB, require less than 10MB of RAM to run, and boot in around 1ms on top of the VMM time (total boot time 3ms-40ms). Unikraft is a Linux Foundation open source project and can be found at www.unikraft.org.
Unless you have a problem which scales to many independent tasks easily e.g. web services, you may find that the best way to improve throughput is by reducing latency. This talk starts with Little's Law and it's consequences for high performance computing.
1) Automated failover involves detecting failure of the primary database, promoting a replica to be the new primary, and failing over applications to connect to the new primary.
2) Detecting failure involves multiple checks like connecting to the primary, checking processes, and using pg_isready. Promoting a replica requires choosing the most up-to-date one and running pg_ctl promote.
3) Failing over applications can be done by updating a configuration system and restarting apps, using a tool like Zookeeper, or by failing over a virtual IP with Pacemaker. Proxies can also be used to fail over connections.
7 ways to crash Postgres
1. Do not apply updates and remain on outdated versions of PostgreSQL.
2. Run out of disk space by allowing the database to grow without monitoring disk usage. This can result in errors and panics.
3. Delete important database files and directories which causes the database to fail to start.
4. Set memory settings too high and overload the system memory, triggering out of memory kills of the PostgreSQL process.
5. Use faulty hardware without monitoring for failures which can lead to corrupted blocks and index errors.
6. Allow too many open connections without connection pooling which can prevent new connections.
7. Accumulate zombie locks by not closing transactions, slowing down
Cassandra nice use cases and worst anti patternsDuyhai Doan
This document discusses Cassandra use cases and anti-patterns. Some good use cases include rate limiting, fraud prevention, account validation, and storing sensor time series data. Poor designs include using Cassandra like a queue, storing null values, intensive updates to the same column, and dynamically changing the schema. The document provides examples and explanations of how to properly implement these scenarios in Cassandra.
Cassandra concepts, patterns and anti-patternsDave Gardner
The document discusses Cassandra concepts, patterns, and anti-patterns. It begins with an agenda that covers choosing NoSQL, Cassandra concepts based on Dynamo and Bigtable, and patterns and anti-patterns of use. It then delves into Cassandra concepts such as consistent hashing, vector clocks, gossip protocol, hinted handoff, read repair, and consistency levels. It also discusses Bigtable concepts like sparse column-based data model, SSTables, commit log, and memtables. Finally, it outlines several patterns and anti-patterns of Cassandra use.
- In Cassandra, data is modeled differently than in relational databases, with an emphasis on denormalizing data and organizing it to support common queries with minimal disk seeks
- Cassandra uses keyspaces, column families, rows, columns and timestamps to organize data, with columns ordered to enable efficient querying of ranges
- To effectively model data in Cassandra, you should think about common queries and design schemas to co-locate frequently accessed data on disk to minimize I/O during queries
Talk from CassandraSF 2012 showing the importance of real durability. Examples of use for row level isolation in Cassandra and the implementation of a transaction log pattern. The example used is a banking system on top of Cassandra with support crediting/debiting an account, viewing an account balance and transferring money between accounts.
The document summarizes a workshop on Cassandra data modeling. It discusses four use cases: (1) modeling clickstream data by storing sessions and clicks in separate column families, (2) modeling a rolling time window of data points by storing each point in a column with a TTL, (3) modeling rolling counters by storing counts in columns indexed by time bucket, and (4) using transaction logs to achieve eventual consistency when modeling many-to-many relationships by serializing transactions and deleting logs after commit. The document provides recommendations and alternatives for each use case.
Cassandra, Modeling and Availability at AMUGMatthew Dennis
brief high level comparison of modeling between relational databases and Cassandra followed by a brief description of how Cassandra achieves global availability
A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/
Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
Este documento describe varios mecanismos por los cuales la tuberculosis puede afectar el abdomen, incluyendo la ingestión de esputo o leche infectados, la propagación contigua desde órganos adyacentes, o la diseminación hematógena desde tuberculosis pulmonar o miliar. Describe las diferentes lesiones que puede causar en el intestino delgado o grueso, como úlceras, hipertrofia, o una combinación de ambas. También menciona otras formas raras de tuberculosis abdominal como la peritonitis, esofágica, gástrica
Christopher Batey is a Technical Evangelist for Apache Cassandra. He discusses various anti-patterns to avoid when using Cassandra, including client-side joins, multi-partition queries, unlogged batches, mutable data, and more. He provides examples of how to model data and queries in Cassandra to avoid these anti-patterns, such as denormalizing data, bucketing time series data, and using logged batches in some cases. He emphasizes tracing queries and using a local multi-node cluster to test patterns before deploying.
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Kinetic Data
Cassandra can be successfully used for applications that are not extremely large scale or write-heavy. The document discusses fears, misconceptions, and accepted anti-patterns of first-time Cassandra users. It provides examples from a deployed application called Kinetic Request that uses Cassandra for multi-datacenter replication, durability, and scalability. Common concerns like atomicity, joins, lookups, updates, and queues are addressed, with solutions demonstrated from the real-world application. The key takeaways are that Cassandra has benefits even at moderate scales, the barriers are not as high as perceived, and to gain experience through experimentation and testing.
Aerospike is a key-value store optimized for fast caching with in-memory data structures and SSD support. Couchbase is optimized for caching with persistence to disk. Cassandra is best for big data archiving due to its efficient packing of data. MongoDB is a general-purpose document database best for web applications. YCSB is a popular benchmark for comparing NoSQL databases, but more tests are needed to evaluate features like secondary indexes.
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSEDataStax Academy
The document provides 5 tips for using Cassandra and DSE: 1) Data modeling best practices to avoid secondary indexes, 2) Understanding compaction choices like size-tiered, leveled, and date-tiered and their use cases, 3) Common mistakes in proofs-of-concept like testing on different hardware and empty nodes, 4) Hardware recommendations like using moderate sized nodes with SSDs, and 5) Anti-patterns like loading large batches of data and modeling queues with improper partitioning.
The document discusses the CAP theorem, which states that a distributed computer system cannot simultaneously provide all three of the following properties - consistency, availability, and partition tolerance. It notes that at most two of the three properties can be satisfied. It provides examples of different database systems and which two properties they satisfy - RDBMS satisfies consistency and availability by forfeiting partition tolerance, while NoSQL systems typically satisfy availability and partition tolerance by forfeiting consistency. The document cautions that vendors do not always accurately represent the properties their systems provide and notes some limitations and workarounds for achieving different properties.
O documento apresenta uma comparação entre os bancos de dados não-relacionais Cassandra e CouchDB. Discute conceitos como ACID x BASE, o teorema CAP e como cada um lida com disponibilidade e consistência. Explica os esquemas de dados e administração do Cassandra e CouchDB e faz um comparativo de suas funcionalidades e arquiteturas.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
By leveraging memory-mapped files, Speedment and the Chronicle Engine supports large Java maps that easily can exceed the size of your server’s RAM.Because the Java maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed, or restarted very quickly. Data can be retrieved with predictable ultralow latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive.” The mapped files can be tens of terabytes, which has been done in real-world deployment cases, and a large number of micro services can share these maps simultaneously. Learn more in this session.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
Microservices can provide terabytes of data in microseconds by mapping data from SQL databases into in-memory key-value stores and column key stores within JVMs. This is done through periodic synchronization of changed data from databases into memory and mapping the in-memory data into fast access structures. The in-memory data is then exposed through Java Stream and REST APIs to microservices for high performance querying and analysis of large datasets. This architecture allows microservices to quickly share access to large datasets and restart rapidly by reloading from the synchronized persistent stores.
Cassandra is used for real-time bidding in online advertising. It processes billions of bid requests per day with low latency requirements. Segment data, which assigns product or service affinity to user groups, is stored in Cassandra to reduce calculations and allow users to be bid on sooner. Tuning the cache size and understanding the active dataset helps optimize performance.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory and disk configuration parameters, designing schemas and queries efficiently, and leveraging caching strategies.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory parameters, effective indexing, and caching queries and data.
This document summarizes a presentation about building a high performance directory server in Java using the OpenDS project. It discusses the OpenDS architecture, design patterns used like asynchronous I/O and immutable objects, experiences tuning the Sun JVM like using large heap sizes and the CMS garbage collector. Performance tests showed search and modification rates of over 200,000 and 20,000 operations per second respectively on a Sun x4170 server. The presentation concludes by encouraging people to try OpenDS and get involved in the open source community around it.
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
Microservices solutions can provide fast access to large datasets by synchronizing SQL data into an in-JVM memory store and using key-value and column key stores. This allows querying terabytes of data in microseconds by mapping the data in memory and providing application programming interfaces. The solution uses periodic synchronization to initially load and periodically reload data, as well as reactive synchronization to capture and replay database changes.
The document discusses which database to use for different situations. It begins with explaining why a relational database may not be suitable for all problems and then describes different database categories including key-value stores, column family databases, document databases, graph databases, and Hadoop. It notes the characteristics and uses of each database type. The document concludes that the choice depends on factors like data structure, scalability needs, and workload.
This document discusses optimizing performance for high-load projects. It summarizes the delivery loads and technologies used for several projects including mGage, mobclix and XXXX. It then discusses optimizations made to improve performance, including using Solr for search, Redis for real-time data, Hadoop for reporting, and various Java optimizations in moving to Java 7. Specific optimizations discussed include reducing garbage collection, improving random number generation, and minimizing I/O operations.
From Raghu Ramakrishnan's presentation "Key Challenges in Cloud Computing and How Yahoo! is Approaching Them" at the 2009 Cloud Computing Expo in Santa Clara, CA, USA. Here's the talk description on the Expo's site: http://cloudcomputingexpo.com/event/session/510
Cassandra implementation for collecting data and presenting dataChen Robert
This document discusses Cassandra implementation for collecting and presenting data. It provides an overview of Cassandra, including why it was chosen, its architecture and data model. It describes how data is written to and read from Cassandra, and demonstrates the data model and graphing of data. Future uses of Cassandra are discussed.
Challenges in Maintaining a High Performance Search Engine Written in Javalucenerevolution
Presented by Simon Willnauer | Apache Lucene - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
During the last decade Apache Lucene became the de-facto standard in open source search technology. Thousands of applications from Twitter Scale Webservices to Computers playing Jeopardy rely on Lucene, a rock-solid, scaleable and fast information-retrieval library entirely written in Java. Maintaining and improving such a popular software library reveals tough challenges in testing, API design, data-structures, concurrency and optimizations. This talk presents the most demanding technical challenges the Lucene Development Team has solved in the past. It covers a number of areas of software development including concurrency & parallelism, testing infrastructure, data-structures, algorithms, API designs with respect to Garbage Collection, and Memory efficiency and efficient resource utilization. This talk doesn’t require any Apache Lucene or information-retrieval background in general. Knowledge about the Java programming language will certainly be helpful while the problems and techniques presented in this talk aren’t Java specific.
Project Voldemort is a distributed key-value store inspired by Amazon Dynamo and Memcached. It was originally developed at LinkedIn to handle high volumes of data and queries in a scalable way across multiple servers. Voldemort uses consistent hashing to partition and replicate data, vector clocks to resolve concurrent write conflicts, and a layered architecture to provide flexibility. It prioritizes performance, availability, and simplicity over more complex consistency guarantees. LinkedIn uses multiple Voldemort clusters to power various real-time services and applications.
The document discusses SQL versus NoSQL databases. It provides background on SQL databases and their advantages, then explains why some large tech companies have adopted NoSQL databases instead. Specifically, it describes how companies like Amazon, Facebook, and Google have such massive amounts of data that traditional SQL databases cannot adequately handle the scale, performance, and flexibility needs. It then summarizes some popular NoSQL databases like Cassandra, Hadoop, MongoDB that were developed to solve the challenges of scaling to big data workloads.
The document discusses Apache Cassandra, a distributed database management system designed to handle large amounts of data across many commodity servers. It was developed at Facebook and modeled after Google's Bigtable. The summary discusses key concepts like its use of consistent hashing to distribute data, support for tunable consistency levels, and focus on scalability and availability over traditional SQL features. It also provides an overview of how Cassandra differs from relational databases by not supporting joins, having an optional schema, and using a prematerialized and transaction-less model.
Cassandra from the trenches: migrating NetflixJason Brown
Jason Brown gave a presentation on Netflix's experience migrating their AB testing infrastructure from Oracle databases to Cassandra in Amazon EC2. Some key points included migrating over 950 million records of customer allocation data and AB test metadata to a denormalized data model in Cassandra using composite columns. Indexes also needed to be rebuilt in Cassandra to support the necessary queries. Ongoing operations of compactions and repairs in Cassandra required tuning to optimize performance for Netflix's workload.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
1. If it’s not SQL, it’s not a database.
2. It takes 5+ years to build a database.
3. Listen to your users.
4. Too much magic is a bad thing.
5. It’s the cloud, stupid.
Your startup on AWS - How to architect and maintain a Lean and Mean accountangelo60207
Prevent infrastructure costs from becoming a significant line item on your startup’s budget! Serial entrepreneur and software architect Angelo Mandato will share his experience with AWS Activate (startup credits from AWS) and knowledge on how to architect a lean and mean AWS account ideal for budget minded and bootstrapped startups. In this session you will learn how to manage a production ready AWS account capable of scaling as your startup grows for less than $100/month before credits. We will discuss AWS Budgets, Cost Explorer, architect priorities, and the importance of having flexible, optimized Infrastructure as Code. We will wrap everything up discussing opportunities where to save with AWS services such as S3, EC2, Load Balancers, Lambda Functions, RDS, and many others.
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfällepanagenda
Webinar Recording: https://www.panagenda.com/webinars/domino-iq-was-sie-erwartet-erste-schritte-und-anwendungsfalle/
HCL Domino iQ Server – Vom Ideenportal zur implementierten Funktion. Entdecken Sie, was es ist, was es nicht ist, und erkunden Sie die Chancen und Herausforderungen, die es bietet.
Wichtige Erkenntnisse
- Was sind Large Language Models (LLMs) und wie stehen sie im Zusammenhang mit Domino iQ
- Wesentliche Voraussetzungen für die Bereitstellung des Domino iQ Servers
- Schritt-für-Schritt-Anleitung zur Einrichtung Ihres Domino iQ Servers
- Teilen und diskutieren Sie Gedanken und Ideen, um das Potenzial von Domino iQ zu maximieren
מכונות CNC קידוח אנכיות הן הבחירה הנכונה והטובה ביותר לקידוח ארונות וארגזים לייצור רהיטים. החלק נוסע לאורך ציר ה-x באמצעות ציר דיגיטלי מדויק, ותפוס ע"י צבת מכנית, כך שאין צורך לבצע setup (התאמות) לגדלים שונים של חלקים.
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...Safe Software
The National Fuels Treatments Initiative (NFT) is transforming wildfire mitigation by creating a standardized map of nationwide fuels treatment locations across all land ownerships in the United States. While existing state and federal systems capture this data in diverse formats, NFT bridges these gaps, delivering the first truly integrated national view. This dataset will be used to measure the implementation of the National Cohesive Wildland Strategy and demonstrate the positive impact of collective investments in hazardous fuels reduction nationwide. In Phase 1, we developed an ETL pipeline template in FME Form, leveraging a schema-agnostic workflow with dynamic feature handling intended for fast roll-out and light maintenance. This was key as the initiative scaled from a few to over fifty contributors nationwide. By directly pulling from agency data stores, oftentimes ArcGIS Feature Services, NFT preserves existing structures, minimizing preparation needs. External mapping tables ensure consistent attribute and domain alignment, while robust change detection processes keep data current and actionable. Now in Phase 2, we’re migrating pipelines to FME Flow to take advantage of advanced scheduling, monitoring dashboards, and automated notifications to streamline operations. Join us to explore how this initiative exemplifies the power of technology, blending FME, ArcGIS Online, and AWS to solve a national business problem with a scalable, automated solution.
Kubernetes Security Act Now Before It’s Too LateMichael Furman
In today's cloud-native landscape, Kubernetes has become the de facto standard for orchestrating containerized applications, but its inherent complexity introduces unique security challenges. Are you one YAML away from disaster?
This presentation, "Kubernetes Security: Act Now Before It’s Too Late," is your essential guide to understanding and mitigating the critical security risks within your Kubernetes environments. This presentation dives deep into the OWASP Kubernetes Top Ten, providing actionable insights to harden your clusters.
We will cover:
The fundamental architecture of Kubernetes and why its security is paramount.
In-depth strategies for protecting your Kubernetes Control Plane, including kube-apiserver and etcd.
Crucial best practices for securing your workloads and nodes, covering topics like privileged containers, root filesystem security, and the essential role of Pod Security Admission.
Don't wait for a breach. Learn how to identify, prevent, and respond to Kubernetes security threats effectively.
It's time to act now before it's too late!
Providing an OGC API Processes REST Interface for FME FlowSafe Software
This presentation will showcase an adapter for FME Flow that provides REST endpoints for FME Workspaces following the OGC API Processes specification. The implementation delivers robust, user-friendly API endpoints, including standardized methods for parameter provision. Additionally, it enhances security and user management by supporting OAuth2 authentication. Join us to discover how these advancements can elevate your enterprise integration workflows and ensure seamless, secure interactions with FME Flow.
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc
How does your privacy program compare to your peers? What challenges are privacy teams tackling and prioritizing in 2025?
In the sixth annual Global Privacy Benchmarks Survey, we asked global privacy professionals and business executives to share their perspectives on privacy inside and outside their organizations. The annual report provides a 360-degree view of various industries' priorities, attitudes, and trends. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar features an expert panel discussion and data-driven insights to help you navigate the shifting privacy landscape. Whether you are a privacy officer, legal professional, compliance specialist, or security expert, this session will provide actionable takeaways to strengthen your privacy strategy.
This webinar will review:
- The emerging trends in data protection, compliance, and risk
- The top challenges for privacy leaders, practitioners, and organizations in 2025
- The impact of evolving regulations and the crossroads with new technology, like AI
Predictions for the future of privacy in 2025 and beyond
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/06/state-space-models-vs-transformers-for-ultra-low-power-edge-ai-a-presentation-from-brainchip/
Tony Lewis, Chief Technology Officer at BrainChip, presents the “State-space Models vs. Transformers for Ultra-low-power Edge AI” tutorial at the May 2025 Embedded Vision Summit.
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, Lewis contrasts state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area.
New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. Lewis presents a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/06/solving-tomorrows-ai-problems-today-with-cadences-newest-processor-a-presentation-from-cadence/
Amol Borkar, Product Marketing Director at Cadence, presents the “Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor” tutorial at the May 2025 Embedded Vision Summit.
Artificial Intelligence is rapidly integrating into every aspect of technology. While the neural processing unit (NPU) often receives the majority of the spotlight as the ultimate AI problem solver, it is essential to recognize that not all AI workloads can be efficiently executed on an NPU and that neural network architectures are evolving rapidly. To create efficient chips and systems with market longevity, designers must plan for diverse AI workloads that include networks yet to be invented.
In this presentation, Borkar introduces a new processor from Cadence Tensilica. This new solution is designed to complement any NPU, creating the perfect synergy between the two processing engines and establishing a robust AI subsystem able to efficiently support workloads yet to be encountered. This combination allows developers to achieve efficiency and performance on the AI workloads of today and tomorrow, paving the way for future innovations in AI-powered devices.
Domino IQ – What to Expect, First Steps and Use Casespanagenda
Webinar Recording: https://www.panagenda.com/webinars/domino-iq-what-to-expect-first-steps-and-use-cases/
HCL Domino iQ Server – From Ideas Portal to implemented Feature. Discover what it is, what it isn’t, and explore the opportunities and challenges it presents.
Key Takeaways
- What are Large Language Models (LLMs) and how do they relate to Domino iQ
- Essential prerequisites for deploying Domino iQ Server
- Step-by-step instructions on setting up your Domino iQ Server
- Share and discuss thoughts and ideas to maximize the potential of Domino iQ
Your startup on AWS - How to architect and maintain a Lean and Mean account J...angelo60207
Prevent infrastructure costs from becoming a significant line item on your startup’s budget! Serial entrepreneur and software architect Angelo Mandato will share his experience with AWS Activate (startup credits from AWS) and knowledge on how to architect a lean and mean AWS account ideal for budget minded and bootstrapped startups. In this session you will learn how to manage a production ready AWS account capable of scaling as your startup grows for less than $100/month before credits. We will discuss AWS Budgets, Cost Explorer, architect priorities, and the importance of having flexible, optimized Infrastructure as Code. We will wrap everything up discussing opportunities where to save with AWS services such as S3, EC2, Load Balancers, Lambda Functions, RDS, and many others.
Bridging the divide: A conversation on tariffs today in the book industry - T...BookNet Canada
A collaboration-focused conversation on the recently imposed US and Canadian tariffs where speakers shared insights into the current legislative landscape, ongoing advocacy efforts, and recommended next steps. This event was presented in partnership with the Book Industry Study Group.
Link to accompanying resource: https://bnctechforum.ca/sessions/bridging-the-divide-a-conversation-on-tariffs-today-in-the-book-industry/
Presented by BookNet Canada and the Book Industry Study Group on May 29, 2025 with support from the Department of Canadian Heritage.
Artificial Intelligence in the Nonprofit Boardroom.pdfOnBoard
OnBoard recently partnered with Microsoft Tech for Social Impact on the AI in the Nonprofit Boardroom Survey, an initiative designed to uncover the current and future role of artificial intelligence in nonprofit governance.
Trends Artificial Intelligence - Mary MeekerClive Dickens
Mary Meeker’s 2024 AI report highlights a seismic shift in productivity, creativity, and business value driven by generative AI. She charts the rapid adoption of tools like ChatGPT and Midjourney, likening today’s moment to the dawn of the internet. The report emphasizes AI’s impact on knowledge work, software development, and personalized services—while also cautioning about data quality, ethical use, and the human-AI partnership. In short, Meeker sees AI as a transformative force accelerating innovation and redefining how we live and work.
➡ 🌍📱👉COPY & PASTE LINK👉👉👉 ➤ ➤➤ https://drfiles.net/
Wondershare Filmora Crack is a user-friendly video editing software designed for both beginners and experienced users.
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationChristine Shepherd
AI agents are reshaping logistics and supply chain operations by enabling automation, predictive insights, and real-time decision-making across key functions such as demand forecasting, inventory management, procurement, transportation, and warehouse operations. Powered by technologies like machine learning, NLP, computer vision, and robotic process automation, these agents deliver significant benefits including cost reduction, improved efficiency, greater visibility, and enhanced adaptability to market changes. While practical use cases show measurable gains in areas like dynamic routing and real-time inventory tracking, successful implementation requires careful integration with existing systems, quality data, and strategic scaling. Despite challenges such as data integration and change management, AI agents offer a strong competitive edge, with widespread industry adoption expected by 2025.
2. C* on a SAN
● fact: C* was designed, from the start, for
commodity hardware
● more than just not requiring a SAN, C*
actually performs better without one
● SPOF
● unnecessary (large) cost
● “(un)coordinated” IO from nodes
● SANs were designed to solve problems C*
doesn’t have
3. Commit Log + Data Directory
(on the same volume)
● conflicting IO patterns
● commit log is 100% sequential append only
● data directory is (usually) random on reads
● commit log is essentially serialized
● massive difference in write
throughput under load
● NB: does not apply to SSDs or EC2
4. Oversize JVM Heaps
● 4 – 8 GB is good
(assuming sufficient ram on your boxen)
● 10 – 12 GB is not bad
(and often “correct”)
● 16GB == max
● > 16GB => badness
● heap >= boxen RAM => badness
6. not using -pr on scheduled repairs
● -pr is kind of new
● only applies to scheduled repairs
● reduces work to 1/RF (e.g. 1/3)
7. low file handle limit
● C* requires lots of file handles
(sorry, deal with it)
● Sockets and SSTables mostly
● 1024 (common default) is not sufficient
● fails in horrible miserably unpredictable ways
(though clear from the logs after the fact)
● 32K - 128K is common
● unlimited is also common, but personally I
prefer some sort of limit ...
8. Load Balacners
(in front of C*)
● clients will load balance
(C* has no master so this can work reliably)
● SPOF
● performance bottle neck
● unneeded complexity
● unneeded cost
9. restricting clients to a single node
● why?
● no really, I don’t understand how
this was thought to be a good idea
● thedailywtf.com territory
10. Unbalanced Ring
● used to be the number one
problem encountered
● OPSC automates the resolution of
this to two clicks (do it + confirm)
even across multiple data centers
● related: don’t let C* auto pick your
tokens, always specify initial_token
11. Row Cache + Slice Queries
● the row cache is a row cache, not a query cache or
slice cache or magic cache or WTF-ever-you-thought-
it-was cache
● for the obvious impaired: that’s why we called it a
row cache – because it caches rows
● laughable performance difference in some extreme
cases (e.g. 100X increase in throughput, 10X drop in
latency, maxed cpu to under 10% average)
12. Row Cache + Large Rows
● 2GB row? yeah, lets cache that !!!
● related: wtf are you doing trying to
read a 2GB row all at once anyway?
13. OPP/BOP
● if you think you need BOP, check
again
● no seriously, you’re doing it wrong
● if you use BOP anyway:
● IRC will mock you
● your OPS team will plan your disappearance
● I will setup a auto reply for your entire domain
that responds solely with “stop using BOP”
14. Unbounded Batches
● batches are sent as a single message
● they must fit entirely in memory
(both server side and client side)
● best size is very much an empirical
exercise depending on your HW, load,
data model, moon phase, etc
(start with 10 – 100 and tune)
● NB: streaming transport will address
this in future releases
15. Bad Rotational Math
● rotational disks require seek time
● 5ms is a fast seek time for a rotational disk
● you cannot get thousands of random seeks
per second from rotational disks
● caches/memory alleviate this, SSDs solve it
● maths are teh hard? buy SSDs
● everything fits in memory? I don’t care what
disks you buy
16. 32 Bit JVMs
● C* deals (usually) with BigData
● 32 bits cannot address BigData
● mmap, file offsets, heaps, caches
● always wrong? no, I guess not ...
17. EBS volumes
● nice in theory, but ...
● not predictable
● freezes common
● outages common
● stripe ephemeral drives instead
● provisioned IOPS EBS?
future hazy, ask again later
18. Non-Sun (err, Oracle) JVM
● at least u22, but in general the
latest release
(unless you have specific reasons otherwise)
● this is changing
● some people (successfully) use
OpenJDK anyway
19. Super Columns
● 10-15 percent overhead on reads and writes
● entire super column is always held in memory
at all stages
● most C* devs hate working on them
● C* and DataStax is committed to maintaining
the API going forward, but they should be
avoided for new projects
● composite columns are an alternative
20. Not Running OPSC
● extremely useful postmortem
● trivial (usually) to setup
● DataStax offers a free version
(you have no excuse now)