|
1 | 1 | Getting Started |
2 | 2 | =============== |
3 | 3 |
|
4 | | -Before we can start executing any queries against Cassandra we need to setup our cluster. Setting up our cluster |
5 | | -allows us to set specific options like what port CQL native transport is listening to for connections, SSL options, |
6 | | -the loadbalancing policy to use for handling queries and more which can be found here :doc:`/api/cassandra/cluster` :: |
| 4 | +First, make sure you have the driver properly :doc:`installed <installation>`. |
7 | 5 |
|
8 | | - from cassandra.cluster import Cluster |
| 6 | +Connecting to Cassandra |
| 7 | +----------------------- |
| 8 | +Before we can start executing any queries against Cassandra we need to setup |
| 9 | +our :class:`~.Cluster`. As the name suggests, you will typically have one |
| 10 | +instance of :class:`~.Cluster` for each Cassandra cluster you want to interact |
| 11 | +with. |
9 | 12 |
|
10 | | - options = { |
11 | | - 'contact_points': ['10.1.1.3', '10.1.1.4', '10.1.1.5'], |
12 | | - 'port': 9042 |
13 | | - } |
| 13 | +The simplest way to create a :class:`~.Cluster` is like this |
14 | 14 |
|
15 | | - cluster = Cluster(**options) |
16 | | - session = cluster.connect(keyspace='users') |
| 15 | +.. code-block:: python |
17 | 16 |
|
18 | | -Instantiating a cluster does not actually connect us to any nodes. To begin executing queries we need a session, which is created by calling cluster.connect(). connect takes an optional 'keyspace' argument allowing all queries in that session to be operated on that keyspace. Alternatively, you can set the keyspace after the session is created. Sessions should NOT be instantiated outside the use of a Cluster. The Cluster handles the |
19 | | -creation and disposal of sessions. :: |
| 17 | + from cassandra.cluster import Cluster |
| 18 | + cluster = Cluster(['10.1.1.3', '10.1.1.4', '10.1.1.5']) |
20 | 19 |
|
21 | | - session.set_keyspace('users') |
| 20 | +The set of IP addresses we pass to the :class:`~.Cluster` are simply |
| 21 | +an initial set of contact points. After the driver connects to one |
| 22 | +of these addresses it will automatically discover the rest of the |
| 23 | +nodes in the cluster and connect to them, so you don't need to list |
| 24 | +every node in your cluster. |
22 | 25 |
|
23 | | -Now that we have a session we can begin to execute queries. If you are using Cassandra in a way that allows you to not block calls you can execute queries asynchronously. More about the execute and execute_async functions can be found here :doc:`/api/cassandra/cluster` (this should link to execute and execute_async) :: |
| 26 | +If you need to use a non-standard port, use SSL, or customize the driver's |
| 27 | +behavior in some other way, this is the place to do it: |
24 | 28 |
|
25 | | - # without async (blocking) |
26 | | - result = session.execute('SELECT * FROM users') |
27 | | - for row in results: |
28 | | - print row |
| 29 | +.. code-block:: python |
29 | 30 |
|
30 | | - # with async |
31 | | - def handle_success(results): |
32 | | - for row in results: |
33 | | - print row |
| 31 | + from cassandra.cluster import Cluster |
| 32 | + from cassandra.polices import DCAwareRoundRobinPolicy |
34 | 33 |
|
35 | | - def handle_error(exception_error): |
36 | | - print exception_error |
| 34 | + cluster = Cluster( |
| 35 | + contact_points=['10.1.1.3', '10.1.1.4', '10.1.1.5'], |
| 36 | + load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='US_EAST'), |
| 37 | + port=9042) |
37 | 38 |
|
38 | | - future = session.execute_async('SELECT * FROM users') |
39 | | - future.add_callbacks(handle_success, handle_error) |
40 | 39 |
|
41 | | -When executing queries from a session, the driver picks a Cassandra node to act as the coordinator. As shown earlier in our Cluster example with the options we passed several ip addresses which the driver can communicate with. By default the driver will touch those nodes in the list and grab the ip addresses of any nodes that those nodes have found via gossip. To change this behavior you can use different Load Balancing policies that are available here :doc:`/api/cassandra/policies` :: |
| 40 | +You can find a more complete list of options in the :class:`~.Cluster` documentation. |
42 | 41 |
|
43 | | - from cassandra.cluster import Cluster |
44 | | - from cassandra.policies import DCAwareRoundRobinPolicy |
| 42 | +Instantiating a :class:`~.Cluster` does not actually connect us to any nodes. |
| 43 | +To establish connections and begin executing queries we need a |
| 44 | +:class:`~.Session`, which is created by calling :meth:`.Cluster.connect()`. |
| 45 | +The :meth:`~.Cluster.connect()` method takes an optional ``keyspace`` argument |
| 46 | +which sets the default keyspace for all queries made through that :class:`~.Session`: |
45 | 47 |
|
46 | | - options = { |
47 | | - 'contact_points': ['10.1.1.3', '10.1.1.4', '10.1.1.5'], |
48 | | - 'port': 9042, |
49 | | - 'load_balancing_policy': DCAwareRoundRobinPolicy(local_dc='datacenter1') |
50 | | - } |
| 48 | +.. code-block:: python |
51 | 49 |
|
52 | | - cluster = Cluster(**options) |
| 50 | + cluster = Cluster(['10.1.1.3', '10.1.1.4', '10.1.1.5']) |
| 51 | + session = cluster.connect('mykeyspace') |
| 52 | +
|
| 53 | +
|
| 54 | +You can always change a Sesssion's keyspace using :meth:`~.Session.set_keyspace` or |
| 55 | +by executing a ``USE <keyspace>`` query: |
| 56 | + |
| 57 | +.. code-block:: python |
| 58 | +
|
| 59 | + session.set_keyspace('users') |
| 60 | + # or you can do this instead |
| 61 | + session.execute('USE users') |
| 62 | +
|
| 63 | +
|
| 64 | +Executing Queries |
| 65 | +----------------- |
| 66 | +Now that we have a :class:`.Session` we can begin to execute queries. The most |
| 67 | +basic and natural way to execute a query is to use :meth:`~.Session.execute()`: |
| 68 | + |
| 69 | +.. code-block:: python |
| 70 | +
|
| 71 | + rows = session.execute('SELECT name, age, email FROM users') |
| 72 | + for user_row in rows: |
| 73 | + print user_row.name, user_row.age, user_row.email |
| 74 | +
|
| 75 | +This will transparently pick a Cassandra node to execute the query against |
| 76 | +and handle any retries that are necessary if the operation fails. |
| 77 | + |
| 78 | +By default, each row in the result set will be a |
| 79 | +`namedtuple <http://docs.python.org/2/library/collections.html#collections.namedtuple>`_. |
| 80 | +Each row will have a matching attribute for each column defined in the schema, |
| 81 | +such as ``name``, ``age``, and so on. You can also treat them as normal tuples |
| 82 | +by unpacking them or accessing fields by position: |
| 83 | + |
| 84 | +.. code-block:: python |
| 85 | +
|
| 86 | + rows = session.execute('SELECT name, age, email FROM users') |
| 87 | + for (name, age, email) in rows: |
| 88 | + print name, age, email |
| 89 | +
|
| 90 | +.. code-block:: python |
| 91 | +
|
| 92 | + rows = session.execute('SELECT name, age, email FROM users') |
| 93 | + names = [row[0] for row in rows] |
| 94 | + ages = [row[1] for row in rows] |
| 95 | + emails = [row[2] for row in rows] |
| 96 | +
|
| 97 | +If you prefer another result format, such as a ``dict`` per row, you |
| 98 | +can change the :attr:`~.Session.row_factory` attribute. |
| 99 | + |
| 100 | +Passing Parameters to CQL Queries |
| 101 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 102 | +When executing non-prepared statements, the driver supports two forms of |
| 103 | +parameter place-holders: positional and named. |
| 104 | + |
| 105 | +Positional parameters are used with a ``%s`` placeholder. For example, |
| 106 | +when you execute: |
| 107 | + |
| 108 | +.. code-block:: python |
| 109 | +
|
| 110 | + session.execute( |
| 111 | + """ |
| 112 | + INSERT INTO users (name, credits, user_id) |
| 113 | + VALUES (%s, %s, %s) |
| 114 | + """ |
| 115 | + ("John O'Reilly", 42, uuid.uuid1()) |
| 116 | + ) |
| 117 | +
|
| 118 | +It is translated to the following CQL query: |
| 119 | + |
| 120 | +.. code-block:: SQL |
| 121 | +
|
| 122 | + INSERT INTO users (name, credits, user_id) |
| 123 | + VALUES ('John O''Reilly', 42, 2644bada-852c-11e3-89fb-e0b9a54a6d93) |
| 124 | +
|
| 125 | +Note that you should use ``%s`` for all types of arguments, not just strings. |
| 126 | +For example, this would be **wrong**: |
| 127 | + |
| 128 | +.. code-block:: python |
| 129 | +
|
| 130 | + session.execute("INSERT INTO USERS (name, age) VALUES (%s, %d)", ("bob", 42)) # wrong |
| 131 | +
|
| 132 | +Instead, use ``%s`` for the age placeholder. |
| 133 | + |
| 134 | +If you need to use a literal ``%`` character, use ``%%``. |
| 135 | + |
| 136 | +**Note**: you must always use a sequence for the second argument, even if you are |
| 137 | +only passing in a single variable: |
| 138 | + |
| 139 | +.. code-block:: python |
| 140 | +
|
| 141 | + session.execute("INSERT INTO foo (bar) VALUES (%s)", "blah") # wrong |
| 142 | + session.execute("INSERT INTO foo (bar) VALUES (%s)", ("blah")) # wrong |
| 143 | + session.execute("INSERT INTO foo (bar) VALUES (%s)", ("blah", )) # right |
| 144 | + session.execute("INSERT INTO foo (bar) VALUES (%s)", ["blah"]) # right |
| 145 | +
|
| 146 | +
|
| 147 | +Note that the second line is incorrect because in Python, single-element tuples |
| 148 | +require a comma. |
| 149 | + |
| 150 | +Named place-holders use the ``%(name)s`` form: |
| 151 | + |
| 152 | +.. code-block:: python |
| 153 | +
|
| 154 | + session.execute( |
| 155 | + """ |
| 156 | + INSERT INTO users (name, credits, user_id, username) |
| 157 | + VALUES (%(name)s, %(credits)s, %(user_id)s, %(name)s) |
| 158 | + """ |
| 159 | + {'name': "John O'Reilly", 'credits': 42, 'user_id': uuid.uuid1()} |
| 160 | + ) |
| 161 | +
|
| 162 | +Note that you can repeat placeholders with the same name, such as ``%(name)s`` |
| 163 | +in the above example. |
| 164 | + |
| 165 | +Only data values should be supplied this way. Other items, such as keyspaces, |
| 166 | +table names, and column names should be set ahead of time (typically using |
| 167 | +normal string formatting). |
| 168 | + |
| 169 | +Type Conversions |
| 170 | +^^^^^^^^^^^^^^^^ |
| 171 | +For non-prepared statements, Python types are cast to CQL literals in the |
| 172 | +following way: |
| 173 | + |
| 174 | +.. table:: |
| 175 | + |
| 176 | + +--------------------+-------------------------+ |
| 177 | + | Python Type | CQL Literal Type | |
| 178 | + +====================+=========================+ |
| 179 | + | ``None`` | ``NULL`` | |
| 180 | + +--------------------+-------------------------+ |
| 181 | + | ``bool`` | ``bool`` | |
| 182 | + +--------------------+-------------------------+ |
| 183 | + | ``float`` | | ``float`` | |
| 184 | + | | | ``double`` | |
| 185 | + +--------------------+-------------------------+ |
| 186 | + | | ``int`` | | ``int`` | |
| 187 | + | | ``long`` | | ``bigint`` | |
| 188 | + | | | ``varint`` | |
| 189 | + | | | ``counter`` | |
| 190 | + +--------------------+-------------------------+ |
| 191 | + | ``decimal.Decimal``| ``decimal`` | |
| 192 | + +--------------------+-------------------------+ |
| 193 | + | | ``str`` | | ``ascii`` | |
| 194 | + | | ``unicode`` | | ``varchar`` | |
| 195 | + | | | ``text`` | |
| 196 | + +--------------------+-------------------------+ |
| 197 | + | | ``buffer`` | ``blob`` | |
| 198 | + | | ``bytearray`` | | |
| 199 | + +--------------------+-------------------------+ |
| 200 | + | | ``date`` | ``timestamp`` | |
| 201 | + | | ``datetime`` | | |
| 202 | + +--------------------+-------------------------+ |
| 203 | + | | ``list`` | ``list`` | |
| 204 | + | | ``tuple`` | | |
| 205 | + | | generator | | |
| 206 | + +--------------------+-------------------------+ |
| 207 | + | | ``set`` | ``set`` | |
| 208 | + | | ``frozenset`` | | |
| 209 | + +--------------------+-------------------------+ |
| 210 | + | | ``dict`` | ``map`` | |
| 211 | + | | ``OrderedDict`` | | |
| 212 | + +--------------------+-------------------------+ |
| 213 | + | ``uuid.UUID`` | | ``timeuuid`` | |
| 214 | + | | | ``uuid`` | |
| 215 | + +--------------------+-------------------------+ |
| 216 | + |
| 217 | + |
| 218 | +Asynchronous Queries |
| 219 | +^^^^^^^^^^^^^^^^^^^^ |
| 220 | +The driver supports asynchronous query execution through |
| 221 | +:meth:`~.Session.execute_async()`. Instead of waiting for the query to |
| 222 | +complete and returning rows directly, this method almost immediately |
| 223 | +returns a :class:`~.ResponseFuture` object. There are two ways of |
| 224 | +getting the final result from this object. |
| 225 | + |
| 226 | +The first is by calling :meth:`~.ResponseFuture.result()` on it. If |
| 227 | +the query has not yet completed, this will block until it has and |
| 228 | +then return the result or raise an Exception if an error occurred. |
| 229 | +For example: |
| 230 | + |
| 231 | +.. code-block:: python |
| 232 | +
|
| 233 | + from cassandra import ReadTimeout |
| 234 | +
|
| 235 | + query = "SELECT * FROM users WHERE user_id=%s" |
| 236 | + future = session.execute_async(query, [user_id]) |
| 237 | +
|
| 238 | + # ... do some other work |
| 239 | +
|
| 240 | + try: |
| 241 | + rows = future.result() |
| 242 | + user = rows[0] |
| 243 | + print user.name, user.age |
| 244 | + except ReadTimeout: |
| 245 | + log.exception("Query timed out:") |
| 246 | +
|
| 247 | +This works well for executing many queries concurrently: |
| 248 | + |
| 249 | +.. code-block:: python |
| 250 | +
|
| 251 | + # build a list of futures |
| 252 | + futures = [] |
| 253 | + query = "SELECT * FROM users WHERE user_id=%s" |
| 254 | + for user_id in ids_to_fetch: |
| 255 | + futures.append(session.execute_async(query, [user_id]) |
| 256 | +
|
| 257 | + # wait for them to complete and use the results |
| 258 | + for future in futures: |
| 259 | + rows = future.result() |
| 260 | + print rows[0].name |
| 261 | +
|
| 262 | +Alternatively, instead of calling :meth:`~.ResponseFuture.result()`, |
| 263 | +you can attach callback and errback functions through the |
| 264 | +:meth:`~.ResponseFuture.add_callback()`, |
| 265 | +:meth:`~.ResponseFuture.add_errback()`, and |
| 266 | +:meth:`~.ResponseFuture.add_callbacks()`, methods. If you have used |
| 267 | +Twisted Python before, this is designed to be a lightweight version of |
| 268 | +that: |
| 269 | +
|
| 270 | +.. code-block:: python |
| 271 | +
|
| 272 | + def handle_success(rows): |
| 273 | + user = rows[0] |
| 274 | + try: |
| 275 | + process_user(user.name, user.age, user.id) |
| 276 | + except Exception: |
| 277 | + log.error("Failed to process user %s", user.id) |
| 278 | + # don't re-raise errors in the callback |
| 279 | +
|
| 280 | + def handle_error(exception): |
| 281 | + log.error("Failed to fetch user info: %s", exception) |
| 282 | +
|
| 283 | +
|
| 284 | + future = session.execute_async(query) |
| 285 | + future.add_callbacks(handle_success, handle_error) |
| 286 | +
|
| 287 | +There are a few important things to remember when working with callbacks: |
| 288 | + * **Exceptions that are raised inside the callback functions will be logged and then ignored.** |
| 289 | + * Your callback will be run on the event loop thread, so any long-running |
| 290 | + operations will prevent other requests from being handled |
| 291 | +
|
| 292 | +
|
| 293 | +Setting a Consistency Level |
| 294 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 295 | +The consistency level used for a query determines how many of the |
| 296 | +replicas of the data you are interacting with need to respond for |
| 297 | +the query to be considered a success. |
| 298 | +
|
| 299 | +By default, :attr:`.ConsistencyLevel.ONE` will be used for all queries. To |
| 300 | +specify a different consistency level, you will need to wrap your queries |
| 301 | +in a :class:`~.SimpleStatement`: |
| 302 | +
|
| 303 | +.. code-block:: python |
| 304 | +
|
| 305 | + from cassandra import ConsistencyLevel |
| 306 | + from cassandra.query import SimpleStatement |
| 307 | +
|
| 308 | + query = SimpleStatement( |
| 309 | + "INSERT INTO users (name, age) VALUES (%s, %s)", |
| 310 | + consistency_level=ConsistencyLevel.QUORUM) |
| 311 | + session.execute(query, ('John', 42)) |
| 312 | +
|
| 313 | +Prepared Statements |
| 314 | +------------------- |
| 315 | +Prepared statements are queries that are parsed by Cassandra and then saved |
| 316 | +for later use. When the driver uses a prepared statement, it only needs to |
| 317 | +send the values of parameters to bind. This lowers network traffic |
| 318 | +and CPU utilization within Cassandra because Cassandra does not have to |
| 319 | +re-parse the query each time. |
| 320 | +
|
| 321 | +To prepare a query, use :meth:`.Session.prepare()`: |
| 322 | +
|
| 323 | +.. code-block:: python |
| 324 | +
|
| 325 | + user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?") |
| 326 | +
|
| 327 | + users = [] |
| 328 | + for user_id in user_ids_to_query: |
| 329 | + user = session.execute(user_lookup_stmt, [user_id]) |
| 330 | + users.append(user) |
| 331 | +
|
| 332 | +:meth:`~.Session.prepare()` returns a :class:`~.PreparedStatement` instance |
| 333 | +which can be used in place of :class:`~.SimpleStatement` instances or literal |
| 334 | +string queries. It is automatically prepared against all nodes, and the driver |
| 335 | +handles re-preparing against new nodes and restarted nodes when necessary. |
| 336 | +
|
| 337 | +Note that the placeholders for prepared statements are ``?`` characters. This |
| 338 | +is different than for simple, non-prepared statements (although future versions |
| 339 | +of the driver may use the same placeholders for both). Cassandra 2.0 added |
| 340 | +support for named placeholders; the 1.0 version of the driver does not support |
| 341 | +them, but the 2.0 version will. |
0 commit comments