Skip to content

Commit c77351d

Browse files
committed
Installation, more complete Getting Started guide
1 parent 496e63d commit c77351d

2 files changed

Lines changed: 435 additions & 35 deletions

File tree

docs/api/getting_started.rst

Lines changed: 324 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,341 @@
11
Getting Started
22
===============
33

4-
Before we can start executing any queries against Cassandra we need to setup our cluster. Setting up our cluster
5-
allows us to set specific options like what port CQL native transport is listening to for connections, SSL options,
6-
the loadbalancing policy to use for handling queries and more which can be found here :doc:`/api/cassandra/cluster` ::
4+
First, make sure you have the driver properly :doc:`installed <installation>`.
75

8-
from cassandra.cluster import Cluster
6+
Connecting to Cassandra
7+
-----------------------
8+
Before we can start executing any queries against Cassandra we need to setup
9+
our :class:`~.Cluster`. As the name suggests, you will typically have one
10+
instance of :class:`~.Cluster` for each Cassandra cluster you want to interact
11+
with.
912

10-
options = {
11-
'contact_points': ['10.1.1.3', '10.1.1.4', '10.1.1.5'],
12-
'port': 9042
13-
}
13+
The simplest way to create a :class:`~.Cluster` is like this
1414

15-
cluster = Cluster(**options)
16-
session = cluster.connect(keyspace='users')
15+
.. code-block:: python
1716
18-
Instantiating a cluster does not actually connect us to any nodes. To begin executing queries we need a session, which is created by calling cluster.connect(). connect takes an optional 'keyspace' argument allowing all queries in that session to be operated on that keyspace. Alternatively, you can set the keyspace after the session is created. Sessions should NOT be instantiated outside the use of a Cluster. The Cluster handles the
19-
creation and disposal of sessions. ::
17+
from cassandra.cluster import Cluster
18+
cluster = Cluster(['10.1.1.3', '10.1.1.4', '10.1.1.5'])
2019
21-
session.set_keyspace('users')
20+
The set of IP addresses we pass to the :class:`~.Cluster` are simply
21+
an initial set of contact points. After the driver connects to one
22+
of these addresses it will automatically discover the rest of the
23+
nodes in the cluster and connect to them, so you don't need to list
24+
every node in your cluster.
2225

23-
Now that we have a session we can begin to execute queries. If you are using Cassandra in a way that allows you to not block calls you can execute queries asynchronously. More about the execute and execute_async functions can be found here :doc:`/api/cassandra/cluster` (this should link to execute and execute_async) ::
26+
If you need to use a non-standard port, use SSL, or customize the driver's
27+
behavior in some other way, this is the place to do it:
2428

25-
# without async (blocking)
26-
result = session.execute('SELECT * FROM users')
27-
for row in results:
28-
print row
29+
.. code-block:: python
2930
30-
# with async
31-
def handle_success(results):
32-
for row in results:
33-
print row
31+
from cassandra.cluster import Cluster
32+
from cassandra.polices import DCAwareRoundRobinPolicy
3433
35-
def handle_error(exception_error):
36-
print exception_error
34+
cluster = Cluster(
35+
contact_points=['10.1.1.3', '10.1.1.4', '10.1.1.5'],
36+
load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='US_EAST'),
37+
port=9042)
3738
38-
future = session.execute_async('SELECT * FROM users')
39-
future.add_callbacks(handle_success, handle_error)
4039
41-
When executing queries from a session, the driver picks a Cassandra node to act as the coordinator. As shown earlier in our Cluster example with the options we passed several ip addresses which the driver can communicate with. By default the driver will touch those nodes in the list and grab the ip addresses of any nodes that those nodes have found via gossip. To change this behavior you can use different Load Balancing policies that are available here :doc:`/api/cassandra/policies` ::
40+
You can find a more complete list of options in the :class:`~.Cluster` documentation.
4241

43-
from cassandra.cluster import Cluster
44-
from cassandra.policies import DCAwareRoundRobinPolicy
42+
Instantiating a :class:`~.Cluster` does not actually connect us to any nodes.
43+
To establish connections and begin executing queries we need a
44+
:class:`~.Session`, which is created by calling :meth:`.Cluster.connect()`.
45+
The :meth:`~.Cluster.connect()` method takes an optional ``keyspace`` argument
46+
which sets the default keyspace for all queries made through that :class:`~.Session`:
4547

46-
options = {
47-
'contact_points': ['10.1.1.3', '10.1.1.4', '10.1.1.5'],
48-
'port': 9042,
49-
'load_balancing_policy': DCAwareRoundRobinPolicy(local_dc='datacenter1')
50-
}
48+
.. code-block:: python
5149
52-
cluster = Cluster(**options)
50+
cluster = Cluster(['10.1.1.3', '10.1.1.4', '10.1.1.5'])
51+
session = cluster.connect('mykeyspace')
52+
53+
54+
You can always change a Sesssion's keyspace using :meth:`~.Session.set_keyspace` or
55+
by executing a ``USE <keyspace>`` query:
56+
57+
.. code-block:: python
58+
59+
session.set_keyspace('users')
60+
# or you can do this instead
61+
session.execute('USE users')
62+
63+
64+
Executing Queries
65+
-----------------
66+
Now that we have a :class:`.Session` we can begin to execute queries. The most
67+
basic and natural way to execute a query is to use :meth:`~.Session.execute()`:
68+
69+
.. code-block:: python
70+
71+
rows = session.execute('SELECT name, age, email FROM users')
72+
for user_row in rows:
73+
print user_row.name, user_row.age, user_row.email
74+
75+
This will transparently pick a Cassandra node to execute the query against
76+
and handle any retries that are necessary if the operation fails.
77+
78+
By default, each row in the result set will be a
79+
`namedtuple <http://docs.python.org/2/library/collections.html#collections.namedtuple>`_.
80+
Each row will have a matching attribute for each column defined in the schema,
81+
such as ``name``, ``age``, and so on. You can also treat them as normal tuples
82+
by unpacking them or accessing fields by position:
83+
84+
.. code-block:: python
85+
86+
rows = session.execute('SELECT name, age, email FROM users')
87+
for (name, age, email) in rows:
88+
print name, age, email
89+
90+
.. code-block:: python
91+
92+
rows = session.execute('SELECT name, age, email FROM users')
93+
names = [row[0] for row in rows]
94+
ages = [row[1] for row in rows]
95+
emails = [row[2] for row in rows]
96+
97+
If you prefer another result format, such as a ``dict`` per row, you
98+
can change the :attr:`~.Session.row_factory` attribute.
99+
100+
Passing Parameters to CQL Queries
101+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
102+
When executing non-prepared statements, the driver supports two forms of
103+
parameter place-holders: positional and named.
104+
105+
Positional parameters are used with a ``%s`` placeholder. For example,
106+
when you execute:
107+
108+
.. code-block:: python
109+
110+
session.execute(
111+
"""
112+
INSERT INTO users (name, credits, user_id)
113+
VALUES (%s, %s, %s)
114+
"""
115+
("John O'Reilly", 42, uuid.uuid1())
116+
)
117+
118+
It is translated to the following CQL query:
119+
120+
.. code-block:: SQL
121+
122+
INSERT INTO users (name, credits, user_id)
123+
VALUES ('John O''Reilly', 42, 2644bada-852c-11e3-89fb-e0b9a54a6d93)
124+
125+
Note that you should use ``%s`` for all types of arguments, not just strings.
126+
For example, this would be **wrong**:
127+
128+
.. code-block:: python
129+
130+
session.execute("INSERT INTO USERS (name, age) VALUES (%s, %d)", ("bob", 42)) # wrong
131+
132+
Instead, use ``%s`` for the age placeholder.
133+
134+
If you need to use a literal ``%`` character, use ``%%``.
135+
136+
**Note**: you must always use a sequence for the second argument, even if you are
137+
only passing in a single variable:
138+
139+
.. code-block:: python
140+
141+
session.execute("INSERT INTO foo (bar) VALUES (%s)", "blah") # wrong
142+
session.execute("INSERT INTO foo (bar) VALUES (%s)", ("blah")) # wrong
143+
session.execute("INSERT INTO foo (bar) VALUES (%s)", ("blah", )) # right
144+
session.execute("INSERT INTO foo (bar) VALUES (%s)", ["blah"]) # right
145+
146+
147+
Note that the second line is incorrect because in Python, single-element tuples
148+
require a comma.
149+
150+
Named place-holders use the ``%(name)s`` form:
151+
152+
.. code-block:: python
153+
154+
session.execute(
155+
"""
156+
INSERT INTO users (name, credits, user_id, username)
157+
VALUES (%(name)s, %(credits)s, %(user_id)s, %(name)s)
158+
"""
159+
{'name': "John O'Reilly", 'credits': 42, 'user_id': uuid.uuid1()}
160+
)
161+
162+
Note that you can repeat placeholders with the same name, such as ``%(name)s``
163+
in the above example.
164+
165+
Only data values should be supplied this way. Other items, such as keyspaces,
166+
table names, and column names should be set ahead of time (typically using
167+
normal string formatting).
168+
169+
Type Conversions
170+
^^^^^^^^^^^^^^^^
171+
For non-prepared statements, Python types are cast to CQL literals in the
172+
following way:
173+
174+
.. table::
175+
176+
+--------------------+-------------------------+
177+
| Python Type | CQL Literal Type |
178+
+====================+=========================+
179+
| ``None`` | ``NULL`` |
180+
+--------------------+-------------------------+
181+
| ``bool`` | ``bool`` |
182+
+--------------------+-------------------------+
183+
| ``float`` | | ``float`` |
184+
| | | ``double`` |
185+
+--------------------+-------------------------+
186+
| | ``int`` | | ``int`` |
187+
| | ``long`` | | ``bigint`` |
188+
| | | ``varint`` |
189+
| | | ``counter`` |
190+
+--------------------+-------------------------+
191+
| ``decimal.Decimal``| ``decimal`` |
192+
+--------------------+-------------------------+
193+
| | ``str`` | | ``ascii`` |
194+
| | ``unicode`` | | ``varchar`` |
195+
| | | ``text`` |
196+
+--------------------+-------------------------+
197+
| | ``buffer`` | ``blob`` |
198+
| | ``bytearray`` | |
199+
+--------------------+-------------------------+
200+
| | ``date`` | ``timestamp`` |
201+
| | ``datetime`` | |
202+
+--------------------+-------------------------+
203+
| | ``list`` | ``list`` |
204+
| | ``tuple`` | |
205+
| | generator | |
206+
+--------------------+-------------------------+
207+
| | ``set`` | ``set`` |
208+
| | ``frozenset`` | |
209+
+--------------------+-------------------------+
210+
| | ``dict`` | ``map`` |
211+
| | ``OrderedDict`` | |
212+
+--------------------+-------------------------+
213+
| ``uuid.UUID`` | | ``timeuuid`` |
214+
| | | ``uuid`` |
215+
+--------------------+-------------------------+
216+
217+
218+
Asynchronous Queries
219+
^^^^^^^^^^^^^^^^^^^^
220+
The driver supports asynchronous query execution through
221+
:meth:`~.Session.execute_async()`. Instead of waiting for the query to
222+
complete and returning rows directly, this method almost immediately
223+
returns a :class:`~.ResponseFuture` object. There are two ways of
224+
getting the final result from this object.
225+
226+
The first is by calling :meth:`~.ResponseFuture.result()` on it. If
227+
the query has not yet completed, this will block until it has and
228+
then return the result or raise an Exception if an error occurred.
229+
For example:
230+
231+
.. code-block:: python
232+
233+
from cassandra import ReadTimeout
234+
235+
query = "SELECT * FROM users WHERE user_id=%s"
236+
future = session.execute_async(query, [user_id])
237+
238+
# ... do some other work
239+
240+
try:
241+
rows = future.result()
242+
user = rows[0]
243+
print user.name, user.age
244+
except ReadTimeout:
245+
log.exception("Query timed out:")
246+
247+
This works well for executing many queries concurrently:
248+
249+
.. code-block:: python
250+
251+
# build a list of futures
252+
futures = []
253+
query = "SELECT * FROM users WHERE user_id=%s"
254+
for user_id in ids_to_fetch:
255+
futures.append(session.execute_async(query, [user_id])
256+
257+
# wait for them to complete and use the results
258+
for future in futures:
259+
rows = future.result()
260+
print rows[0].name
261+
262+
Alternatively, instead of calling :meth:`~.ResponseFuture.result()`,
263+
you can attach callback and errback functions through the
264+
:meth:`~.ResponseFuture.add_callback()`,
265+
:meth:`~.ResponseFuture.add_errback()`, and
266+
:meth:`~.ResponseFuture.add_callbacks()`, methods. If you have used
267+
Twisted Python before, this is designed to be a lightweight version of
268+
that:
269+
270+
.. code-block:: python
271+
272+
def handle_success(rows):
273+
user = rows[0]
274+
try:
275+
process_user(user.name, user.age, user.id)
276+
except Exception:
277+
log.error("Failed to process user %s", user.id)
278+
# don't re-raise errors in the callback
279+
280+
def handle_error(exception):
281+
log.error("Failed to fetch user info: %s", exception)
282+
283+
284+
future = session.execute_async(query)
285+
future.add_callbacks(handle_success, handle_error)
286+
287+
There are a few important things to remember when working with callbacks:
288+
* **Exceptions that are raised inside the callback functions will be logged and then ignored.**
289+
* Your callback will be run on the event loop thread, so any long-running
290+
operations will prevent other requests from being handled
291+
292+
293+
Setting a Consistency Level
294+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
295+
The consistency level used for a query determines how many of the
296+
replicas of the data you are interacting with need to respond for
297+
the query to be considered a success.
298+
299+
By default, :attr:`.ConsistencyLevel.ONE` will be used for all queries. To
300+
specify a different consistency level, you will need to wrap your queries
301+
in a :class:`~.SimpleStatement`:
302+
303+
.. code-block:: python
304+
305+
from cassandra import ConsistencyLevel
306+
from cassandra.query import SimpleStatement
307+
308+
query = SimpleStatement(
309+
"INSERT INTO users (name, age) VALUES (%s, %s)",
310+
consistency_level=ConsistencyLevel.QUORUM)
311+
session.execute(query, ('John', 42))
312+
313+
Prepared Statements
314+
-------------------
315+
Prepared statements are queries that are parsed by Cassandra and then saved
316+
for later use. When the driver uses a prepared statement, it only needs to
317+
send the values of parameters to bind. This lowers network traffic
318+
and CPU utilization within Cassandra because Cassandra does not have to
319+
re-parse the query each time.
320+
321+
To prepare a query, use :meth:`.Session.prepare()`:
322+
323+
.. code-block:: python
324+
325+
user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?")
326+
327+
users = []
328+
for user_id in user_ids_to_query:
329+
user = session.execute(user_lookup_stmt, [user_id])
330+
users.append(user)
331+
332+
:meth:`~.Session.prepare()` returns a :class:`~.PreparedStatement` instance
333+
which can be used in place of :class:`~.SimpleStatement` instances or literal
334+
string queries. It is automatically prepared against all nodes, and the driver
335+
handles re-preparing against new nodes and restarted nodes when necessary.
336+
337+
Note that the placeholders for prepared statements are ``?`` characters. This
338+
is different than for simple, non-prepared statements (although future versions
339+
of the driver may use the same placeholders for both). Cassandra 2.0 added
340+
support for named placeholders; the 1.0 version of the driver does not support
341+
them, but the 2.0 version will.

0 commit comments

Comments
 (0)