Cloudsearch DG

Download as pdf or txt
Download as pdf or txt
You are on page 1of 210

Amazon CloudSearch

Developer Guide API Version 2011-02-01

Amazon CloudSearch Developer Guide

Amazon CloudSearch: Developer Guide


Copyright 2012 Amazon Web Services LLC or its affiliates. All rights reserved. The following are trademarks or registered trademarks of Amazon: Amazon, Amazon.com, Amazon.com Design, Amazon DevPay, Amazon EC2, Amazon Web Services Design, AWS, CloudFront, EC2, Elastic Compute Cloud, Kindle, and Mechanical Turk. In addition, Amazon.com graphics, logos, page headers, button icons, scripts, and service names are trademarks, or trade dress of Amazon in the U.S. and/or other countries. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

Amazon CloudSearch Developer Guide

What Is Amazon CloudSearch? .............................................................................................................. 1 Search Data Format ............................................................................................................................... 2 Search Domain Configuration ................................................................................................................. 3 Search Requests .................................................................................................................................... 5 Getting Started ....................................................................................................................................... 6 Step 1: Before You Begin ........................................................................................................................ 6 Step 2: Create a Search Domain ............................................................................................................ 7 Step 3: Send Data for Indexing ............................................................................................................. 11 Step 4: Search Your Amazon CloudSearch Domain ............................................................................. 14 Step 5: Delete Your Amazon CloudSearch Movies Domain .................................................................. 19 Making API Requests ........................................................................................................................... 21 Endpoints .............................................................................................................................................. 21 Making Configuration Requests ............................................................................................................ 22 Request Authentication ............................................................................................................... 23 Making Document Service Requests .................................................................................................. 24 Making Search Requests ..................................................................................................................... 25 Creating a Search Domain ................................................................................................................... 27 Configuring Access for a Search Domain ............................................................................................. 32 Getting Domain Information .................................................................................................................. 37 Deleting a Domain ................................................................................................................................ 43 Preparing Your Data .............................................................................................................................. 46 Mapping Document Data to Index Fields .............................................................................................. 46 Creating SDF Batches .......................................................................................................................... 47 Document Versions ..................................................................................................................... 49 Adding and Updating Documents ................................................................................................ 49 Deleting Documents .................................................................................................................... 50 Generating SDF .......................................................................................................................... 51 Configuring Index Fields ....................................................................................................................... 53 Adding Sources for a Field ................................................................................................................... 54 Command Line Tools ............................................................................................................................ 55 AWS Management Console ................................................................................................................. 56 API ........................................................................................................................................................ 62 Configuring Text Options ....................................................................................................................... 63 Configuring Stemming .......................................................................................................................... 63 Configuring Stopwords ......................................................................................................................... 66 Configuring Synonyms .......................................................................................................................... 69 Uploading Data ..................................................................................................................................... 72 Indexing Document Data ...................................................................................................................... 77 Searching Your Data ............................................................................................................................. 80 Submitting Search Requests ................................................................................................................ 81 Searching Text Fields ............................................................................................................................ 82 Using Boolean Operators in Text Searches ................................................................................. 84 Using Wildcards in Text Searches ............................................................................................... 85 Searching for Phrases in Text Fields ........................................................................................... 86 Searching Literal Fields ........................................................................................................................ 86 Searching Uint Fields ........................................................................................................................... 87 Constructing Boolean Search Queries ................................................................................................ 88 Controlling Search Results ................................................................................................................... 89 Getting Results as XML ............................................................................................................... 89 Paginating Results ...................................................................................................................... 90 Retrieving Data from Index Fields ............................................................................................... 90 Sorting Results ............................................................................................................................ 91 Getting and Using Facet Information .................................................................................................... 91 Getting Facet Information for Text and Literal Fields ................................................................... 92 Getting Facet Information for Uint Fields ..................................................................................... 92 Getting Facet Information for Particular Values ........................................................................... 93 Sorting Facet Information ............................................................................................................ 94 Using Facet Information .............................................................................................................. 95
API Version 2011-02-01 3

Amazon CloudSearch Developer Guide

Customizing Result Ranking ................................................................................................................. 98 Configuring Rank Expressions ............................................................................................................. 98 Ranking Search Results ..................................................................................................................... 102 Constraining Search Results .............................................................................................................. 102 Command Line Tool Reference .......................................................................................................... 103 Using the Command Line Tools .......................................................................................................... 103 Prerequisites ............................................................................................................................. 104 Installing the Command Line Tools ............................................................................................ 104 Running the Amazon CloudSearch Commands ........................................................................ 106 cs-configure-access-policies .............................................................................................................. 106 cs-configure-fields .............................................................................................................................. 109 cs-configure-ranking .......................................................................................................................... 111 cs-configure-text-options ................................................................................................................... 113 cs-create-domain ............................................................................................................................... 115 cs-configure-from-sdf ......................................................................................................................... 116 cs-delete-domain ............................................................................................................................... 117 cs-describe-domain ........................................................................................................................... 118 cs-index-documents ........................................................................................................................... 120 cs-post-sdf ......................................................................................................................................... 121 Experimental Tools ............................................................................................................................. 122 cs-generate-sdf ........................................................................................................................ 122 Configuration API Reference .............................................................................................................. 126 Actions ................................................................................................................................................ 126 CreateDomain ........................................................................................................................... 128 DefineIndexField ....................................................................................................................... 129 DefineRankExpression .............................................................................................................. 131 DeleteDomain ............................................................................................................................ 132 DeleteIndexField ....................................................................................................................... 133 DeleteRankExpression .............................................................................................................. 134 DescribeDefaultSearchField ...................................................................................................... 135 DescribeDomains ...................................................................................................................... 136 DescribeIndexFields .................................................................................................................. 137 DescribeRankExpressions ........................................................................................................ 138 DescribeServiceAccessPolicies ................................................................................................ 139 DescribeStemmingOptions ....................................................................................................... 140 DescribeStopwordOptions ......................................................................................................... 141 DescribeSynonymOptions ......................................................................................................... 142 IndexDocuments ....................................................................................................................... 143 UpdateDefaultSearchField ........................................................................................................ 144 UpdateServiceAccessPolicies ................................................................................................... 146 UpdateStemmingOptions .......................................................................................................... 148 UpdateStopwordOptions ........................................................................................................... 150 UpdateSynonymOptions ........................................................................................................... 152 Data Types .......................................................................................................................................... 153 AccessPoliciesStatus ................................................................................................................ 154 CreateDomainResult ................................................................................................................. 155 DefaultSearchFieldStatus .......................................................................................................... 155 DefineIndexFieldResult ............................................................................................................. 155 DefineRankExpressionResult .................................................................................................... 156 DeleteDomainResult ................................................................................................................. 156 DeleteIndexFieldResult ............................................................................................................. 156 DeleteRankExpressionResult .................................................................................................... 156 DescribeDefaultSearchFieldResult ........................................................................................... 157 DescribeDomainsResult ............................................................................................................ 157 DescribeIndexFieldsResult ........................................................................................................ 157 DescribeRankExpressionsResult .............................................................................................. 158 DescribeServiceAccessPoliciesResult ...................................................................................... 158 DescribeStemmingOptionsResult ............................................................................................. 158
API Version 2011-02-01 4

Amazon CloudSearch Developer Guide

DescribeStopwordOptionsResult .............................................................................................. 158 DescribeSynonymOptionsResult ............................................................................................... 159 DomainStatus ............................................................................................................................ 159 IndexDocumentsResult ............................................................................................................. 160 IndexField .................................................................................................................................. 161 IndexFieldStatus ........................................................................................................................ 162 LiteralOptions ............................................................................................................................ 162 NamedRankExpression ............................................................................................................. 162 OptionStatus .............................................................................................................................. 164 RankExpressionStatus .............................................................................................................. 164 ServiceEndpoint ........................................................................................................................ 165 SourceAttribute .......................................................................................................................... 165 SourceData ............................................................................................................................... 166 SourceDataMap ........................................................................................................................ 166 SourceDataTrimTitle .................................................................................................................. 166 StemmingOptionsStatus ............................................................................................................ 167 StopwordOptionsStatus ............................................................................................................. 167 SynonymOptionsStatus ............................................................................................................. 168 TextOptions ............................................................................................................................... 168 UIntOptions ............................................................................................................................... 169 UpdateDefaultSearchFieldResult .............................................................................................. 169 UpdateServiceAccessPoliciesResult ......................................................................................... 169 UpdateStemmingOptionsResult ................................................................................................ 170 UpdateStopwordOptionsResult ................................................................................................. 170 UpdateSynonymOptionsResult ................................................................................................. 170 Common Query Parameters ............................................................................................................... 171 Common Errors .................................................................................................................................. 172 Document Service API Reference ...................................................................................................... 174 documents/batch ................................................................................................................................ 174 documents/batch JSON API ...................................................................................................... 175 documents/batch XML API ........................................................................................................ 178 Search API Reference ........................................................................................................................ 184 search ................................................................................................................................................. 184 Search Requests ....................................................................................................................... 185 Search Response ...................................................................................................................... 190 Search Status Codes ................................................................................................................ 194 Troubleshooting .................................................................................................................................. 196 Limits .................................................................................................................................................. 198 Articles and Tutorials ........................................................................................................................... 200 Amazon CloudSearch Glossary ......................................................................................................... 201 Document History ............................................................................................................................... 205

API Version 2011-02-01 5

Amazon CloudSearch Developer Guide

What Is Amazon CloudSearch?


Topics Search Data Format in Amazon CloudSearch (p. 2) Search Domain Configuration in Amazon CloudSearch (p. 3) Search Requests in Amazon CloudSearch (p. 5) Amazon CloudSearch is a fully-managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website. Amazon CloudSearch enables you to search large collections of data such as web pages, document files, forum posts, or product information. With Amazon CloudSearch, you can quickly add search capabilities to your website without having to become a search expert or worry about hardware provisioning, setup, and maintenance. As your volume of data and traffic fluctuates, Amazon CloudSearch automatically scales to meet your needs. You can use Amazon CloudSearch to index and search both structured data and plain text. Amazon CloudSearch supports full text search, searching within fields, prefix searches, Boolean searches, and faceting. You can get search results in JSON or XML, sort and filter results based on field values, and rank results alphabetically, numerically, or according to custom rank expressions. To build a search solution with Amazon CloudSearch, you: Create and configure a search domain. A search domain encapsulates your searchable data and the search instances that handle your search requests.You set up a separate domain for each different data set you want to search. Upload the data you want to search to your domain. Amazon CloudSearch automatically indexes your data and deploys the search index to one or more search instances. Search your domain. You send search requests to your domain's search endpoint as an HTTP/HTTPS GET request. The rest of this section introduces the key concepts and terms that will help you understand what you need to do to build a search solution with Amazon CloudSearch: Search Data Format in Amazon CloudSearch (p. 2) describes the kinds of data you can search and the format used to submit data to Amazon CloudSearch for indexing. Search Domain Configuration in Amazon CloudSearch (p. 3) describes the options you can configure to control how your data is indexed and what information you can retrieve when you search. Search Requests in Amazon CloudSearch (p. 5) describes how search requests are submitted and processed.

API Version 2011-02-01 1

Amazon CloudSearch Developer Guide Search Data Format

For a high-level overview of Amazon CloudSearch, service highlights, and pricing information, see the Amazon CloudSearch detail page. The rest of this guide describes how to use Amazon CloudSearch and provides detailed information about the APIs and command line tools. If you are new to Amazon CloudSearch, you should begin with Getting Started with Amazon CloudSearch (p. 6). For more information about working with your own data sets, see Preparing Your Data for Amazon CloudSearch (p. 46). For more information about constructing searches with the Amazon CloudSearch query language, see Searching Your Data with Amazon CloudSearch (p. 80). The following table lets you jump directly to specific task or reference topics. How Do I? Set up my first search domain Manage my search domains Relevant Sections Getting Started with Amazon CloudSearch (p. 6) Creating an Amazon CloudSearch Domain (p. 27) Configuring Access for an Amazon CloudSearch Domain (p. 32) Getting Information about an Amazon CloudSearch Domain (p. 37) Deleting an Amazon CloudSearch Domain (p. 43) Configuring Index Fields for an Amazon CloudSearch Domain (p. 53) Configuring Text Options for an Amazon CloudSearch Domain (p. 63) Customizing Result Ranking with Amazon CloudSearch (p. 98) Preparing Your Data for Amazon CloudSearch (p. 46)

Configure index and search options for my domains

Format my data for Amazon CloudSearch Upload and index my data

Uploading Data to an Amazon CloudSearch Domain (p. 72) Indexing Document Data with Amazon CloudSearch (p. 77) Searching Your Data with Amazon CloudSearch (p. 80)

Search my domains

Install and use the Amazon Amazon CloudSearch Command Line Tool Reference (p. 103) CloudSearch command line tools Get more information about the Amazon CloudSearch APIs Making Amazon CloudSearch API Requests (p. 21) Amazon CloudSearch Configuration API Reference (p. 126) Amazon CloudSearch Document Service API Reference (p. 174) Amazon CloudSearch Search API Reference (p. 184) Limits in Amazon CloudSearch (p. 198)

Search Data Format in Amazon CloudSearch


The collection of data that you want to search (sometimes referred to as your corpus) can consist of unstructured full-text documents, semi-structured documents such as those formatted in mark-up languages like XML, or structured data that conforms to a strict data model. To make your data searchable, you describe it using the Search Data Format (SDF) and upload the resulting SDF data to your search domain. Each item that you want to be able to return as a search result (such as a forum post or web page) is represented as a document in SDF. Every document has a unique id (docid), a version number, and one or more fields that contain the data that you want to search and return in results. An SDF batch is a collection of add and delete requests for individual documents. SDF batches must be valid JSON or XML and conform to the SDF data conventions.

API Version 2011-02-01 2

Amazon CloudSearch Developer Guide Search Domain Configuration

Amazon CloudSearch generates a search index from your SDF data according to your domain's configuration options. As your data changes, you submit SDF updates to add, change, or delete documents from your index. Updates are applied continuously, so your changes become searchable in near real-time. For information about how to represent your data in SDF, see Preparing Your Data for Amazon CloudSearch (p. 46). To see the JSON schema for SDF, go to JSON documents/batch Requests (p. 175). To see the XML schema for SDF, go to XML documents/batch Requests (p. 178).

Search Domain Configuration in Amazon CloudSearch


To build an index from your SDF data, Amazon CloudSearch needs to know what data you want to search, what data you want to be able to include in the search results, what data you want to use as facets, and if any custom stopwords, synonyms, and stems need to be defined for your data set. You define this metadata in your domain configuration by configuring indexing and text options. In your domain configuration, you also specify access policies to control who can send data updates and search your domain, and rank expressions to customize how search results are ranked.

Indexing Options
A domain's indexing options configure the index fields that will be included in the search index. An index field represents a named field and value pair that you want to store in your index. You configure an index field for each SDF document field that will be searched, used as a facet, or returned in search results.

Index Fields
Every index field has a unique name and a source that specifies one or more SDF document fields. The sources are used to populate the index field. If no source is specified, the source defaults to the SDF document field that has the same name as the index field. An index field definition also includes meta-information such as: The index field type. Whether a literal field is searchable (Text and uint fields are always searchable.) Whether the value of a text or literal field can be returned in results/ (Uint fields are aways returnable.) Whether facet counts can be calculated for a text or literal field. (Facet counts can always be calculated for uint fields.)

Amazon CloudSearch supports three types of index fields: textcontains arbitrary alphanumeric data. For example, a text field might contain a name, description, or the entire body of a document. Text fields are always searchable and Amazon CloudSearch performs text processing on them according to the stopwords, synonyms, and stems you configure in your domain's text options. literalcontains an identifier or other data that you want to be able to match exactly. Unlike text fields, Amazon CloudSearch does not perform any text processing on literal fields. Literal fields can be used for fields that have a small set of possible values, as well as for more arbitrary values like email addresses or titles where an exact match is important. Literal fields are frequently used to enable faceted searches where you want to count the number of exact matches for a particular value. uintcontains an unsigned integer value. For example, you might use a uint field for a field that contains a quantity or numerical rating, or for a date field that contains a time_t value.

API Version 2011-02-01 3

Amazon CloudSearch Developer Guide Text Options

For information about how to configure index fields for Amazon CloudSearch, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).

Facets
A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a facet. You can display this information along with the search results and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.) A facet can be any numeric field or a text or literal field that has faceting enabled in your domain configuration. To request facet information in your search request, you specify: One or more facets Facet constraints that specify the particular values you want to count (optional) How you want the facet values to be sorted in the results (optional) For each facet, Amazon CloudSearch calculates the number of hits that share the same value. If you specify constraints, the facet counts are calculated only for values that match the constraints. Only constraints that have matches are included in the facet results.

Note
Values from a facet-enabled text or literal field cannot be returned in the search results. Text and literal fields can be facet-enabled or result-enabled, but not both. If you want to return the value from an SDF document field as well as use the field as a facet, create two index fields that use the same SDF document field as a source and make one result-enabled, and the other facet-enabled. For information about configuring facets, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). For information about using facet information to support faceted navigation, see Getting and Using Facet Information in Amazon CloudSearch (p. 91).

Text Options
During indexing, Amazon CloudSearch performs a number of text-processing steps on text fields. First, Amazon CloudSearch tokenizes the field values, stripping punctuation and splitting the text into individual terms that are indexed separately. For example, the string "spider-man" would be split into two terms, spider and man. Text fields are then processed using the domain-specific stopword, stemming, and synonym dictionaries: Stopwords configured for the domain are excluded from the index. For example, the stopwords dictionary generally contains insignificant, frequently occurring terms such as "a", "and", and "the" that would result in a massive number of matches if they were included in the index. Related words are mapped to a common stem according to the stemming dictionary configured for the domain. For example, the stemming dictionary might map "running" and "ran" to the stem "run". Synonyms are mapped according to the synonym dictionary configured for the domain. For example, the synonym dictionary might define "colt" and "filly" as synonyms for "horse". Amazon CloudSearch defines a default stopword dictionary that you can fine-tune for your application. Stemming and synonym dictionaries are application-specific and are empty by default. For information about how to configure stopwords, stems, and synonyms for your domain, see Configuring Text Options for an Amazon CloudSearch Domain (p. 63).

API Version 2011-02-01 4

Amazon CloudSearch Developer Guide Access Policies

Access Policies
Access to your search domain's endpoints is restricted by IP address so that only authorized hosts can submit documents and send search requests. IP address authorization is used only to control access to the document and search endpoints. All Amazon CloudSearch configuration requests must be authenticated using standard AWS authentication. Amazon CloudSearch access policies are specified using the AWS Identity and Access Management (IAM) Access Policy Language. For information about how to configure access policies for your domain, see Configuring Access for an Amazon CloudSearch Domain (p. 32).

Rank Expressions
You can customize how search results are ranked by defining your own rank expressions. Rank expressions are numeric expressions that can be used at search time to calculate a score for every document that matches the search. A rank expression uses standard numeric operators and functions and can reference uint fields, other rank expressions, a document's text_relevance score. When you submit search requests, you specify the rank expression(s) you want to use to rank or constrain the search results. A document's text_relevance score indicates how relevant a particular search hit is to the search request. To calculate the relevance score, Amazon CloudSearch takes into account how many times the search terms appear (term frequency) and how close the search terms are to each other (proximity). For information about how to configure rank expressions for your domain, see Customizing Result Ranking with Amazon CloudSearch (p. 98).

Search Requests in Amazon CloudSearch


You submit search requests to your domain's search endpoint as HTTP/HTTPS GET requests. You can perform free text and Boolean searches, and specify a variety of options to constrain your search, request facet information, control ranking, and specify what you want to be returned in the results. You can get search results in either JSON or XML. By default, Amazon CloudSearch returns results in JSON. When you submit a search request, Amazon CloudSearch performs text-processing on the search terms. The search terms are tokenized, stopwords are removed, and stems are mapped according to the domain configuration. Once this preprocessing is complete, Amazon CloudSearch looks up the search terms in the index and identifies all of the documents that match the request. To generate a response, Amazon CloudSearch processes this list of search hits to filter and rank the matching documents and compute facets. Amazon CloudSearch then returns the response in JSON or XML. By default, Amazon CloudSearch returns search results ranked according to the hits' text_relevance scores. Alternatively, your request can specify the index field or rank expression that you want to use to sort the hits. For example, you might want to rank hits by an index field that contains the price or a rank expression that calculates popularity. For more information about searching, ranking, and paginating results, see Searching Your Data with Amazon CloudSearch (p. 80).

API Version 2011-02-01 5

Amazon CloudSearch Developer Guide Step 1: Before You Begin

Getting Started with Amazon CloudSearch


Topics Step 1: Before You Begin with Amazon CloudSearch (p. 6) Step 2: Create an Amazon CloudSearch Domain (p. 7) Step 3: Send Data to Amazon CloudSearch for Indexing (p. 11) Step 4: Search Your Amazon CloudSearch Domain (p. 14) Step 5: Delete Your Amazon CloudSearch Movies Domain (p. 19) To start searching your data with Amazon CloudSearch, you simply: Create and configure a search domain Upload and index the data you want to search Send search requests to your domain This tutorial shows you how to get up and running using the AWS Management Console for Amazon CloudSearch. To make it even easier to get started, we've generated a sample data set of over 5,000 popular movie titles that you can download and examine, upload to your own search domain, and submit search queries against to see how Amazon CloudSearch works. Using the AWS Management Console and the sample movie data, you'll quickly have your own searchable movie database running in Amazon CloudSearch. The following video steps through this tutorial and shows how to create your first search domain through the console: Getting Started with Amazon CloudSearch.

Step 1: Before You Begin with Amazon CloudSearch


To use Amazon CloudSearch, you need an AWS account. Your AWS account enables you to access Amazon CloudSearch and other AWS services, such as Amazon S3 and EC2. As with other AWS services,

API Version 2011-02-01 6

Amazon CloudSearch Developer Guide Step 2: Create a Search Domain

you pay only for the Amazon CloudSearch resources you use. There are no sign up fees and charges are not incurred until you create a search domain. If you already have an AWS account, you are automatically signed up for Amazon CloudSearch.

To create an AWS account


1. 2. Go to https://aws.amazon.com and click Sign Up Now. Follow the instructions to sign up. You will need to enter payment information before you can begin using Amazon CloudSearch.

Step 2: Create an Amazon CloudSearch Domain


An Amazon CloudSearch domain encapsulates a collection of data you want to search, the search instances that process your search requests, and a configuration that controls how your data is indexed and searched. You create a separate search domain for each collection of data you want to make searchable. For each domain, you configure indexing options that describe the fields you want to include in your index and how you want to use them, text options that define domain-specific stopwords, stems, and synonyms, rank expressions that you can use to customize how search results are ranked, and access policies that control access to the domains document and search endpoints. You interact with a search domain to: Configure index and search options Submit data for indexing Perform searches Each domain has a unique endpoint through which you submit search requests to the domain. For example, the endpoint for a domain called movies created in the US East region might be:
search.123456789012-movies.us-east-1.cloudsearch.amazonaws.com

When creating a search domain, you specify a unique name for the domain. Domain names must start with a letter or number and be at least 3 and no more than 28 characters long. The allowed characters are: a-z, 0-9, and hyphen (-). Currently, all domains are created in the AWS Region us-east-1. To configure the new domain, you need to specify: The index fields you want to be able to search, use as facets, and return in search results. Access policies for the domain's document service and search service endpoints. This tutorial shows you how to create and interact with a domain using the Amazon CloudSearch console. For information about how to use the command line tools and APIs, see Creating an Amazon CloudSearch Domain (p. 27).

Important
The domain you're about to create will be live and you will incur the standard Amazon CloudSearch usage fees for the domain until you delete it. For more information about Amazon CloudSearch usage rates, go to the Amazon CloudSearch detail page.

To create your movies domain


1. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home.
API Version 2011-02-01 7

Amazon CloudSearch Developer Guide Step 2: Create a Search Domain

2.

On the Welcome to Amazon CloudSearch page, click Create Your First Search Domain.

3.

On the NAME YOUR DOMAIN step, enter a name for your new domain and click Continue. Domain names must start with a letter or number and be at least 3 and no more than 28 characters. Domain names can contain the following characters: a-z (lower case), 0-9, and - (hyphen). Upper case letters and underscores are not allowed.

4.

On the CONFIGURE INDEX step, click Use a predefined configuration, select IMDB movies (demo), and click Continue. You can also automatically configure a search domain by choosing the predefined configuration for the type of data you want to index, or by uploading a sample of your data.

API Version 2011-02-01 8

Amazon CloudSearch Developer Guide Step 2: Create a Search Domain

5.

On the REVIEW INDEX CONFIGURATION step, review the index fields that will be configured. Five fields are configured automatically for the imdb-movie data: actor, director, genre, title, and year. The actor, director, and title fields are text fields and will be searched by default if no search field is specified in a search request. The contents of those fields can also be returned in search results. The genre field is configured as a literal field and is designated as a facet so it can be used to sort and filter the results. Because it's a facet, it cannot be returned in the search resultsif you want to retrieve contents of the genre field when you search, you can configure an additional field with the same source data and make it result-enabled. (For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).) The year field is configured as a uint field. You cannot change the configuration of a uint fielduint fields are always search-enabled, facet-enabled, and result-enabled. When you are finished reviewing the indexing options, click Continue.

6.

On the SET UP ACCESS POLICIES step, click Recommended rules and click Continue. The recommended rules allow access to the search endpoint from all IP addresses, and restrict access to the document service to the IP address you specify.

API Version 2011-02-01 9

Amazon CloudSearch Developer Guide Step 2: Create a Search Domain

Important
If you do not configure access rules for your search domain, you will only be able to interact with the domain through the Amazon CloudSearch console. By default, the document service and search service endpoints are configured to block all IP addresses. Keep in mind that if you do not have a static IP address, you must re-authorize your computer whenever your IP address changes. If your IP address is assigned dynamically, it is also likely that you're sharing that address with other computers on your network.This means that when you authorize the IP address, all computers that share it will be able to access your search domain's document service endpoint.

7.

On the CONFIRM step, review the domain configuration and click Confirm to create your domain.

8.

Once the domain has been created, click OK to exit the Create New Search Domain wizard and go to the domain's dashboard.

API Version 2011-02-01 10

Amazon CloudSearch Developer Guide Step 3: Send Data for Indexing

When you create a new domain, Amazon CloudSearch initializes resources for the domain, which can take around half an hour. During this initialization process, the status of the domain will be LOADING. You can begin uploading the data you want to search as soon as the domain status changes to PROCESSING. Once the status changes to ACTIVE, your domain will be fully-functional and available to process search requests.

Note
While you can start uploading documents through the console once the domain status reaches the PROCESSING state, you won't be able to upload data through the command line tools or document service API until the domain status is ACTIVE.

Step 3: Send Data to Amazon CloudSearch for Indexing


You upload the data you want to search to your domain so that Amazon CloudSearch can build and deploy a searchable index. The format used to submit documents to Amazon CloudSearch is called Search Data Format (SDF).The AWS Management Console can automatically generate SDF from several types of files: Comma Separated Value (.csv) Adobe Portable Document Format (.pdf) HTML (.htm, .html) Microsoft Excel (.xls, .xlsx)
API Version 2011-02-01 11

Amazon CloudSearch Developer Guide Step 3: Send Data for Indexing

Microsoft PowerPoint (.ppt, .pptx) Microsoft Word (.doc, .docx) Text Documents (.txt) JSON Documents (.json) XML Documents (.xml) For most file types, including JSON and XML, the contents of the file are treated as a single content field. However, CSV files are handled differently. When you upload CSV files that contain a header row, each column is treated as a field, and each row is treated as a separate document. If you upload multiple types of files, any CSV files are parsed row-by-row, and any non-CSV files are treated as individual documents. The sample IMDB movies data is already formatted as SDF and contains add requests for over 5,000 popular movies. Each add request specifies a unique ID for the movie, a document version number, and fields that contain the movie data such as title and genre. This tutorial shows how to submit data through the Amazon CloudSearch console, but you can also convert and post data (p. 51) with the command line tools, and submit SDF batches through the document service API (p. 174).

To add the sample data to your movies domain


1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of your movies domain to view the domain dashboard. At the top of the domain dashboard, click the Upload Documents button.

Note
The Upload Documents button is available once the domain status is PROCESSING or ACTIVE. You will not be able to search uploaded documents until the domain status is ACTIVE.

4.

On the DOCUMENT SOURCE step, select Predefined data, choose IMDB movies (demo), and click Continue.

API Version 2011-02-01 12

Amazon CloudSearch Developer Guide Step 3: Send Data for Indexing

5.

On the REVIEW DOCUMENTS step, review the upload summary and click Upload Documents to send the data to your domain for indexing.

Note
If you'd like to see what the SDF data looks like, click Download the generated SDF files. For more information about SDF and preparing your own data, see Preparing Your Data for Amazon CloudSearch (p. 46).

6.

On the DOCUMENT SUMMARY step, click Finish to return to the domain dashboard.

That's it! You now have a fully functional Amazon CloudSearch domain that you can start searching. The data is automatically indexed in near real-time, so you can start searching your domain right away.

API Version 2011-02-01 13

Amazon CloudSearch Developer Guide Step 4: Search Your Amazon CloudSearch Domain

Step 4: Search Your Amazon CloudSearch Domain


You can use the search tester in the Amazon CloudSearch console to perform simple text searches. To perform more complex Boolean queries, you can submit search requests through a Web browser or send HTTP requests using cURL or any HTTP library.

Searching with the Search Tester


The search tester enables you to choose which fields you want to search, sort the results, and browse any facets that are configured for the domain. By default, results are sorted according to an automatically-generated relevance score, text_relevance. (For more information about customizing how results are ranked, see Customizing Result Ranking with Amazon CloudSearch (p. 98).)

To search your domain


1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of your movies domain. In the Navigation panel, click the Run a Test Search link for your movies domain.

4.

Select the field(s) you want to search, enter the text you want to search for, and click Go.

To view the HTTP search request that was sent to your domain's search endpoint and the JSON or XML response returned by Amazon CloudSearch, click the view raw link for the response format you want to see.
API Version 2011-02-01 14

Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser

You can copy and paste the request URL to submit the request and view the response from a Web browser. Requests can be sent via HTTP or HTTPS.

Submitting Search Requests from a Web Browser


To perform more complex searches, you can submit your own search requests directly to your search endpoint. You can perform simple and Boolean searches and specify a variety of options to constrain your search, request facet information, customize ranking, and control what information is returned in the results. For example, to search your movies domain and get the titles of all of the available Star Wars movies, append the following search string to your search endpoint. (2011-02-01 is the API version and must be specified.)
/2011-02-01/search?q=star+wars&return-fields=title

Note
Your domain's search endpoint is shown on the domain dashboard. You can also perform a search from the AWS Management Console, view the raw request and response, and copy the request URL from the Search Request field. By default, Amazon CloudSearch returns the response in JSON. You can also get the search results formatted in XML by specifying the results-type parameter, results-type=xml. (Errors are always returned in JSON.) The following image shows the results of the previous query.

Filtering Results
You can use the Boolean query option, bq, to find documents that have particular numeric attributes. You can filter based on an exact value in a field, an inequality, or a range of values, as in these examples:

API Version 2011-02-01 15

Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser

bq=year:2000 matches documents with the year 2000. bq=year:2000.. matches documents with a year greater than or equal to 2000 bq=year:..2000 matches documents with a year less than or equal to 2000 bq=year:2000..2011 matches documents with a year between 2000 and 2011, inclusive. For example, the following Boolean query searches for "star", finds all of the matching movies that were released before 2000, and returns title and year of each one:
2011-02-01/search?bq=(and 'star' year:..2000)&return-fields=title,year

The response shows the number of matching documents and the requested fields for each hit.

For more information about constructing search queries, see Searching Your Data with Amazon CloudSearch (p. 80).

Ranking the Search Results


By default, Amazon CloudSearch ranks the search results according to an automatically generated text_relevance score.You can change how results are ranked by specifying the rank option in your search request to specify the field or rank expression you want to use for ranking. (A rank expression is a custom numeric expression that can be evaluated for each document in the set of matching documents. For information about defining your own rank expressions, see Customizing Result Ranking with Amazon CloudSearch (p. 98).) If you specify a text field with the rank option, the results are sorted alphabetically according to that field. For example, to rank results from your movies domain alphabetically by title, add &rank=title to your query string:
2011-02-01/search?bq=(and genre:'sci-fi' year:..2000)&returnfields=title,year&rank=title

API Version 2011-02-01 16

Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser

When you rank alphabetically, the results are sorted in ascending order by default. Any values that begin with a numeral are listed before the first A entry:

Similarly, you can specify an integer field with the rank option to sort the results numerically. By default, when you rank alphabetically or numerically, results are returned in ascending order. You can prefix the field name with a minus (-) if you want the results returned in descending order. If you specify multiple rank options, the first option is used as the primary sort field, the second option is used as the secondary sort field, and so on. For more information about ranking results, see Customizing Result Ranking with Amazon CloudSearch (p. 98)

Getting Facet Information


A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a facet. You can display this information along with the search results and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.) A facet can be any numeric field or a text or literal field that has faceting enabled in your domain configuration. To request facet information in your search request, you specify: One or more facets Facet constraints that specify the particular values you want to count (optional) How you want the facet values to be sorted in the results (optional) For each facet, Amazon CloudSearch calculates the number of hits that share the same value. If you specify constraints, the facet counts are calculated only for values that match the constraints. Only constraints that have matches are included in the facet results.

Note
Values from a facet-enabled text or literal field cannot be returned in the search results. Text and literal fields can be facet-enabled or result-enabled, but not both. If you want to return the value from an SDF document field as well as use the field as a facet, create two index fields that use the same SDF document field as a source and make one result-enabled, and the other facet-enabled.

API Version 2011-02-01 17

Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser

To get facet counts with your search results


Use the facet option to specify the fields for which you want to compute facets. For the sample IMDB movies data faceting is enabled for one field, genre.

/2011-02-01/search?q=star&return-fields=title&facet=genre

The facets appear below the hits in the results.

If you want to compute facet counts for selected values of a facet field, you can set facet constraints for the field. Facet constraints do not constrain the results themselves, only the facet counts that are returned. For example, the following request only counts the movies that are in the Sci-Fi, Fantasy, or Thriller genres:
/2011-02-01/search?q=star&return-fields=title&facet=genre&facet-genre-con straints='Sci-Fi','Fantasy','Thriller'

This constrains the facet counts to the three specified values:

API Version 2011-02-01 18

Amazon CloudSearch Developer Guide Step 5: Delete Your Amazon CloudSearch Movies Domain

For more information about faceted searches, see Getting and Using Facet Information in Amazon CloudSearch (p. 91).

Step 5: Delete Your Amazon CloudSearch Movies Domain


When you are finished experimenting with your movies domain, you need to delete it to avoid incurring additional usage fees.

Important
Deleting a domain deletes the index associated with the domain and takes the domain's document and search endpoints offline permanently.

To delete your imdb-movies domain


1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of your movies domain to view to the domain dashboard. At the top of the domain dashboard, click the Delete this Domain button.

4.

In the Delete Domain dialog box, select the Delete the domain option and click OK to permanently remove the domain and all of its data.

Note
It can take around 15 minutes to delete the domain and its resources. Until then, the domain status will be BEING DELETED. Wondering where to go next? What Is Amazon CloudSearch? (p. 1) has a guide to the rest of the Amazon CloudSearch developer documentation. For more information about the Amazon CloudSearch query language, see Searching Your Data with Amazon CloudSearch (p. 80). If you're ready to set up a domain with your own data, see Preparing Your Data for Amazon CloudSearch (p. 46) and Uploading Data to

API Version 2011-02-01 19

Amazon CloudSearch Developer Guide Step 5: Delete Your Amazon CloudSearch Movies Domain

an Amazon CloudSearch Domain (p. 72) for information about formatting and submitting your data to Amazon CloudSearch.

API Version 2011-02-01 20

Amazon CloudSearch Developer Guide Endpoints

Making Amazon CloudSearch API Requests


Topics Endpoints for Amazon CloudSearch (p. 21) Making Configuration Requests in Amazon CloudSearch (p. 22) Making Document Service Requests in Amazon CloudSearch (p. 24) Making Search Requests in Amazon CloudSearch (p. 25) This section describes how to make requests using the Amazon CloudSearch APIs. Endpoints for Amazon CloudSearch (p. 21) describes the endpoints you use to contact the Amazon CloudSearch configuration service, document service, and search service. The following sections describe how to submit requests to each of the services.

Endpoints for Amazon CloudSearch


Amazon CloudSearch provides separate endpoints for accessing the configuration, search, and document services. The configuration service is accessed through a general endpoint, cloudsearch.us-east-1.amazonaws.com.To access the other services, you use Amazon CloudSearch domain-specific endpoints: http://doc-domainname-domainid.us-east-1.cloudsearch.amazonaws.comthe document service endpoint is used to submit documents to the domain for indexing. http://search-domainname-domainid.us-east-1.cloudsearch.amazonaws.comthe search endpoint is used to submit search requests to the domain. You must specify the API version in every Amazon CloudSearch request.The current Amazon CloudSearch API version is 2011-02-01. For example:

API Version 2011-02-01 21

Amazon CloudSearch Developer Guide Making Configuration Requests

http://search-movies-h2pc7ftfnsdlqh6pqqawbftrhu.us-east-1.cloudsearch.amazon aws.com/2011-02-01/search?q=star

Making Configuration Requests in Amazon CloudSearch


You submit Amazon CloudSearch configuration requests using the AWS Query protocol. AWS Query requests are HTTP or HTTPS requests submitted via HTTP GET or POST with a query parameter named Action. The endpoint for Amazon CloudSearch configuration requests is cloudsearch.us-east-1.amazonaws.com. The region name is us-east-1, and the service name is cloudsearch.

Structure of a Configuration Request


This guide shows Amazon CloudSearch configuration requests as URLs, which can be used directly in a browser. The URL contains three parts: Endpointthe Web service entry point to act on, cloudsearch.us-east-1.amazonaws.com. Actionthe Amazon CloudSearch configuration action you want to perform. For a complete list of actions, see Actions (p. 126). Parametersany request parameters required for the specified action. Each query request must also include some common parameters to handle authentication. For more information, see Request Authentication (p. 23). You must specify the Version parameter in every Amazon CloudSearch configuration request. The current Amazon CloudSearch API version is 2011-02-01. For example, the following GET request creates a new search domain called movies:
https://cloudsearch.us-east-1.amazonaws.com ?Action=CreateDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120712/us-east-1/cloudsearch/aws4 _request &X-Amz-Date=2012-07-12T21:41:29.094Z &X-Amz-SignedHeaders=host &X-Amz-Signature=c7600a00fea082dac002b247f9d6812f25195fbaf7f0a6fc4ce08a39666c6a10 3c8dcb

Note
Although the GET requests are shown as URLs, the parameter values are shown unencoded to make them easier to read. Keep in mind that you must URL encode parameter values when submitting requests.

API Version 2011-02-01 22

Amazon CloudSearch Developer Guide Request Authentication

Request Authentication
Requests submitted to the Configuration API are authenticated using your AWS credentials. You must include authorization parameters and a digital signature in every request. Amazon CloudSearch supports AWS Signature Version 4. For detailed signing instructions, see Signature V4 Signing Process in the AWS General Reference. To create a signature for a request, you create a canonicalized version of the query string and compute an RFC 2104-compliant HMAC signature using a signing key derived from your AWS Secret Access key. For example, to construct a CreateDomain request, you need the following information:
Region name: us-east-1 Service name: cloudsearch API version: 2011-02-01 Date: 2012-07-12T21:41:29.094Z Access key: AKIAIOSFODNN7EXAMPLE Secret key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY Action: CreateDomain Action Parameters: DomainName=movies

The canonical query string for a CreateDomain request looks like this:
Action=CreateDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120712/us-east-1/cloudsearch/aws4 _request &X-Amz-Date=2012-07-12T21:41:29.094Z &X-Amz-SignedHeaders=host

The string to sign looks like this:


AWS4-HMAC-SHA256 20120712T214129Z 20120712/us-east-1/cloudsearch/aws4_request a4cf362487306de739da7c697220a47373da10975702b0d9f80b6a6a7477df4a

The final signed request looks like this:


https://cloudsearch.us-east-1.amazonaws.com ?Action=CreateDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120712/us-east-1/cloudsearch/aws4 _request &X-Amz-Date=2012-07-12T21:41:29.094Z &X-Amz-SignedHeaders=host &X-Amz-Signature=c7600a00fea082dac002b247f9d6812f25195fbaf7f0a6fc4ce08a39666c6a10

API Version 2011-02-01 23

Amazon CloudSearch Developer Guide Making Document Service Requests

Making Document Service Requests in Amazon CloudSearch


The document service API is a REST-style API that has a single resource, documents/batch. You submit documents/batch requests to your domain's unique document service endpoint to add, update, or delete documents. A documents/batch request is an HTTP request, as defined by RFC 2616. All documents/batch requests must be submitted using HTTP POST. You must specify the Amazon CloudSearch API version in every document service request. The current Amazon CloudSearch API version is 2011-02-01. The version number precedes the resource name in the request, for example POST /2011-02-01/documents/batch.

Access Control
Access to a search domain's document service is restricted by IP address so that only authorized hosts can submit document changes. By default, your search domain will not accept document service requests from any IP addresses. You must authorize specific IP addresses or address ranges before you can submit documents through the command line tools or APIs. You can configure access policies from the Amazon CloudSearch console, using the cs-configure-access-policies command, or with the UpdateServiceAccessPolicies configuration action.

Request Headers
A documents/batch request must include the following headers: Content-Lengththe length of the request body, in bytes. Content-Typethe type of data in the request body, application/json or application/xml. Hostyour domain's document service endpoint. By default, the response to a documents/batch request is returned in JSON. You can set the Accept header to application/xml if you want an XML response.

Request Body
The body of a documents/batch request contains a JSON or XML description of the document operations you want to perform. This description conforms to the Search Data Format (SDF). For more information about SDF, see Uploading Data to an Amazon CloudSearch Domain (p. 72)

Example Request
POST /2011-02-01/documents/batch HTTP/1.1 Accept: application/json Content-Length: 1176 Content-Type: application/json Host: doc.imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazon aws.com [ { "type": "add", "id": "tt0484562", "version": 1337648735,

API Version 2011-02-01 24

Amazon CloudSearch Developer Guide Making Search Requests

"lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances"] } } }, { "type": "delete", "id": "tt0434409", "version": 1337648735 } ]

Making Search Requests in Amazon CloudSearch


The search API is a REST-style API that has a single resource, search. You submit search requests to your domain's unique search service endpoint. Search requests are HTTP requests, as defined by RFC 2616. Search requests must be submitted using HTTP GET. You must specify the Amazon CloudSearch API version in every search request. The current Amazon CloudSearch API version is 2011-02-01. The version number precedes the resource name in the request, for example GET /2011-02-01/search.

Access Control
Access to a search domain's search service is restricted by IP address so that only authorized hosts can submit search requests. By default, your domain will not accept search requests from any IP addresses. You have to authorize specific IP addresses or address ranges before you can search the domain. You can do this from the Amazon CloudSearch console, through the cs-configure-access-policies command, or with the UpdateServiceAccessPolicies configuration action.

Request Headers
A search request must include the HOST header, which specifies your domain's search service endpoint. Optionally, you can also specify the following headers: Cache-Controlforces the revalidation of results when a cached result document would otherwise be returned. Originspecifies the domain that wants to use the response data, as described by the W3C Cross-Origin Resource Sharing draft. By default, the response to a search request is returned in JSON. You can set the Accept header to application/xml if you want an XML response.

API Version 2011-02-01 25

Amazon CloudSearch Developer Guide Request Parameters

Request Parameters
Search parameters are specified in the query string. For more information about constructing searches, see Searching Your Data with Amazon CloudSearch (p. 80).

Example Request
GET /2011-02-01/search?q=star+wars&return-fields=title HTTP/1.1 Host: search-imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazon aws.com

API Version 2011-02-01 26

Amazon CloudSearch Developer Guide

Creating an Amazon CloudSearch Domain


A search domain encapsulates your data, metadata, and index configuration options such as text processing options and ranking options. Each domain is searched separately. If you have multiple collections of data that you want to make searchable, you can create a separate search domain for each collection. When you create a new search domain, you must also configure access policies (p. 32), configure indexing options (p. 53), and upload your data (p. 72) before you can start submitting search requests (p. 80). When you create a search domain, you must give it a unique name. Domain names must start with a letter or number and be at least 3 and no more than 28 characters long. The allowed characters are: a-z, 0-9, and hyphen (-). Upper case letters, underscores (_), and other special characters are not allowed in domain names. Currently, all domains are created in the AWS Region us-east-1. Each domain has unique endpoints through which you upload data for indexing and submit search requests. For example, the endpoints for a domain called imdb-movies might be:
search-imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com doc-imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com

Important
By default, access to a new domain's document and search endpoints is blocked for all IP addresses. You must configure access policies for the domain to be able to submit search requests to the domain's search endpoint and upload data from the command line or through the domain's document endpoint. You can upload documents and search the domain through the Amazon CloudSearch console without configuring access policies. You can create a search domain using the cs-create-domain (p. 28) command, from the Amazon CloudSearch console (p. 28), or using the CreateDomain (p. 31) configuration action.

API Version 2011-02-01 27

Amazon CloudSearch Developer Guide Command Line Tools

Command Line Tools


You use the cs-create-domain (p. 115) command to create search domains. For information about installing and setting up the Amazon CloudSearch command line tools, see Amazon CloudSearch Command Line Tool Reference (p. 103).

To create a domain
Run the cs-create-domain command and specify the name of the domain you want to create with the --domain-name option. For example, to create a domain called movies:
cs-create-domain --domain-name movies =========================================== Creating domain [movies] Domain endpoints are currently being created. Use cs-describe-domain to check for endpoints.

It can take around half an hour to create endpoints for a new domain. By default, the cs-create-domain command returns immediately. If you specify the --wait option, the cs-create-domain command returns once your domain's endpoints are active. You can use the cs-describe-domain command to view a summary of the domain's status and configuration. For more information, see Getting Information about an Amazon CloudSearch Domain (p. 37).

AWS Management Console


The Amazon CloudSearch console enables you to easily create new search domains and provides a variety of options for configuring indexing options, including Cloning an Existing Domain's Indexing Options (p. 59).

To create a domain
1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. At the top of the Navigation panel, click Create a New Domain. (If you are creating a domain for the first time, click Create Your First Search Domain on the Welcome page.)

API Version 2011-02-01 28

Amazon CloudSearch Developer Guide AWS Management Console

3.

On the NAME YOUR DOMAIN step, enter a name for your new domain and click Continue. Domain names must start with a letter or number and be at least 3 and no more than 28 characters long. Domain names can contain the following characters: a-z (lower case), 0-9, and - (hyphen). Upper case letters, underscores (_), and other special characters are not allowed in domain names.

4.

On the CONFIGURE INDEX step, select Manual Configuration and click Continue. You can configure index fields and access policies when you first create the domain, or simply create a domain and configure it later. For more information about using the Amazon CloudSearch console to configure the domain, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53) and Configuring Access for an Amazon CloudSearch Domain (p. 32).

5.

On the REVIEW INDEX CONFIGURATION step, click Continue to configure the index fields later. For more information about configuring index fields, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).

6.

On the SET UP ACCESS POLICIES step, click Continue to set up access policies later. For more information about configuring access policies, see Configuring Access for an Amazon CloudSearch Domain (p. 32).

API Version 2011-02-01 29

Amazon CloudSearch Developer Guide AWS Management Console

Note
If you don't configure access policies, you will only be able to upload documents and submit search queries through the console. By default, the Document and Search endpoints are configured to block all IP addresses.

7.

On the CONFIRM step, review the domain configuration and click Confirm to create your domain.

8.

Once the domain has been created, click OK to exit the Create New Search Domain wizard and go to the domain's dashboard.

API Version 2011-02-01 30

Amazon CloudSearch Developer Guide API

API
You use the CreateDomain (p. 128) configuration action to create new domains. For example:
https://cloudsearch.us-east-1.amazonaws.com?Action=CreateDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120328/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-28T21:54:28.711Z &X-Amz-SignedHeaders=host &X-Amz-Signature=f5f82e71838707de1f72bfc42cc021e0324e1befa5df7c39c2ac25c61b3c8dcb

Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).

API Version 2011-02-01 31

Amazon CloudSearch Developer Guide

Configuring Access for an Amazon CloudSearch Domain


Access to a search domain is restricted by IP address so that only authorized hosts can submit documents and send search requests. You can authorize individual IP addresses or address ranges (subnets). IP addresses are specified in the standard Classless Inter-Domain Routing (CIDR) format, for example 10.24.34.0/24 specifies the range 10.24.34.0 - 10.24.34.255, while 10.24.34.0/32 specifies the single IP address 10.24.34.0. For more information about CIDR notation, see RFC 4632. Amazon CloudSearch access policies are specified using the AWS Identity and Access Management (IAM) Access Policy Language. When you first create a search domain, it will not accept search or document service requests from any IP addresses. You must authorize specific IP addresses or address ranges before you can submit requests to your domain's endpoints through the command line tools or Amazon CloudSearch APIs. After you authorize one or more IP addresses, you can start uploading documents (p. 72) with the cs-post-sdf command or by sending requests to the documents/batch API. After you upload your documents, you can start submitting search requests (p. 80). You can also define access policy rules to deny particular addresses and address ranges. Deny rules take precedence over Allow rules.

Note
IP address authorization is used only to control access to the document and search APIs. The Amazon CloudSearch configuration API uses standard AWS authentication. If you don't know your computer's IP address, you can go to http://www.whatsmyip.org/ to find out what it is. Keep in mind that if you do not have a static IP address, you must re-authorize your computer whenever your IP address changes. If your IP address is assigned dynamically, it is also likely that you're sharing that address with other computers on your network. This means that when you authorize the IP address, all computers that share it will be able to access your search domain's document and search endpoints.

Note
If you have made changes to your domain that require indexing, changes to the domain's access policies will not take effect until it is re-indexed. If re-indexing is needed, it will be indicated in the response to your update access policies request and shown on the domain dashboard in the console.

API Version 2011-02-01 32

Amazon CloudSearch Developer Guide Command Line Tools

You can configure your access policies using the cs-configure-access-policies (p. 33) command, from the Amazon CloudSearch console (p. 34), or by uploading an IAM policy document with the UpdateServiceAccessPolicies (p. 35) configuration action.

Command Line Tools


You use the cs-configure-access-policies (p. 106) command to configure access to your domain's document and search endpoints. Access to each endpoint is configured separately. You can retrieve your domain's current policy document using the cs-configure-access-policies (p. 106) command by specifying the --retrieve option. For information about installing and setting up the Amazon CloudSearch command line tools, see Amazon CloudSearch Command Line Tool Reference (p. 103).

To configure access to your domain's document and search endpoints


1. Run the cs-configure-access-policies command in the --update mode to create a policy document that allows access to the document and search services. You specify the IP address or address range you want to authorize using the --allow option. To block specified addresses or address ranges, use the --deny option. You specify which service(s) you want to configure using the --service option to specify doc, search, or all. For example:
cs-configure-access-policies --domain-name movies --update --allow 192.0.2.0 --service all ========================= Standardizing ip: 192.0.2.0; using: 192.0.2.0/32 [movies] Updating access policy: {"Version":"2011-10-11","Id":"34f11d91-88d9-4e15-8ebe-05dffef103c6", "Statement":[{"Sid":"1","Effect":"Allow","Action":"*", "Resource":"arn:aws:cs:us-east-1:598352442322:search/movies", "Condition":{"IpAddress":{"aws:SourceIp":["192.0.2.0/32"]}}}, {"Sid":"2","Effect":"Allow","Action":"*", "Resource":"arn:aws:cs:us-east-1:598352442322:doc/movies", "Condition":{"IpAddress":{"aws:SourceIp":["192.0.2.0/32"]}}}]}

Note
The Action name in the policy document is always set to the wildcard character (*). There are no specific action names supported at this time. When prompted, enter y to confirm that you want to update the access policies for your domain.
Really update access policies for [movies] y/N: y Your access policy update may take a few minutes to complete and its state will change to Active when complete. To check the state, use cs-configure-access-policies --retrieve-policy --service all

2.

The --update option merges the specified policy rules with the existing policy document and uploads the revised policy document to the domain.

API Version 2011-02-01 33

Amazon CloudSearch Developer Guide AWS Management Console

AWS Management Console


The Amazon CloudSearch console enables you to easily add access policy rules to authorize or block particular IP addresses or address ranges. The console provides four shortcuts for defining access policy rules: Recommended rulesenables anyone to search your data, but only you will be able to add and delete documents. Your domain's search endpoint will be reachable from any IP address, but only you will have access to the document endpoint. Allow only my IP address access to all servicesonly you will be able to search your data and add and delete documents. Your domain's endpoints will not be reachable from any other IP address. Allow everyone access to all servicesenables anyone to search your data and add and delete documents from your domain. Your domain's endpoints will accessible from any IP address. Deny everyone access to all servicesyour domain's document and search endpoints will not be directly accessible. You can only upload documents or submit search requests through the Amazon CloudSearch console. You can start with one of the shortcuts, and add additional rules to fine-tune access to your domain's endpoints. Deny rules take precedence over allow rules.

Note
When you use the shortcuts, your IP address is automatically detected. If it's not correct or not the address you want to authorize, you can modify it before submitting your changes. You might need to work with your IT department to determine which IP addresses to authorize.

To add access policy rules


1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain you want to configure, and then click the domain's Access Policies link.

3.

On the domain's Access Policies page, choose one of the shortcuts or enter the IP addresses you want to authorize or block. To add additional IP addresses or address ranges to the rule, click the add (+) icon in the IP Ranges column. To remove an address or range from the rule, click its delete (-) icon in the IP Ranges column. To add a new rule to the policy, click the Add a New Rule button. To remove a rule from the policy, click the remove (x) button in the Remove column.

API Version 2011-02-01 34

Amazon CloudSearch Developer Guide API

4.

When you are done making changes to your access policy rules, click Submit. To exit without saving your changes, click Revert.

API
You use the UpdateServiceAccessPolicies (p. 146) configuration action to upload an IAM policy document that defines the access policies for your domain's document and search endpoints. For example:
https://cloudsearch.us-east-1.amazonaws.com ?AccessPolicies={"Statement": [ {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:360924696794:search/movies", "Condition": { "IpAddress": { "aws:SourceIp": ["192.0.2.0/32"] } }}, {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:360924696794:doc/movies", "Condition": { "IpAddress": { "aws:SourceIp": ["192.0.2.0/32"] } }} ] } &Action=UpdateServiceAccessPolicies &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120330/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-30T19:27:45.110Z &X-Amz-SignedHeaders=host &X-Amz-Signature=801de749ab11a669925246f3d9454eee1dbc319f3352 3a4eb35a36ec93764e7d

API Version 2011-02-01 35

Amazon CloudSearch Developer Guide API

Note
For readability, the request is shown without URL-encoding. Keep in mind that Amazon CloudSearch configuration requests must be URL-encoded. Configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23). A policy document for Amazon CloudSearch contains a collection of statements that allow or deny access to the search and document service endpoints based on IP address. Note that the Action name is always set to the wildcard character (*). There are no specific action names supported at this time. You can retrieve your domain's current policy document with the DescribeServiceAccessPolicies (p. 139) action. Access to each endpoint is configured separately. For example:
{ "Statement":[{ "Effect":"Allow", "Action":"*", "Resource":"arn:aws:cs:us-east-1:123456789012:doc/movies", "Condition":{ "IpAddress":{ "aws:SourceIp":"192.0.2.0/24" } } }, { "Effect":"Allow", "Action":"*", "Resource":"arn:aws:cs:us-east-1:123456789012:search/movies", "Condition":{ "IpAddress":{ "aws:SourceIp":"192.0.2.0/24" } } } ] }

The Amazon Resource Name (ARN) for a domain's endpoints is of the form: arn:aws:cs:us-east-1:awsaccountid:service/domain The service can be either doc or search. The domain is the name of the domain for which you are configuring access. You can get a domain's ARNs with the DescribeDomains configuration action or the cs-describe-domains command. For more information, see Getting Information about an Amazon CloudSearch Domain (p. 37).

API Version 2011-02-01 36

Amazon CloudSearch Developer Guide Command Line Tools

Getting Information about an Amazon CloudSearch Domain


You can retrieve the following information about each of your search domains: Domain Namethe name of the domain. Document Endpointthe endpoint through which you can submit document updates. Search Endpointthe endpoint through which you can submit search requests. Searchable Documentsthe number of documents that have been indexed. Index Fieldsthe name and type of each configured index field. Rank Expressionsthe name and type of each rank expression.

When a domain is first created, the domain status will indicate that the domain is currently being activated and no other information will be available. Once your domain's document and search endpoints are available, the domain status will show the endpoint addresses you use to add data and submit search requests. If you haven't submitted any data for indexing, the number of searchable documents will be zero. You can get domain information using the cs-describe-domain (p. 37) command, from the Amazon CloudSearch console (p. 38), or using the DescribeDomains (p. 41) configuration action. This section also shows how you can view your domain's access policies and text options through the console. For information about accessing them through the command line tools or API, see Configuring Access for an Amazon CloudSearch Domain (p. 32) and Configuring Text Options for an Amazon CloudSearch Domain (p. 63).

Command Line Tools


You use the cs-describe-domain (p. 118) command to get information about your search domains. For information about installing and setting up the Amazon CloudSearch command line tools, see Amazon CloudSearch Command Line Tool Reference (p. 103).

API Version 2011-02-01 37

Amazon CloudSearch Developer Guide AWS Management Console

To get domain information


Run the cs-describe-domain command to get information about all of your domains. To get information about a specific domain, use the --domain-name option to specify the domain you are interested in. To include information about all of the domain's configured index fields and rank expressions, specify the --show-all option. For example, to get all of the available information about the domain named movies:
cs-describe-domain --domain-name movies --show-all === Domain Summary === Domain Name: movies Document Service endpoint: doc-movies-h2pc7ftfnsdlqh6pqqawbftrhu.us-east-1. cloudsearch.amazonaws.com Search Service endpoint: search-movies-h2pc7ftfnsdlqh6pqqawbftrhu.us-east1. cloudsearch.amazonaws.com SearchInstanceType: search.m1.small SearchPartitionCount: 1 SearchInstanceCount: 1 Searchable Documents: 5208 Current configuration changes require a call to IndexDocuments: No === Domain Configuration === Access Policies: ================ State: Active {"Version":"2011-10-11","Id":"34f11d91-88d9-4e15-8ebe-05dffef103c6", "Statement":[{"Sid":"1","Effect":"Allow","Action":"*", "Resource":"arn:aws:cs:us-east-1:598352442322:search/movies", "Condition":{"IpAddress":{"aws:SourceIp":["207.171.191.60/32"]}}}, {"Sid":"2","Effect":"Allow","Action":"*", "Resource":"arn:aws:cs:us-east-1:598352442322:doc/movies", "Condition":{"IpAddress":{"aws:SourceIp":["207.171.191.60/32"]}}}]}

Fields: ======= actor director genre title year ======================

Active Active Active Active Active

text text literal text uint

(Result) (Result) (Search Facet) (Result) ()

AWS Management Console


The Amazon CloudSearch console enables you to easily view information about all of your domains. The console's CloudSearch Dashboard shows a summary of all of the domains that you have created.

API Version 2011-02-01 38

Amazon CloudSearch Developer Guide AWS Management Console

To view detailed information about a particular domain


1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. Click the name of the domain in the Navigation panel. The Domain Dashboard shows the status summary for the selected domain.

3.

To view the index fields configured for the domain, click the domain's Indexing Options link in the Navigation panel. (For more information about index fields, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).)

API Version 2011-02-01 39

Amazon CloudSearch Developer Guide AWS Management Console

4.

To view the rank expressions configured for the domain, click the domain's Rank Expressions link in the Navigation panel. (For information about rank expressions, see Customizing Result Ranking with Amazon CloudSearch (p. 98).)

5.

To view the access policies configured for the domain, click the domain's Access Policies link in the Navigation panel. (For information about access policies, see Configuring Access for an Amazon CloudSearch Domain (p. 32).)

API Version 2011-02-01 40

Amazon CloudSearch Developer Guide API

6.

To view the stopwords, synonyms, and stemmming options configured for the domain, click the domain's Text Options link in the Navigation panel. (For information about text options, see Configuring Text Options for an Amazon CloudSearch Domain (p. 63).)

API
You use the DescribeDomains (p. 136) configuration action to get information about your domains. To get information about specific domains, specify the DomainNames parameter. For example, to get information about the movies and imdb-movies domains:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DescribeDomains

API Version 2011-02-01 41

Amazon CloudSearch Developer Guide API

&DomainNames.member.1=movies &DomainNames.member.2=imdb-movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120330/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-30T20:28:51.229Z &X-Amz-SignedHeaders=host &X-Amz-Signature=d8a4b3098bb37b73c48398db57315b272b92cbfcd 6b22ad1718c599b47466aea

If you omit the DomainNames parameter, DescribeDomains returns a summary of all your domains.

Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).

API Version 2011-02-01 42

Amazon CloudSearch Developer Guide Command Line Tools

Deleting an Amazon CloudSearch Domain


If you are no longer using a search domain, you must delete it to avoid incurring additional usage fees. Deleting a domain deletes the index associated with the domain and takes the domain's document and search endpoints offline permanently. It can take several minutes to completely remove a domain and all of its resources. You can delete a domain using the cs-delete-domain (p. 43) command, from the Amazon CloudSearch console, or using the DeleteDomain (p. 45) configuration action.

Command Line Tools


You use the cs-delete-domain (p. 117) command to delete a search domain and all of its resources. For information about installing and setting up the Amazon CloudSearch command line tools, see Amazon CloudSearch Command Line Tool Reference (p. 103).

To delete a domain
1. Run the cs-delete-domain command and specify the name of the domain you want to delete. For example, to delete the movies domain:
cs-delete-domain --domain-name movies

2.

When prompted, enter y to confirm that you want to delete the domain.
Really delete [movies] (y/N): y

AWS Management Console


You can easily delete a domain from the domain dashboard in the Amazon CloudSearch console.

API Version 2011-02-01 43

Amazon CloudSearch Developer Guide AWS Management Console

To delete a domain
1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain you want to delete.

3.

On the Domain Dashboard, click the Delete this Domain button.

4.

In the Delete Domain dialog box, enable the checkbox and click OK to confirm that you want to delete the domain.

API Version 2011-02-01 44

Amazon CloudSearch Developer Guide API

API
You use the DeleteDomain (p. 132) configuration action to remove a domain and all of its resources. The domain you want to delete is specified in the DomainName parameter. For example, to delete the movies domain:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DeleteDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120330/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-30T20:39:24.716Z &X-Amz-SignedHeaders=host &X-Amz-Signature=d17fcb306c5466cba0264d911889b4408082767a68afc8635a613d9f c6196a9f

Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).

API Version 2011-02-01 45

Amazon CloudSearch Developer Guide Mapping Document Data to Index Fields

Preparing Your Data for Amazon CloudSearch


Topics Mapping Document Data to Index Fields in Amazon CloudSearch (p. 46) Creating SDF Batches in Amazon CloudSearch (p. 47) Before you can submit data to Amazon CloudSearch for indexing, you must describe it according to the Search Data Format (SDF). In SDF, each item that you want to be able to receive as a search result is represented as a document. Every document has a unique id (docid), a version number, and one or more fields that contain the data that you want to search and return in results. These document fields are used to populate the index fields you configure for your domain. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). The Mapping Document Data (p. 46) section discusses how your source data relates to your domain's indexing options. Creating SDF Batches (p. 47) describes how to format your data in SDF. For a detailed description of the SDF JSON and XML schemas, see the Amazon CloudSearch Document Service API Reference (p. 174).

Mapping Document Data to Index Fields in Amazon CloudSearch


You use the document fields defined in your SDF to populate the fields in your index. Any fields that occur in a document that are not used as a source for at least one index field are ignored and will not be searchable or returnable. Documents can contain a subset of the fields configured for the domainevery document does not have to contain all fields. A document field can contain multiple values. When all sources are mapped to an index field, the total number of values in the index field cannot exceed 100. At search time, the document is returned as a hit if any of those values match the search query. Document fields can contain alphanumeric or unsigned integer data. You can map unsigned integer data to a uint index field and use it to construct rank expressions and enable searches within a range of values.

API Version 2011-02-01 46

Amazon CloudSearch Developer Guide Creating SDF Batches

You can map alphanumeric data to either text fields or literal fields. A document field can contain up to 1 MB of data. A uint field contains a 32-bit unsigned integer. If you're mapping timestamps to a uint field, you have to strip off the milliseconds or the timestamp will overflow the uint field. Uint fields are always searchable and can always be returned in search results and used as facets. A text field contains arbitrary alphanumeric data such as a name, description, or even the entire body of a document. Text fields are always searchable. They are tokenized during indexing and Amazon CloudSearch performs additional text processing on them according to the stopwords, synonyms, and stems you configure in your domain's text options. The contents of a text field can also be returned in search results or the field can be used as a facet, but not both. Amazon CloudSearch can return up to 2 KB of data from a text fieldif the field contents exceed 2 KB, only the first 2 KB is included in the results. If a search request does not specify what field to search, by default Amazon CloudSearch searches all text fields. You can control what fields are searched by default by defining your own default search field. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). A literal field contains an identifier or other data that you want to be able to match exactly. Unlike text fields, they are not tokenizedAmazon CloudSearch does not perform any text processing on literal fields. Literal fields can be used for fields that have a small set of possible values, as well as for more arbitrary values like email addresses or brand names where an exact match is important. Literal fields are frequently used to enable faceted searches where you want to count the number of exact matches for a particular value.

Creating SDF Batches in Amazon CloudSearch


You create SDF batches to describe the data that you want to make searchable. When you send SDF batches to a domain, the data is indexed automatically according to the domain's indexing and text options. An SDF batch is a collection of add and delete operations that represent the documents you want to add, update, or delete from your domain. SDF batches can be described in either JSON or XML. The maximum batch size is 5 MB. The maximum size of an individual document is 1 MB. If you have a large number of documents, you must send updates in 5 MB batches. For each document in a batch, you must specify: The operation you want to perform: add or delete. A unique ID for the document (docid). A document ID can contain the following characters: a-z (lower-case letters), 0-9, and _ (underscore). Document IDs cannot begin with an underscore. A document version number for the add or delete operation. The version is used to guarantee that older updates aren't accidentally applied, and to provide control over the ordering of concurrent updates to the service. The document service guarantees that the update with the highest version will be applied and remain there until an add or delete operation with a higher version number and the same document ID is received. If you submit multiple add or delete operations with the same version number, which one takes precedence is undefined. You must increase the version number every time you submit a new add or delete operation for a document. For more information, see Document Versions in Amazon CloudSearch (p. 49). The document language as a two-letter language code, such as en for English. (Add operations only.) A name-value pair for each document field. When specifying SDF in JSON, the value for a field cannot be null. (Add operations only.) For example, the following JSON SDF batch adds one document and deletes one document:

API Version 2011-02-01 47

Amazon CloudSearch Developer Guide Creating SDF Batches

[ { "type": "add", "id": "tt0484562", "version": 1, "lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances", "Crewson, Wendy","Ludwig, Alexander","Cosmo, James", "Warner, Amelia","Hickey, John Benjamin","Piddock, Jim", "Lockhart, Emma"] } }, { "type": "delete", "id": "tt0484575", "version": 2 } ]

Uploading SDF batches that contain invalid JSON or XML will produce unpredictable results. Processing stops when an error is encountered, but the preceding add and delete operations are applied to the domain.You can verify the validity of your JSON or XML data using tools such as xmllint and jsonlint. Both JSON and XML batches can only contain UTF-8 characters that are valid in XML. Valid characters are the control characters tab (0009), carriage return (000D), and line feed (000A), and the legal characters of Unicode and ISO/IEC 10646. FFFE, FFFF, and the surrogate blocks D800DBFF and DC00DFFF are invalid and will cause errors. (For more information, see Extensible Markup Language (XML) 1.0 (Fifth Edition).) You can use the following regular expression to match invalid characters so you can remove them: /[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]/ . When formatting SDF in JSON, quotes (") and backslashes (\) within field values must be escaped with a backslash. For example:
"title":"Where the Wild Things Are" "isbn":"0-06-025492-0" "image":images\\covers\\Where_The_Wild_Things_Are_(book)_cover.jpg comment":"Sendak's \"Where the Wild Things Are\" is a children's classic."

When formatting SDF in XML, ampersands (&) and less-than symbols (<) within field values need to be represented with the corresponding entity references (&amp; and &lt;). For example:
<field name="title">Little Cow &amp; the Turtle</field> <field name="isbn">0-84466-4774</field> <field name="image">images\covers\Little_Cow_&amp;_the_Turtle_(book)_cov er.jpg</field> <field name="comment">&lt;insert comment></field>

If you have large blocks of user-generated content, you might want to wrap the entire field in a CDATA section, rather than replacing every occurrence with the entity reference. For example:

API Version 2011-02-01 48

Amazon CloudSearch Developer Guide Document Versions

<field name="comment">&lt;!CDATA[Monsters & mayhem--what's not to like! ]]>

The command line tools and Amazon CloudSearch console include an experimental mechanism for automatically generating SDF from a variety of source documents.

Document Versions in Amazon CloudSearch


In some cases, you might have multiple processes submitting batches of documents to your domain. Because it's possible that these batches could contain changes to the same document, and that the batches might be received out of order, Amazon CloudSearch always applies the add or delete operation that has the highest version number. If the version number specified for a document is lower than in a previously-received operation, the change is ignored. If the version number specified for a document is the same as in a previously-received operation, the results are undefined. There's no way to predict which operation will take precedence. This means that to update or delete an existing document, you must specify a new, larger version number. You do not specify the version that you want to update or delete. To successfully update and delete documents, you have to keep track of the version numbers. One common approach is to use timestamps for versioning. Keep in mind that the version is stored as a 32-bit unsigned integer. If you're versioning with timestamps, you have to strip off the milliseconds or the timestamp will overflow. If you want to be able to query the index for your documents' version numbers, create a version field and populate it with the current version each time you update a document. When deleting documents, note that deleting version max(uint32_t) will permanently remove the document from your domain. Because it is not possible to specify a higher version number, there is no way to add a later version of the document.

Adding and Updating Documents in Amazon CloudSearch


An add operation specifies either a new document that you want to add to the index or an existing document that you want to update. When you add or update a document, you specify the document's ID, a new version number, the document language, and all of the fields the document contains. If a document field matches the name of an index field, it is automatically used as the source for that index field. You can also explicitly map one or more document fields to an index field. You don't have to specify every configured field for every documentdocuments can contain a subset of the configured fields. Document fields that are not used as a source for at least one index field are ignored during indexing.

Note
You must specify a new, larger version number every time you add or update a document. For more information, see Document Versions in Amazon CloudSearch (p. 49).

To add a document to a search domain


1. Specify an add operation in your SDF data that contains the id of the document you want to add and each of the fields that you want to be able to search or return in results. If you are updating an existing document, the version number must be greater than the document's current version number for the update to be applied. For example, the following operation would add version 1 of document tt0484562:

API Version 2011-02-01 49

Amazon CloudSearch Developer Guide Deleting Documents

[ { "type": "add", "id": "tt0484562", "version": 1, "lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances", "Crewson, Wendy","Ludwig, Alexander","Cosmo, James", "Warner, Amelia","Hickey, John Benjamin","Piddock, Jim", "Lockhart, Emma"] } } ]

2.

Send the SDF data to your domain. You can submit data updates through the Amazon CloudSearch console, using the cs-post-sdf command, or by posting a request directly to the domain's document service endpoint. For more information, see Uploading Data to an Amazon CloudSearch Domain (p. 72).

Deleting Documents in Amazon CloudSearch


A delete operation specifies a document that you want to remove from a domain's index. Once a document is deleted, it will no longer be searchable or returned in results. When deleting a document, you must specify a new, larger version number.You do not specify the version that you want to delete. For more information, see Document Versions in Amazon CloudSearch (p. 49).

Note
When posting SDF updates to delete documents, you have to specify each document that you want to delete. If you want to start over with an empty domain that has the same configuration, you can use the console to clone the domain. For more information, see Cloning an Existing Domain's Indexing Options (p. 59).

To delete a document from a search domain


1. Specify a delete operation in your SDF data that contains the id of the document you want to remove and an updated document version number. The version number must be greater than the document's current version number for the document to be deleted. For example, the following operation would remove version 1 of document tt0484575:
[ { "type": "delete", "id": "tt0484575", "version": 2 } ]

2.

Send the SDF data to your domain. You can submit data updates through the Amazon CloudSearch console, using the cs-post-sdf command, or by posting a request directly to the domain's document service endpoint. For more information, see Uploading Data to an Amazon CloudSearch Domain (p. 72).
API Version 2011-02-01 50

Amazon CloudSearch Developer Guide Generating SDF

Generating SDF from Your Source Data (Experimental) in Amazon CloudSearch


The command line tools and Amazon CloudSearch console include an experimental mechanism for automatically generating SDF batches from several common file types: PDF, Microsoft Excel, Microsoft PowerPoint, Microsoft Word, CSV, text, HTML, JSON, and XML. When you generate SDF from a collection of files, your data is analyzed and an add document operation is created for each file. The file contents (and metadata if available) are parsed into one or more index fields. If you are generating SDF from CSV files, by default each row in the CSV file is treated as a separate document, and each column is treated as a field. If there is a column header called docid, the values in that column are used as the document IDs; otherwise, unique IDs are generated based on the filename and row number.

Note
Currently, only CSV files are parsed to automatically extract custom field data and generate multiple documents. When processing XML and JSON files, each file is treated as a single document and the contents of the file are used to populate a single text field.

Command Line Tools


You can use the cs-generate-sdf command to automatically generate SDF from local files or data stored in Amazon S3. To process multiple files, you can specify multiple --source options, or use wildcards in the path or Amazon S3 URI that specifies the location of your source data.

To generate SDF
Run the cs-generate-sdf command. You must specify the --source option and either the --output option or the --domain option. If you are updating documents, you can use the --modified-after option to restrict processing to files or Amazon S3 objects modified after a particular time.You can also specify other options to control how the source data is parsed. For more information, see cs-generate-sdf (p. 122).
cs-generate-sdf --source c:\myAmazingDataSet\* --modified-after 2012-03-28T00:00 --output c:\myAmazingDataSet\SDF

To generate SDF from CSV data


Run the cs-generate-sdf command. By default, when processing CSV files, each row will be parsed as a separate document. (If you specify the --single-doc-per-csv option when processing CSV files, each file will be treated as a single document.)
cs-generate-sdf --source c:\myAmazingDataSet\data1.csv --output c:\myAmazingDataSet\SDF

Note
If you are processing multiple files, CSV files are processed as one document per row, and non-CSV files are processed as one document per file.

API Version 2011-02-01 51

Amazon CloudSearch Developer Guide Generating SDF

AWS Management Console


When you upload source documents through the Amazon CloudSearch console, they are automatically converted to SDF. You can use the console to upload up to 5 MB of data at a time. If you choose, you can download the generated SDF file. For more information about uploading data through the console, see Uploading Data to an Amazon CloudSearch Domain (p. 72).

API Version 2011-02-01 52

Amazon CloudSearch Developer Guide

Configuring Index Fields for an Amazon CloudSearch Domain


Topics Adding Sources for an Amazon CloudSearch Index Field (p. 54) Command Line Tools (p. 55) AWS Management Console (p. 56) API (p. 62) Each document that you add to your search domain has a collection of fields that contain the data that can be searched or returned. The value of a field can be either text or a number. Every document must have a unique document ID, a version number, and at least one field. In your domain configuration, you define all of the document fields you want to include in your index. Any fields that occur in a document that are not part of your domain configuration are ignored and will not be searchable or returnable. Documents can contain a subset of the fields configured for the domainevery document does not have to contain all fields.

Note
By default, if no search field is specified in a search request, Amazon CloudSearch searches all text fields configured for the domain.You can change this behavior by specifying a default search field for the domain using the UpdateDefaultSearchField (p. 144) configuration action. Amazon CloudSearch supports three types of index fields: texta text field contains arbitrary alphanumeric data. A text field is always searchable. The value of a text field can either be returned in search results or the field can be used as a facet. By default, text fields are neither result-enabled or facet-enabled. literala literal field contains an identifier or other data that you want to be able to match exactly. The value of a literal field can be returned in search results or the field can be used as a facet, but not both. By default, literal fields are not search-enabled, result-enabled, or facet-enabled. uinta uint field contains an unsigned integer value. Uint fields are always searchable, the value of a uint field can always be returned in results, and faceting is always enabled. Uint fields can also be used in rank expressions.

API Version 2011-02-01 53

Amazon CloudSearch Developer Guide Adding Sources for a Field

Note
If your document data contains a text or literal field whose value you want to be able to return in results and also use as a facet, you can use the document field as a source for two different index fields and make one returnable, and enable faceting for the other. When configuring index fields, you can specify: Whether literal fields can be searched Whether facets can be calculated for text or literal fields to enable filtering Whether the contents of a text or literal field can be returned in the search results A default value for the field Up to 20 data sources for the field

Note
Making text and literal fields result-enabled increases the size of your index, which can increase the cost of running your domain. When possible, it's best to retrieve large amounts of data from an external source, rather than embedding it in your index. Since it can take some time to apply document updates across the domain, critical data such as pricing information should be retrieved from an external source using the returned document IDs instead of returned from the index. Field names must begin with a letter and be at least 3 and no more than 64 characters long. The allowed characters are: a-z (lower-case letters), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be specified as field names. Adding Sources for an Amazon CloudSearch Index Field (p. 54) describes how document fields are used to populate your index fields.You can configure fields using the cs-configure-from-sdf or cs-configure-fields command (p. 55), through the Amazon CloudSearch console (p. 56), or using the DefineIndexField (p. 62) configuration action.

Adding Sources for an Amazon CloudSearch Index Field


A source is a field in the SDF document data that you want to use to populate the index field. A field can have up to 20 sources. If you don't specify a source, the source field with the same name as the index field is used as the source. You can specify sources to: Combine the contents of multiple fields into a single field. Strip common title words from a field so you can use it for sorting. Map values from the source to a different set of values in your new index field. When defining a source for an index field, you specify the name of the source field, a default value to use if the source field doesn't exist in the document data, and how you want to use the source field: Copytakes the data from the source field and puts it into the index field without any modifications. Copy is often used when you want to add multiple sources for an index field so you can easily search across all of the source fields. For example, you might copy the data from two source fields called actor and director into a single searchable field called people. Trim Titletakes the contents of the source field and removes common title prefixes such as "the", and then populates the index field with the trimmed data. For example, if the source field contains the

API Version 2011-02-01 54

Amazon CloudSearch Developer Guide Command Line Tools

title "The Catcher in the Rye", the trimmed version stored in the index field would be "Catcher in the Rye". Trim Title is often used to populate an index field you can use for sorting. Maptakes a key found in the source field and maps it to a value you want to store in the index field. For example, you might map the keys red and yellow to the value warm, and blue and green to the value cool.

Command Line Tools


The command line tools provide two ways to configure your domain's index fields. You can: Use the cs-configure-from-sdf (p. 116) command to analyze one or more SDF batches and automatically configure index fields for your domain based on the content. Use the cs-configure-fields (p. 109) command to create and configure individual index fields.

Configuring Index Fields from SDF


To automatically configure your domain's index fields
1. Run the cs-configure-from-sdf command to analyze one or more SDF batches and configure fields for your domain. (For information about how to create SDF batches, see Preparing Your Data for Amazon CloudSearch (p. 46).) For example, to configure fields for the movies domain based on the SDF batch defined in moviedata.json:
cs-configure-from-sdf --domain-name movies --source moviedata.json Detected source format for moviedata.json:json Analyzing moviedata.json ----------------------------------------------------------------------------Existing field configuration for the domain - imdb-movies : ----------------------------------------------------------------------------Detected field configurations from all the sources : genre literal (Search Facet) title text (Result) actor text (Result) director text (Result) ----------------------------------------------------------------------------New proposed field configuration for the domain - imdb-movies : genre literal (Search Facet) [NEW] title text (Result) [NEW] actor text (Result) [NEW] director text (Result) [NEW] -----------------------------------------------------------------------------

2.

When prompted, enter y to confirm that you want to configure your domain with the specified fields. (You can easily modify the configuration later through the console or using the cs-configure-fields command.)
Configure [imdb-movies] with analyzed fields y/N: y

API Version 2011-02-01 55

Amazon CloudSearch Developer Guide Configuring Individual Fields

Note
When you add or reconfigure index fields, you must rebuild your index for the changes to be visible in search results. For more information, see Indexing Document Data with Amazon CloudSearch (p. 77).

Configuring Individual Fields


To add an index field to your domain
Run the cs-configure-fields (p. 109) command and specify the name of the new field with the --name option, and the field type with the --type option. For example, to add a uint year field to the movies domain:
cs-configure-fields --domain-name movies --name year --type uint Updated 1 Index Field: year RequiresIndexDocuments uint ()

By default, the source for the index field is the source field name of the same name. You can specify up to 20 --source options to configure sources for the index field. The values from all of the specified sources are concatenated and copied to the index field.

Note
When you add fields or reconfigure existing fields, you need to explicitly issue a request to re-index your data when you are done making configuration changes. For more information, see Indexing Document Data with Amazon CloudSearch (p. 77).

AWS Management Console


You can easily configure individual index fields (p. 56) for your domain through the Indexing Options page in the Amazon CloudSearch console. You can also copy all of the index fields from an existing domain (p. 59) when you create a new domain.

Configuring Individual Fields


To configure a new index field
1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain that you want to configure, and then click the domain's Indexing Options link. To create a new index field, click Add Index Field to add a field specification to the list. (If you haven't created any fields yet, a blank field specification is shown on the Indexing Options page by default.)

API Version 2011-02-01 56

Amazon CloudSearch Developer Guide Configuring Individual Fields

4.

Specify a unique name for the field and select the field type: text, literal, or uint. Field names must begin with a letter and be at least 3 and no more than 64 characters long. The allowed characters are: a-z (lower-case letters), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be used as custom field names.

5. 6. 7. 8. 9.

To make a literal field searchable, enable the Search checkbox. To use a text or literal field as a facet to enable filtering, enable the Facet checkbox. (Note that the field can be facet-enabled, or result-enabled, but not both.) To allow a text or literal field value to be returned in search results, enable the Result checkbox. (Note that the field can be facet-enabled, or result-enabled, but not both.) Specify a default value for the field (optional). This value is used when no value is specified for the field in the document data. To add a source for the field: a. b. Click the add link in the Source column. In the Add Source dialog box, enter the name of the source field you want to use as a source for the specified index field.

API Version 2011-02-01 57

Amazon CloudSearch Developer Guide Configuring Individual Fields

c. d.

e.

Enter a default value to use for the index field if the specified source field doesn't exist in the document data (optional). Select a Transform Type to specify how the index field should be populated: Copy, Trim Title, Map. For more information about using source fields, see Adding Sources for an Amazon CloudSearch Index Field (p. 54). If you select the Map transform type, enter one or more key-value pairs to specify how you want to map the source data to the index field.

f.

Click Add to save your changes.

10. To configure additional fields, click Add Index Field and repeat these configuration steps. 11. When you are done configuring fields, click Submit to save your changes. To restore the previous field configurations, click Revert.

API Version 2011-02-01 58

Amazon CloudSearch Developer Guide Cloning an Existing Domain's Indexing Options

Cloning an Existing Domain's Indexing Options


During development and testing, you might want to create an empty domain that has the same index fields as an existing domain. You can do this through the Amazon CloudSearch console.

To clone a search domain:


1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. At the top of the Navigation panel, click the Create a New Domain button.

3.

On the NAME YOUR DOMAIN step, enter a name for the new domain and click Continue.

4.

On the CONFIGURE INDEX step, select Copy the configuration from another search domain, choose the domain you want to copy, and click Continue.

Note
This will only copy the domain's indexing options, access policies and text options are not copied from the specified domain.

API Version 2011-02-01 59

Amazon CloudSearch Developer Guide Cloning an Existing Domain's Indexing Options

5.

On the REVIEW INDEX CONFIGURATION step, make any ajustments you want and click Continue. For more information about configuring index fields, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).

6.

On the SET UP ACCESS POLICIES step, click Continue to select access policies for the new domain. Access policies are not automatically copied from the cloned domain. For more information about configuring access policies, see Configuring Access for an Amazon CloudSearch Domain (p. 32).

API Version 2011-02-01 60

Amazon CloudSearch Developer Guide Cloning an Existing Domain's Indexing Options

7.

On the CONFIRM step, review the domain configuration and click Confirm to create your domain.

8.

Once the domain has been created, click OK to exit the Create New Search Domain wizard and go to the domain's dashboard.

API Version 2011-02-01 61

Amazon CloudSearch Developer Guide API

API
You use the DefineIndexField (p. 129) configuration action to add field definitions to your domain configuration. If the specified field already exists, DefineIndexField replaces it. The type-specific options enable you to define a default value for a field, and enable or disable specific features for text and literal fields: FacetEnabledcontrols whether facets can be calculated for this field. Calculating facets determines how many documents contain matching values for the field. Facet counts are not automatically returned for facet-enabled fields; they must be explicitly requested at search time. (Uint fields are always facet-enabled.) ResultEnabledcontrols whether the contents of a text or literal field can be returned. (Uint fields are always returnable.) SearchEnabledcontrols whether a literal field is searchable. (Text and uint fields are always searchable.) For example, to create a uint index field called year and populate it with the data from the yearreleased field in the SDF data:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DefineIndexField &DomainName=movies &IndexField.IndexFieldName=year &IndexField.IndexFieldType=uint &IndexField.SourceAttributes.member.1.SourceDataCopy.SourceName=yearreleased &IndexField.SourceAttributes.member.1.SourceDataFunction=Copy &IndexField.UIntOptions.DefaultValue=0 &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120401/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-01T17:00:07.803Z &X-Amz-SignedHeaders=host &X-Amz-Signature=b291a01dd69a49e04f4a84862b38e0758c53cf93b76dd452cc802886b20 724bc

Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).

API Version 2011-02-01 62

Amazon CloudSearch Developer Guide Configuring Stemming

Configuring Text Options for an Amazon CloudSearch Domain


Topics Configuring Stemming in Amazon CloudSearch (p. 63) Configuring Stopwords in Amazon CloudSearch (p. 66) Configuring Synonyms in Amazon CloudSearch (p. 69) Amazon CloudSearch enables you to specify the following text options to control how your domain's data is indexed: Stemsmap related words to a common root word or stem. Stopwordswords that should be ignored during indexing and searching. Synonymswords that have the same meaning as words that occur in your data and should produce the same search results. Although the defaults work well in many cases, fine-tuning these dictionaries is one way to optimize the search results based on your knowledge of the data you are searching. When you modify a domain's text options, you must explicitly rebuild the index (p. 77) for the changes to be reflected in search results. During text processing, search terms are converted to lower-case, so stopwords, stems, and synonyms are not case-sensitive and should be specified in lower-case.

Configuring Stemming in Amazon CloudSearch


A stemming dictionary maps related words to a common stem. A stem is typically the root or base word from which variants are derived. For example, run is the stem of running and ran. During indexing, Amazon CloudSearch uses the stemming dictionary when it performs text-processing on text fields. At search time, the stemming dictionary is used to perform text-processing on the search request. This enables matching on variants of a word. For example, if you map the term running to the stem run and then search for running, the request matches documents that contain run as well as running.

API Version 2011-02-01 63

Amazon CloudSearch Developer Guide Command Line Tools

Stems are specified as a collection of term and stem pairs. When you configure stemming options, the existing stemming dictionary is replaced with the mappings you specify. By default, Amazon CloudSearch does not define any stems. However, some basic algorithmic stemming is always performed, such as removing plural suffixes. (This is done whether or not you specify a custom stemming dictionary.) The maximum size of a stemming dictionary is 500 KB. You can configure stems using the cs-configure-text-options (p. 64) command, from the Amazon CloudSearch console (p. 64), or using the UpdateStemmingOptions (p. 65) configuration action.

Command Line Tools


You can use the cs-configure-text-options (p. 113) command to upload a text file that contains a list of term and stem pairs.

To configure stemming options


1. Create a text file for your stemming dictionary and specify one comma-separated term, stem pair per line. For example: mice, mouse people, person running, run 2. Run the cs-configure-text-options command with the --stems option to upload the stemming dictionary to your domain:
cs-configure-text-options -d mydomain -stems stems.txt Updating Stemming options Read the stems file Sent 3 token stem pairs.

3.

If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain

AWS Management Console


You can configure a domain's stemming options from the Text Options panel in the Amazon CloudSearch console.

To configure stemming options


1. 2. 3. 4. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain, and then click the domain's Text Options link. In the Text Options panel, click the Stemming tab. For each term, stem pair you want to add to the stemming dictionary, enter the term and its stem and click the Add button. You can also edit the list directly or copy and paste the list into a text editor to make changes.
API Version 2011-02-01 64

Amazon CloudSearch Developer Guide API

5. 6.

Click Submit to save your changes. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.

API
Use the UpdateStemmingOptions (p. 148) configuration action to upload a JSON-formatted stemming dictionary to your domain. A stemming dictionary has a single JSON object with one property, stems. The value of the stems property is an object that contains a collection of string: value pairs that map terms to their stems:
{"stems": {"term1": "stem1", "term2": "stem2", "term3": "stem3"}}

For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=UpdateStemmingOptions &DomainName=movies &Stems={"stems": {"mice": "mouse", "people": "person", "running": "run"} } &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T21:43:50.884Z

API Version 2011-02-01 65

Amazon CloudSearch Developer Guide Configuring Stopwords

&X-Amz-SignedHeaders=host &X-Amz-Signature=4f7a17dc53fbd7e08b3d3a0c4d771466fe48d2739c8d6333ebe0261d 88941488

Configuring Stopwords in Amazon CloudSearch


Stopwords are words that should typically be ignored both during indexing and at search time because they are either insignificant or so common that including them would result in a massive number of matches. During indexing, Amazon CloudSearch uses the stopword dictionary when it performs text-processing on text fields. In most cases, stopwords are not included in the index. (If multiple stopwords appear consecutively in a text field, they are not filtered out. This enables searches for phrases such as "to be or not to be" to return valid results.) At search time, the stopword dictionary is used to perform text-processing on the search request. By default, Amazon CloudSearch defines the following stopwords for English (en): a an and are as at be but by for in is it of on or the to was You can configure stopwords using the cs-configure-text-options (p. 66) command, from the Amazon CloudSearch console (p. 67), or using the UpdateStopwordOptions (p. 68) configuration action.

Command Line Tools


You can use the cs-configure-text-options (p. 113) command to upload a stopword dictionary.

To configure stopwords
1. Create a text file that contains your stopword dictionary. In the file, specify one stopword per line. For example: the or and

API Version 2011-02-01 66

Amazon CloudSearch Developer Guide AWS Management Console

2.

Run the cs-configure-text-options command with the --stopwords option to upload the stopword dictionary to your domain.
cs-configure-text-options -d mydomain --stopwords stopwords.txt Updating stop words options Sent 3 stop words.

3.

If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain

AWS Management Console


You can configure a domain's stopword dictionary from the Text Options panel in the Amazon CloudSearch console.

To configure stopwords
1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain, and then click the domain's Text Options link. On the Stopwords tab, for each word you want to add to the stopword dictionary, enter it in the Add a Stopword field and click Add. You can also edit the list directly or copy and paste the list into a text editor to make changes.

API Version 2011-02-01 67

Amazon CloudSearch Developer Guide API

4. 5.

Click Submit to save your changes. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.

API
Use the UpdateStopwordOptions (p. 150) configuration action to upload a JSON-formatted stopword dictionary to your domain. A stopword dictionary has a single JSON object with one property, stopwords. The value of the stopwords property is an object that contains an array of strings:
{"stopwords": ["string1", "string2", "string3"]}

For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=UpdateStopwordOptions &DomainName=movies &Stopwords={"stopwords": ["a", "an", "the", "of"]} &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T21:47:23.216Z &X-Amz-SignedHeaders=host &X-Amz-Signature=47bc42cfb11561dffade2779bac6af7f6b53d76d2a729f9e62b2e528 d5eac319

API Version 2011-02-01 68

Amazon CloudSearch Developer Guide Configuring Synonyms

Configuring Synonyms in Amazon CloudSearch


You can configure synonyms for terms that appear in the data you are searching. That way, if a user searches for the synonym rather than the indexed term, the results will include documents that contain the indexed term. For example, you might want to configure synonyms so that a search for "Rocky Four" or "Rocky 4" will match the movie titled "Rocky IV". To do that, you would configure 4 and four as synonyms of the indexed term IV. If you want two terms to match the same documents, you must define them as synonyms of each other. For example: cat, feline feline, cat The synonym dictionary is used during indexing to configure mappings for terms that occur in text fields. No synonym processing is done on the search request. By default, Amazon CloudSearch does not define any synonyms. You can configure synonyms using the cs-configure-text-options (p. 69) command, from the Amazon CloudSearch console (p. 69), or using the UpdateSynonymOptions (p. 70) configuration action.

Command Line Tools


You can use the cs-configure-text-options (p. 113) command to upload a text file that contains your synonym dictionary.

To configure synonyms
1. Create a text file that contains your synonym dictionary. Each line in the file should specify a term followed by a comma-separated list of its synonyms. For example: cat, feline, kitten dog, canine, puppy horse, equine, colt 2. Run the cs-configure-text-options command with the --synonyms option to upload the synonym dictionary to your domain.
cs-configure-text-options -d mydomain --synonyms synonyms.txt

3.

If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain

AWS Management Console


You can configure a domain's synonyms from the Text Options panel in the Amazon CloudSearch console.

To configure synonyms
1. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home.
API Version 2011-02-01 69

Amazon CloudSearch Developer Guide API

2. 3. 4.

In the Navigation panel, click the name of the domain, and then click the domain's Text Options link. In the Text Options panel, click the Synonyms tab. For each term and synonym list that you want to add to your synonyms dictionary, enter the term in the Add a Term field and the comma-separated list of synonyms in the Synonyms field. You can also edit the list directly or copy and paste the list into a text editor to make changes.

5. 6.

Click Submit to save your changes. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.

API
Call UpdateSynonymOptions (p. 152) to upload a JSON-formatted synonym dictionary to your domain. A synonym dictionary has a single JSON object with one property, synonyms. The value of the synonyms property is a collection of string: value pairs that map each term to one or more synonyms. To map a term to multiple synonyms, specify the synonyms as an array of strings:
{"synonyms": { "term1": ["synonym1", "synonym2"], "term2": ["synonym1"], "term2": ["synonym1", "synonym2", "synonym3"] } }

For example:

API Version 2011-02-01 70

Amazon CloudSearch Developer Guide API

https://cloudsearch.us-east-1.amazonaws.com ?Action=UpdateSynonymOptions &DomainName=movies &Synonyms={"synonyms": { "cat": ["feline", "kitten"], "dog": ["canine","puppy"]}} &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T21:56:03.214Z &X-Amz-SignedHeaders=host &X-Amz-Signature=cf9a49e63e25b0131da11c9dccb4c648ff243c01ea4282d14d88e2c6 ec414523

API Version 2011-02-01 71

Amazon CloudSearch Developer Guide Command Line Tools

Uploading Data to an Amazon CloudSearch Domain


To make your data searchable, you must describe it according to the Search Data Format (SDF) and upload the resulting SDF batches to a search domain. Amazon CloudSearch can then generate a search index from your SDF data according to the index fields and text options that you have configured for the domain. As your data changes, you submit SDF updates to add, change, or delete documents from your index. Amazon CloudSearch applies data updates continuously, so your changes become searchable in near real-time. Amazon CloudSearch ensures that the most recent changes are applied to your domain using the document version numbers specified in the SDF add and delete operations. The operation with the greatest version number always takes precedence. To be applied, the version number in the add or delete operation must be greater than the document's current version number in the index. If the version number in an add or delete operation is less than the document's current version number, the operation is ignored. If an operation specifies the same document version that already exists in the index, the result is undefinedthere's no guarantee which one will take precedence.

Important
To successfully upload SDF data to your domain, it has to be valid JSON or XML and conform to the SDF data conventions. For information about creating SDF batches, see Preparing Your Data for Amazon CloudSearch (p. 46). For information about configuring index fields for a domain, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). You can submit SDF data to a domain using the cs-post-sdf (p. 72) command, from the Amazon CloudSearch console (p. 73), or by posting it directly (p. 76) to the domain's Document endpoint.

Command Line Tools


You use the cs-post-sdf (p. 121) command to send SDF data to your search domain. The SDF batches can be local or stored in Amazon S3. For information about installing and setting up the Amazon CloudSearch command line tools, see Amazon CloudSearch Command Line Tool Reference (p. 103).

API Version 2011-02-01 72

Amazon CloudSearch Developer Guide AWS Management Console

To send data to a domain for indexing


1. 2. If you haven't already, prepare your data according to the SDF schema. For more information about generating SDF, see Preparing Your Data for Amazon CloudSearch (p. 46). Run the cs-post-sdf command to upload your SDF data to your domain.You must specify at least one --source option to specify the location of the SDF data you want to upload.
cs-post-sdf -d mydomain --source data1.sdf Processing: data1.sdf Detected source format for data1.sdf as json Status: success Added: 5208 Deleted: 0

AWS Management Console


In the Amazon CloudSearch console, you can upload data to your domain from the domain dashboard. The console can automatically convert the following types of files to SDF during the upload process: Comma Separated Value (.csv) Adobe Portable Document Format (.pdf) HTML (.htm, .html) Microsoft Excel (.xls, .xlsx) Microsoft PowerPoint (.ppt, .pptx) Microsoft Word (.doc, .docx) Text Documents (.txt) JSON Documents (.json) XML Documents (.xml)

CSV files are parsed row-by-row and a separate document is generated for each row. All other types of files are treated as a single document. For more information about automatically generating SDF, see Preparing Your Data for Amazon CloudSearch (p. 46). You can also upload SDF batches through the Amazon CloudSearch console.

To send data to a domain for indexing


1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain. At the top of the domain dashboard, click Upload Documents.

API Version 2011-02-01 73

Amazon CloudSearch Developer Guide AWS Management Console

4.

Select the location of the data you want to upload to your domain: File(s) on my local disk Object(s) from Amazon S3 Predefined data

Note
If you upload data in a format other than SDF, it will automatically be converted to SDF during the upload process.

5.

If you are uploading local files, click Browse to choose the file(s) to upload:

API Version 2011-02-01 74

Amazon CloudSearch Developer Guide AWS Management Console

6.

If you are uploading objects from Amazon S3, select the bucket you want to upload from. To upload the entire contents of the bucket, leave the Prefix field empty and click Add. To upload selected objects, enter a filter in the Prefix field and click Add. (You can add multiple prefixes.)

7.

If are uploading predefined sample data, choose the data set that you want to use:

8.

Once you've selected the data you want to upload, click Continue.

9.

On the Review Documents step, review the documents to be uploaded and click Upload Documents to continue.

API Version 2011-02-01 75

Amazon CloudSearch Developer Guide API

10. On the Document Summary step, if SDF batches have been automatically generated from your data, you can click Download the generated SDF files to get them. Click Finish to return to the domain dashboard.

API
You use the documents/batch (p. 174) document service API to post SDF data to your domain to add, update, or remove documents. For example:
curl -X POST --upload-file data1.sdf doc.movies-123456789012.us-east-1.cloud search.amazonaws.com/2011-02-01/documents/batch --header "Content-Type:applica tion/json"

API Version 2011-02-01 76

Amazon CloudSearch Developer Guide Command Line Tools

Indexing Document Data with Amazon CloudSearch


When you send document updates to your domain, Amazon CloudSearch automatically updates the domain's search index with the new data in near real time. You don't have to do anything for the updates to be indexed. However, if you change the configuration of your domain's index fields or text options, you must explicitly rebuild your search index for those changes to be visible in search results. Because rebuilding the index can take a significant amount of time if you have a lot of data, you should finish making all of your configuration changes before re-indexing your documents. When you make changes that require re-indexing, the domain status changes to NEEDS INDEXING. While the index is being rebuilt, the domain's status is PROCESSING.You can continue to submit search requests while indexing is in process, but the configuration changes won't be visible in search results until indexing completes and the domain's status changes to ACTIVE.

Note
Depending on the volume of data, building a full index can take a considerable amount of compute power. Amazon CloudSearch automatically manages the resources needed to build the index in a timely fashion. Most data updates and simple domain configuration changes are built and deployed in minutes. Indexing large volumes of data and applying configuration changes that require rebuilding the full index will take longer to complete. You can initiate indexing using the the cs-index-documents (p. 77) command, from the Amazon CloudSearch console (p. 78), or using the IndexDocuments (p. 79) configuration action.

Command Line Tools


You use the cs-index-documents (p. 120) command to rebuild your domain's search index.

To explicitly index your domain


Run the cs-index-documents command. For example, to rebuild the index for a domain called movies:

API Version 2011-02-01 77

Amazon CloudSearch Developer Guide AWS Management Console

cs-index-documents --domain-name movies =========================================== Indexing documents for domain [movies] Now indexing fields: =========================================== actor director genre title year ===========================================

AWS Management Console


When you make changes that require your domain's index to be rebuilt, the status shown on the domain dashboard changes to NEEDS INDEXING. The console also displays a message at the top of the configuration pages prompting you to run indexing when you are done making changes.

To run indexing
1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain that needs indexing. On the Domain Dashboard, click the Run Indexing button.

4.

Click OK in the Starting Indexing dialog box to return to the domain dashboard.

API Version 2011-02-01 78

Amazon CloudSearch Developer Guide API

API
You use the IndexDocuments (p. 143) configuration action to initiate indexing. For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=IndexDocuments &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T22:41:07.764Z &X-Amz-SignedHeaders=host &X-Amz-Signature=cf2f7663cc7c80901474f889ab9b1b8e65deea5be1e2c527319bc8e1 6859d7a4

Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).

API Version 2011-02-01 79

Amazon CloudSearch Developer Guide

Searching Your Data with Amazon CloudSearch


Topics Submitting Search Requests in Amazon CloudSearch (p. 81) Searching Text Fields in Amazon CloudSearch (p. 82) Searching Literal Fields in Amazon CloudSearch (p. 86) Searching Uint Fields in Amazon CloudSearch (p. 87) Constructing Boolean Search Queries in Amazon CloudSearch (p. 88) Controlling How Search Results are Returned in Amazon CloudSearch (p. 89) Getting and Using Facet Information in Amazon CloudSearch (p. 91) When you submit a search request, each document in your search domain is examined to see if it matches the query constraints specified in the request. The matching documents (search hits) are then sorted according to the ranking preferences specified in the request. If no ranking preferences are specified, the matching documents are sorted using their text_relevance scores. The text_relevance scores are in the range 0-1000. The rank is set to -text_relevance to list the documents in descending order with the highest-scoring documents first. Query constraints can be specified in a number of ways. In the simplest case, a text string provided by the user is used as a query constraint to find documents that contain the specified text in the default search field. For example, you could specify the query constraint q=star+wars to retrieve a list of the documents that contain the terms "star" and "wars" in any text field:
http://search-domainname-domainid.us-east-1.cloudsearch.amazonaws.com/2011-0201/search?q=star+wars

You can specify additional constraints to search specific fields and use Boolean logic to construct more complex queries. When you search text fields, Amazon CloudSearch finds all documents that contain the search terms anywhere within the specified field. If the field being searched is a literal field, the field contents must exactly match the search string to be returned in results. You can search uint fields for a particular value or a range of values. In your search requests, you can also specify how you want Amazon CloudSearch to rank and return the search hits. Instead of using the default text_relevance scores, you can rank hits alphabetically, numerically, or according to your own custom rank expressions.
API Version 2011-02-01 80

Amazon CloudSearch Developer Guide Submitting Search Requests

When you submit a search request, Amazon CloudSearch returns a response that specifies how the results were ranked, the match expression that was derived from your query constraints, and a collection of hits that represents the documents that match the query constraints. For example:
{ "rank":"-text_relevance", "match-expr":"(label 'star wars')", "hits":{ "found":7, "start":0, "hit":[ {"id":"tt1185834"}, {"id":"tt0076759"}, {"id":"tt0121765"}, {"id":"tt0080684"}, {"id":"tt0086190"}, {"id":"tt0120915"}, {"id":"tt0121766"}] }, "info":{ "rid":"b7c167f6c2da6d93ecb53d18230cbc27146c9356f9c643ec9dec53e707b9af87f27b24b2f4b636a9", "time-ms":4, "cpu-time-ms":0 } }

foundspecifies the total number of documents that matched the query. startspecifies the offset of the first hit included in the response. idspecifies the unique document ID of an individual hit. By default, a search response contains the IDs of the first 10 ranked hits. You can retrieve additional information for each hit by specifying which result enabled fields should be included in the response. You can also control how many hits are returned at a time. When you want to page through a large set of matching documents, you can specify the offset of the first hit that you want to retrieve. For more information, see Controlling How Search Results are Returned in Amazon CloudSearch (p. 89).

Submitting Search Requests in Amazon CloudSearch


You submit search requests to your domain's search endpoint via HTTP GET. To construct a search request, you append the Amazon CloudSearch API version and the name of the resource you are accessing, 2011-02-01/search, and a query string that specifies the terms and constraints for your search and what you want to get back in the response. The maximum size of a search request is 8190 bytes, including the HTTP method, URI, and protocol version. For example, the following request performs a simple text search of the search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com domain and gets the contents of the title field:

API Version 2011-02-01 81

Amazon CloudSearch Developer Guide Searching Text Fields

http://search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch. amazonaws.com/2011-02-01/search?q=star+wars&return-fields=title

Note
The API version must be specified in all search requests. When there are updates to the Search API, you access them using a new API version. The query string in a search request must be URL-encoded. You can use any method you want to send GET requests to your domain's search endpointyou can enter the request URL directly in a Web browser, use cURL to submit the request, or generate an HTTP call using your favorite HTTP library. By default, Amazon CloudSearch returns the response in JSON. You can also get the results formatted in XML by specifying the results-type parameter, results-type=xml.

Note
You can also use the Search Tester in the Amazon CloudSearch console to search your data, browse the results, and view the generated request URLs and JSON and XML responses. For more information, see Searching with the Search Tester (p. 14).

Searching Text Fields in Amazon CloudSearch


Amazon CloudSearch provides two ways to perform free text searches: You can use the q parameter (p. 82) to search the default search field for one or more terms. By default, this searches all text fields configured for the domain. You can use the bq parameter (p. 83) to search one or more text fields. When you search text fields, Amazon CloudSearch finds all documents that contain the search terms anywhere within the specified field, in any order. For example, in the sample movie data, the title field is configured as a text field. If you search the title field for "star", you will find all of the movies that contain star anywhere in the title field, such as star, star wars, and a star is born. This differs from searching a literal field, where the field value must be identical to the search string to be considered a match. When searching text fields, you can: Use Boolean operators (p. 84) when specifying the terms you are searching for. Amazon CloudSearch supports three operators in text searches: - (NOT), | (OR), and + (AND). Use the wildcard operator (p. 85) to perform prefix searches. Amazon CloudSearch only supports the wildcard operator *, which matches zero or more characters at the end of the specified term. Use quotes to search for phrases (p. 86).

Searching Text Fields with the Query Parameter in Amazon CloudSearch


The query parameter, q, provides an easy way to search the default search field for one or more terms. When you create a domain, the default search field is configured to include all text fields in the index. You can use the UpdateDefaultSearchField (p. 144) configuration action to configure your own default search field.

API Version 2011-02-01 82

Amazon CloudSearch Developer Guide Searching Text Fields with the Boolean Query Parameter

By default, documents must contain all of the terms you specify to be considered a match. Unlike literal fields, the terms can occur anywhere within the text field, in any order. You can prefix a term with the (NOT) operator to exclude all results that include that term. Similarly, you can separate terms with the | (OR) operator if you want to match documents that contain any of the specified terms. For more information, see Using Boolean Operators in Amazon CloudSearch Text Searches (p. 84). To search for a phrase rather than individual terms, enclose the phrase in double quotes. For more information, see Searching for Phrases in Text Fields in Amazon CloudSearch (p. 86). For example, to search the default search field for star wars, specify q=star+wars in the query string:
https://search-domainname-domainid.us-east-1.cloudsearch.amazonaws.com/2011-0201/search?q=star+wars

The following example shows the default JSON response:


{ "rank":"-text_relevance", "match-expr":"(label 'star wars')", "hits":{ "found":7, "start":0, "hit":[ {"id":"tt1185834"}, {"id":"tt0076759"}, {"id":"tt0121765"}, {"id":"tt0080684"}, {"id":"tt0086190"}, {"id":"tt0120915"}, {"id":"tt0121766"}] }, "info":{ "rid":"b7c167f6c2da6d93ecb53d18230cbc27146c9356f9c643ec9dec53e707b9af87f27b24b2f4b636a9", "time-ms":4, "cpu-time-ms":0 } }

Searching Text Fields with the Boolean Query Parameter in Amazon CloudSearch
The Boolean query parameter, bq, provides a rich expression language for fine-grained control over document matching. You can search within particular fields and combine expressions with the and, or, and not prefix operators. In addition to searching text fields, you can use the bq parameter to search literal and uint fields. If you don't specify any fields when using the bq parameter, the default search field is used, just like with the q parameter. For example, the following queries produce the same results:
search?bq='star' search?q=star

API Version 2011-02-01 83

Amazon CloudSearch Developer Guide Using Boolean Operators in Text Searches

When constructing queries with bq, you must enclose the search terms within single quotes. By default, documents must contain all of the terms you specify to be considered a match. When you search text fields, the terms can occur anywhere within the text field, in any order. You can prefix a term with the - (NOT) operator to exclude all results that include that word. Similarly, you can separate terms with the | (OR) operator if you want to match documents that contain any of the specified terms. For more information, see Using Boolean Operators in Amazon CloudSearch Text Searches (p. 84). To search for a phrase rather than individual terms, enclose the phrase in double quotes. For more information, see Searching for Phrases in Text Fields in Amazon CloudSearch (p. 86). To search a particular text field, prefix the search terms with the name of the field you want to search, followed by a colon. For example:
search?bq=title:'star'

This searches the title field of each document and matches all documents whose titles contain the term star. In addition to searching text fields, the bq parameter can be used to search specific literal (p. 86) and uint (p. 87) fields. To combine matches against multiple fields, you can use the Boolean operators and, or, and not. For more information about constructing Boolean queries, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).

Note
You can only search literal fields that are search-enabled in your domain's configuration. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).

Using Boolean Operators in Amazon CloudSearch Text Searches


When searching text fields with either the q or bq parameter, you can use the Boolean operators + (AND), | (OR), and - (NOT). These shortcuts only work for text searches. To create Boolean queries that search uint and literal fields, you need to use the Boolean query syntax described in Constructing Boolean Search Queries in Amazon CloudSearch (p. 88). If you separate search terms with + or a space, Amazon CloudSearch matches documents that contain all of the specified search termsthey are ANDed together.You can use the | (OR) operator to separate terms when you want to match documents that contain either the preceding term(s) or the following term(s). To exclude documents that contain a particular term from the search results, prefix the term with the (NOT) operator. For example, to search for all of the documents that don't contain the term star in the default search field, you would specify: search?q=-star. The NOT operator only applies to individual terms. Searching for search?q=-star+wars retrieves all documents that do not contain the term star, but do contain the term wars.

Note
To retrieve all of the documents in your domain, you can prefix a term that you know doesn't exist in your domain's data with the NOT operator, for example -1234567. However, keep in mind that this is a resource-intensive operation if you have a large dataset and might be subject to timeouts. For example, when searching the sample movie data: search?q=star|wars matches movies that contain either star or wars in the default search field.
API Version 2011-02-01 84

Amazon CloudSearch Developer Guide Using Wildcards in Text Searches

search?bq=title:'story funny|underdog' matches movies that contain both the terms story and funny or the term underdog in the title field. search?bq=title:'red|white|blue' matches movies that contain either red, white, or blue in the title field. search?bq=actor:'"evans, chris"|"Garity, Troy"' matches movies that contain either the phrase evans, chris or the phrase Garity, Troy in the actor field. search?bq='title:-star+war|world' matches movies whose titles do not contain star, but do contain either war or world. You can also use the Boolean operators when constructing queries using the full Boolean query syntax. For example, search?bq=(and director:'Lucas|Spielberg' (not actor:'"Ford, Harrison"')) matches movies that either Lucas or Speilberg directed, but did not star Harrison Ford. For more information about the Boolean query syntax, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).

Using Wildcards in Amazon CloudSearchText Searches


You can use the * (asterisk) wildcard operator to perform prefix matching. The * operator only applies to individual terms. When you append the * operator to a string, the string is treated as a prefix. Amazon CloudSearch matches results that contain the prefix followed by zero or more characters. If you're searching a text field, the matched prefix can occur anywhere within the contents of the field. You can also use the wildcard operator to perform "starts with" searches in literal fields. For more information, see Using Wildcards in Literal Searches in Amazon CloudSearch (p. 87).

Note
When performing wildcard searches on text fields, keep in mind that Amazon CloudSearch tokenizes the text fields during indexing and peforms basic text processing such as removing the trailing s from plural terms. Normally, the same text processing is performed on the search query. However, when you use the wildcard operator, no text processing is performed on the prefix. This means that a search for a prefix that ends in "s" won't match the singular version of the term. This can happen for any term that ends in s, not just plurals. For example, if you search the actor field in the sample movie data for "Gillanders", there are three matching movies. If you search for "Gillander*", you get the same three movies. However, if you search for "Gillanders*" there are no matches. This is because the term is stored in the index as "Gillander", "Gillanders" does not appear in the index. For example, the following Boolean query searches the title field for the prefix star:
search?bq=title:'star*'&return-fields=title

If you perform this search against the sample movie data, the response will contain movies such as Stargate, Dark Star, and Starsky & Hutch:
{"rank":"-text_relevance", "match-expr":"(label title:'star*')", "hits":{"found":34,"start":0, "hit":[ {"id":"tt1408101","data":{"title":["Untitled Star Trek Sequel"]}}, {"id":"tt0111282","data":{"title":["Stargate"]}}, {"id":"tt0335438","data":{"title":["Starsky & Hutch"]}}, {"id":"tt0477095","data":{"title":["Starter for 10"]}},

API Version 2011-02-01 85

Amazon CloudSearch Developer Guide Searching for Phrases in Text Fields

{"id":"tt1185834","data":{"title":["Star Wars: The Clone Wars"]}}, {"id":"tt0069945","data":{"title":["Dark Star"]}}, {"id":"tt0088172","data":{"title":["Starman"]}}, {"id":"tt0844760","data":{"title":["Starship Troopers 3: Marauder"]}}, {"id":"tt0092007","data":{"title":["Star Trek IV: The Voyage Home"]}}, {"id":"tt0098382","data":{"title":["Star Trek V: The Final Frontier"]}} ] }, "info":{ "rid":"8a0620f6c72ff3e73c2a10e59f186fa89ba1fa67e3b160548fb2c7aa91bce7aeb dc0b87198cf138a", "time-ms":3, "cpu-time-ms":0 }}

Searching for Phrases in Text Fields in Amazon CloudSearch


You can enclose a phrase in double quotes to match the complete phrase rather than the individual terms in the phrase. You can perform phrase searches with either the q or bq parameter. For example, the following queries produce the same results:
search?q="with love" search?bq='"with love"'

If you perform this search against the sample movie data, you'll notice that the results for the phrase search contain one less hit than a simple search for the terms with love:
{"rank":"-text_relevance", "match-expr":"(label '\"with love\"')", "hits":{ "found":4, "start":0, "hit":[ {"id":"tt0062376"}, {"id":"tt0309530"}, {"id":"tt1179034"}, {"id":"tt0057076"} ] }, "info":{"rid":"7508c2e52f5c3c25eca625c994c1351ed8fed385d15bffaf9dd32aae31644e 939b8656dcd8c96d09","time-ms":2,"cpu-time-ms":0} }

Searching Literal Fields in Amazon CloudSearch


When you search a literal field, Amazon CloudSearch returns only those documents that contain an exact match for the complete search string in the specified field. For example, if the title field is configured as a literal field and you search for "star", the value of the title field must be star to be considered a matchstar wars and a star is born will not be included in the search results. This differs from text fields, where the specified search terms can appear anywhere within the field in any order.
API Version 2011-02-01 86

Amazon CloudSearch Developer Guide Searching Literal Fields with the Boolean Query Parameter

Literal fields are often used in conjunction with faceting to enable users to drill down into the results according to the faceted attributes. For more information about faceting, see Getting and Using Facet Information in Amazon CloudSearch (p. 91).

Searching Literal Fields with the Boolean Query Parameter in Amazon CloudSearch
To search literal fields, you must use the Boolean Query parameter, bq. To search a literal field, prefix the search string with the name of the literal field you want to search, followed by a colon. The search string must be enclosed in single quotes. For example:
search?bq=genre:'sci-fi'

This searches the genre field of each document and matches all documents whose genre field contains the value sci-fi. To be a match, the field value must be an exact match for the search string. For example, documents that contain the value young adult sci-fi in the genre field will not be included in the search results when you search for "sci-fi". In addition to searching literal fields, the bq parameter can be used to search specific text (p. 82) and uint (p. 87) fields. To combine matches against multiple fields, you can use the Boolean operators and, or, and not. For more information about constructing Boolean queries, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).

Note
You can only search literal fields that are search-enabled in your domain's configuration. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).

Using Wildcards in Literal Searches in Amazon CloudSearch


When searching literal fields, you can use the wildcard operator to find values that start with a particular string. For example, the genre field in the sample movie data is a literal field. If you search the genre field for "fi*", it will match all of the movies in the film-noir genre, but not the movies in the sci-fi genre. To be a match, the entire string up to the wildcard operator must match exactly.

Searching Uint Fields in Amazon CloudSearch


You can search uint fields for a particular value or a range of values. Uint fields are always search enabled.

Searching Uint Fields with the Boolean Query Parameter in Amazon CloudSearch
To search uint fields, you must use the Boolean Query parameter, bq. To search a uint field, prefix the value or range of values you want to search with the name of the uint field, followed by a colon. The integer value or range is not enclosed in single quotes. In addition to searching uint fields, you can use the bq parameter to search specific text (p. 82) and literal (p. 86) fields. To combine matches against multiple fields, you can use the Boolean operators and,

API Version 2011-02-01 87

Amazon CloudSearch Developer Guide Constructing Boolean Search Queries

or, and not. For more information about constructing Boolean queries, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).

Searching for an Integer Value in Amazon CloudSearch


The syntax for searching a uint field for a particular value is fieldname:integer. For example, to search the sample movie data for movies released in 2010, you would specify:
search?bq=year:2010

Searching for a Range of Values in Amazon CloudSearch


The syntax for searching a uint field for a range of values is <start>..<end>. The start and ending values of the range are included. For example, to search the sample data set for movies released from 2008 to 2010, you would specify the range as 2008..2010:
search?bq=year:2008..2010

Ranges can be open ended. For example, you could specify year:2002.. to find all matching movies released from 2002 onward, or ..1970 to find all the movies released through 1970:
search?bq=year:2002.. search?bq=year:..1970

Constructing Boolean Search Queries in Amazon CloudSearch


The bq parameter enables you to combine matches against fields using the Boolean operators and, or, and not. When constructing Boolean search queries, you use parentheses to control the order of evaluation of the expression. When part of an expression is enclosed in parentheses, that part is evaluated first. The resulting value is used in the evaluation of the remainder of the expression. At a minimum, the entire expression must be enclosed in a single set of parentheses. For example, to search the title field for matches that either contain the string "star" or do not contain the string "wars":
search?bq=(or title:'star' (not title:'wars'))

You can use and, or, and not at the field level, and still use the - and | operators within the match expressions. For example, the following queries produce the same results:
search?bq=(or title:'star' title:'-wars') search?bq=(or title:'star' (not title:'wars'))

For more information about using Boolean operators in match expressions, see Using Boolean Operators in Amazon CloudSearch Text Searches (p. 84). You can construct Boolean search queries to combine searches against multiple fields. For example:

API Version 2011-02-01 88

Amazon CloudSearch Developer Guide Controlling Search Results

search?bq=(and title:'star' genre:'drama')

Note
If you don't get the results you expect from a search request, check the match-expr in the response to see how Amazon CloudSearch parsed the match expression specified in the bq parameter.

Controlling How Search Results are Returned in Amazon CloudSearch


You can specify query parameters in your search request to: Get results as XML (p. 89) Paginate the results (p. 90) Retrieve field values (p. 90) Sort the results (p. 91)

Getting Results as XML in Amazon CloudSearch


By default, Amazon CloudSearch search responses are formatted in JSON. To get results as XML, specify the query parameter results-type=xml in your search request:
search?q=star+wars&results-type=xml

Search responses formatted in XML contain exactly the same information as a JSON response:
<results> <rank>-text_relevance</rank> <match-expr>(label 'star wars')</match-expr> <hits found="7" start="0"> <hit id="tt1185834"/> <hit id="tt0076759"/> <hit id="tt0121765"/> <hit id="tt0080684"/> <hit id="tt0086190"/> <hit id="tt0120915"/> <hit id="tt0121766"/> </hits> <facets/> <info rid="b7c167f6c2da6d93501039ad23f00811361e4acf6ca09ec98ae60af47463dfe4 ce2e5565e736aa1f" time-ms="3" cpu-time-ms="0"/> </results>

For detailed information about the JSON and XML response formats for search requests, see Search Response (p. 190).

API Version 2011-02-01 89

Amazon CloudSearch Developer Guide Paginating Results

Paginating Results in Amazon CloudSearch


By default, Amazon CloudSearch returns the top ten hits according to the specified ranking. To control the number of hits returned in a result set, you use the size parameter. To request the next set of hits beginning from a particular offset, you use the start parameter. Note that the result set is zero-basedthe first result is at index 0. For example, search?q=-star returns the first 10 hits that don't contain star in the default search field, starting at index 0. To get the next set of ten hits, set the start parameter to 10:
search?q=-star&start=10

If you want to retrieve 25 hits at a time, set the size parameter to 25. To get the first set of hits, you don't have to set the start parameter:
search?q=-star&size=25

For subsequent requests, use the start parameter to retrieve the set of hits you want. For example, to get the third batch of 25 hits specify:
search?q=-star&size=25&start=50

Retrieving Data from Index Fields in Amazon CloudSearch


By default, searches only return the IDs of the documents that match the search constraints. To include additional information, you can use the return-fields parameter to specify which index fields to include in the results. Integer fields (uint) can always be returned in results. However, only text and literal fields that are result enabled in the domain configuration can be returned. You can also specify the default text_relevance score as a return field. You can retrieve up to 2 KB of source data from an index field. All of the source data is indexed, but only the first 2 KB of data can be returned.

Note
Making fields result enabled increases the size of your index, which can increase the cost of running your domain. You should only store document data in the search index by making fields result-enabled when it's difficult or costly to retrieve the data using other means. Since it can take some time to apply document updates across the domain, critical data such as pricing information should be retrieved using the returned document IDs instead of returned from the index. To retrieve source data for result-enabled fields, you specify the return-fields parameter in the query string. You can specify a single return field, or up to 10 fields as a comma-separated list. For example, to include the actor, title, and default text_relevance score in the search results:
search?q=star+wars&return-fields=actor,title,text_relevance

The specified fields will be included for each hit:

API Version 2011-02-01 90

Amazon CloudSearch Developer Guide Sorting Results

{ "id":"tt1185834", "data":{ "actor":["Abercrombie, Ian","Baker, Dee Bradley","Burton, Corey", "Eckstein, Ashley","Futterman, Nika","Kane, Tom", "Lanter, Matt","Taber, Catherine","Taylor, James Arnold", "Wood, Matthew"], "text_relevance":["308"], "title":["Star Wars: The Clone Wars"] } }

Sorting Results in Amazon CloudSearch


By default, results are sorted according to their text_relevance scores, with the highest-scoring documents listed first. You can use the rank parameter in your search requests to sort results alphabetically, numerically, or using your own custom rank expressions. You can use any result-enabled text or literal field to sort results alphabetically. For example, rank=actor is specified in the following query to sort the results alphabetically by actor:
search?q=star+wars&return-fields=title&rank=title

By default, results are listed in an ascending order. To sort in descending order, prefix the field name with - (minus sign):
search?q=star+wars&rank=-title

You can use any uint field to sort results numerically. For example, specifying rank=-year will sort the results by year with the most recent year listed first:
search?q=star+wars&return-fields=title,year&rank=-year

Note
If you don't specify the rank option, it is set to -text_relevance by default so the highest-scoring documents are listed first. You can also define custom rank expressions and use them to sort results. For more information about creating and using your own rank expressions, see Customizing Result Ranking with Amazon CloudSearch (p. 98).

Getting and Using Facet Information in Amazon CloudSearch


Topics Getting Facet Information for Text and Literal Fields in Amazon CloudSearch (p. 92) Getting Facet Information for Uint Fields in Amazon CloudSearch (p. 92) Getting Facet Information for Particular Values in Amazon CloudSearch (p. 93) Sorting Facet Information in Amazon CloudSearch (p. 94)
API Version 2011-02-01 91

Amazon CloudSearch Developer Guide Getting Facet Information for Text and Literal Fields

Using Facet Information in Amazon CloudSearch (p. 95) A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a particular field. You can display this information along with the search results and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.) You can get facet information for any uint field and facet-enabled text and literal fields by specifying the facet parameter in your search request. Amazon CloudSearch also provides search parameters that enable you to control how facet values are returned and sorted. You can select which facets to retrieve, limit the number of facet values returned, and control the sorting of the facet values for each field.

Getting Facet Information for Text and Literal Fields in Amazon CloudSearch
When you request facet information for a text or literal field, Amazon CloudSearch returns facet counts for the top 40 values in the specified field. You can include the facet-FIELD-top-n parameter to limit the number of facet values that are returned for a particular field.

Note
To get facet information for a text or literal field, the field must be configured to enable faceting. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). For example, the following request gets facet counts for the top five most-frequently-occurring values in the genre field:
search?bq=title:'star'&facet=genre&facet-genre-top-n=5

The response includes the returns the facet information after the list of hits.
"facets":{ "genre":{"constraints":[ {"value":"Sci-Fi","count":20}, {"value":"Action","count":18}, {"value":"Adventure","count":16}, {"value":"Thriller","count":10}, {"value":"Fantasy","count":5} ] }

Getting Facet Information for Uint Fields in Amazon CloudSearch


When you request facet information for a uint field, Amazon CloudSearch returns the min and max values for the field. For example, when you specify facet=year, you get the first and last year that appears in the year field:
"facets":{"year":{"min":1974,"max":2012}}

API Version 2011-02-01 92

Amazon CloudSearch Developer Guide Getting Facet Information for Particular Values

To drill down into particular bins of integers, you use the facet-FIELD-constraints parameter. For more information, see Getting Facet Information for Particular Values in Amazon CloudSearch (p. 93).

Getting Facet Information for Particular Values in Amazon CloudSearch


The facet-FIELD-constraints parameter controls which facet values are returned for the specified facet. You specify the facet values you want to count as a comma-separated list. The values must be enclosed within single quotes. Note that the facet values are case sensitive: facet-genre-constraints='drama' is not the same as facet-genre-constraints='Drama'.

Note
If commas occur in a facet value you want to use as a constraint, the comma must be escaped with a backslash. For example, facet-actor-constraints='Bai\, Ling','Bryant\, Gene'. For example, to find out how many documents have Drama or Sci-Fi in the genre field, you'd set facet-genre-constraints='Drama','Sci-Fi':
search?q=star&facet=genre&facet-genre-constraints='Drama','Sci-Fi'

In the response, the counts are only shown for the specified constraints:
facets":{"genre": {"constraints":[ {"value":"Sci-Fi","count":20}, {"value":"Drama","count":4} ]} }

The facet-FIELD-constraints parameter can also be used with uint fields.You can specify individual values, as well as ranges of values, which enables you to do range-based binning. You can use the min and max values returned when you don't specify any constraints to calculate the ranges, and then get facet counts for each of those ranges with a subsequent search. The values and ranges are specified as a comma-separated list. For example, the following request gets facet counts for documents with a year value of 2000, 2001, 2002 through 2004, and all documents with year greater than or equal to 2005:
search?q=star&facet=year&facet-year-constraints=2000,2001,2002..2004,2005..

By default, the response shows the constraints with the highest counts first:
"facets":{ "year":{"min":1970,"max":2012, "constraints":[ {"value":"2005..","count":8}, {"value":"2002..2004","count":2}, {"value":"2001","count":1} ] } }

API Version 2011-02-01 93

Amazon CloudSearch Developer Guide Sorting Facet Information

Sorting Facet Information in Amazon CloudSearch


You can use the facet-FIELD-sort parameter to control how the facet information is sorted in the search results. Amazon CloudSearch supports four sorting options: alphasort the facet values alphabetically. The facet values are always sorted ascending order when using the alpha option. countsort the facet values by their counts. The facet values are always sorted in descending order when using the count option. maxsort the facet values according to the maximum values in the specified field.This option is specified as max(FIELD). By default, the facet values are sorted in ascending order. To sort in descending order, prefix the max option with a - (minus sign): -max(FIELD). sumsort the facet values according to the sum of the values in the specified field (in ascending order). This option is specified as sum(FIELD). By default, the facet values are sorted in ascending order. To sort in descending order, prefix the sum option with a - (minus sign): -sum(FIELD). By default, facet information is sorted by facet counts. The - (minus) prefix cannot be used to reverse the sort order when using the alpha or count options.

To sort values for a facet field alphabetically


Specify facet-FIELD-sort=alpha:
search?bq=title:'star'&facet=genre&facet-genre-sort=alpha

To sort values for a facet field using the value of a uint field or rank expression
Specify facet-FIELD-sort=max(FIELD). When you use the max option, the score used for sorting is the maximum value in the specified field across all matching documents with that facet value. By default, the values are sorted in ascending order. You can prefix the max option with a - (minus sign) to reverse the order. For example, you could use the default text_relevance score to sort the facet values. In the following request, the facet value that has the matching document with the highest text_relevance score is listed first:
search?bq=title:'star'&facet=genre&facet-genre-sort=-max(text_relevance)

The maximum text_relevance score for each facet value is displayed in the facet information:
"facets": {"genre": {"constraints":[ {"value":"Action","count":18,"score":288}, {"value":"Adventure","count":16,"score":288}, {"value":"Sci-Fi","count":20,"score":288}, {"value":"Animation","count":1,"score":282}, {"value":"Comedy","count":4,"score":282}, {"value":"Thriller","count":10,"score":282}, {"value":"Biography","count":1,"score":276}, {"value":"Drama","count":3,"score":276}, {"value":"Romance","count":1,"score":276},

API Version 2011-02-01 94

Amazon CloudSearch Developer Guide Using Facet Information

{"value":"Mystery","count":3,"score":274}, {"value":"Music","count":1,"score":272}, {"value":"Fantasy","count":5,"score":271}, {"value":"Family","count":3,"score":270} ] } }

To sum the values in a field and use the resulting score to sort the facet values
Specify facet-FIELD-sort=sum(FIELD) . When you use the sum option, the score used for sorting is the sum of the values in the specified field for all matching documents with that facet value. By default, the values are listed in ascending order. For example:
search?bq='state'&facet=chief&facet-chief-sort=sum(majvotes)

The sum is displayed in the facet information as the score for the facet value:
facets": { "chief": { "constraints: [ {"value": "Roberts","count": 116,"score": 869}, ... {"value": "Warren",count": 712,"score": 4932} ] } }

Note
You can prefix the sum option with a - (minus sign) to list the values in descending order.

Using Facet Information in Amazon CloudSearch


You can display facet information to enable users to more easily browse search results and zero in on the information they are interested in. For example, if a user is trying to find one of the Star Trek movies, but can't remember the full title, he might start by searching for "star". If you want to display top facets for actor and genre, you would specify those facets in the query, along with the number of facet values you want to retrieve for each facet:
search?q=star&facet=actor,genre&facet-actor-top-n=10&facet-genre-topn=5&size=5&results-type=xml

This gives you the following information in the search response:


<results xmlns="http://cloudsearch.amazonaws.com/ 2011-02-01/results"> <rank>-text_relevance</rank> <match-expr>(label 'star')</match-expr> <hits found="26" start="0"> <hit id="tt1408101"/>

API Version 2011-02-01 95

Amazon CloudSearch Developer Guide Using Facet Information

<hit id="tt0069945"/> <hit id="tt1185834"/> <hit id="tt0092007"/> <hit id="tt0098382"/> </hits> <facets> <facet name="actor"/> <facet name="genre"> <constraint value="Sci-Fi" count="20"/> <constraint value="Action" count="18"/> <constraint value="Adventure" count="17"/> <constraint value="Thriller" count="10"/> <constraint value="Fantasy" count="5"/> </facet> </facets> <info rid="3c5a461d28b76874a756e4d419a38646955da47864afeeef172add882f 712bb0b7c9e486627e07e2" time-ms="3" cpu-time-ms="0"/> </results>

Using the document ids, you can retrieve the data you want to display for each hit from a separate system. By displaying the facet information, you can provide a way for the user to zero on in the movie he's looking for. For example, he might click "William Shatner" in the list of actors to see the subset of movies that William Shatner appeared in. To retrieve the subset, you can use the bq search parameter to perform a fielded search against the actor field and find the matches that contain star in any text field and William Shatner in the actor field.

Note
In this example, both the actor and genre fields have configured as facets. If you want to try out these queries with the sample imdb-movie data, you'll need to modify your movie domain's indexing options to configure the actor field as a facet. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
search?bq=(and 'star' actor:'William Shatner')&facet=actor,genre &facet-actor-top-n=10&facet-genre-top-n=5&size=5 &results-type=xml

This retrieves the subset of hits along with the actor and genre facet information:
<results> <rank>-text_relevance</rank> <match-expr>(and 'star' actor:'William Shatner')</match-expr> <hits found="6" start="0"> <hit id="tt0092007"/> <hit id="tt0098382"/> <hit id="tt0088170"/> <hit id="tt0079945"/> <hit id="tt0084726"/> </hits> <facets> <facet name="actor"> <constraint value="Doohan, James" count="6"/> <constraint value="Kelley, DeForest" count="6"/> <constraint value="Koenig, Walter" count="6"/> <constraint value="Nichols, Nichelle" count="6"/> <constraint value="Nimoy, Leonard" count="6"/>

API Version 2011-02-01 96

Amazon CloudSearch Developer Guide Using Facet Information

<constraint value="Shatner, William" count="6"/> <constraint value="Takei, George" count="6"/> <constraint value="Butrick, Merritt" count="2"/> <constraint value="Lenard, Mark" count="2"/> <constraint value="Adamson, Joseph" count="1"/> </facet> <facet name="genre"> <constraint value="Sci-Fi" count="6"/> <constraint value="Action" count="5"/> <constraint value="Adventure" count="5"/> <constraint value="Thriller" count="4"/> <constraint value="Mystery" count="2"/> </facet> </facets> <info rid="ccd66a5219f938d2d27598352059d8c34094e7b0695b7c51dc91631555cb382dc17ef8064dbc9fdd" time-ms="3" cpu-time-ms="0"/> </results>

At this point, the user might remember that the movie he's trying to find also had Joseph Adamson in it and click on Joseph Adamson in the actor list. Again, you would use his selection to further refine the query:
search?bq=(and 'star' actor:'William Shatner' actor:'Adamson, Joseph') &return-fields=title&facet=actor,genre&facet-actor-top-n=10 &facet-genre-top-n=5&size=5&results-type=xml

Now, there's just a single match that you can display to the user Star Trek IV: The Voyage Home:
<results> <rank>-text_relevance</rank> <match-expr>(and 'star' actor:'William Shatner' actor:'Adamson, Joseph')</match-expr> <hits found="1" start="0"> <hit id="tt0092007"> <d name="title">Star Trek IV: The Voyage Home</d> </hit> </hits> <facets> ... </facets> </results>

API Version 2011-02-01 97

Amazon CloudSearch Developer Guide Configuring Rank Expressions

Customizing Result Ranking with Amazon CloudSearch


Topics Configuring Rank Expressions in Amazon CloudSearch (p. 98) Ranking Search Results in Amazon CloudSearch (p. 102) Constraining Search Results in Amazon CloudSearch (p. 102) By default, search results are ranked according to their relevance to the search request. A document's default text_relevance score takes into account the proximity of the search terms and the frequency of those terms within the document compared to how common the term is across all documents in the domain. To change how search results are ranked, you can: Use any text or literal field to sort results alphabetically (p. 91). Use any uint field to sort results numerically (p. 91). Use a custom rank expression to rank results. To define a rank expression, you construct a numeric expression using uint fields, other rank expressions, a document's default text_relevance score, and standard numeric operators and functions. To use a rank expression to customize result ranking, you Use the rank option in your search requests to specify the rank expression you want to use to order the results. You can also use rank expressions in your search requests to set thresholds for search results through the t-FIELD option.

Configuring Rank Expressions in Amazon CloudSearch


A rank expression is a numeric expression that you can construct using uint fields, other rank expressions, a document's default text_relevance score, and standard numeric operators and functions. The rank expression syntax is based on JavaScript expressions and supports: Integer, floating point, hex and octal literals

API Version 2011-02-01 98

Amazon CloudSearch Developer Guide Command Line Tools

Arithmetic operators: + - * / % Bitwise operators: | & ^ ~ << >> >>> Boolean operators (including the ternary operator): && || ! ?: Comparison operators: < <= = >= > Common mathematic functions: abs ceil erf exp floor lgamma ln log2 log10 max min sqrt pow Trigonometric library functions: acosh acos asinh asin atanh atan cosh cos sinh sin tanh tan Miscellaneous functions: rand, time, min, max JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operatorsthe second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression a || b, b is only evaluated if a is not true. Rank expressions always return an integer value from 0 to the maximum unsigned 32-bit integer value. If the expression is invalid or evaluates to a negative value, it returns 0. If the expression evaluates to a value greater than the maximum, it returns the maximum value. Intermediate results are calculated as double-precision floating point values and the return value is rounded to the nearest integer. Rank expression names must begin with a letter and be at least 3 and no more than 64 characters long. The following characters are allowed: a-z (lower-case letters), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be specified as rank expression names. For example, if you define a uint field named popularity for your domain, you could use that field in conjunction with the default text_relevance score to construct a custom rank expression. The following expression bases 30% of a document's rank score on the popularity field, and 70% of its rank score on its default text_relevance score. (The default text_relevance score is in the range 0-1000, the popularity field is assumed to have values in the range 0-10000, and the expression returns a value in the range 0-1000.)
((0.3*popularity)/10.0)+(0.7*text_relevance)

For more information about using rank expressions to sort search results, see Sorting Results in Amazon CloudSearch (p. 91). In addition to specifying how you want to rank results, the Amazon CloudSearch Search API also enables you to specify threshold constraints. A threshold constraint can be based on the value of a uint field, or on the value of a rank expression. For example, if your documents have an available_on field that specifies a date as an epoch uint value, you could define a rank expression to exclude documents whose available_on value is later than the current time:
(time() > available_on)?1:0

For more information about using rank expressions to constrain search results, see Constraining Search Results in Amazon CloudSearch (p. 102) You can configure rank expressions using the cs-configure-ranking (p. 99) command, from the Amazon CloudSearch console (p. 100), or using the DefineRankExpression (p. 101) configuration action.

Command Line Tools


You use the cs-configure-ranking
(p. 111) command to define rank expressions for a domain.

API Version 2011-02-01 99

Amazon CloudSearch Developer Guide AWS Management Console

To configure a rank expression


Run the cs-configure-ranking command to define a new rank expression. You specify a name for the expression with the --name option, and the numeric expression that you want to evaluate with the --expression option.
cs-configure-ranking --name popularhits --expression '((0.3*popular ity)/10.0)+(0.7*text_relevance)'

AWS Management Console


To configure a rank expression
1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain, and then click the domain's Rank Expressions link. In the Rank Expressions panel, click the Add a New Rank Expression button. The button is below the list of expressions configured for the domain.

4.

Enter a name for the new expression in the Name field.

API Version 2011-02-01 100

Amazon CloudSearch Developer Guide API

5.

Enter the numerical expression you want to evaluate at search time in the Expression field. You can use the insert... menu to insert special values and mathematic and trigonometric functions.

6. 7.

Click Add a New Expression to configure additional rank expressions. Click Submit to save your changes.

API
You use the DefineRankExpression (p. 131) configuration action to specify rank expressions. The name you specify in the RankExpression.RankName option is how you reference the expression in your search requests. You specify the numeric expression that you want to evaluate for each search result in the RankExpression.RankExpression option. For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DefineRankExpression &DomainName=movies &RankExpression.RankExpression=((0.3*year)/10.0)+((0.7*text_relevance)) &RankExpression.RankName=popularhits &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120403/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-03T00:16:02.684Z &X-Amz-SignedHeaders=host &X-Amz-Signature=30205ede7907cf8a3fc41172fc63e323136a083b0967f96196bdea53f60d3cf3

API Version 2011-02-01 101

Amazon CloudSearch Developer Guide Ranking Search Results

Ranking Search Results in Amazon CloudSearch


To use a rank expression to control how search results are ordered, you use the rank option in your search requests to specify the name of the rank expression. You must negate the rank expression name by prepending - (minus) if you want to sort the results in descending order. For example:
search?q=star+wars&return-fields=title&rank=-popularhits

Constraining Search Results in Amazon CloudSearch


To use a rank expression as a threshold constraint for the search results, you use the threshold option in your search requests. The threshold option is specified as t-RANKNAME. For example:
search?q=star+wars&return-fields=title&t-is_available=1

API Version 2011-02-01 102

Amazon CloudSearch Developer Guide Using the Command Line Tools

Amazon CloudSearch Command Line Tool Reference


Topics Using the Command Line Tools for Amazon CloudSearch (p. 103) cs-configure-access-policies (p. 106) cs-configure-fields (p. 109) cs-configure-ranking (p. 111) cs-configure-text-options (p. 113) cs-create-domain (p. 115) cs-configure-from-sdf (p. 116) cs-delete-domain (p. 117) cs-describe-domain (p. 118) cs-index-documents (p. 120) cs-post-sdf (p. 121) Experimental Tools for Amazon CloudSearch (p. 122) This section provides detailed information about the Amazon CloudSearch command line tools. You can also access the reference information for each tool from the command line by specifying the --help option. For example, cs-generate-sdf --help. The Amazon CloudSearch command line tools wrap the configuration and document service APIs to provide a simple way to set up and manage your search domains. Both the command line tools and sample IMDB movie data set are available from the Amazon CloudSearch developer tools page.

Using the Command Line Tools for Amazon CloudSearch


This section describes: Prerequisites (p. 104) for installing the Amazon CloudSearch command line tools. How to download and set up the Amazon CloudSearch command line tools. (p. 104)
API Version 2011-02-01 103

Amazon CloudSearch Developer Guide Prerequisites

How to run the Amazon CloudSearch command line tools. (p. 106)

Prerequisites for Installing the Amazon CloudSearch Command Line Tools


To use the Amazon CloudSearch command line tools, you need: A basic familiarity with working in a Linux/UNIX or Windows environment. A Java 6-compatible Java Runtime Environment (JRE). A JAVA_HOME environment variable that points to your Java runtime. This environment variable should be set to the full path of the directory that contains the bin directory that contains the java (Linux/UNIX) or java.exe (Windows) executable. Your AWS Credentials. To get your AWS credentials, sign in to the AWS Management Console, select Security Credentials from the My Account/Console menu, and click Access Credentials.

Installing the Command Line Tools for Amazon CloudSearch


To install the Amazon CloudSearch command line tools
1. 2. 3. 4. To download the command line tools for Windows, go to https://aws.amazon.com/developertools/4320728073503020 and click the Download button. To download the command line tools for Mac OS/Linux, go to https://aws.amazon.com/developertools/9054800585729911 and click the Download button. Unpack the .zip or .tar.gz file. On Windows, we recommend unzipping the tools in the C:\CloudSearch directory. Set the CS_HOME environment variable to point to the directory where you unpacked the tools. On Linux and UNIX, enter following command:
export CS_HOME=install_directory_path

On Windows, enter the following command:


set CS_HOME=install_directory_path

Note
These examples temporarily set the CS_HOME and PATH variables for the duration of your terminal session. You can also set them permanently. On Linux and MacOSX, add the export commands to your shell startup file (.profile, .bashrc, .tcshrc, or .zshrc) in your home directory. On Windows, you can do this through the Control Panel: Control Panel > System and Security > System > Advanced > Environment Variables. Add the CS_HOME environment variable to your PATH. On Linux and UNIX, enter following command:
export PATH=$PATH:$CS_HOME/bin

5.

API Version 2011-02-01 104

Amazon CloudSearch Developer Guide Installing the Command Line Tools

On Windows, enter the following command:


set PATH=%PATH%;%CS_HOME%\bin

6.

Make sure you have the Java 6 (or later) JRE installed and the JAVA_HOME environment variable is set to the full path of the directory that contains the bin directory in which the Java executable resides. For information about checking your Java installation, go to java.com.

Note
On Mac OS X, JAVA_HOME should be set using the /usr/libexec/java_home command. For example: export JAVA_HOME=$(/usr/libexec/java_home). For more information, see QA1170 on developer.apple.com. 7. Configure the command line tools to use your AWS identifiers. The Amazon CloudSearch command line tools look for your AWS identifiers in a text file on your local system in the location specified by the AWS_CREDENTIAL_FILE environment variable. If you have not already configured an AWS credential file: a. Use a text editor to create a two-line text file that specifies your AWS identifiers. The first line sets the accessKey property and the second line sets the secretKey property. For example:
accessKey=AKIAIOSFODNN7EXAMPLE secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY

b. c. d.

Save the file using any name you want (for example, account-key). Limit the file permissions to only the file owner. (For example, use chmod 600 on the file if you're using Linux/UNIX). Set the AWS_CREDENTIAL_FILE environment variable. On Linux and UNIX, enter following command:
export AWS_CREDENTIAL_FILE=credential_file_path

On Windows, enter the following command:


set AWS_CREDENTIAL_FILE=credential_file_path

8.

To verify that the Amazon CloudSearch tools are configured correctly, run the cs-describe-domain command. (Since you haven't configured any domains yet, the Domain Summary will be empty.)
cs-describe-domain

If you get an error, check the following: If the system cannot find the specified path, your JAVA_HOME environment variable needs to be set to the location where you have the JRE installed. For example, C:\Program Files\Java\jre6. If cs-describe-domain is not recognized as a command, check your PATH and make sure it contains the bin directory for the command line tools, for example /Users/username/CloudSearch/tools/bin.

API Version 2011-02-01 105

Amazon CloudSearch Developer Guide Running the Amazon CloudSearch Commands

If you get an InvalidClientTokenId error, your AWS credentials are not configured correctly. Make sure that you've configured the AWS_CREDENTIAL_FILE environment variable and that your credential file contains valid AWS identifiers.

Running the Amazon CloudSearch Commands


All of the Amazon CloudSearch commands require that you specify your AWS credentials. The easiest way to do that is to set up an AWS credential file and set the AWS_CREDENTIAL_FILE environment variable as described in the installation instructions. You can also explicitly specify your credentials with each request, either by using the --aws-credential-file option to specify the location of your credential file, or by specifying both the --access-key and --secret-key options. For most commands, you must also specify the name of your search domain with the -d or --domain-name option. (The one exception is that you can invoke the cs-describe-domain command without specifying the --domain-name option to list information about all of your Amazon CloudSearch domains.)

cs-configure-access-policies
NAME cs-configure-access-policies - Configure access to an Amazon CloudSearch domain. SYNOPSIS cs-configure-access-policies --service doc|search|all [--allow IP|CIDR|all] [--deny IP|CIDR|all] [--update] [--policy-file FILE] [--delete IP|CIDR] [--force] [--retrieve] COMMON_OPTIONS DESCRIPTION Defines access policies for a domain's document and search endpoints. When a domain is first created, it is configured to deny all access. To access the document or search services through the Amazon CloudSearch Command Line Tools or APIs, you must authorize one or more IP addresses. This command provides two ways for you to update your domain's access policies: --update Add or remove specific permissions from your domain's access policies. Changes are automatically merged with the domain's existing policy document.

--policy-file Upload a policy document to your domain. The uploaded file overwrites the domain's existing policy document.

API Version 2011-02-01 106

Amazon CloudSearch Developer Guide cs-configure-access-policies

When using the --update option, you can specify multiple --allow or --deny options to allow or block multiple IP addresses or address ranges. You must specify one or more --service options to indicate which service endpoints you want to apply the access policies to. Address ranges are specified using Classless Inter-Domain Routing (CIDR) notation with the base IP address followed by a / and a network mask that indicates the number of leftmost bits used to identify the network. If you don't specify a network mask it defaults to 32, which authorizes or blocks only the specified IP address. When using the --policy-file option, the uploaded policy document replaces the domain's existing policy document. The specified file must be a valid AWS Identity and Access Management (IAM) policy document. (You can use the --retrieve mode to get the domain's current policy document.) For information about the IAM Access Policy Language, see http://docs.amazonwebservices.com/IAM/latest/UserGuide/index.html? AccessPolicyLanguage.html. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are querying or configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-c, --aws-credential-file FILE

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

-ve, --verbose -v, --version

UPDATE ACCESS POLICY OPTIONS -al, --allow IP|CIDR Add access privileges for a specific IP address or CIDR block. Specify all to allow access from any IP address. Multiple --allow options can be specified to authorize multiple addresses or address ranges. Used in conjunction with the --update option.

API Version 2011-02-01 107

Amazon CloudSearch Developer Guide cs-configure-access-policies

-del,--delete IP|CIDR

Delete the allow or deny rule configured for the specified IP address or CIDR block. Used in conjunction with the --update option. Deny access privileges for a specific IP address or CIDR block. Specify all to block access from all IP addresses. Multiple --deny options can be specified to block multiple addresses or address ranges. Used in conjunction with the --update option. Specify the service to apply the policy changes to: doc, search, or all. All allow, deny, and delete options will be applied to the specified service. Multiple --service options can be specified to apply the same policies to multiple services. Required when using the --update option. Update the policy with the specified allow, deny and delete options. When using --update, you must also specify at least one --allow, --deny, or --delete option. You must also specify at least one of the domain's endpoints with the --service option.

-de, --deny IP|CIDR

-se, --service SERVICE

-u, --update

POLICY FILE OPTIONS -pf, --policy-file FILE Replace the domain's existing policy document with the specified JSON policy document. Can be specified as a path to a local file or an S3 URI. Retrieve the domain's existing policy document.

-r, --retrieve

MISCELLANEOUS OPTIONS -f, --force Apply changes to the domain's access policies without confirmation. Can be used in conjunction with either the --update or --policy-file option.

EXAMPLES Authorize addresses in the range 192.0.2.0 to 192.0.2.255 to access all services: cs-configure-access-policies -d mydomain --update --allow 192.0.2.0/24 --service all COMMON_OPTIONS Block a particular IP address from accessing the search service: cs-configure-access-policies -d mydomain --update --deny 192.0.2.0 --service search

API Version 2011-02-01 108

Amazon CloudSearch Developer Guide cs-configure-fields

COMMON_OPTIONS Allow access to all services from any IP address: cs-configure-access-policies -d mydomain --update --allow all --service all COMMON_OPTIONS Upload a policy document and overwrite the domain's access policies without having to confirm the change: cs-configure-access-policies -d mydomain --policy-file c:\mypolicydoc.json --force COMMON_OPTIONS

cs-configure-fields
NAME cs-configure-fields - Define index fields for a domain. SYNOPSIS cs-configure-fields --name STRING --type text|literal|uint [--option search|nosearch|facet|nofacet|result|noresult] [--source STRING] [--default-value NUM] [--delete] COMMON_OPTIONS

DESCRIPTION Defines the fields that will be included in a domain's index and specifies which fields can be searched, included in search results, or used as facets. You can also use this command to delete fields from the domain. The --option values you can specify for a field depend on the field type: - text Text fields are always searchable. You can specify the facet, nofacet, result, or noresult options for a text field. A text field can be used as a facet or returned in search results, but not both. By default, text fields are not facet or result enabled. You can specify the search, nosearch, facet, nofacet, result, or noresult options for a literal field. A literal field can be used as a facet or returned in search results, but not both. By default, literal fields are not searchable, facet-enabled, or result enabled. Uint fields can always be used as facets and returned in results. No --option values are valid for a uint field.

- literal

- uint

For more information about configuring indexing options, see the Amazon CloudSearch Developer Guide.

API Version 2011-02-01 109

Amazon CloudSearch Developer Guide cs-configure-fields

COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-c, --aws-credential-file FILE

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

-ve, --verbose -v, --version

INDEXING OPTIONS --default-value NUM The default value for a uint field. This value will be added to any document that does not contain at least one value for the field. Delete the field specified by the --name and --type options. The name of the field you are configuring or deleting. Field names must begin with a letter and can contain the following characters: a-z (lower-case letters), 0-9, and _ (underscore). Field names must be at least 3 and no more than 28 characters. Required. Configures an option for the field specified by the --name and --type options. Valid values: search, nosearch, facet, nofacet, result, noresult. Text and literal fields cannot have both the facet and result options enabled. By default, text and uint fields are always searchable and uint fields are always facet-enabled.

--delete

--name STRING

--option OPTION

API Version 2011-02-01 110

Amazon CloudSearch Developer Guide cs-configure-ranking

--source FIELD

A source field for a compound field. The value of a compound field is the concatenation of the values of all of its sources. The type of the field that you are configuring or deleting: text, literal, uint. Required.

--type TYPE

EXAMPLES Configure index fields: cs-configure-fields -d mydomain --name title --type text --option result COMMON_OPTIONS cs-configure-fields -d mydomain --name people --type text --source actor --source director COMMON_OPTIONS cs-configure-fields -d mydomain --name category --type literal --options facet COMMON_OPTIONS cs-configure-fields --name value --type uint --default-value 100 COMMON_OPTIONS Delete an index field: cs-configure-fields -d mydomain --name obsolete_field --type index-uint --delete COMMON_OPTIONS

cs-configure-ranking
NAME cs-configure-ranking - Configure a custom rank expression for a domain. SYNOPSIS cs-configure-ranking --name STRING --expression EXPRESSION [--delete] COMMON_OPTIONS DESCRIPTION Enables you to specify a rank expression to control how search results are ranked. A rank expression is a numeric expression that can reference uint fields and other rank expressions by name. You can also reference a document's default text_relevance score in a rank expression. A document's text_relevance score is a value from 0 to 1000 (inclusive). To calculate the relevance score, Amazon CloudSearch takes into account how many times the search terms appear (term frequency) and how close

API Version 2011-02-01 111

Amazon CloudSearch Developer Guide cs-configure-ranking

the search terms are to each other (proximity). All of the usual arithmetic, bitwise, boolean, and comparison operators and most common math C library functions can be used in rank expressions. Intermediate results are calculated as double-precision floating point values and the return value is rounded to the nearest integer. If the expression is invalid or evaluates to a negative value, it returns 0. To use a rank expression to sort search results, you specify &rank=RANKEXPRESSION in your search requests. For more information about constructing and using rank expressions, see the Amazon CloudSearch API Reference and the Amazon CloudSearch Developer Guide. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-c, --aws-credential-file FILE

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

-ve, --verbose -v, --version

RANKING OPTIONS --delete Delete the rank expression specified in the --name option. The name of the rank expression you are configuring or deleting. Required. The rank expression to be computed when processing a search request. A rank expression is a numeric expression that can reference uint fields and other rank expressions by name, as well as a document's

--name STRING

-ex,--expression EXPRESSION

API Version 2011-02-01 112

Amazon CloudSearch Developer Guide cs-configure-text-options

default text_relevance score. EXAMPLES cs-configure-ranking -d mydomain --name myrankexp --expression ((0.3*myuintfield)/10.0)+((0.7*text_relevance)) COMMON_OPTIONS cs-configure-ranking -d mydomain --name myrankexp --expression text_relevance+myotherankexp/100000 COMMON_OPTIONS

cs-configure-text-options
NAME cs-configure-text-options - Specify domain-specific stopwords, synonyms, and stems. SYNOPSIS cs-configure-text-options [--stopwords FILE|S3_URI] [--synonyms FILE|S3_URI] [--stems FILE|S3_URI] [--print-stopwords] [--print-synonyms] [--print-stems] COMMON_OPTIONS DESCRIPTION Amazon CloudSearch gives you control over how your content is indexed by enabling you to specify the following language-specific text options: - stopwords Words that should typically be ignored both during indexing and at search time because they are either insignificant or so common that including them would result in a massive number of matches. The default stopwords for English are: a, an, and, are, as, at, be, but, by, for, in, is, it, of, on, or, the, to, was. Words that have the same or nearly the same meaning as terms that appear in your corpus. When a user searches for a synonym rather than the indexed term, the results will include documents that contain the indexed term. No synonyms are defined by default. Define mappings between related words and a common stem. This enables matching on variants of a word. No stems are defined by default.

- synonyms

- stems

COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction

API Version 2011-02-01 113

Amazon CloudSearch Developer Guide cs-configure-text-options

with --secret-key. Must be specified if you do not use an AWS credential file. -c, --aws-credential-file FILE The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are querying or configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

-ve, --verbose -v, --version

TEXT OPTIONS --stems FILE The path or S3 URI for a stemming dictionary file. The stemming dictionary file should contain one comma-separated term, stem pair per line. For example: mice, mouse people, person running, run --stopwords FILE The path or S3 URI for a stopwords dictionary file. The stopwords dictionary file should contain one stopword per line. For example: the or and --synonyms FILE The path or S3 URI for a synonyms dictionary file. Each line in the file should specify a term followed by a comma-separated list of its synonyms. For example: cat, feline, kitten dog, canine, puppy horse, equine, colt, filly -psw, --print-stopwords -psm, --print-stems List the domain's stopwords. List the domain's stems.

API Version 2011-02-01 114

Amazon CloudSearch Developer Guide cs-create-domain

-psn, --print-synonyms EXAMPLES

List the domain's synonyms.

cs-configure-text-options -d mydomain --stems /home/mystems.txt --stopwords /home/mystopwords.txt --synonyms /home/mysynonyms.txt COMMON_OPTIONS cs-configure-text-options -d mydomain --print-stopwords COMMON_OPTIONS

cs-create-domain
NAME cs-create-domain - Create a new Amazon CloudSearch domain. SYNOPSIS cs-create-domain --domain-name STRING [--wait] COMMON_OPTIONS DESCRIPTION Creates a search domain with the name specified by the --domain-name option. Domain names must begin with a letter or number and can contain the following characters: a-z, 0-9, and -. Uppercase letters and underscores are not allowed. Domain names must be at least 3 and no more than 28 characters. By default, this command returns immediately. If you specify the --wait option, cs-create-domain will return once the domain is created. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are creating. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message.

-c, --aws-credential-file FILE

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help

API Version 2011-02-01 115

Amazon CloudSearch Developer Guide cs-configure-from-sdf

-k, --secret-key STRING

Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-ve, --verbose -v, --version

DOMAIN OPTIONS -w, --wait Wait for domain creation to complete before returning.

EXAMPLES cs-create-domain -d mydomain --wait COMMON_OPTIONS

cs-configure-from-sdf
NAME cs-configure-from-sdf - Define index fields for a domain based on the contents of one or more SDF batches. SYNOPSIS cs-configure-from-sdf --source PATH|S3_URI+ [--replace] [--force] COMMON_OPTIONS DESCRIPTION Scans SDF batches specified with the --source option and configures index fields for all of the document fields. Prompts for confirmation before making any changes unless you specify the --force option. By default, fields that have already been configured are left as-is. You can use the --replace option to overwrite the existing configuration. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are configuring. Required. The endpoint for the Amazon Cloud Search

-c, --aws-credential-file FILE

-d, --domain-name STRING

-e,

--endpoint URL

API Version 2011-02-01 116

Amazon CloudSearch Developer Guide cs-delete-domain

Configuration Service. Defaults to cloud9.us-east-1.amazonaws.com. -h, --help -k, --secret-key STRING Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-ve, --verbose -v, --version

FIELD OPTIONS -f, --force Apply changes to the domain's configuration without confirmation. Upload configuration information for all identified fields and overwrite the configuration of any fields that were already defined. (Prompts for confirmation unless you also specify --force.) The path to a file or an S3 URI that contains the data you want to scan. Required.

-re, --replace

-s, --source FILE

EXAMPLES cs-configure-from-sdf -d mydomain --source s3://mybucket/myAmazingDataSet COMMON_OPTIONS

cs-delete-domain
NAME cs-delete-domain - Permanently delete the specified domain and all of its data. SYNOPSIS cs-delete-domain --domain-name STRING [--force] COMMON_OPTIONS DESCRIPTION Deletes the search domain specified by the --domain-name option. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.

API Version 2011-02-01 117

Amazon CloudSearch Developer Guide cs-describe-domain

-c, --aws-credential-file FILE

The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are deleting. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

-ve, --verbose -v, --version

DELETE DOMAIN OPTIONS -f, --force Delete the domain without prompting for confirmation.

EXAMPLES Delete a domain without prompting for confirmation: cs-delete-domain -d mydomain --force COMMON_OPTIONS

cs-describe-domain
NAME cs-describe-domain - Display information about a domain, including its status and endpoints. SYNOPSIS cs-describe-domain [--show-all] COMMON_OPTIONS DESCRIPTION Display information about your configured domains. If the --domain-name option is specified, cs-describe-domain only shows information for the specified domain. This command returns a table that contains the following information about the domain(s):

API Version 2011-02-01 118

Amazon CloudSearch Developer Guide cs-describe-domain

Domain Name

The name of the domain.

Document Service Endpoint The endpoint through which you can submit document updates. Search Endpoint The endpoint through which you can submit search requests. The number of documents that have been indexed. The name and type of each configured index field. Only shown when --show-all is specified. The name and type of each ranking field. Only shown when --show-all is specified. The number of partitions being used to hold the search index. The number of search instances being used to process search requests. The Amazon EC2 instance type being used to process search requests.

Searchable Documents

Index Fields

Ranking Fields

SearchPartitionCount

SearchInstanceCount

SearchInstanceType

The domain status also indicates whether or not the index needs to be rebuilt to process configuration changes. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used with --secret-key. Must be specified if you don't use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are querying. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you don't an AWS credential file. -ve, --verbose Display verbose log messages.

-c, --aws-credential-file FILE

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING use

API Version 2011-02-01 119

Amazon CloudSearch Developer Guide cs-index-documents

-v, --version

Display the version number of the command line tools.

DESCRIBE DOMAIN OPTIONS -all, --show-all Display all available information for the domain, including configured fields.

EXAMPLES Get information about a particular domain: cs-describe-domain -d mydomain --show-all COMMON_OPTIONS

cs-index-documents
NAME cs-index-documents - Index a domain's documents. SYNOPSIS cs-index-documents COMMON_OPTIONS DESCRIPTION Builds and deploys a complete index for the domain specified by the --domain-name option. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.

-c, --aws-credential-file FILE

The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are indexing. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

API Version 2011-02-01 120

Amazon CloudSearch Developer Guide cs-post-sdf

use an AWS credential file. -ve, --verbose -v, --version Display verbose log messages. Display the version number of the command line tools.

INDEX DOCUMENTS OPTIONS No specific options. EXAMPLES cs-index-documents -d mydomain COMMON_OPTIONS

cs-post-sdf
NAME cs-post-sdf - Upload the SDF documents that you want to index and search. SYNOPSIS cs-post-sdf --source PATH|S3_URI+ COMMON_OPTIONS DESCRIPTION Update the contents of the domain specified by the --domain-name option with the documents specified by the --source option. The source documents must be specified in the SDF format, which can be generated from most types of files using the cs-generate-sdf command. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.

-c, --aws-credential-file FILE

The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are updating. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message.

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help

API Version 2011-02-01 121

Amazon CloudSearch Developer Guide Experimental Tools

-k, --secret-key STRING

Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.

-ve, --verbose -v, --version

UPDATE DOCUMENTS OPTIONS -s, --source PATH|S3_URI The path to a file or an S3 URI that contains the SDF data you want to upload.

EXAMPLES cs-post-sdf -d movies --source movies.sdf COMMON_OPTIONS SEE ALSO cs-generate-sdf

Experimental Tools for Amazon CloudSearch


The following tools are provided on an experimental basis. Please let us know if you would like to see them fully-supported and enhanced in future releases: cs-generate-sdf (p. 122)

cs-generate-sdf
NAME cs-generate-sdf - Experimental tool for analyzing the data you want to index and automatically generating SDF batches for indexing. SYNOPSIS cs-generate-sdf --source PATH|S3_URI --output PATH|S3_URI [--modified-after yyyy-mm-ddTnn:nn] [--exclude-metadata] [--exclude-content] [--single-doc-per-csv] [--sdf-format json|xml] [--docid-prefix STRING] [--doc-version NUM] [--batch-size MB] [--batch-docs NUM] COMMON_OPTIONS DESCRIPTION Analyze your data and generate SDF (Search Data Format) batches that can be submitted to Amazon CloudSearch for indexing using the cs-post-sdf command. The generated SDF batches can be saved to your local file system or to an S3 bucket. The cs-generate-sdf command can generate SDF batches from the following content types:

API Version 2011-02-01 122

Amazon CloudSearch Developer Guide cs-generate-sdf

text/csv text/html text/plain application/json application/msword application/pdf application/vnd.ms-excel application/vnd.ms-powerpoint application/vnd.openxmlformats-officedocument.presentationml.presentation application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.openxmlformats-officedocument.wordprocessingml.document application/xhtml+xml application/xml Generally, a single add document request is added to the SDF batch for each source file. Where possible, the contents of the source file are parsed into one or more index fields. If metadata is available for the file, an index field is added for each piece of metadata. When creating SDF batches from CSV source files, they are automatically parsed to generate a separate document for each row in the CSV file. The contents of the first row are used to define the document fields. If you are processing multiple files, CSV files are parsed row-by-row, and non-CSV files are treated as individual documents. You can specify the --single-doc-per-csv option to override the default behavior and treat each CSV file as a single document. Specifying the --single-doc-per-csv option has no effect on non-CSV files. Note: Currently, only CSV files are parsed to automatically extract custom field data and generate multiple documents. When processing XML and JSON files, each file is treated as a separate document and the contents of the file are used to populate a single text field. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.

-c, --aws-credential-file FILE

The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are updating. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with

-d, --domain-name STRING

-e,

--endpoint URL

-h, --help -k, --secret-key STRING

API Version 2011-02-01 123

Amazon CloudSearch Developer Guide cs-generate-sdf

--access-key. Must be specified if you do not use an AWS credential file. -ve, --verbose -v, --version Display verbose log messages. Display the version number of the command line tools.

REQUIRED SDF OPTIONS -o, --output PATH|S3_URI The local directory or S3 bucket where you want to save the generated SDF batches. You must either specify an output location with the --output option, or specify the --domain option to upload the generated SDF batches to a search domain. The local directory, file, or S3 bucket that contains the data that you want to create SDF batches from. You can process data from multiple locations by specifying multiple --source options. Accepts Apache-ant style wildcards such as */** for files and S3 prefixes. Required.

-s, --source PATH|S3_URI

ADVANCED SDF OPTIONS -bd, --batch-docs NUM -bs, --batch-size MB -sdpc, --single-doc-per-csv The maximum number of documents in a batch. The maximum batch size in MB. Defaults to 5MB. Treat the CSV file as a single document. If this option is specified, the contents of the CSV file will be treated as a single text field. This option has no effect on non-CSV files. The prefix to prepend to the document ID while processing CSV data. If not specified, the filename is used as the --docid-prefix. The docid column is used as the document ID if it is included in the CSV data; otherwise, the row number is used as the document ID. The version number to use for all of the generated SDF documents. Defaults to 1. Do not include the content of the source files in the generated SDF documents, only process the metadata. Do not include the metadata of the source files in the generated SDF documents, only process the content. The format of the generated SDF docments: json or xml. Defaults to json.

-dp, --docid-prefix STRING

-dv, --doc-version NUM

-ec, --exclude-content

-em, --exclude-metadata

-format, --sdf-format json|xml

API Version 2011-02-01 124

Amazon CloudSearch Developer Guide cs-generate-sdf

-m, --modified-after TIMESTAMP

Only process files or S3 objects modified after the specified date and time. Specified as yyyy-mm-ddTnn:nn.

EXAMPLES Generate an SDF batch from a plain text file: cs-generate-sdf --source c:\myAmazingDataSet\data1.txt --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate a single document for each CSV file: cs-generate-sdf --source c:\myAmazingDataSet\*.csv -sdpc --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate an SDF batch from multiple documents: cs-generate-sdf --source c:\myAmazingDataSet\data1.xml --source c:\myAmazingDataSet\data2.xml --source c:\myAmazingDataSet\data3.xml --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches from all HTML documents in a directory: cs-generate-sdf --source c:\myAmazingDataSet\*.html --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches from all Word or PDF documents in a directory: cs-generate-sdf --source c:\myAmazingDataSet\*.doc --source c:\myAmazingDataSet\*.pdf --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches from all recognized file types: cs-generate-sdf --source c:\myAmazingDataSet\* --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches and upload them to your domain: cs-generate-sdf -d mydomain --source c:\myAmazingDataSet\* COMMON_OPTIONS SEE ALSO cs-post-sdf

API Version 2011-02-01 125

Amazon CloudSearch Developer Guide Actions

Amazon CloudSearch Configuration API Reference


Topics Actions (p. 126) Data Types (p. 153) Common Query Parameters (p. 171) Common Errors (p. 172) You use the Amazon CloudSearch Configuration API to create, configure, and manage search domains. The configuration service is accessed through a general endpoint, cloudsearch.us-east-1.amazonaws.com.You submit Amazon CloudSearch configuration requests using the AWS Query protocol. AWS Query requests are HTTP or HTTPS requests submitted via HTTP GET or POST with a Query parameter named Action.The API version must be specified in all requests. The current Amazon CloudSearch API version is 2011-02-01. Requests submitted to the Configuration API are authenticated using your AWS credentials. You must include authorization parameters and a digital signature in every request. Amazon CloudSearch supports AWS Signature Version 4. For detailed signing instructions, see Signature V4 Signing Process in the AWS General Reference. The other APIs you use to interact with Amazon CloudSearch are: Amazon CloudSearch Document Service API Reference (p. 174)Submit the data you want to search. Amazon CloudSearch Search API Reference (p. 184)Search your domain.

Actions
The actions described in this guide are called using the AWS Query protocol. The following actions are supported: CreateDomain (p. 128) DefineIndexField (p. 129)

API Version 2011-02-01 126

Amazon CloudSearch Developer Guide Actions

DefineRankExpression (p. 131) DeleteDomain (p. 132) DeleteIndexField (p. 133) DeleteRankExpression (p. 134) DescribeDefaultSearchField (p. 135) DescribeDomains (p. 136) DescribeIndexFields (p. 137) DescribeRankExpressions (p. 138) DescribeServiceAccessPolicies (p. 139) DescribeStemmingOptions (p. 140) DescribeStopwordOptions (p. 141) DescribeSynonymOptions (p. 142) IndexDocuments (p. 143) UpdateDefaultSearchField (p. 144) UpdateServiceAccessPolicies (p. 146) UpdateStemmingOptions (p. 148) UpdateStopwordOptions (p. 150) UpdateSynonymOptions (p. 152)

API Version 2011-02-01 127

Amazon CloudSearch Developer Guide CreateDomain

CreateDomain
Description
Creates a new search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a CreateDomainResult structure. Name
DomainStatus

Description The current status of the search domain. Type: DomainStatus (p. 159)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400 500

Base Internal

An error occurred while processing the request. An internal error occurred while processing the request. If this problem persists, report an issue from the Service Health Dashboard.

LimitExceeded

The request was rejected because a resource limit has already been met. 409

API Version 2011-02-01 128

Amazon CloudSearch Developer Guide DefineIndexField

DefineIndexField
Description
Configures an IndexField for the search domain. Used to create new fields and modify existing ones. If the field exists, the new configuration replaces the old one. You can configure a maximum of 200 index fields.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. IndexField Defines a field in the index, including its name, type, and the source of its Yes data. The IndexFieldType indicates which of the options will be present. It is invalid to specify options for a type other than the IndexFieldType. Type: IndexField (p. 161)

Response Elements
The following elements come wrapped in a DefineIndexFieldResult structure. Name Description

IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met.

InvalidType LimitExceeded

API Version 2011-02-01 129

Amazon CloudSearch Developer Guide DefineIndexField

Error

Description

HTTP Status Code

ResourceNotFound

The request was rejected because it attempted to reference a resource 409 that does not exist.

API Version 2011-02-01 130

Amazon CloudSearch Developer Guide DefineRankExpression

DefineRankExpression
Description
Configures a RankExpression for the search domain. Used to create new rank expressions and modify existing ones. If the expression exists, the new configuration replaces the old one. You can configure a maximum of 50 rank expressions.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName

Description

Required

A string that represents the name of a domain. Domain names must Yes be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. A named expression that can be evaluated at search time and used Yes for ranking or thresholding in a search query. Type: NamedRankExpression (p. 162)

RankExpression

Response Elements
The following elements come wrapped in a DefineRankExpressionResult structure. Name
RankExpression

Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met. The request was rejected because it attempted to reference a resource 409 that does not exist.

InvalidType LimitExceeded

ResourceNotFound

API Version 2011-02-01 131

Amazon CloudSearch Developer Guide DeleteDomain

DeleteDomain
Description
Permanently deletes a search domain and all of its data.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a DeleteDomainResult structure. Name
DomainStatus

Description The current status of the search domain. Type: DomainStatus (p. 159)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400 500

Base Internal

An error occurred while processing the request. An internal error occurred while processing the request. If this problem persists, report an issue from the Service Health Dashboard.

API Version 2011-02-01 132

Amazon CloudSearch Developer Guide DeleteIndexField

DeleteIndexField
Description
Removes an IndexField from the search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName

Description

Required

A string that represents the name of a domain. Domain names must Yes be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. A string that represents the name of an index field. Field names must Yes begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names "body", "docid", and "text_relevance" are reserved and cannot be specified as field or rank expression names. Type: String Length constraints: Minimum length of 1. Maximum length of 64.

IndexFieldName

Response Elements
The following elements come wrapped in a DeleteIndexFieldResult structure. Name Description

IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 133

Amazon CloudSearch Developer Guide DeleteRankExpression

DeleteRankExpression
Description
Removes a RankExpression from the search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. RankName

The name of the RankExpression to delete. Type: String Length constraints: Minimum length of 1. Maximum length of 64.

Yes

Response Elements
The following elements come wrapped in a DeleteRankExpressionResult structure. Name
RankExpression

Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 134

Amazon CloudSearch Developer Guide DescribeDefaultSearchField

DescribeDefaultSearchField
Description
Gets the default search field configured for the search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a DescribeDefaultSearchFieldResult structure. Name
DefaultSearchField

Description The name of the IndexField to use for search requests issued with the q parameter. The default is the empty string, which automatically searches all text fields. Type: DefaultSearchFieldStatus (p. 155)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 135

Amazon CloudSearch Developer Guide DescribeDomains

DescribeDomains
Description
Gets information about the search domains owned by this account. Can be limited to specific domains. Shows all domains by default.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainNames.member.N

Description

Required

Limits the DescribeDomains response to the specified search No domains. Type: String list

Response Elements
The following elements come wrapped in a DescribeDomainsResult structure. Name
DomainStatusList

Description The current status of all of your search domains. Type: DomainStatus (p. 159) list

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400 500

Base Internal

An error occurred while processing the request. An internal error occurred while processing the request. If this problem persists, report an issue from the Service Health Dashboard.

API Version 2011-02-01 136

Amazon CloudSearch Developer Guide DescribeIndexFields

DescribeIndexFields
Description
Gets information about the index fields configured for the search domain. Can be limited to specific fields by name. Shows all fields by default.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName

Description

Required

A string that represents the name of a domain. Domain names Yes must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Limits the DescribeIndexFields response to the specified No fields. Type: String list

FieldNames.member.N

Response Elements
The following elements come wrapped in a DescribeIndexFieldsResult structure. Name
IndexFields

Description The index fields configured for the domain. Type: IndexFieldStatus (p. 162) list

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 137

Amazon CloudSearch Developer Guide DescribeRankExpressions

DescribeRankExpressions
Description
Gets the rank expressions configured for the search domain. Can be limited to specific rank expressions by name. Shows all rank expressions by default.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName

Description

Required

A string that represents the name of a domain. Domain names Yes must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Limits the DescribeRankExpressions response to the specified fields. Type: String list No

RankNames.member.N

Response Elements
The following elements come wrapped in a DescribeRankExpressionsResult structure. Name
RankExpressions

Description The rank expressions configured for the domain. Type: RankExpressionStatus (p. 164) list

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 138

Amazon CloudSearch Developer Guide DescribeServiceAccessPolicies

DescribeServiceAccessPolicies
Description
Gets information about the resource-based policies that control access to the domain's document and search services.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a DescribeServiceAccessPoliciesResult structure. Name
AccessPolicies

Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 139

Amazon CloudSearch Developer Guide DescribeStemmingOptions

DescribeStemmingOptions
Description
Gets the stemming dictionary configured for the search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a DescribeStemmingOptionsResult structure. Name
Stems

Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 140

Amazon CloudSearch Developer Guide DescribeStopwordOptions

DescribeStopwordOptions
Description
Gets the stopwords configured for the search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a DescribeStopwordOptionsResult structure. Name
Stopwords

Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 141

Amazon CloudSearch Developer Guide DescribeSynonymOptions

DescribeSynonymOptions
Description
Gets the synonym dictionary configured for the search domain.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a DescribeSynonymOptionsResult structure. Name
Synonyms

Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 142

Amazon CloudSearch Developer Guide IndexDocuments

IndexDocuments
Description
Tells the search domain to start indexing its documents using the latest text processing options and IndexFields. This operation must be invoked to make options whose OptionStatus (p. 164) has OptionState of RequiresIndexDocuments visible in search results.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in a IndexDocumentsResult structure. Name Description

FieldNames The names of the fields that are currently being processed due to an IndexDocuments action. Type: String list

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

API Version 2011-02-01 143

Amazon CloudSearch Developer Guide UpdateDefaultSearchField

UpdateDefaultSearchField
Description
Configures the default search field for the search domain. The default search field is used when a search request does not specify which fields to search. By default, it is configured to include the contents of all of the domain's text fields.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DefaultSearchField

Description The IndexField to use for search requests issued with the q parameter. The default is an empty string, which automatically searches all text fields. Type: String

Required Yes

DomainName

A string that represents the name of a domain. Domain names Yes must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in an UpdateDefaultSearchFieldResult structure. Name
DefaultSearchField

Description The value of the DefaultSearchField configured for this search domain and its current status. Type: DefaultSearchFieldStatus (p. 155)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409

InvalidType

API Version 2011-02-01 144

Amazon CloudSearch Developer Guide UpdateDefaultSearchField

Error

Description

HTTP Status Code

ResourceNotFound

The request was rejected because it attempted to reference a resource 409 that does not exist.

API Version 2011-02-01 145

Amazon CloudSearch Developer Guide UpdateServiceAccessPolicies

UpdateServiceAccessPolicies
Description
Configures the policies that control access to the domain's document and search services. The maximum size of an access policy document is 100KB.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
AccessPolicies

Description

Required

An IAM access policy as described in The Access Policy Language Yes in Using AWS Identity and Access Management. The maximum size of an access policy document is 100KB. Example: {"Statement": [{"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:search/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }}, {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:documents/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }} ] } Type: String

DomainName

A string that represents the name of a domain. Domain names must Yes be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.

Response Elements
The following elements come wrapped in an UpdateServiceAccessPoliciesResult structure. Name
AccessPolicies

Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172).

API Version 2011-02-01 146

Amazon CloudSearch Developer Guide UpdateServiceAccessPolicies

Error

Description

HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met. The request was rejected because it attempted to reference a resource 409 that does not exist.

InvalidType LimitExceeded

ResourceNotFound

API Version 2011-02-01 147

Amazon CloudSearch Developer Guide UpdateStemmingOptions

UpdateStemmingOptions
Description
Configures a stemming dictionary for the search domain.The stemming dictionary is used during indexing and when processing search requests. The maximum size of the stemming dictionary is 500KB.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Stems

Maps terms to their stems, serialized as a JSON document. The document Yes has a single object with one property "stems" whose value is an object mapping terms to their stems. The maximum size of a stemming document is 500KB. Example: { "stems": {"people": "person", "walking": "walk"} } Type: String

Response Elements
The following elements come wrapped in an UpdateStemmingOptionsResult structure. Name
Stems

Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met.

InvalidType LimitExceeded

API Version 2011-02-01 148

Amazon CloudSearch Developer Guide UpdateStemmingOptions

Error

Description

HTTP Status Code

ResourceNotFound

The request was rejected because it attempted to reference a resource 409 that does not exist.

API Version 2011-02-01 149

Amazon CloudSearch Developer Guide UpdateStopwordOptions

UpdateStopwordOptions
Description
Configures stopwords for the search domain. Stopwords are used during indexing and when processing search requests. The maximum size of the stopwords dictionary is 10KB.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Stopwords

Lists stopwords serialized as a JSON document. The document has a Yes single object with one property "stopwords" whose value is an array of strings. The maximum size of a stopwords document is 10KB. Example: { "stopwords": ["a", "an", "the", "of"] } Type: String

Response Elements
The following elements come wrapped in an UpdateStopwordOptionsResult structure. Name
Stopwords

Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met.

InvalidType LimitExceeded

API Version 2011-02-01 150

Amazon CloudSearch Developer Guide UpdateStopwordOptions

Error

Description

HTTP Status Code

ResourceNotFound

The request was rejected because it attempted to reference a resource 409 that does not exist.

API Version 2011-02-01 151

Amazon CloudSearch Developer Guide UpdateSynonymOptions

UpdateSynonymOptions
Description
Configures a synonym dictionary for the search domain. The synonym dictionary is used during indexing to configure mappings for terms that occur in text fields. The maximum size of the synonym dictionary is 100KB.

Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required

DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Synonyms

Maps terms to their synonyms, serialized as a JSON document. The document has a single object with one property "synonyms" whose value is an object mapping terms to their synonyms. Each synonym is a simple string or an array of strings. The maximum size of a stopwords document is 100KB. Example: { "synonyms": {"cat": ["feline", "kitten"], "puppy": "dog"} } Type: String

Yes

Response Elements
The following elements come wrapped in an UpdateSynonymOptionsResult structure. Name
Synonyms

Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)

Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400

Base Internal

An error occurred while processing the request.

An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409

InvalidType

API Version 2011-02-01 152

Amazon CloudSearch Developer Guide Data Types

Error

Description

HTTP Status Code

LimitExceeded

The request was rejected because a resource limit has already been 409 met. The request was rejected because it attempted to reference a resource 409 that does not exist.

ResourceNotFound

Data Types
The Amazon CloudSearch Configuration Service API contains several data types that various actions use. This section describes each data type in detail.

Note
The order of each element in the response is not guaranteed. Applications should not assume a particular order. The following data types are supported: AccessPoliciesStatus (p. 154) CreateDomainResult (p. 155) DefaultSearchFieldStatus (p. 155) DefineIndexFieldResult (p. 155) DefineRankExpressionResult (p. 156) DeleteDomainResult (p. 156) DeleteIndexFieldResult (p. 156) DeleteRankExpressionResult (p. 156) DescribeDefaultSearchFieldResult (p. 157) DescribeDomainsResult (p. 157) DescribeIndexFieldsResult (p. 157) DescribeRankExpressionsResult (p. 158) DescribeServiceAccessPoliciesResult (p. 158) DescribeStemmingOptionsResult (p. 158) DescribeStopwordOptionsResult (p. 158) DescribeSynonymOptionsResult (p. 159) DomainStatus (p. 159) IndexDocumentsResult (p. 160) IndexField (p. 161) IndexFieldStatus (p. 162) LiteralOptions (p. 162) NamedRankExpression (p. 162) OptionStatus (p. 164)

API Version 2011-02-01 153

Amazon CloudSearch Developer Guide AccessPoliciesStatus

RankExpressionStatus (p. 164) ServiceEndpoint (p. 165) SourceAttribute (p. 165) SourceData (p. 166) SourceDataMap (p. 166) SourceDataTrimTitle (p. 166) StemmingOptionsStatus (p. 167) StopwordOptionsStatus (p. 167) SynonymOptionsStatus (p. 168) TextOptions (p. 168) UIntOptions (p. 169) UpdateDefaultSearchFieldResult (p. 169) UpdateServiceAccessPoliciesResult (p. 169) UpdateStemmingOptionsResult (p. 170) UpdateStopwordOptionsResult (p. 170) UpdateSynonymOptionsResult (p. 170)

AccessPoliciesStatus
Description
A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies.

Contents
Name Description
Options An IAM access policy as described in The Access Policy Language in Using AWS Identity and Access Management. The maximum size of an access policy document is 100KB.

Example: {"Statement": [{"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:search/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }}, {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:documents/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }} ] } Type: String
Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

API Version 2011-02-01 154

Amazon CloudSearch Developer Guide CreateDomainResult

CreateDomainResult
Description
A response message that contains the status of a newly created domain.

Contents
Name
DomainStatus

Description The current status of the search domain. Type: DomainStatus (p. 159)

DefaultSearchFieldStatus
Description
The value of the DefaultSearchField configured for this search domain and its current status.

Contents
Name Description
Options The name of the IndexField to use as the default search field. The default is an empty string, which automatically searches all text fields. Type: String Length constraints: Minimum length of 1. Maximum length of 64. Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

DefineIndexFieldResult
Description
A response message that contains the status of an updated index field.

Contents
Name Description
IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)

API Version 2011-02-01 155

Amazon CloudSearch Developer Guide DefineRankExpressionResult

DefineRankExpressionResult
Description
A response message that contains the status of an updated RankExpression.

Contents
Name
RankExpression

Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)

DeleteDomainResult
Description
A response message that contains the status of a newly deleted domain, or no status if the domain has already been completely deleted.

Contents
Name
DomainStatus

Description The current status of the search domain. Type: DomainStatus (p. 159)

DeleteIndexFieldResult
Description
A response message that contains the status of a deleted index field.

Contents
Name Description
IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)

DeleteRankExpressionResult
Description
A response message that contains the status of a deleted RankExpression.

API Version 2011-02-01 156

Amazon CloudSearch Developer Guide DescribeDefaultSearchFieldResult

Contents
Name
RankExpression

Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)

DescribeDefaultSearchFieldResult
Description
A response message that contains the default search field for a search domain.

Contents
Name
DefaultSearchField

Description The name of the IndexField to use for search requests issued with the q parameter. The default is the empty string, which automatically searches all text fields. Type: DefaultSearchFieldStatus (p. 155)

DescribeDomainsResult
Description
A response message that contains the status of one or more domains.

Contents
Name
DomainStatusList

Description The current status of all of your search domains. Type: DomainStatus (p. 159) list

DescribeIndexFieldsResult
Description
A response message that contains the index fields for a search domain.

Contents
Name
IndexFields

Description The index fields configured for the domain. Type: IndexFieldStatus (p. 162) list

API Version 2011-02-01 157

Amazon CloudSearch Developer Guide DescribeRankExpressionsResult

DescribeRankExpressionsResult
Description
A response message that contains the rank expressions for a search domain.

Contents
Name
RankExpressions

Description The rank expressions configured for the domain. Type: RankExpressionStatus (p. 164) list

DescribeServiceAccessPoliciesResult
Description
A response message that contains the access policies for a domain.

Contents
Name
AccessPolicies

Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)

DescribeStemmingOptionsResult
Description
A response message that contains the stemming options for a search domain.

Contents
Name
Stems

Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)

DescribeStopwordOptionsResult
Description
A response message that contains the stopword options for a search domain.

API Version 2011-02-01 158

Amazon CloudSearch Developer Guide DescribeSynonymOptionsResult

Contents
Name
Stopwords

Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)

DescribeSynonymOptionsResult
Description
A response message that contains the synonym options for a search domain.

Contents
Name
Synonyms

Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)

DomainStatus
Description
The current status of the search domain.

Contents
Name
Created

Description True if the search domain is created. It can take several minutes to initialize a domain when CreateDomain (p. 128) is called. Newly created search domains are returned from DescribeDomains (p. 136) with a false value for Created until domain creation is complete. Type: Boolean True if the search domain has been deleted. The system must clean up resources dedicated to the search domain when DeleteDomain (p. 132) is called. Newly deleted search domains are returned from DescribeDomains (p. 136) with a true value for IsDeleted for several minutes until resource cleanup is complete. Type: Boolean The service endpoint for updating documents in a search domain. Type: ServiceEndpoint (p. 165)

Deleted

DocService

API Version 2011-02-01 159

Amazon CloudSearch Developer Guide IndexDocumentsResult

Name
DomainId

Description An internally generated unique identifier for a domain. Type: String Length constraints: Minimum length of 1. Maximum length of 64. A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. The number of documents that have been submitted to the domain and indexed. Type: Integer True if processing is being done to activate the current domain configuration. Type: Boolean True if IndexDocuments (p. 143) needs to be called to activate the current domain configuration. Type: Boolean The number of search instances that are available to process search requests. Type: Integer The instance type that is being used to process search requests. Type: String Valid Values: SearchInstance:t1.micro | SearchInstance:m1.small | SearchInstance:m1.large | SearchInstance:m2.xlarge The number of partitions across which the search index is spread. Type: Integer The service endpoint for requesting search results from a search domain. Type: ServiceEndpoint (p. 165)

DomainName

NumSearchableDocs

Processing

RequiresIndexDocuments

SearchInstanceCount

SearchInstanceType

SearchPartitionCount

SearchService

IndexDocumentsResult
Description
The result of an IndexDocuments action.

API Version 2011-02-01 160

Amazon CloudSearch Developer Guide IndexField

Contents
Name Description
FieldNames The names of the fields that are currently being processed due to an IndexDocuments action. Type: String list

IndexField
Description
Defines a field in the index, including its name, type, and the source of its data. The IndexFieldType indicates which of the options will be present. It is invalid to specify options for a type other than the IndexFieldType.

Contents
Name
IndexFieldName

Description The name of a field in the search index. Field names must begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names "body", "docid", and "text_relevance" are reserved and cannot be specified as field or rank expression names. Type: String Length constraints: Minimum length of 1. Maximum length of 64. The type of field. Based on this type, exactly one of the UIntOptions (p. 169), LiteralOptions (p. 162) or TextOptions (p. 168) must be present. Type: String Valid Values: uint | literal | text Options for literal field. Present if IndexFieldType specifies the field is of type literal. Type: LiteralOptions (p. 162) An optional list of source attributes that provide data for this index field. If not specified, the data is pulled from a source attribute with the same name as this IndexField. When one or more source attributes are specified, an optional data transformation can be applied to the source data when populating the index field. You can configure a maximum of 20 sources for an IndexField. Type: SourceAttribute (p. 165) list Options for text field. Present if IndexFieldType specifies the field is of type text. Type: TextOptions (p. 168) Options for an unsigned integer field. Present if IndexFieldType specifies the field is of type unsigned integer. Type: UIntOptions (p. 169)

IndexFieldType

LiteralOptions

SourceAttributes

TextOptions

UIntOptions

API Version 2011-02-01 161

Amazon CloudSearch Developer Guide IndexFieldStatus

IndexFieldStatus
Description
The value of an IndexField and its current status.

Contents
Name Description
Options Defines a field in the index, including its name, type, and the source of its data. The IndexFieldType indicates which of the options will be present. It is invalid to specify options for a type other than the IndexFieldType. Type: IndexField (p. 161) Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

LiteralOptions
Description
Options that define a literal field in the search index.

Contents
Name
DefaultValue

Description The default value for a literal field. Optional. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. Specifies whether facets are enabled for this field. Default: False. Type: Boolean Specifies whether values of this field can be returned in search results and used for ranking. Default: False. Type: Boolean Specifies whether search is enabled for this field. Default: False. Type: Boolean

FacetEnabled

ResultEnabled

SearchEnabled

NamedRankExpression
Description
A named expression that can be evaluated at search time and used for ranking or thresholding in a search query.

API Version 2011-02-01 162

Amazon CloudSearch Developer Guide NamedRankExpression

Contents
Name
RankExpression

Description The expression to evaluate for ranking or thresholding while processing a search request. The RankExpression syntax is based on JavaScript expressions and supports: Integer, floating point, hex and octal literals Shortcut evaluation of logical operators such that an expression a || b evaluates to the value a if a is true without evaluting b at all JavaScript order of precedence for operators Arithmetic operators: + - * / % Boolean operators (including the ternary operator) Bitwise operators Comparison operators Common mathematic functions: abs ceil erf exp floor lgamma ln log2 log10 max min sqrt pow Trigonometric library functions: acosh acos asinh asin atanh atan cosh cos sinh sin tanh tan Random generation of a number between 0 and 1: rand Current time in epoch: time The min max functions that operate on a variable argument list Intermediate results are calculated as double precision floating point values. The final return value of a RankExpression is automatically converted from floating point to a 32-bit unsigned integer by rounding to the nearest integer, with a natural floor of 0 and a ceiling of max(uint32_t), 4294967295. Mathematical errors such as dividing by 0 will fail during evaluation and return a value of 0. The source data for a RankExpression can be the name of an IndexField of type uint, another RankExpression or the reserved name text_relevance. The text_relevance source is defined to return an integer from 0 to 1000 (inclusive) to indicate how relevant a document is to the search request, taking into account repetition of search terms in the document and proximity of search terms to each other in each matching IndexField in the document. For more information about using rank expressions to customize ranking, see the Amazon CloudSearch Developer Guide. Type: String Length constraints: Minimum length of 1. Maximum length of 10240.

RankName

The name of a rank expression. Rank expression names must begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names "body", "docid", and "text_relevance" are reserved and cannot be specified as field or rank expression names. Type: String Length constraints: Minimum length of 1. Maximum length of 64.

API Version 2011-02-01 163

Amazon CloudSearch Developer Guide OptionStatus

OptionStatus
Description
The status of an option, including when it was last updated and whether it is actively in use for searches.

Contents
Name
CreationDate

Description A timestamp for when this option was created. Type: DateTime The state of processing a change to an option. Possible values: RequiresIndexDocuments: the option's latest value will not be visible in searches until IndexDocuments (p. 143) has been called and indexing is complete. Processing: the option's latest value is not yet visible in all searches but is in the process of being activated. Active: the option's latest value is completely visible. Type: String Valid Values: RequiresIndexDocuments | Processing | Active

State

UpdateDate

A timestamp for when this option was last updated. Type: DateTime A unique integer that indicates when this option was last updated. Type: Integer

UpdateVersion

RankExpressionStatus
Description
The value of a RankExpression and its current status.

Contents
Name Description
Options The expression that is evaluated for ranking or thresholding while processing a search request. Type: NamedRankExpression (p. 162) Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

API Version 2011-02-01 164

Amazon CloudSearch Developer Guide ServiceEndpoint

ServiceEndpoint
Description
The endpoint to which service requests can be submitted, including the actual URL prefix for sending requests and the Amazon Resource Name (ARN) so the endpoint can be referenced in other API calls such as UpdateServiceAccessPolicies (p. 146).

Contents
Name
Arn

Description An Amazon Resource Name (ARN). See Identifiers for IAM Entities in Using AWS Identity and Access Management for more information. Type: String The URL (including /version/pathPrefix) to which service requests can be submitted. Type: String

Endpoint

SourceAttribute
Description
Identifies the source data for an index field. An optional data transformation can be applied to the source data when populating the index field. By default, the value of the source attribute is copied to the index field.

Contents
Name
SourceDataCopy

Description Copies data from a source document attribute to an IndexField. Type: SourceData (p. 166) Identifies the transformation to apply when copying data from a source attribute. Type: String Valid Values: Copy | TrimTitle | Map Maps source document attribute values to new values when populating the IndexField. Type: SourceDataMap (p. 166) Trims common title words from a source document attribute when populating an IndexField. This can be used to create an IndexField you can use for sorting. Type: SourceDataTrimTitle (p. 166)

SourceDataFunction

SourceDataMap

SourceDataTrimTitle

API Version 2011-02-01 165

Amazon CloudSearch Developer Guide SourceData

SourceData
Description
The source attribute name and an optional default value to use if a document doesn't have an attribute of that name.

Contents
Name
DefaultValue

Description An optional default value to use if the source attribute is not specified in a document. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. The name of the document source field to add to this IndexField. Type: String Length constraints: Minimum length of 1. Maximum length of 64.

SourceName

SourceDataMap
Description
Specifies how to map source attribute values to custom values when populating an IndexField.

Contents
Name
Cases

Description A map that translates source field values to custom values. Type: String to String map An optional default value to use if the source attribute is not specified in a document. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. The name of the document source field to add to this IndexField. Type: String Length constraints: Minimum length of 1. Maximum length of 64.

DefaultValue

SourceName

SourceDataTrimTitle
Description
Specifies how to trim common words from the beginning of a field to enable title sorting by that field.

API Version 2011-02-01 166

Amazon CloudSearch Developer Guide StemmingOptionsStatus

Contents
Name
DefaultValue

Description An optional default value to use if the source attribute is not specified in a document. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. An IETF RFC 4646 language code. Only the primary language is considered. English (en) is currently the only supported language. Type: String The separator that follows the text to trim. Type: String The name of the document source field to add to this IndexField. Type: String Length constraints: Minimum length of 1. Maximum length of 64.

Language

Separator

SourceName

StemmingOptionsStatus
Description
The stemming options configured for this search domain and the current status of those options.

Contents
Name Description
Options Maps terms to their stems, serialized as a JSON document. The document has a single object with one property "stems" whose value is an object mapping terms to their stems. The maximum size of a stemming document is 500KB. Example: { "stems": {"people": "person", "walking": "walk"} } Type: String Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

StopwordOptionsStatus
Description
The stopword options configured for this search domain and the current status of those options.

API Version 2011-02-01 167

Amazon CloudSearch Developer Guide SynonymOptionsStatus

Contents
Name Description
Options Lists stopwords serialized as a JSON document. The document has a single object with one property "stopwords" whose value is an array of strings. The maximum size of a stopwords document is 10KB. Example: { "stopwords": ["a", "an", "the", "of"] } Type: String Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

SynonymOptionsStatus
Description
The synonym options configured for this search domain and the current status of those options.

Contents
Name Description
Options Maps terms to their synonyms, serialized as a JSON document. The document has a single object with one property "synonyms" whose value is an object mapping terms to their synonyms. Each synonym is a simple string or an array of strings. The maximum size of a stopwords document is 100KB. Example: { "synonyms": {"cat": ["feline", "kitten"], "puppy": "dog"} } Type: String Status

The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)

TextOptions
Description
Options that define a text field in the search index.

Contents
Name
DefaultValue

Description The default value for a text field. Optional. Type: String Length constraints: Minimum length of 0. Maximum length of 1024.

API Version 2011-02-01 168

Amazon CloudSearch Developer Guide UIntOptions

Name
FacetEnabled

Description Specifies whether facets are enabled for this field. Default: False. Type: Boolean Specifies whether values of this field can be returned in search results and used for ranking. Default: False. Type: Boolean

ResultEnabled

UIntOptions
Description
Options that define a uint field in the search index.

Contents
Name
DefaultValue

Description The default value for an unsigned integer field. Optional. Type: Integer

UpdateDefaultSearchFieldResult
Description
A response message that contains the status of an updated default search field.

Contents
Name
DefaultSearchField

Description The value of the DefaultSearchField configured for this search domain and its current status. Type: DefaultSearchFieldStatus (p. 155)

UpdateServiceAccessPoliciesResult
Description
A response message that contains the status of updated access policies.

API Version 2011-02-01 169

Amazon CloudSearch Developer Guide UpdateStemmingOptionsResult

Contents
Name
AccessPolicies

Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)

UpdateStemmingOptionsResult
Description
A response message that contains the status of updated stemming options.

Contents
Name
Stems

Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)

UpdateStopwordOptionsResult
Description
A response message that contains the status of updated stopword options.

Contents
Name
Stopwords

Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)

UpdateSynonymOptionsResult
Description
A response message that contains the status of updated synonym options.

API Version 2011-02-01 170

Amazon CloudSearch Developer Guide Common Query Parameters

Contents
Name
Synonyms

Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)

Common Query Parameters


This section lists the request parameters that all actions use. Any action-specific parameters are listed in the topic for the action. Parameter Name
Action

Description The action to perform. Default: None Type: String The parameters required to authenticate a query request. Contains: AWSAccessKeyID SignatureVersion Timestamp Signature Default: None The Access Key ID corresponding to the AWS Secret Access Key you used to sign the request. Default: None Type: String The date and time at which the request signature expires, in the format YYYY-MM-DDThh:mm:ssZ, as specified in the ISO 8601 standard. Condition: Requests must include either Timestamp or Expires, but not both. Default: None Type: String The temporary security token obtained through a call to AWS Security Token Service. Only available for actions in the following AWS services: Amazon EC2, Amazon Simple Notification Service, Amazon SQS, and AWS SimpleDB. Default: None Type: String

Required Yes

AuthParams

Conditional

AWSAccessKeyId

Yes

Expires

Conditional

SecurityToken

API Version 2011-02-01 171

Amazon CloudSearch Developer Guide Common Errors

Parameter Name
Signature

Description

Required

The digital signature you created for the request. Refer Yes to the service's developer documentation for information about how to generate the signature. Default: None Type: String The hash algorithm you used to create the request signature. Default: None Valid Values: HmacSHA256 | HmacSHA1. Type: String Yes

SignatureMethod

SignatureVersion

The signature version you use to sign the request. Set Yes this to the value recommended in your product-specific documentation on security. Default: None Type: String The date and time the request was signed, in the format Conditional YYYY-MM-DDThh:mm:ssZ, as specified in the ISO 8601 standard. Condition: Requests must include either Timestamp or Expires, but not both. Default: None Type: String The API version to use, in the format YYYY-MM-DD. Default: None Type: String Yes

Timestamp

Version

Common Errors
This section lists the common errors that all actions return. Any action-specific errors are listed in the topic for the action. Error Description HTTP Status Code 400

IncompleteSignature

The request signature does not conform to AWS standards.

InternalFailure

The request processing has failed due to some 500 unknown error, exception, or failure. The action or operation requested is invalid. The X.509 certificate or AWS Access Key ID provided does not exist in our records. 400 403

InvalidAction InvalidClientTokenId

API Version 2011-02-01 172

Amazon CloudSearch Developer Guide Common Errors

Error

Description

HTTP Status Code 400

InvalidParameterCombination

Parameters that must not be used together were used together. A bad or out-of-range value was supplied for the input parameter. AWS query string is malformed, does not adhere to AWS standards. The query string is malformed. The request is missing an action or operation parameter.

InvalidParameterValue

400

InvalidQueryParameter

400

MalformedQueryString MissingAction

404 400

MissingAuthenticationToken

Request must contain either a valid (registered) 403 AWS Access Key ID or X.509 certificate. An input parameter that is mandatory for processing the request is not supplied. 400

MissingParameter

OptInRequired

The AWS Access Key ID needs a subscription 403 for the service. Request is past expires date or the request date (either with 15 minute padding), or the request date occurs more than 15 minutes in the future. The request has failed due to a temporary failure of the server. 400

RequestExpired

ServiceUnavailable

503

Throttling

Request was denied due to request throttling. 400

API Version 2011-02-01 173

Amazon CloudSearch Developer Guide documents/batch

Amazon CloudSearch Document Service API Reference


Topics documents/batch (p. 174) You use the document service API to manage the documents in your Amazon CloudSearch domain. The documents in your domain are automatically indexed and made searchable. You access the document service API through a domain-specific endpoint, http://doc-domainname-domainid.us-east-1.cloudsearch.amazonaws.com. You use the Amazon CloudSearch document service API to: Add new documents to your search domain Replace existing documents in your search domain Remove existing documents from your search domain

Note
The document service API is a REST-style API that has a single resource, documents/batch. The API version must be specified in all requests. The current Amazon CloudSearch API version is 2011-02-01. The other APIs you use to interact with Amazon CloudSearch are: Amazon CloudSearch Configuration API Reference (p. 126)Set up and manage your search domain. Amazon CloudSearch Search API Reference (p. 184)Search your domain.

documents/batch
This section describes the HTTP request and response messages for the documents/batch resource. You use the documents/batch resource to submit data to your search domain for indexing. It is accessed through a domain's document service endpoint at /2011-02-01/documents/batch. All requests must be submitted using HTTP POST.

API Version 2011-02-01 174

Amazon CloudSearch Developer Guide documents/batch JSON API

Requests can only be submitted to your search domain's document service from authorized IP addresses. For information about authorizing IP addresses to submit document service requests, see Configuring Access for an Amazon CloudSearch Domain (p. 32). For more information about submitting data for indexing, see Uploading Data to an Amazon CloudSearch Domain (p. 72).

documents/batch JSON API


JSON documents/batch Requests
The body of a documents/batch request uses SDF to specify the document operations you want to perform. An SDF JSON representation of a batch is a collection of objects that define individual add and delete operations. The type property identifies whether an object represents an add or delete operation. For example, the following JSON SDF batch adds one document and deletes one document:
[ { "type": "add", "id": "tt0484562", "version": 1, "lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances", "Crewson, Wendy","Ludwig, Alexander","Cosmo, James", "Warner, Amelia","Hickey, John Benjamin","Piddock, Jim", "Lockhart, Emma"] } }, { "type": "delete", "id": "tt0484575", "version": 2 }]

Note
When specifying SDF in JSON, the value for a field cannot be null. An add or delete operation is only applied to an existing document if the version number specified in the operation is greater than the existing document's version number. If a batch contains multiple add or delete operations for the same document, the operation with the highest version number is applied. (If multiple operations in a batch specify the same document and version number, the document service arbitrarily picks which one to apply.) The JSON schema representation of a batch is shown below:
{ "type": "array", "minItems": 1, "items": { "type": "object", "properties": { "type": { "type": "string",

API Version 2011-02-01 175

Amazon CloudSearch Developer Guide documents/batch JSON API

"enum": ["add", "delete"], "required": true }, "id": { "type": "string", "pattern": "[a-z0-9][a-z0-9_]{0,127}", "minLength": 1, "maxLength": 128, "required": true }, "version": { "type": "number", "minimum": 1, "maximum": 4294967295 "required": true }, "lang": { "type": "string", "minLength": 2, "maxLength": 2 }, "fields": { "type": "object", "patternProperties": { "[a-zA-Z0-9][a-zA-Z0-9_]{0,63}": { "type": "string", } } } } } }

documents/batch Request Properties (JSON)


Property type id Description The operation type, add or delete. An alphanumeric string. Allowed characters are: a-z (lower-case letters), 0-9, and _ (underscore). Document IDs cannot begin with an underscore. The max length is 128 characters. Any non-negative number less than 2^32. An ISO-639-1 two-letter language code. English (en) is currently the only supported language. Condition: Required for add operations. fields A collection of one or more field_name properties Conditional that define the fields the document contains. Condition: Required for add operations. Must contain at least one field_name property. Required Yes Yes

version lang

Yes Conditional

API Version 2011-02-01 176

Amazon CloudSearch Developer Guide documents/batch JSON API

Property field_name

Description

Required

Specifies a field within the document being added. Conditional Field names must begin with a letter and can contain the following characters: a-z (lower case), 0-9, and _ (underscore). Field names must be at least 3 and no more than 64 characters.The names "body", "docid", and "text_relevance" are reserved names and cannot be used as field names. To specify multiple values for a field, you specify an array of values instead of a single value. For example:
"genre": ["Adven ture","Drama","Fantasy","Thriller"]

Condition: At least one field must be specified in the fields object.

documents/batch Response (JSON)


The response body lists the number of adds and deletes that were performed and any errors or warnings that were generated. The JSON schema representation of a document service API response is shown below:
{ "type": "object", "properties": { "status": { "type": "text", "enum": ["success", "error"], "required": true }, "adds": { "type": "integer", "minimum": 0, "required": true }, "deletes": { "type": "integer", "minimum": 0, "required": true }, "errors": { "type": "array", "required": false, "items": { "type": "object", "properties": { "message": { "type": "string", "required": true }

API Version 2011-02-01 177

Amazon CloudSearch Developer Guide documents/batch XML API

} } }, "warnings": { "type": "array", "required": false, "items": { "type": "object", "properties": { "message": { "type": "string", "required": true } } } } } }

documents/batch Response Properties (JSON)


Property status adds deletes error warning Description The result status, which is either success or error. The number of add document operations that were performed. Always zero when the status is error. The number of delete document operations that were performed. Always zero when the status is error. Provides information about a parsing or validation error. Specified only if the status is error. Provides information about a warning generated during parsing or validation.

documents/batch XML API


XML documents/batch Requests
The body of a documents/batch request uses the Search Data Format (SDF) to specify the document operations you want to perform. For example:
<batch> <add id="tt0484562" version="1" lang="en"> <field name="title">The Seeker: The Dark Is Rising</field> <field name="director">Cunningham, David L.</field> <field name="genre">Adventure</field> <field name="genre">Drama</field> <field name="genre">Fantasy</field> <field name="genre">Thriller</field> <field name="actor">McShane, Ian</field> <field name="actor">Eccleston, Christopher</field> <field name="actor">Conroy, Frances</field> <field name="actor">Ludwig, Alexander</field>

API Version 2011-02-01 178

Amazon CloudSearch Developer Guide documents/batch XML API

<field <field <field <field <field <field </add> <delete </batch>

name="actor">Crewson, Wendy</field> name="actor">Warner, Amelia</field> name="actor">Cosmo, James</field> name="actor">Hickey, John Benjamin</field> name="actor">Piddock, Jim</field> name="actor">Lockhart, Emma</field> id="tt0301199" version="1" />

The Relax NG schema for an XML representation of a batch is shown below:


start = batch sdf.field_name = xsd:token { minLength = "1" maxLength = "64" pattern = "[a-z0-9][a-z0-9_]{0,63}" } sdf.id = xsd:token { minLength = "1" maxLength = "128" pattern = "[a-z0-9][a-z0-9_]{0,127}" } sdf.version = xsd:integer { minExclusive = "0" maxExclusive = "4294967295" } batch = element batch { (element add { attribute id { sdf.id }, attribute version { sdf.version }, attribute lang { xsd:language }, element field { attribute name { sdf.field_name }, text }+ } | element delete { attribute id { sdf.id }, attribute version { sdf.version }, empty })+ }

API Version 2011-02-01 179

Amazon CloudSearch Developer Guide documents/batch XML API

documents/batch Request Elements (XML)


Element batch Description Required

The collection of add or delete operations that you Yes want to submit to your search domain. A batch must contain at least one add or delete element. Specifies a document that you want to add to your No search domain. The id, version, and lang attributes are required and an add element must contain at least one field. Attributes: idAn alphanumeric string. Any characters other than A-Z (upper or lower case) and 0-9 are illegal. The max length is 128 characters. versionAny non-negative number less than 2^32. langAn ISO-639-1 two-letter language code. English (en) is currently the only supported language.

add

field

Specifies a field in the document being added. The Conditional name attribute and a field value are required. Field names must begin with a letter and can contain the following characters: a-z (lower case), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be used as field names. The field value can be text or CDATA. To specify multiple values for a field, you include multiple field elements with the same name. For example:
<field <field <field <field name="genre">Adventure</field> name="genre">Drama</field> name="genre">Fantasy</field> name="genre">Thriller</field>

Constraints: nameAn alphanumeric string that begins with a letter. Can contain a-z (lower case), 0-9, _ (underscore), - (hyphen), and . (period). Condition: At least one field must be specified in an add element.

API Version 2011-02-01 180

Amazon CloudSearch Developer Guide documents/batch XML API

Element delete

Description

Required

Specifies a document that you want to remove from No your search domain. The id and version attributes are required. A delete element must be empty. Constraints: idAn alphanumeric string. Any characters other than A-Z (upper or lower case) and 0-9 are illegal. versionAny number less than 2^32. The version number specified must be higher than the document's current version number for the document to be deleted.

documents/batch Response (XML)


The response body lists the number of adds and deletes that were performed and any errors or warnings that were generated. The RelaxNG schema of a document service API response is:
start = response response = element response { attribute status { "success" | "error" }, attribute adds { xsd:integer }, attribute deletes { xsd:integer }, element errors { element error { text }+ }? & element warnings { element warning { text }+ }? }

API Version 2011-02-01 181

Amazon CloudSearch Developer Guide documents/batch Status Codes

documents/batch Response Elements (XML)


Element result Description Contains elements that list the errors and warnings generated when parsing and validating the request. Attributes: statusThe result status, which is either success or error. addsThe number of added documents. If the status is error, this is always zero. deletesThe number of deleted documents. If the status is error, this is always zero. Constraints: If the status is error, the results element contains a list of errors. If the status is success, the results element can contain a list of warnings, but no errors. errors error warnings warning Contains a collection of error elements that identify the errors that occurred when parsing and validating the request. Provides information about a parsing or validation error. The value provides a description of the error. Contains a collection of warning elements that identify the warnings that were generated when parsing and validating the request. Provides information about a parsing or validation warning.The value provides a description of the error.

documents/batch Status Codes


A document service request can return three types of status codes: 5xx status codes indicate that there was an internal server error. 4xx status codes indicate that the request was malformed. 2xx status codes indicate that the request was processed successfully.

Error No Content-Type No Content-Length Incorrect Path Invalid HTTP Method Invalid Accept Type

Description The Content-Type header is missing. The Content-Length header is missing. URL path does not match ''/YYYY-MM-DD/documents/batch''. The HTTP method is not POST. Requests must be posted to documents/batch. Accept header specifies a content type other than ''application/xml'' or ''application/json''. Responses can be sent only as XML or JSON.

HTTP Status Code 400 401 404 405 406

API Version 2011-02-01 182

Amazon CloudSearch Developer Guide Common Request Headers

Error Request Too Large Invalid Character Set

Description

HTTP Status Code

The length of the request body is larger than 413 the maximum allowed value. The character set is something other than ''ASCII'', ''ISO-8859-1'', or '''UTF-8''. 415

Common Request Headers


Name Content-Type Description Required A standard MIME type describing the format of the Required object data. For more information, see W3C RFC 2616 Section 14. Default: application/json Constraints: application/json or application/xml only Content-Length Accept The length in bytes of the body in the response. Yes

A standard MIME type describing the format of the No response data. For more information, see W3C RFC 2616 Section 14. Default: the content-type of the request Constraints: application/json or application/xml only

CommonResponse Headers
Name Content-Type Description A standard MIME type describing the format of the object data. For more information, see W3C RFC 2616 Section 14. Default: application/xml Constraints: application/xml or application/json only Content-Length The length in bytes of the body in the response.

API Version 2011-02-01 183

Amazon CloudSearch Developer Guide search

Amazon CloudSearch Search API Reference


Topics search (p. 184) You use the Search API to submit search requests to your CloudSearch domain. You access the Search API through a domain-specific endpoint, http://search-domainname-domainid.us-east-1.cloudsearch.amazonaws.com. The API version must be specified in all requests. The current Amazon CloudSearch API version is 2011-02-01. The other APIs you use to interact with Amazon CloudSearch are: Amazon CloudSearch Configuration API Reference (p. 126)Set up and manage your search domain. Amazon CloudSearch Document Service API Reference (p. 174)Submit the data you want to search.

search
You use the search API to search the documents that you've uploaded to your search domain. Search requests are submitted via GET with a set of field-value pairs specified directly in the HTTP query string.The maximum size of a search request is 8190 bytes, including the HTTP method, URI, and protocol version. The response format can be either JSON or XML. (Errors are always returned in JSON.)

Note
Requests can be submitted to your search domain's search service only from authorized IP addresses. For information about authorizing IP addresses to submit search requests, see Configuring Access for an Amazon CloudSearch Domain (p. 32). Amazon CloudSearch processes search requests in two phases. First, it identifies the complete set of documents that match the terms specified with the q (query) and bq (Boolean query) Search Request Parameters (p. 186). Amazon CloudSearch then processes the match-set of search hits to: Filter the hits according to the value of the t-FIELD parameter (if specified).

API Version 2011-02-01 184

Amazon CloudSearch Developer Guide Search Requests

Rank the filtered hits using the fields specified in the rank parameter. If the rank parameter is not specified, results are ranked according to their text_relevance scores. Compute facet counts for the fields specified in the facet parameter and the constraints specified for each field (if any). Return the processed set of hits. The maximum number of hits returned is controlled by the size parameter. By default, the top ten results are returned. You can specify an offset with the start parameter to retrieve the next set of hits. For more information about searching with Amazon CloudSearch, see Searching Your Data with Amazon CloudSearch (p. 80).

Search Requests
You submit search requests to your domain's search endpoint via HTTP GET. To construct a search request, you append the Amazon CloudSearch API version and the name of the resource you are accessing, 2011-02-01/search, and a query string that specifies the terms and constraints for your search and what you want to get back in the response. The maximum size of a search request is 8190 bytes, including the HTTP method, URI, and protocol version. For example, the following request performs a simple text search of the search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com domain and gets the contents of the title field:
http://search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch. amazonaws.com/2011-02-01/search?q=star+wars&return-fields=title

Note
The API version must be specified in all search requests. When there are updates to the Search API, you access them using a new API version. The query string in a search request must be URL-encoded. You can use any method you want to send GET requests to your domain's search endpointyou can enter the request URL directly in a Web browser, use cURL to submit the request, or generate an HTTP call using your favorite HTTP library. By default, Amazon CloudSearch returns the response in JSON. You can also get the results formatted in XML by specifying the results-type parameter, results-type=xml.

Note
You can also use the Search Tester in the Amazon CloudSearch console to search your data, browse the results, and view the generated request URLs and JSON and XML responses. For more information, see Searching with the Search Tester (p. 14). Amazon CloudSearch can return up to 2 KB of data from a text fieldif the contents of the field exceed 2 KB, only the first 2 KB is included in the results. (All of the data is searchable, only the result data is truncated.)

Search Syntax
GET /2011-02-01/search

API Version 2011-02-01 185

Amazon CloudSearch Developer Guide Search Requests

Search Request Headers


Name Cache-Control HOST Description Required Forces revalidation of results when a cached result document No would otherwise be returned. The search request endpoint for the domain you're querying. Yes You can use DescribeDomains (p. 136) to retrieve your domain's search request endpoint.

Search Request Parameters


Name
bq

Description One or more match expressions (p. 189) that define a Boolean search. Multiple expressions are joined with a top-level AND. If the bq parameter is specified in conjunction with the q parameter, the values are joined with a top-level AND. Within a match expression, you can use the - (NOT), | (OR), and * (wildcard) operators to exclude particular terms, find results that match any of the specified terms, or search for a prefix. To search for a phrase rather than individual terms, you can enclose the phrase in double quotes. For more information, see Searching Your Data with Amazon CloudSearch (p. 80). Condition: Required if the q parameter is not specified. Type: String

Required Conditional

facet

A comma-separated list of the fields for which you want to compute facets. The specified fields must be numeric fields or defined as facet-enabled in the domain configuration. By default, counts are computed for all field values. If you want to specify the field values that you want counted for a particular field, use the facet-FIELD-constraints parameter instead, where FIELD is the name of the field. You can specify the maximum number of constraints to include in the results with the facet-FIELD-top-n parameter. By default, the results include counts for the top 40 constraints. Type: String

No

API Version 2011-02-01 186

Amazon CloudSearch Developer Guide Search Requests

Name

Description

Required

facet-FIELD-constraints The field values (facet constraints) that you want to count No for a particular field. FIELD is the name of the field. Constraints are specified as a comma-separated list of ranges or single-quoted strings. For example, facet-year-constraints=2000..2011 calculates facet counts for the years 2000 through 2011, inclusive. You can omit the lower end of a range to count all of the values less than or equal to the specified value. Similarly, you can omit the upper end of a range to count all of the values greater than or equal to the specified value. To specify constraints for a text field, enclose the values in single quotes. For example, facet-color-constraints='red','blue','green'. If you don't specify facet constraints, counts are computed for all field values.

Type: String
facet-FIELD-sort

How you want to sort facet values for a particular field. FIELD is the name of the field. There are four sorting options: alphaSort the facet values alphabetically (in ascending order). countSort the facet values by their counts (in descending order). maxSort the facet values according to the maximum values in the specified field. This option is specified as max(FIELD). By default, the facet values are sorted in ascending order. To sort in descending order, prefix the sort option with - (minus): -max(FIELD). sumSort the facet values according to the sum of the values in the specified field (in ascending order). This option is specified as sum(FIELD). Type: String

No

facet-FIELD-top-n

Set the maximum number of facet constraints to be included for the specified field in the search results. By default, the results include counts for the top 40 constraints. Type: Integer

No

API Version 2011-02-01 187

Amazon CloudSearch Developer Guide Search Requests

Name
q

Description

Required

The string to search for. You use the q parameter to Conditional perform simple text searches. This searches the default search field for the specified text. If the q parameter is specified in conjunction with the bq parameter, the values are joined with a top-level AND. If you separate search terms with plus (+) or a space, Amazon CloudSearch matches documents that contain all of the specified search termsthey are ANDed together. For example, q=star+wars searches the default field for star and wars. This is equivalent to specifying bq='star wars'. You can use the - (NOT), | (OR), and * (wildcard) operators to exclude particular terms, find results that match any of the specified terms, or search for a prefix. To search for a phrase rather than individual terms, you can enclose the phrase in double quotes. For more information, see Searching Your Data with Amazon CloudSearch (p. 80). Condition: Required if the bq parameter is not specified. Type: String

rank

A comma-separated list of fields or rank expressions to No use for ranking. A maximum of 10 fields and rank expressions can be specified. You can use any uint field to rank results numerically. Any result-enabled text or literal field can be used to rank results alphabetically. To rank results by relevance, you can specify the name of a custom rank expression or text_relevance. Hits are ordered according to the specified rank field(s). By default, hits are ranked in ascending order. You can prefix a field name with a minus (-) to rank in descending order. If no rank parameter is specified, it defaults to rank=-text_relevance, which lists results according to their text_relevance scores with the highest-scoring documents first. Type: String

results-type

Controls the content type of the response, json or xml. The default is json. Type: String

No

return-fields

The document fields to include in the response. Up to 2 No KB of data can be returned from a text field. If the field contents exceed 2 KB, only the first 2 KB is included in the results. Specified as a comma-separated list of field names. If no return-fields are specified, only the document ids of the hits are returned. Type: String

API Version 2011-02-01 188

Amazon CloudSearch Developer Guide Search Requests

Name
size

Description

Required

The maximum number of search hits to return. The default No is 10. Type: Positive Integer

start

The offset of the first search hit you want to return. The default is 0 (the first hit). Type: Positive Integer

No

t-FIELD

Restrict the match set used in subsequent post-processing No steps according to the specified rank expression. Only hits that have a score within the specified RANGE are included. Ranges are specified as described in Expression Syntax for Boolean Queries (p. 189). Type: RANGE

Expression Syntax for Boolean Queries


Expression Syntax
FIELD:'search string'

Description Search for a string in the specified text or literal field. For example, bq=title:'star'. Any single quotation marks or backslashes in the string must be escaped with a backslash. Search for a string in the specified text or literal field. For example, (field title 'star'). Any single quotation marks or backslashes in the string must be escaped with a backslash. You can use this alternate fielded search syntax when you're specifying multiple fielded search expressions as part of Boolean expression. For example, bq=(and (field title 'star') (filter year ..2000)). Search for an integer value in the specified uint field. For example, bq=year:2000. Matches documents that have at least one value in the field that equals the specifed value. You can specify a single value or a range of values. A pair of nonnegative integers separated by two dots matches documents that have at least one attribute in the field that falls in the specified range. You can omit one value to specify an open-ended upper or lower limit. The range is inclusive on both ends. For example, bq=year:1998..2000.

(field FIELD 'search string')

FIELD:value

API Version 2011-02-01 189

Amazon CloudSearch Developer Guide Search Response

Expression Syntax
(filter FIELD value)

Description Search for an integer in the specified uint field. For example, (filter year 2000). Matches documents that have at least one value in the field that equals the specifed value. You can use this alternate fielded search syntax when you're specifying multiple fielded search expressions as part of Boolean expression. For example, bq=(and (field title 'star') (filter year ..2000)). You can specify a single value or a range of values. A pair of nonnegative integers separated by two dots matches documents that have at least one attribute in the field that falls in the specified range. You can omit one value to specify an open-ended upper or lower limit. The range is inclusive on both ends. For example, (filter year 1998..2000). Include hits only if they match all of the specified expressions. (Boolean AND operator.) For example, bq=(and (field title 'star') (field actor 'Ford, Harrison') (filter year ..2000)). Exclude hits that match the specified expression. (Boolean NOT operator.) For example, bq=(not (and (field actor 'Guinness, Alec') (field actor 'Ford, Harrison'))). Include hits that match any of the specified expressions. (Boolean OR operator.) For example, bq=(or (field actor 'Guinness, Alec') (field actor 'Ford, Harrison') (field actor 'Jones, James Earl')).

(and expression1 expression2 expressionN)

(not expression1)

(or expression1 expression2 expressionN)

Search Response
When a request completes successfully, the response body contains the search results. By default, search results are returned in JSON. If the results-type parameter is set to XML, search results are returned in XML. When a request returns an error code, the body of the response contains information about the error that occurred. Error responses are always returned in JSON. If an error occurs while the request body is parsed and validated, the error code is set to 400 and the response body includes a list of the errors and where they occurred. The following example shows a JSON response.
{ "rank":"-text_relevance", "match-expr":"(label 'star wars')", "hits":{ "found":7, "start":0, "hit":[

API Version 2011-02-01 190

Amazon CloudSearch Developer Guide Search Response

{"id":"tt1185834", "data":{ "actor":["Abercrombie, Ian","Baker, Dee","Burton, Corey"], "title":["Star Wars: The Clone Wars"] } }, . . . {"id":"tt0121766", "data":{ "actor":["Bai, Ling","Bryant, Gene","Castle-Hughes, Keisha"], "title":["Star Wars: Episode III - Revenge of the Sith"] } } ] }, "info":{ "rid":"b7c167f6c2da6d93531b9a7b314ad030b3a74803b4b7797edb905ba5a6a08", "time-ms":2, "cpu-time-ms":0 } }

The following example shows the equivalent XML response:


<?xml version="1.0" encoding="UTF-8"?> <results xmlns="http://cloudsearch.amazonaws.com/2011-02-01/results"> <rank>-text_relevance</rank> <match-expr>(label 'star wars')</match-expr> <hits found="7" start="0"> <hit id="tt1185834"> <d name="actor">Abercrombie, Ian</d> <d name="actor">Baker, Dee</d> <d name="actor">Burton, Corey</d> <d name="title">Star Wars</d> </hit> . . . <hit id="tt0121766"> <d name="actor">Bai, Ling</d> <d name="actor">Bryant, Gene</d> <d name="actor">Castle-Hughes, Keisha</d> <d name="title">Star Wars: Episode III - Revenge of the Sith</d> </hit> </hits> <facets/> <info rid="b7c167f6c2da6d93531b9a7b314ad030a5ddfe34efbdd8959999ac792f37a1f" time-ms="2" cpu-time-ms="0" /> </results>

API Version 2011-02-01 191

Amazon CloudSearch Developer Guide Search Response

Search Response Headers


Name
Content-Type

Description A standard MIME type describing the format of the object data. For more information, see W3C RFC 2616 Section 14. Default: application/json Constraints: application/json or application/xml only

Content-Length

The length in bytes of the body in the response.

Search Response Properties (JSON)


Property
match-expr hits

Description Shows the match expression constructed from the search parameters. Contains hit statistics (found, start) and a hit array that lists the document ids and data for each hit. The total number of hits that match the search request after Amazon CloudSearch finished processing the match set. The index of the first hit returned in this response. An array that lists the document ids and data for each hit. The unique identifier for a document. A list of returned fields. Contains facet information and facet counts. A field for which facets were calculated. An array of the facet values and counts. The facet value being counted. The number of hits that contain the facet value in FacetFieldName. Contains information about the request processing. Lists the fields that were used to rank the search hits. The encrypted Resource ID. How long it took to process the search request in milliseconds. The CPU time required to process the search request in milliseconds. Contains any warning or error messages returned by the search service. The severity, source, host, code, and message are included for each one. Whether the message is a warning or error. The host from which the message originated.

found

start hit id data facets FacetFieldName constraints value count info rank rid time-ms cpu-time-ms messages

severity host

API Version 2011-02-01 192

Amazon CloudSearch Developer Guide Search Response

Property code

Description The warning or error code. The search service returns the following warnings and errors: WildcardTermLimitmore than 2000 terms matched the wildcard in the search request. The number of terms matched was limited to 2000. InvalidFieldOrRankAliasInRankParameterthe specified ranking field could not be found. UnknownFieldInMatchExpressiona field specified in the bq parameter could not be found. IncorrectFieldTypeInMatchExpressionthe type specified in the match expression does not match the field type. InvalidMatchExpressionthe match expression could not be parsed. UndefinedFieldan unknown field was specified in the match expression.

message

A description of the warning or error that was returned by the search service.

Search Response Elements (XML)


Name
results

Description If the request was successful, contains the search results. If an error occurs, the info element lists the warnings or errors that were returned by the search service. Lists the fields that were used to rank the search hits. Shows the match expression constructed from the search parameters. Contains hit statistics and a collection of hit elements. The found attribute is the total number of hits that match the search request after Amazon CloudSearch finished processing the results. The contained hit elements are ordered according to their text_relevance scores or the rank option specified in the search request. A document that matched the search request. The id attribute is the document's unique id. Contains a d (data) element for each returned field. A field returned from a hit. Hit elements contain a d (data) element for each returned field. Contains a facet element for each facet requested in the search request. Contains a constraint element for each value of a field for which a facet count was calculated. The facet-FIELD-top-n request parameter can be used to specify how many constraints to return. By default, facet counts are returned for the top 40 constraints. The facet-FIELD-constraints request parameter can be used to explicitly specify which values to count. A facet field value and the number of occurrences (count) of that value within the search hits.

rank match-expr hits

hit

facets facet

constraint

API Version 2011-02-01 193

Amazon CloudSearch Developer Guide Search Status Codes

Name
info

Description Information about the request processing. The rid attribute is the encrypted Resource ID. The time-ms attribute is how long it took to process the search request, in milliseconds. The cpu-time-ms attribute is the CPU time required to process the search request, in milliseconds. Information about a warning or error returned by the search service while processing the request. The severity attribute is either warning or error. The code attribute specifies one of the following warning or error codes: WildcardTermLimitmore than 2000 terms matched the wildcard in the search request. The number of terms matched was limited to 2000. InvalidFieldOrRankAliasInRankParameterthe specified ranking field could not be found. UnknownFieldInMatchExpressiona field specified in the bq parameter could not be found. IncorrectFieldTypeInMatchExpressionthe type specified in the match expression does not match the field type. InvalidMatchExpressionthe match expression could not be parsed. UndefinedFieldan unknown field was specified in the match expression. The host attribute specifies the id of the host from which the message originated.

message

Search Status Codes


A search request can return three types of status codes: 5xx status codes indicate that there was an internal server error. 4xx status codes indicate that the request was malformed. 2xx status codes indicate that the request was processed successfully.

Error Not Found

Description

HTTP Status Code

The request path (API version or collection name) 404 was not valid. Consult the body of the response for details and adjust the request before retrying. The HTTP method was not GET, POST, HEAD, or 405 OPTIONS. The search API does not support PUT or DELETE methods. The server did not receive a complete request within the time allowed. 408

Invalid HTTP Method

Request Timeout Length Required Request Entity Too Large

A POST request did not include a Content-Length 411 header. A POST request included a body larger than the 413 search API supports. Use multiple simpler, smaller requests in place of one large request.

API Version 2011-02-01 194

Amazon CloudSearch Developer Guide Search Status Codes

Error Internal Server Error Bandwidth Limit Exceeded

Description

HTTP Status Code

An internal problem occurred. The request can be 500 retried. The request was throttled. The request rate or resource consumption should be reduced before retrying the request. 509

API Version 2011-02-01 195

Amazon CloudSearch Developer Guide Uploading Documents

Troubleshooting Amazon CloudSearch


The following topics describe solutions to problems you might encounter when using Amazon CloudSearch. Topics Uploading Documents (p. 196) Deleting All Documents in an Amazon CloudSearch (p. 197) Document Update Latency (p. 197) Retrieving a Document's Version Number (p. 197)

Uploading Documents
If your SDF is not formatted correctly or contains invalid values, you will get errors when you attempt to upload it or use it to configure fields for your domain. Here are some common problems and their solutions: Invalid JSONif you are using JSON, the first thing to do is make sure there are no JSON syntax errors in your SDF batch. To do that, run it through a validation tool such as the JSON Validator. This will identify any fundamental issues with the data. Invalid XMLSDF batches must be well-formed XML. You are especially likely to encounter issues if your fields contain XML datathe data must be XML-encoded or enclosed in CDATA sections. To identify any problems, run your SDF batch through a validation tool such as the W3C Markup Validation Service. Not Recognized as SDFif you are configuring your domain from SDF and Amazon CloudSearch doesnt recognize your data as valid SDF, it responds with a list of generic metadata fields: content_encoding content_language content_type language resourcename For example, this can happen if there are invalid document IDs or version numbers. Make sure that your SDF data contains all of the required properties for each document.

API Version 2011-02-01 196

Amazon CloudSearch Developer Guide Deleting All Documents

Document IDs with bad valuescapital letters, hyphens, and other special characters are not allowed in document IDs. Document IDs can only contain the characters a-z (lowercase letters), 0-9, and underscore (_). Document IDs must start with a letter or number; they cannot start with an underscore. Bad version numbersversion numbers must fit within a 32-bit unsigned integer (. When specifying your SDF in JSON, make sure that the version number is not enclosed in quotes. If it is, the version is treated as a string and Amazon CloudSearch will reject the SDF as invalid. Multi-valued fields without a valuewhen specifying SDF in JSON, you cannot specify an empty array as the value of a field. Multi-valued fields must contain at least one value. Bad charactersone problem that can be difficult to detect if you do not filter your data while generating your SDF batch is that can contain characters that are invalid in XML. Both JSON and XML batches can contain only UTF-8 characters that are valid in XML. You can use a validation tool such as the JSON Validator or W3C Markup Validation Service to identify invalid characters.

Deleting All Documents in an Amazon CloudSearch


Amazon CloudSearch currently does not provide a mechanism for deleting all of the documents in a domain. However, you can clone the domain configuration to start over with an empty domain. For more information, see Cloning an Existing Domain's Indexing Options (p. 59).

Document Update Latency


Sending a large volume of single-document batches can increase the amount of time it takes each document to become searchable. If have a large amount of update traffic, you need to batch your updates. We recommend using a batch size close to the 5 MB limit.

Retrieving a Document's Version Number


If you want to be able to query the index for your documents' version numbers, create a version field and populate it with the current version each time you update a document.

API Version 2011-02-01 197

Amazon CloudSearch Developer Guide

Limits in Amazon CloudSearch


This table shows naming and size restrictions within Amazon CloudSearch. For information about increasing limits such as max partitions and instances, contact Amazon CloudSearch. The current Amazon CloudSearch limits are summarized in the following table. Item Domain name Limit Allowed characters are a-z (lower-case letters), 0-9, and hyphen (-). Domain names must start with a letter or number and be at least 3 and no more than 28 characters long. Allowed characters are a-z (lower-case letters), 0-9, and _ (underscore). Field names must begin with a letter and be at least 1 and no more than 64 characters long. The names "body", "docid", and "text_relevance" are reserved names and cannot be specified as field names. Allowed characters are a-z (lower-case letters), 0-9, and _ (underscore). Rank expression names must begin with a letter and be at least 3 and no more than 64 characters long. The names "body", "docid", and "text_relevance" are reserved names and cannot be specified as rank expression names. Source field names must be at least 1 and no more than 64 characters long. Allowed characters are: a-z (lower-case letters), 0-9, and _ (underscore). Document IDs must begin with a letter or numeral and must be at least 1 and no more than 128 characters long. The maximum document size is 1 MB. The maximum batch size is 5 MB. The maximum size of a document's version number is max(uint32_t). English (en) is currently the only supported language. Up to 200 index fields can be configured for a domain.

Field name

Rank expression name

Source field name Document ID (docid)

Document size Batch size Document version number size Document language Maximum number of index fields

API Version 2011-02-01 198

Amazon CloudSearch Developer Guide

Item Maximum number of sources for an index field Maximum number of field values

Limit Up to 20 sources can be configured for a field. Up to 100 values can be specified in a field.

Maximum size of terms in an index field Individual terms within a text or literal field are truncated if they exceed 256 characters. Default value size Uint field range The maximum size of a default value for a field is 1 KB. A uint field can contain values in the range 0 - max(uint32_t).

Maximum number of rank expressions Up to 50 rank expressions can be configured for a domain. Rank expression size The maximum size of a rank expression is 10240 bytes. The maximum value that can be returned by a rank expression is max(uint32_t). An integer value in the range 0-1000. 10 50 The maximum size of a Amazon CloudSearch policy document is 100 KB. The maximum size of a Amazon CloudSearch stemming dictionary is 500 KB. The maximum size of a Amazon CloudSearch stopwords dictionary is 10 KB. The maximum size of a Amazon CloudSearch synonym dictionary is 100 KB. The size parameter can contain values in the range 0 max(uint32_t). The start parameter can contain values in the range 0 max(uint32_t). Up to 10 uint fields and expressions can be specified in the rank parameter. The maximum size of a search request submitted as an HTTP GET request is 8190 bytes. Up to 2 KB of data can be returned from a field. If the field contents exceed 2 KB, only the first 2 KB is included in the results.

text_relevance score Maximum search partitions Maximum search instances Policy document size Stemming dictionary size Stopwords dictionary size Synonym dictionary size Search requests: size parameter Search requests: start parameter Search requests: rank parameter Search requests: GET requests Search requests: returned data

API Version 2011-02-01 199

Amazon CloudSearch Developer Guide

Amazon CloudSearch Articles and Tutorials


For additional information about using Amazon CloudSearch, see the following articles and tutorials on the AWS website. Guide to Formatting Your Data in SDF for Amazon CloudSearch Search Data Format (SDF) is the structured data format that you use to represent the data that you want to index and search with Amazon CloudSearch. This guide describes how to structure your data to support searching, describe it in SDF, and validate the SDF before uploading it to your search domain. Guide to Using Elastic IPs to Manage Access to Amazon CloudSearch Domains When creating and configuring search domains, you use your AWS credentials for authentication. To control access to a particular search domains document and search endpoints, you need to whitelist the specific IP addresses or address ranges that can submit document updates and search requests. This guide describes how to use elastic IPs to manage access to your document and search endpoints from EC2.

API Version 2011-02-01 200

Amazon CloudSearch Developer Guide

Amazon CloudSearch Glossary


This section provides a summary of Amazon CloudSearch terminology. For the complete Amazon AWS glossary, see the AWS General Reference. Amazon CloudSearch batch A fully-managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website. A collection of add and delete document operations in Search Data Format (SDF). You use the document service API to submit add and delete document operations to update the data in your search domain. The API that you use to create, configure, and manage search domains. A collection of data that you want to search. Represents an item that can be returned as a search result. Each document has a collection of fields that contain the data that can be searched or returned. The value of a field can be either a string or a number. Each document must have a unique ID, a version number, and at least one field. A unique alpha-numeric identifier for a document. This is the id attribute that's specified in an add or delete operation when using the documents service API. The API that you use to submit SDF batches to update the data in your search domain. The URL that you connect to when sending document updates to a search domain. In SDF, each document has a numeric version number that's used to guarantee that a domain always reflects the most recent document updates. Document updates are applied only if the version number specified in the add or delete operation is greater than the existing version number. Index fields that represent categories that you want to use to refine and filter search results.

configuration API corpus document

document ID (docid)

document service API document service endpoint document version

facets

API Version 2011-02-01 201

Amazon CloudSearch Developer Guide

facet constraints facet enabled hits index index field index field name indexing options

Specify the particular facet values that you want to count. An index field option that enables facet information to be calculated for the field. Documents that match the criteria specified in the search request. Also referred to as search results. See search index (p. 202). A name-value pair that is included in a search domain's index. An index field can contain text, literal, or unsigned integer data. The name of a text, literal, or uint field. Configuration settings that define a search domain's index fields, how SDF data is mapped to those index fields, and how the index fields can be used. A numeric expression that you can use to control how search hits are ranked. You can construct rank expressions using uint fields, other rank expressions, a document's default text_relevance score, and standard numeric operators and functions. When you use the rank option to specify a rank expression in a search request, the expression is evaluated for each search hit and the hits are listed according to their rank expression values. An index field option that enables the field's value(s) to be returned in the search results. Search Data Format. The API that you use to submit search requests to a domain. The format that you use to describe the data that you want to add or delete from your search domain. Search Data Format (SDF) can be represented as either JSON or XML. Encapsulates your searchable data and the search instances that handle your search requests. You set up a separate domain for each different collection of data that you want to search. A search domain's indexing options, text options, access policies, and rank expressions. A user-specified name that is used to construct a unique identifier for a domain. An index field option that enables the field data to be searched. A representation of your searchable data that facilitates fast and accurate data retrieval. A search instance is a compute resource that indexes your data and processes search requests. A search domain has one or more search instances, each with a finite amount of RAM and CPU resources. As your data volume grows, more search instances or larger search instances are deployed to contain your indexed data. When necessary, your index is automatically partitioned across multiple search instances. As your request volume or complexity increases, each

rank expression

result enabled SDF search API Search Data Format

search domain

search domain configuration search domain name search enabled search index search instances

API Version 2011-02-01 202

Amazon CloudSearch Developer Guide

search partition is automatically replicated to provide additional processing capacity. search requests search result search service endpoint source stem stemming A request that is sent to a search domain to retrieve documents that match particular search criteria. A document that matches a search request. Also referred to as a search hit. The URL that you connect to when sending search requests to a search domain. An SDF document field that is used to populate an index field. The common root or substring shared by a set of related words. The process of mapping related words to a common stem. This enables matching on variants of a word. For example, a search for "horse" could return matches for horses, horseback, and horsing, as well as horse. A domain-specific collection of mappings of words to their stems. Amazon CloudSearch does not define a default stemming dictionary. The process of filtering stop words from an index or search request. A word that is not indexed and is automatically filtered out of search requests because it is either insignificant or so common that including it would result in too many matches to be useful. Stop words are language-specific. A domain-specific collection of stopwords. Amazon CloudSearch defines a default stopword dictionary for English that you can use as-is, or customize to suit your collection of data. A word that is the same or nearly the same as an indexed word and that should produce the same results when specified in a search request. For example, a search for "Rocky Four" or "Rocky 4" should return the fifth Rocky movie. This can be done by designating that four and 4 are synonyms for IV. Synonyms are language-specific. A domain-specific collection of synonym mappings. Amazon CloudSearch does not define a default synonym dictionary. A built-in relevance score that's based on the repetition of search terms in the document and proximity of search terms to each other in each matching index field in the document. A document's text_relevance score is an integer value from 0 to 1000 (inclusive). Domain-specific stopword, stemming, and synonym dictionaries used during text processing when building a search index. Stopwords and stems are also used at search time to process the search terms before looking for matching documents in the index. Part of the text processing that Amazon CloudSearch performs when indexing and processing search requests. During indexing, the contents of each text field are split into a collection of tokens that can be indexed separately. Punctuation is stripped and each word (that isn't in the stopword list) becomes a token. For example, the string "spider-man" would be split into two tokens: spider and man. At search
API Version 2011-02-01 203

stemming dictionary stopping stopword

stopword dictionary

synonym

synonym dictionary text_relevance

text options

tokenization

Amazon CloudSearch Developer Guide

time, the search terms are tokenized using the same rules before being matched against the indexed tokens. version See document version (p. 201).

API Version 2011-02-01 204

Amazon CloudSearch Developer Guide

Document History for Amazon CloudSearch


This Document History describes the important changes to the documentation in this release of Amazon CloudSearch.

Relevant Dates to this History:


Current product version2011-02-01 Latest product release10 April 2012 Last document update9 July 2012

Change

Description

Release Date 10 April 2012

Initial product release Amazon CloudSearch is introduced as a new service in Beta release. Added how to clone a You can clone an existing search domain to get an empty domain domain that has the same indexing options. For more information, see Cloning an Existing Domain's Indexing Options (p. 59). Added Getting Started video and the Troubleshooting and Articles sections

25 April 2012

A screencast of the Getting Started tutorial is now available 9 July 2012 on YouTube. The Troubleshooting Amazon CloudSearch (p. 196) provides solutions to common SDF issues, a workaround for deleting all documents from a domain, and tips for reducing document update latency. The Articles section provides a link to the new Guide to Formatting Your Data in SDF for Amazon CloudSearch available from aws.amazon.com/articles. Reorganized Searching Your Data with Amazon 27 July 2012 CloudSearch (p. 80), added the Guide to Using Elastic IPs to Manage Access to Amazon CloudSearch Domains to Amazon CloudSearch Articles and Tutorials (p. 200), and added an item about retrieving document versions to Troubleshooting Amazon CloudSearch (p. 196).

Updated the Searching, Articles & Tutorials, and Troubleshooting sections

API Version 2011-02-01 205

You might also like