Cloudsearch DG
Cloudsearch DG
Cloudsearch DG
What Is Amazon CloudSearch? .............................................................................................................. 1 Search Data Format ............................................................................................................................... 2 Search Domain Configuration ................................................................................................................. 3 Search Requests .................................................................................................................................... 5 Getting Started ....................................................................................................................................... 6 Step 1: Before You Begin ........................................................................................................................ 6 Step 2: Create a Search Domain ............................................................................................................ 7 Step 3: Send Data for Indexing ............................................................................................................. 11 Step 4: Search Your Amazon CloudSearch Domain ............................................................................. 14 Step 5: Delete Your Amazon CloudSearch Movies Domain .................................................................. 19 Making API Requests ........................................................................................................................... 21 Endpoints .............................................................................................................................................. 21 Making Configuration Requests ............................................................................................................ 22 Request Authentication ............................................................................................................... 23 Making Document Service Requests .................................................................................................. 24 Making Search Requests ..................................................................................................................... 25 Creating a Search Domain ................................................................................................................... 27 Configuring Access for a Search Domain ............................................................................................. 32 Getting Domain Information .................................................................................................................. 37 Deleting a Domain ................................................................................................................................ 43 Preparing Your Data .............................................................................................................................. 46 Mapping Document Data to Index Fields .............................................................................................. 46 Creating SDF Batches .......................................................................................................................... 47 Document Versions ..................................................................................................................... 49 Adding and Updating Documents ................................................................................................ 49 Deleting Documents .................................................................................................................... 50 Generating SDF .......................................................................................................................... 51 Configuring Index Fields ....................................................................................................................... 53 Adding Sources for a Field ................................................................................................................... 54 Command Line Tools ............................................................................................................................ 55 AWS Management Console ................................................................................................................. 56 API ........................................................................................................................................................ 62 Configuring Text Options ....................................................................................................................... 63 Configuring Stemming .......................................................................................................................... 63 Configuring Stopwords ......................................................................................................................... 66 Configuring Synonyms .......................................................................................................................... 69 Uploading Data ..................................................................................................................................... 72 Indexing Document Data ...................................................................................................................... 77 Searching Your Data ............................................................................................................................. 80 Submitting Search Requests ................................................................................................................ 81 Searching Text Fields ............................................................................................................................ 82 Using Boolean Operators in Text Searches ................................................................................. 84 Using Wildcards in Text Searches ............................................................................................... 85 Searching for Phrases in Text Fields ........................................................................................... 86 Searching Literal Fields ........................................................................................................................ 86 Searching Uint Fields ........................................................................................................................... 87 Constructing Boolean Search Queries ................................................................................................ 88 Controlling Search Results ................................................................................................................... 89 Getting Results as XML ............................................................................................................... 89 Paginating Results ...................................................................................................................... 90 Retrieving Data from Index Fields ............................................................................................... 90 Sorting Results ............................................................................................................................ 91 Getting and Using Facet Information .................................................................................................... 91 Getting Facet Information for Text and Literal Fields ................................................................... 92 Getting Facet Information for Uint Fields ..................................................................................... 92 Getting Facet Information for Particular Values ........................................................................... 93 Sorting Facet Information ............................................................................................................ 94 Using Facet Information .............................................................................................................. 95
API Version 2011-02-01 3
Customizing Result Ranking ................................................................................................................. 98 Configuring Rank Expressions ............................................................................................................. 98 Ranking Search Results ..................................................................................................................... 102 Constraining Search Results .............................................................................................................. 102 Command Line Tool Reference .......................................................................................................... 103 Using the Command Line Tools .......................................................................................................... 103 Prerequisites ............................................................................................................................. 104 Installing the Command Line Tools ............................................................................................ 104 Running the Amazon CloudSearch Commands ........................................................................ 106 cs-configure-access-policies .............................................................................................................. 106 cs-configure-fields .............................................................................................................................. 109 cs-configure-ranking .......................................................................................................................... 111 cs-configure-text-options ................................................................................................................... 113 cs-create-domain ............................................................................................................................... 115 cs-configure-from-sdf ......................................................................................................................... 116 cs-delete-domain ............................................................................................................................... 117 cs-describe-domain ........................................................................................................................... 118 cs-index-documents ........................................................................................................................... 120 cs-post-sdf ......................................................................................................................................... 121 Experimental Tools ............................................................................................................................. 122 cs-generate-sdf ........................................................................................................................ 122 Configuration API Reference .............................................................................................................. 126 Actions ................................................................................................................................................ 126 CreateDomain ........................................................................................................................... 128 DefineIndexField ....................................................................................................................... 129 DefineRankExpression .............................................................................................................. 131 DeleteDomain ............................................................................................................................ 132 DeleteIndexField ....................................................................................................................... 133 DeleteRankExpression .............................................................................................................. 134 DescribeDefaultSearchField ...................................................................................................... 135 DescribeDomains ...................................................................................................................... 136 DescribeIndexFields .................................................................................................................. 137 DescribeRankExpressions ........................................................................................................ 138 DescribeServiceAccessPolicies ................................................................................................ 139 DescribeStemmingOptions ....................................................................................................... 140 DescribeStopwordOptions ......................................................................................................... 141 DescribeSynonymOptions ......................................................................................................... 142 IndexDocuments ....................................................................................................................... 143 UpdateDefaultSearchField ........................................................................................................ 144 UpdateServiceAccessPolicies ................................................................................................... 146 UpdateStemmingOptions .......................................................................................................... 148 UpdateStopwordOptions ........................................................................................................... 150 UpdateSynonymOptions ........................................................................................................... 152 Data Types .......................................................................................................................................... 153 AccessPoliciesStatus ................................................................................................................ 154 CreateDomainResult ................................................................................................................. 155 DefaultSearchFieldStatus .......................................................................................................... 155 DefineIndexFieldResult ............................................................................................................. 155 DefineRankExpressionResult .................................................................................................... 156 DeleteDomainResult ................................................................................................................. 156 DeleteIndexFieldResult ............................................................................................................. 156 DeleteRankExpressionResult .................................................................................................... 156 DescribeDefaultSearchFieldResult ........................................................................................... 157 DescribeDomainsResult ............................................................................................................ 157 DescribeIndexFieldsResult ........................................................................................................ 157 DescribeRankExpressionsResult .............................................................................................. 158 DescribeServiceAccessPoliciesResult ...................................................................................... 158 DescribeStemmingOptionsResult ............................................................................................. 158
API Version 2011-02-01 4
DescribeStopwordOptionsResult .............................................................................................. 158 DescribeSynonymOptionsResult ............................................................................................... 159 DomainStatus ............................................................................................................................ 159 IndexDocumentsResult ............................................................................................................. 160 IndexField .................................................................................................................................. 161 IndexFieldStatus ........................................................................................................................ 162 LiteralOptions ............................................................................................................................ 162 NamedRankExpression ............................................................................................................. 162 OptionStatus .............................................................................................................................. 164 RankExpressionStatus .............................................................................................................. 164 ServiceEndpoint ........................................................................................................................ 165 SourceAttribute .......................................................................................................................... 165 SourceData ............................................................................................................................... 166 SourceDataMap ........................................................................................................................ 166 SourceDataTrimTitle .................................................................................................................. 166 StemmingOptionsStatus ............................................................................................................ 167 StopwordOptionsStatus ............................................................................................................. 167 SynonymOptionsStatus ............................................................................................................. 168 TextOptions ............................................................................................................................... 168 UIntOptions ............................................................................................................................... 169 UpdateDefaultSearchFieldResult .............................................................................................. 169 UpdateServiceAccessPoliciesResult ......................................................................................... 169 UpdateStemmingOptionsResult ................................................................................................ 170 UpdateStopwordOptionsResult ................................................................................................. 170 UpdateSynonymOptionsResult ................................................................................................. 170 Common Query Parameters ............................................................................................................... 171 Common Errors .................................................................................................................................. 172 Document Service API Reference ...................................................................................................... 174 documents/batch ................................................................................................................................ 174 documents/batch JSON API ...................................................................................................... 175 documents/batch XML API ........................................................................................................ 178 Search API Reference ........................................................................................................................ 184 search ................................................................................................................................................. 184 Search Requests ....................................................................................................................... 185 Search Response ...................................................................................................................... 190 Search Status Codes ................................................................................................................ 194 Troubleshooting .................................................................................................................................. 196 Limits .................................................................................................................................................. 198 Articles and Tutorials ........................................................................................................................... 200 Amazon CloudSearch Glossary ......................................................................................................... 201 Document History ............................................................................................................................... 205
For a high-level overview of Amazon CloudSearch, service highlights, and pricing information, see the Amazon CloudSearch detail page. The rest of this guide describes how to use Amazon CloudSearch and provides detailed information about the APIs and command line tools. If you are new to Amazon CloudSearch, you should begin with Getting Started with Amazon CloudSearch (p. 6). For more information about working with your own data sets, see Preparing Your Data for Amazon CloudSearch (p. 46). For more information about constructing searches with the Amazon CloudSearch query language, see Searching Your Data with Amazon CloudSearch (p. 80). The following table lets you jump directly to specific task or reference topics. How Do I? Set up my first search domain Manage my search domains Relevant Sections Getting Started with Amazon CloudSearch (p. 6) Creating an Amazon CloudSearch Domain (p. 27) Configuring Access for an Amazon CloudSearch Domain (p. 32) Getting Information about an Amazon CloudSearch Domain (p. 37) Deleting an Amazon CloudSearch Domain (p. 43) Configuring Index Fields for an Amazon CloudSearch Domain (p. 53) Configuring Text Options for an Amazon CloudSearch Domain (p. 63) Customizing Result Ranking with Amazon CloudSearch (p. 98) Preparing Your Data for Amazon CloudSearch (p. 46)
Uploading Data to an Amazon CloudSearch Domain (p. 72) Indexing Document Data with Amazon CloudSearch (p. 77) Searching Your Data with Amazon CloudSearch (p. 80)
Search my domains
Install and use the Amazon Amazon CloudSearch Command Line Tool Reference (p. 103) CloudSearch command line tools Get more information about the Amazon CloudSearch APIs Making Amazon CloudSearch API Requests (p. 21) Amazon CloudSearch Configuration API Reference (p. 126) Amazon CloudSearch Document Service API Reference (p. 174) Amazon CloudSearch Search API Reference (p. 184) Limits in Amazon CloudSearch (p. 198)
Amazon CloudSearch generates a search index from your SDF data according to your domain's configuration options. As your data changes, you submit SDF updates to add, change, or delete documents from your index. Updates are applied continuously, so your changes become searchable in near real-time. For information about how to represent your data in SDF, see Preparing Your Data for Amazon CloudSearch (p. 46). To see the JSON schema for SDF, go to JSON documents/batch Requests (p. 175). To see the XML schema for SDF, go to XML documents/batch Requests (p. 178).
Indexing Options
A domain's indexing options configure the index fields that will be included in the search index. An index field represents a named field and value pair that you want to store in your index. You configure an index field for each SDF document field that will be searched, used as a facet, or returned in search results.
Index Fields
Every index field has a unique name and a source that specifies one or more SDF document fields. The sources are used to populate the index field. If no source is specified, the source defaults to the SDF document field that has the same name as the index field. An index field definition also includes meta-information such as: The index field type. Whether a literal field is searchable (Text and uint fields are always searchable.) Whether the value of a text or literal field can be returned in results/ (Uint fields are aways returnable.) Whether facet counts can be calculated for a text or literal field. (Facet counts can always be calculated for uint fields.)
Amazon CloudSearch supports three types of index fields: textcontains arbitrary alphanumeric data. For example, a text field might contain a name, description, or the entire body of a document. Text fields are always searchable and Amazon CloudSearch performs text processing on them according to the stopwords, synonyms, and stems you configure in your domain's text options. literalcontains an identifier or other data that you want to be able to match exactly. Unlike text fields, Amazon CloudSearch does not perform any text processing on literal fields. Literal fields can be used for fields that have a small set of possible values, as well as for more arbitrary values like email addresses or titles where an exact match is important. Literal fields are frequently used to enable faceted searches where you want to count the number of exact matches for a particular value. uintcontains an unsigned integer value. For example, you might use a uint field for a field that contains a quantity or numerical rating, or for a date field that contains a time_t value.
For information about how to configure index fields for Amazon CloudSearch, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
Facets
A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a facet. You can display this information along with the search results and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.) A facet can be any numeric field or a text or literal field that has faceting enabled in your domain configuration. To request facet information in your search request, you specify: One or more facets Facet constraints that specify the particular values you want to count (optional) How you want the facet values to be sorted in the results (optional) For each facet, Amazon CloudSearch calculates the number of hits that share the same value. If you specify constraints, the facet counts are calculated only for values that match the constraints. Only constraints that have matches are included in the facet results.
Note
Values from a facet-enabled text or literal field cannot be returned in the search results. Text and literal fields can be facet-enabled or result-enabled, but not both. If you want to return the value from an SDF document field as well as use the field as a facet, create two index fields that use the same SDF document field as a source and make one result-enabled, and the other facet-enabled. For information about configuring facets, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). For information about using facet information to support faceted navigation, see Getting and Using Facet Information in Amazon CloudSearch (p. 91).
Text Options
During indexing, Amazon CloudSearch performs a number of text-processing steps on text fields. First, Amazon CloudSearch tokenizes the field values, stripping punctuation and splitting the text into individual terms that are indexed separately. For example, the string "spider-man" would be split into two terms, spider and man. Text fields are then processed using the domain-specific stopword, stemming, and synonym dictionaries: Stopwords configured for the domain are excluded from the index. For example, the stopwords dictionary generally contains insignificant, frequently occurring terms such as "a", "and", and "the" that would result in a massive number of matches if they were included in the index. Related words are mapped to a common stem according to the stemming dictionary configured for the domain. For example, the stemming dictionary might map "running" and "ran" to the stem "run". Synonyms are mapped according to the synonym dictionary configured for the domain. For example, the synonym dictionary might define "colt" and "filly" as synonyms for "horse". Amazon CloudSearch defines a default stopword dictionary that you can fine-tune for your application. Stemming and synonym dictionaries are application-specific and are empty by default. For information about how to configure stopwords, stems, and synonyms for your domain, see Configuring Text Options for an Amazon CloudSearch Domain (p. 63).
Access Policies
Access to your search domain's endpoints is restricted by IP address so that only authorized hosts can submit documents and send search requests. IP address authorization is used only to control access to the document and search endpoints. All Amazon CloudSearch configuration requests must be authenticated using standard AWS authentication. Amazon CloudSearch access policies are specified using the AWS Identity and Access Management (IAM) Access Policy Language. For information about how to configure access policies for your domain, see Configuring Access for an Amazon CloudSearch Domain (p. 32).
Rank Expressions
You can customize how search results are ranked by defining your own rank expressions. Rank expressions are numeric expressions that can be used at search time to calculate a score for every document that matches the search. A rank expression uses standard numeric operators and functions and can reference uint fields, other rank expressions, a document's text_relevance score. When you submit search requests, you specify the rank expression(s) you want to use to rank or constrain the search results. A document's text_relevance score indicates how relevant a particular search hit is to the search request. To calculate the relevance score, Amazon CloudSearch takes into account how many times the search terms appear (term frequency) and how close the search terms are to each other (proximity). For information about how to configure rank expressions for your domain, see Customizing Result Ranking with Amazon CloudSearch (p. 98).
you pay only for the Amazon CloudSearch resources you use. There are no sign up fees and charges are not incurred until you create a search domain. If you already have an AWS account, you are automatically signed up for Amazon CloudSearch.
When creating a search domain, you specify a unique name for the domain. Domain names must start with a letter or number and be at least 3 and no more than 28 characters long. The allowed characters are: a-z, 0-9, and hyphen (-). Currently, all domains are created in the AWS Region us-east-1. To configure the new domain, you need to specify: The index fields you want to be able to search, use as facets, and return in search results. Access policies for the domain's document service and search service endpoints. This tutorial shows you how to create and interact with a domain using the Amazon CloudSearch console. For information about how to use the command line tools and APIs, see Creating an Amazon CloudSearch Domain (p. 27).
Important
The domain you're about to create will be live and you will incur the standard Amazon CloudSearch usage fees for the domain until you delete it. For more information about Amazon CloudSearch usage rates, go to the Amazon CloudSearch detail page.
2.
On the Welcome to Amazon CloudSearch page, click Create Your First Search Domain.
3.
On the NAME YOUR DOMAIN step, enter a name for your new domain and click Continue. Domain names must start with a letter or number and be at least 3 and no more than 28 characters. Domain names can contain the following characters: a-z (lower case), 0-9, and - (hyphen). Upper case letters and underscores are not allowed.
4.
On the CONFIGURE INDEX step, click Use a predefined configuration, select IMDB movies (demo), and click Continue. You can also automatically configure a search domain by choosing the predefined configuration for the type of data you want to index, or by uploading a sample of your data.
5.
On the REVIEW INDEX CONFIGURATION step, review the index fields that will be configured. Five fields are configured automatically for the imdb-movie data: actor, director, genre, title, and year. The actor, director, and title fields are text fields and will be searched by default if no search field is specified in a search request. The contents of those fields can also be returned in search results. The genre field is configured as a literal field and is designated as a facet so it can be used to sort and filter the results. Because it's a facet, it cannot be returned in the search resultsif you want to retrieve contents of the genre field when you search, you can configure an additional field with the same source data and make it result-enabled. (For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).) The year field is configured as a uint field. You cannot change the configuration of a uint fielduint fields are always search-enabled, facet-enabled, and result-enabled. When you are finished reviewing the indexing options, click Continue.
6.
On the SET UP ACCESS POLICIES step, click Recommended rules and click Continue. The recommended rules allow access to the search endpoint from all IP addresses, and restrict access to the document service to the IP address you specify.
Important
If you do not configure access rules for your search domain, you will only be able to interact with the domain through the Amazon CloudSearch console. By default, the document service and search service endpoints are configured to block all IP addresses. Keep in mind that if you do not have a static IP address, you must re-authorize your computer whenever your IP address changes. If your IP address is assigned dynamically, it is also likely that you're sharing that address with other computers on your network.This means that when you authorize the IP address, all computers that share it will be able to access your search domain's document service endpoint.
7.
On the CONFIRM step, review the domain configuration and click Confirm to create your domain.
8.
Once the domain has been created, click OK to exit the Create New Search Domain wizard and go to the domain's dashboard.
When you create a new domain, Amazon CloudSearch initializes resources for the domain, which can take around half an hour. During this initialization process, the status of the domain will be LOADING. You can begin uploading the data you want to search as soon as the domain status changes to PROCESSING. Once the status changes to ACTIVE, your domain will be fully-functional and available to process search requests.
Note
While you can start uploading documents through the console once the domain status reaches the PROCESSING state, you won't be able to upload data through the command line tools or document service API until the domain status is ACTIVE.
Microsoft PowerPoint (.ppt, .pptx) Microsoft Word (.doc, .docx) Text Documents (.txt) JSON Documents (.json) XML Documents (.xml) For most file types, including JSON and XML, the contents of the file are treated as a single content field. However, CSV files are handled differently. When you upload CSV files that contain a header row, each column is treated as a field, and each row is treated as a separate document. If you upload multiple types of files, any CSV files are parsed row-by-row, and any non-CSV files are treated as individual documents. The sample IMDB movies data is already formatted as SDF and contains add requests for over 5,000 popular movies. Each add request specifies a unique ID for the movie, a document version number, and fields that contain the movie data such as title and genre. This tutorial shows how to submit data through the Amazon CloudSearch console, but you can also convert and post data (p. 51) with the command line tools, and submit SDF batches through the document service API (p. 174).
Note
The Upload Documents button is available once the domain status is PROCESSING or ACTIVE. You will not be able to search uploaded documents until the domain status is ACTIVE.
4.
On the DOCUMENT SOURCE step, select Predefined data, choose IMDB movies (demo), and click Continue.
5.
On the REVIEW DOCUMENTS step, review the upload summary and click Upload Documents to send the data to your domain for indexing.
Note
If you'd like to see what the SDF data looks like, click Download the generated SDF files. For more information about SDF and preparing your own data, see Preparing Your Data for Amazon CloudSearch (p. 46).
6.
On the DOCUMENT SUMMARY step, click Finish to return to the domain dashboard.
That's it! You now have a fully functional Amazon CloudSearch domain that you can start searching. The data is automatically indexed in near real-time, so you can start searching your domain right away.
Amazon CloudSearch Developer Guide Step 4: Search Your Amazon CloudSearch Domain
4.
Select the field(s) you want to search, enter the text you want to search for, and click Go.
To view the HTTP search request that was sent to your domain's search endpoint and the JSON or XML response returned by Amazon CloudSearch, click the view raw link for the response format you want to see.
API Version 2011-02-01 14
Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser
You can copy and paste the request URL to submit the request and view the response from a Web browser. Requests can be sent via HTTP or HTTPS.
Note
Your domain's search endpoint is shown on the domain dashboard. You can also perform a search from the AWS Management Console, view the raw request and response, and copy the request URL from the Search Request field. By default, Amazon CloudSearch returns the response in JSON. You can also get the search results formatted in XML by specifying the results-type parameter, results-type=xml. (Errors are always returned in JSON.) The following image shows the results of the previous query.
Filtering Results
You can use the Boolean query option, bq, to find documents that have particular numeric attributes. You can filter based on an exact value in a field, an inequality, or a range of values, as in these examples:
Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser
bq=year:2000 matches documents with the year 2000. bq=year:2000.. matches documents with a year greater than or equal to 2000 bq=year:..2000 matches documents with a year less than or equal to 2000 bq=year:2000..2011 matches documents with a year between 2000 and 2011, inclusive. For example, the following Boolean query searches for "star", finds all of the matching movies that were released before 2000, and returns title and year of each one:
2011-02-01/search?bq=(and 'star' year:..2000)&return-fields=title,year
The response shows the number of matching documents and the requested fields for each hit.
For more information about constructing search queries, see Searching Your Data with Amazon CloudSearch (p. 80).
Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser
When you rank alphabetically, the results are sorted in ascending order by default. Any values that begin with a numeral are listed before the first A entry:
Similarly, you can specify an integer field with the rank option to sort the results numerically. By default, when you rank alphabetically or numerically, results are returned in ascending order. You can prefix the field name with a minus (-) if you want the results returned in descending order. If you specify multiple rank options, the first option is used as the primary sort field, the second option is used as the secondary sort field, and so on. For more information about ranking results, see Customizing Result Ranking with Amazon CloudSearch (p. 98)
Note
Values from a facet-enabled text or literal field cannot be returned in the search results. Text and literal fields can be facet-enabled or result-enabled, but not both. If you want to return the value from an SDF document field as well as use the field as a facet, create two index fields that use the same SDF document field as a source and make one result-enabled, and the other facet-enabled.
Amazon CloudSearch Developer Guide Submitting Search Requests from a Web Browser
/2011-02-01/search?q=star&return-fields=title&facet=genre
If you want to compute facet counts for selected values of a facet field, you can set facet constraints for the field. Facet constraints do not constrain the results themselves, only the facet counts that are returned. For example, the following request only counts the movies that are in the Sci-Fi, Fantasy, or Thriller genres:
/2011-02-01/search?q=star&return-fields=title&facet=genre&facet-genre-con straints='Sci-Fi','Fantasy','Thriller'
Amazon CloudSearch Developer Guide Step 5: Delete Your Amazon CloudSearch Movies Domain
For more information about faceted searches, see Getting and Using Facet Information in Amazon CloudSearch (p. 91).
Important
Deleting a domain deletes the index associated with the domain and takes the domain's document and search endpoints offline permanently.
4.
In the Delete Domain dialog box, select the Delete the domain option and click OK to permanently remove the domain and all of its data.
Note
It can take around 15 minutes to delete the domain and its resources. Until then, the domain status will be BEING DELETED. Wondering where to go next? What Is Amazon CloudSearch? (p. 1) has a guide to the rest of the Amazon CloudSearch developer documentation. For more information about the Amazon CloudSearch query language, see Searching Your Data with Amazon CloudSearch (p. 80). If you're ready to set up a domain with your own data, see Preparing Your Data for Amazon CloudSearch (p. 46) and Uploading Data to
Amazon CloudSearch Developer Guide Step 5: Delete Your Amazon CloudSearch Movies Domain
an Amazon CloudSearch Domain (p. 72) for information about formatting and submitting your data to Amazon CloudSearch.
http://search-movies-h2pc7ftfnsdlqh6pqqawbftrhu.us-east-1.cloudsearch.amazon aws.com/2011-02-01/search?q=star
Note
Although the GET requests are shown as URLs, the parameter values are shown unencoded to make them easier to read. Keep in mind that you must URL encode parameter values when submitting requests.
Request Authentication
Requests submitted to the Configuration API are authenticated using your AWS credentials. You must include authorization parameters and a digital signature in every request. Amazon CloudSearch supports AWS Signature Version 4. For detailed signing instructions, see Signature V4 Signing Process in the AWS General Reference. To create a signature for a request, you create a canonicalized version of the query string and compute an RFC 2104-compliant HMAC signature using a signing key derived from your AWS Secret Access key. For example, to construct a CreateDomain request, you need the following information:
Region name: us-east-1 Service name: cloudsearch API version: 2011-02-01 Date: 2012-07-12T21:41:29.094Z Access key: AKIAIOSFODNN7EXAMPLE Secret key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY Action: CreateDomain Action Parameters: DomainName=movies
The canonical query string for a CreateDomain request looks like this:
Action=CreateDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120712/us-east-1/cloudsearch/aws4 _request &X-Amz-Date=2012-07-12T21:41:29.094Z &X-Amz-SignedHeaders=host
Access Control
Access to a search domain's document service is restricted by IP address so that only authorized hosts can submit document changes. By default, your search domain will not accept document service requests from any IP addresses. You must authorize specific IP addresses or address ranges before you can submit documents through the command line tools or APIs. You can configure access policies from the Amazon CloudSearch console, using the cs-configure-access-policies command, or with the UpdateServiceAccessPolicies configuration action.
Request Headers
A documents/batch request must include the following headers: Content-Lengththe length of the request body, in bytes. Content-Typethe type of data in the request body, application/json or application/xml. Hostyour domain's document service endpoint. By default, the response to a documents/batch request is returned in JSON. You can set the Accept header to application/xml if you want an XML response.
Request Body
The body of a documents/batch request contains a JSON or XML description of the document operations you want to perform. This description conforms to the Search Data Format (SDF). For more information about SDF, see Uploading Data to an Amazon CloudSearch Domain (p. 72)
Example Request
POST /2011-02-01/documents/batch HTTP/1.1 Accept: application/json Content-Length: 1176 Content-Type: application/json Host: doc.imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazon aws.com [ { "type": "add", "id": "tt0484562", "version": 1337648735,
"lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances"] } } }, { "type": "delete", "id": "tt0434409", "version": 1337648735 } ]
Access Control
Access to a search domain's search service is restricted by IP address so that only authorized hosts can submit search requests. By default, your domain will not accept search requests from any IP addresses. You have to authorize specific IP addresses or address ranges before you can search the domain. You can do this from the Amazon CloudSearch console, through the cs-configure-access-policies command, or with the UpdateServiceAccessPolicies configuration action.
Request Headers
A search request must include the HOST header, which specifies your domain's search service endpoint. Optionally, you can also specify the following headers: Cache-Controlforces the revalidation of results when a cached result document would otherwise be returned. Originspecifies the domain that wants to use the response data, as described by the W3C Cross-Origin Resource Sharing draft. By default, the response to a search request is returned in JSON. You can set the Accept header to application/xml if you want an XML response.
Request Parameters
Search parameters are specified in the query string. For more information about constructing searches, see Searching Your Data with Amazon CloudSearch (p. 80).
Example Request
GET /2011-02-01/search?q=star+wars&return-fields=title HTTP/1.1 Host: search-imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazon aws.com
Important
By default, access to a new domain's document and search endpoints is blocked for all IP addresses. You must configure access policies for the domain to be able to submit search requests to the domain's search endpoint and upload data from the command line or through the domain's document endpoint. You can upload documents and search the domain through the Amazon CloudSearch console without configuring access policies. You can create a search domain using the cs-create-domain (p. 28) command, from the Amazon CloudSearch console (p. 28), or using the CreateDomain (p. 31) configuration action.
To create a domain
Run the cs-create-domain command and specify the name of the domain you want to create with the --domain-name option. For example, to create a domain called movies:
cs-create-domain --domain-name movies =========================================== Creating domain [movies] Domain endpoints are currently being created. Use cs-describe-domain to check for endpoints.
It can take around half an hour to create endpoints for a new domain. By default, the cs-create-domain command returns immediately. If you specify the --wait option, the cs-create-domain command returns once your domain's endpoints are active. You can use the cs-describe-domain command to view a summary of the domain's status and configuration. For more information, see Getting Information about an Amazon CloudSearch Domain (p. 37).
To create a domain
1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. At the top of the Navigation panel, click Create a New Domain. (If you are creating a domain for the first time, click Create Your First Search Domain on the Welcome page.)
3.
On the NAME YOUR DOMAIN step, enter a name for your new domain and click Continue. Domain names must start with a letter or number and be at least 3 and no more than 28 characters long. Domain names can contain the following characters: a-z (lower case), 0-9, and - (hyphen). Upper case letters, underscores (_), and other special characters are not allowed in domain names.
4.
On the CONFIGURE INDEX step, select Manual Configuration and click Continue. You can configure index fields and access policies when you first create the domain, or simply create a domain and configure it later. For more information about using the Amazon CloudSearch console to configure the domain, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53) and Configuring Access for an Amazon CloudSearch Domain (p. 32).
5.
On the REVIEW INDEX CONFIGURATION step, click Continue to configure the index fields later. For more information about configuring index fields, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
6.
On the SET UP ACCESS POLICIES step, click Continue to set up access policies later. For more information about configuring access policies, see Configuring Access for an Amazon CloudSearch Domain (p. 32).
Note
If you don't configure access policies, you will only be able to upload documents and submit search queries through the console. By default, the Document and Search endpoints are configured to block all IP addresses.
7.
On the CONFIRM step, review the domain configuration and click Confirm to create your domain.
8.
Once the domain has been created, click OK to exit the Create New Search Domain wizard and go to the domain's dashboard.
API
You use the CreateDomain (p. 128) configuration action to create new domains. For example:
https://cloudsearch.us-east-1.amazonaws.com?Action=CreateDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120328/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-28T21:54:28.711Z &X-Amz-SignedHeaders=host &X-Amz-Signature=f5f82e71838707de1f72bfc42cc021e0324e1befa5df7c39c2ac25c61b3c8dcb
Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).
Note
IP address authorization is used only to control access to the document and search APIs. The Amazon CloudSearch configuration API uses standard AWS authentication. If you don't know your computer's IP address, you can go to http://www.whatsmyip.org/ to find out what it is. Keep in mind that if you do not have a static IP address, you must re-authorize your computer whenever your IP address changes. If your IP address is assigned dynamically, it is also likely that you're sharing that address with other computers on your network. This means that when you authorize the IP address, all computers that share it will be able to access your search domain's document and search endpoints.
Note
If you have made changes to your domain that require indexing, changes to the domain's access policies will not take effect until it is re-indexed. If re-indexing is needed, it will be indicated in the response to your update access policies request and shown on the domain dashboard in the console.
You can configure your access policies using the cs-configure-access-policies (p. 33) command, from the Amazon CloudSearch console (p. 34), or by uploading an IAM policy document with the UpdateServiceAccessPolicies (p. 35) configuration action.
Note
The Action name in the policy document is always set to the wildcard character (*). There are no specific action names supported at this time. When prompted, enter y to confirm that you want to update the access policies for your domain.
Really update access policies for [movies] y/N: y Your access policy update may take a few minutes to complete and its state will change to Active when complete. To check the state, use cs-configure-access-policies --retrieve-policy --service all
2.
The --update option merges the specified policy rules with the existing policy document and uploads the revised policy document to the domain.
Note
When you use the shortcuts, your IP address is automatically detected. If it's not correct or not the address you want to authorize, you can modify it before submitting your changes. You might need to work with your IT department to determine which IP addresses to authorize.
3.
On the domain's Access Policies page, choose one of the shortcuts or enter the IP addresses you want to authorize or block. To add additional IP addresses or address ranges to the rule, click the add (+) icon in the IP Ranges column. To remove an address or range from the rule, click its delete (-) icon in the IP Ranges column. To add a new rule to the policy, click the Add a New Rule button. To remove a rule from the policy, click the remove (x) button in the Remove column.
4.
When you are done making changes to your access policy rules, click Submit. To exit without saving your changes, click Revert.
API
You use the UpdateServiceAccessPolicies (p. 146) configuration action to upload an IAM policy document that defines the access policies for your domain's document and search endpoints. For example:
https://cloudsearch.us-east-1.amazonaws.com ?AccessPolicies={"Statement": [ {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:360924696794:search/movies", "Condition": { "IpAddress": { "aws:SourceIp": ["192.0.2.0/32"] } }}, {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:360924696794:doc/movies", "Condition": { "IpAddress": { "aws:SourceIp": ["192.0.2.0/32"] } }} ] } &Action=UpdateServiceAccessPolicies &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120330/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-30T19:27:45.110Z &X-Amz-SignedHeaders=host &X-Amz-Signature=801de749ab11a669925246f3d9454eee1dbc319f3352 3a4eb35a36ec93764e7d
Note
For readability, the request is shown without URL-encoding. Keep in mind that Amazon CloudSearch configuration requests must be URL-encoded. Configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23). A policy document for Amazon CloudSearch contains a collection of statements that allow or deny access to the search and document service endpoints based on IP address. Note that the Action name is always set to the wildcard character (*). There are no specific action names supported at this time. You can retrieve your domain's current policy document with the DescribeServiceAccessPolicies (p. 139) action. Access to each endpoint is configured separately. For example:
{ "Statement":[{ "Effect":"Allow", "Action":"*", "Resource":"arn:aws:cs:us-east-1:123456789012:doc/movies", "Condition":{ "IpAddress":{ "aws:SourceIp":"192.0.2.0/24" } } }, { "Effect":"Allow", "Action":"*", "Resource":"arn:aws:cs:us-east-1:123456789012:search/movies", "Condition":{ "IpAddress":{ "aws:SourceIp":"192.0.2.0/24" } } } ] }
The Amazon Resource Name (ARN) for a domain's endpoints is of the form: arn:aws:cs:us-east-1:awsaccountid:service/domain The service can be either doc or search. The domain is the name of the domain for which you are configuring access. You can get a domain's ARNs with the DescribeDomains configuration action or the cs-describe-domains command. For more information, see Getting Information about an Amazon CloudSearch Domain (p. 37).
When a domain is first created, the domain status will indicate that the domain is currently being activated and no other information will be available. Once your domain's document and search endpoints are available, the domain status will show the endpoint addresses you use to add data and submit search requests. If you haven't submitted any data for indexing, the number of searchable documents will be zero. You can get domain information using the cs-describe-domain (p. 37) command, from the Amazon CloudSearch console (p. 38), or using the DescribeDomains (p. 41) configuration action. This section also shows how you can view your domain's access policies and text options through the console. For information about accessing them through the command line tools or API, see Configuring Access for an Amazon CloudSearch Domain (p. 32) and Configuring Text Options for an Amazon CloudSearch Domain (p. 63).
3.
To view the index fields configured for the domain, click the domain's Indexing Options link in the Navigation panel. (For more information about index fields, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).)
4.
To view the rank expressions configured for the domain, click the domain's Rank Expressions link in the Navigation panel. (For information about rank expressions, see Customizing Result Ranking with Amazon CloudSearch (p. 98).)
5.
To view the access policies configured for the domain, click the domain's Access Policies link in the Navigation panel. (For information about access policies, see Configuring Access for an Amazon CloudSearch Domain (p. 32).)
6.
To view the stopwords, synonyms, and stemmming options configured for the domain, click the domain's Text Options link in the Navigation panel. (For information about text options, see Configuring Text Options for an Amazon CloudSearch Domain (p. 63).)
API
You use the DescribeDomains (p. 136) configuration action to get information about your domains. To get information about specific domains, specify the DomainNames parameter. For example, to get information about the movies and imdb-movies domains:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DescribeDomains
&DomainNames.member.1=movies &DomainNames.member.2=imdb-movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120330/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-30T20:28:51.229Z &X-Amz-SignedHeaders=host &X-Amz-Signature=d8a4b3098bb37b73c48398db57315b272b92cbfcd 6b22ad1718c599b47466aea
If you omit the DomainNames parameter, DescribeDomains returns a summary of all your domains.
Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).
To delete a domain
1. Run the cs-delete-domain command and specify the name of the domain you want to delete. For example, to delete the movies domain:
cs-delete-domain --domain-name movies
2.
When prompted, enter y to confirm that you want to delete the domain.
Really delete [movies] (y/N): y
To delete a domain
1. 2. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain you want to delete.
3.
4.
In the Delete Domain dialog box, enable the checkbox and click OK to confirm that you want to delete the domain.
API
You use the DeleteDomain (p. 132) configuration action to remove a domain and all of its resources. The domain you want to delete is specified in the DomainName parameter. For example, to delete the movies domain:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DeleteDomain &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120330/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-03-30T20:39:24.716Z &X-Amz-SignedHeaders=host &X-Amz-Signature=d17fcb306c5466cba0264d911889b4408082767a68afc8635a613d9f c6196a9f
Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).
You can map alphanumeric data to either text fields or literal fields. A document field can contain up to 1 MB of data. A uint field contains a 32-bit unsigned integer. If you're mapping timestamps to a uint field, you have to strip off the milliseconds or the timestamp will overflow the uint field. Uint fields are always searchable and can always be returned in search results and used as facets. A text field contains arbitrary alphanumeric data such as a name, description, or even the entire body of a document. Text fields are always searchable. They are tokenized during indexing and Amazon CloudSearch performs additional text processing on them according to the stopwords, synonyms, and stems you configure in your domain's text options. The contents of a text field can also be returned in search results or the field can be used as a facet, but not both. Amazon CloudSearch can return up to 2 KB of data from a text fieldif the field contents exceed 2 KB, only the first 2 KB is included in the results. If a search request does not specify what field to search, by default Amazon CloudSearch searches all text fields. You can control what fields are searched by default by defining your own default search field. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). A literal field contains an identifier or other data that you want to be able to match exactly. Unlike text fields, they are not tokenizedAmazon CloudSearch does not perform any text processing on literal fields. Literal fields can be used for fields that have a small set of possible values, as well as for more arbitrary values like email addresses or brand names where an exact match is important. Literal fields are frequently used to enable faceted searches where you want to count the number of exact matches for a particular value.
[ { "type": "add", "id": "tt0484562", "version": 1, "lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances", "Crewson, Wendy","Ludwig, Alexander","Cosmo, James", "Warner, Amelia","Hickey, John Benjamin","Piddock, Jim", "Lockhart, Emma"] } }, { "type": "delete", "id": "tt0484575", "version": 2 } ]
Uploading SDF batches that contain invalid JSON or XML will produce unpredictable results. Processing stops when an error is encountered, but the preceding add and delete operations are applied to the domain.You can verify the validity of your JSON or XML data using tools such as xmllint and jsonlint. Both JSON and XML batches can only contain UTF-8 characters that are valid in XML. Valid characters are the control characters tab (0009), carriage return (000D), and line feed (000A), and the legal characters of Unicode and ISO/IEC 10646. FFFE, FFFF, and the surrogate blocks D800DBFF and DC00DFFF are invalid and will cause errors. (For more information, see Extensible Markup Language (XML) 1.0 (Fifth Edition).) You can use the following regular expression to match invalid characters so you can remove them: /[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]/ . When formatting SDF in JSON, quotes (") and backslashes (\) within field values must be escaped with a backslash. For example:
"title":"Where the Wild Things Are" "isbn":"0-06-025492-0" "image":images\\covers\\Where_The_Wild_Things_Are_(book)_cover.jpg comment":"Sendak's \"Where the Wild Things Are\" is a children's classic."
When formatting SDF in XML, ampersands (&) and less-than symbols (<) within field values need to be represented with the corresponding entity references (& and <). For example:
<field name="title">Little Cow & the Turtle</field> <field name="isbn">0-84466-4774</field> <field name="image">images\covers\Little_Cow_&_the_Turtle_(book)_cov er.jpg</field> <field name="comment"><insert comment></field>
If you have large blocks of user-generated content, you might want to wrap the entire field in a CDATA section, rather than replacing every occurrence with the entity reference. For example:
The command line tools and Amazon CloudSearch console include an experimental mechanism for automatically generating SDF from a variety of source documents.
Note
You must specify a new, larger version number every time you add or update a document. For more information, see Document Versions in Amazon CloudSearch (p. 49).
[ { "type": "add", "id": "tt0484562", "version": 1, "lang": "en", "fields": { "title": "The Seeker: The Dark Is Rising", "director": "Cunningham, David L.", "genre": ["Adventure","Drama","Fantasy","Thriller"], "actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances", "Crewson, Wendy","Ludwig, Alexander","Cosmo, James", "Warner, Amelia","Hickey, John Benjamin","Piddock, Jim", "Lockhart, Emma"] } } ]
2.
Send the SDF data to your domain. You can submit data updates through the Amazon CloudSearch console, using the cs-post-sdf command, or by posting a request directly to the domain's document service endpoint. For more information, see Uploading Data to an Amazon CloudSearch Domain (p. 72).
Note
When posting SDF updates to delete documents, you have to specify each document that you want to delete. If you want to start over with an empty domain that has the same configuration, you can use the console to clone the domain. For more information, see Cloning an Existing Domain's Indexing Options (p. 59).
2.
Send the SDF data to your domain. You can submit data updates through the Amazon CloudSearch console, using the cs-post-sdf command, or by posting a request directly to the domain's document service endpoint. For more information, see Uploading Data to an Amazon CloudSearch Domain (p. 72).
API Version 2011-02-01 50
Note
Currently, only CSV files are parsed to automatically extract custom field data and generate multiple documents. When processing XML and JSON files, each file is treated as a single document and the contents of the file are used to populate a single text field.
To generate SDF
Run the cs-generate-sdf command. You must specify the --source option and either the --output option or the --domain option. If you are updating documents, you can use the --modified-after option to restrict processing to files or Amazon S3 objects modified after a particular time.You can also specify other options to control how the source data is parsed. For more information, see cs-generate-sdf (p. 122).
cs-generate-sdf --source c:\myAmazingDataSet\* --modified-after 2012-03-28T00:00 --output c:\myAmazingDataSet\SDF
Note
If you are processing multiple files, CSV files are processed as one document per row, and non-CSV files are processed as one document per file.
Note
By default, if no search field is specified in a search request, Amazon CloudSearch searches all text fields configured for the domain.You can change this behavior by specifying a default search field for the domain using the UpdateDefaultSearchField (p. 144) configuration action. Amazon CloudSearch supports three types of index fields: texta text field contains arbitrary alphanumeric data. A text field is always searchable. The value of a text field can either be returned in search results or the field can be used as a facet. By default, text fields are neither result-enabled or facet-enabled. literala literal field contains an identifier or other data that you want to be able to match exactly. The value of a literal field can be returned in search results or the field can be used as a facet, but not both. By default, literal fields are not search-enabled, result-enabled, or facet-enabled. uinta uint field contains an unsigned integer value. Uint fields are always searchable, the value of a uint field can always be returned in results, and faceting is always enabled. Uint fields can also be used in rank expressions.
Note
If your document data contains a text or literal field whose value you want to be able to return in results and also use as a facet, you can use the document field as a source for two different index fields and make one returnable, and enable faceting for the other. When configuring index fields, you can specify: Whether literal fields can be searched Whether facets can be calculated for text or literal fields to enable filtering Whether the contents of a text or literal field can be returned in the search results A default value for the field Up to 20 data sources for the field
Note
Making text and literal fields result-enabled increases the size of your index, which can increase the cost of running your domain. When possible, it's best to retrieve large amounts of data from an external source, rather than embedding it in your index. Since it can take some time to apply document updates across the domain, critical data such as pricing information should be retrieved from an external source using the returned document IDs instead of returned from the index. Field names must begin with a letter and be at least 3 and no more than 64 characters long. The allowed characters are: a-z (lower-case letters), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be specified as field names. Adding Sources for an Amazon CloudSearch Index Field (p. 54) describes how document fields are used to populate your index fields.You can configure fields using the cs-configure-from-sdf or cs-configure-fields command (p. 55), through the Amazon CloudSearch console (p. 56), or using the DefineIndexField (p. 62) configuration action.
title "The Catcher in the Rye", the trimmed version stored in the index field would be "Catcher in the Rye". Trim Title is often used to populate an index field you can use for sorting. Maptakes a key found in the source field and maps it to a value you want to store in the index field. For example, you might map the keys red and yellow to the value warm, and blue and green to the value cool.
2.
When prompted, enter y to confirm that you want to configure your domain with the specified fields. (You can easily modify the configuration later through the console or using the cs-configure-fields command.)
Configure [imdb-movies] with analyzed fields y/N: y
Note
When you add or reconfigure index fields, you must rebuild your index for the changes to be visible in search results. For more information, see Indexing Document Data with Amazon CloudSearch (p. 77).
By default, the source for the index field is the source field name of the same name. You can specify up to 20 --source options to configure sources for the index field. The values from all of the specified sources are concatenated and copied to the index field.
Note
When you add fields or reconfigure existing fields, you need to explicitly issue a request to re-index your data when you are done making configuration changes. For more information, see Indexing Document Data with Amazon CloudSearch (p. 77).
4.
Specify a unique name for the field and select the field type: text, literal, or uint. Field names must begin with a letter and be at least 3 and no more than 64 characters long. The allowed characters are: a-z (lower-case letters), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be used as custom field names.
5. 6. 7. 8. 9.
To make a literal field searchable, enable the Search checkbox. To use a text or literal field as a facet to enable filtering, enable the Facet checkbox. (Note that the field can be facet-enabled, or result-enabled, but not both.) To allow a text or literal field value to be returned in search results, enable the Result checkbox. (Note that the field can be facet-enabled, or result-enabled, but not both.) Specify a default value for the field (optional). This value is used when no value is specified for the field in the document data. To add a source for the field: a. b. Click the add link in the Source column. In the Add Source dialog box, enter the name of the source field you want to use as a source for the specified index field.
c. d.
e.
Enter a default value to use for the index field if the specified source field doesn't exist in the document data (optional). Select a Transform Type to specify how the index field should be populated: Copy, Trim Title, Map. For more information about using source fields, see Adding Sources for an Amazon CloudSearch Index Field (p. 54). If you select the Map transform type, enter one or more key-value pairs to specify how you want to map the source data to the index field.
f.
10. To configure additional fields, click Add Index Field and repeat these configuration steps. 11. When you are done configuring fields, click Submit to save your changes. To restore the previous field configurations, click Revert.
3.
On the NAME YOUR DOMAIN step, enter a name for the new domain and click Continue.
4.
On the CONFIGURE INDEX step, select Copy the configuration from another search domain, choose the domain you want to copy, and click Continue.
Note
This will only copy the domain's indexing options, access policies and text options are not copied from the specified domain.
5.
On the REVIEW INDEX CONFIGURATION step, make any ajustments you want and click Continue. For more information about configuring index fields, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
6.
On the SET UP ACCESS POLICIES step, click Continue to select access policies for the new domain. Access policies are not automatically copied from the cloned domain. For more information about configuring access policies, see Configuring Access for an Amazon CloudSearch Domain (p. 32).
7.
On the CONFIRM step, review the domain configuration and click Confirm to create your domain.
8.
Once the domain has been created, click OK to exit the Create New Search Domain wizard and go to the domain's dashboard.
API
You use the DefineIndexField (p. 129) configuration action to add field definitions to your domain configuration. If the specified field already exists, DefineIndexField replaces it. The type-specific options enable you to define a default value for a field, and enable or disable specific features for text and literal fields: FacetEnabledcontrols whether facets can be calculated for this field. Calculating facets determines how many documents contain matching values for the field. Facet counts are not automatically returned for facet-enabled fields; they must be explicitly requested at search time. (Uint fields are always facet-enabled.) ResultEnabledcontrols whether the contents of a text or literal field can be returned. (Uint fields are always returnable.) SearchEnabledcontrols whether a literal field is searchable. (Text and uint fields are always searchable.) For example, to create a uint index field called year and populate it with the data from the yearreleased field in the SDF data:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DefineIndexField &DomainName=movies &IndexField.IndexFieldName=year &IndexField.IndexFieldType=uint &IndexField.SourceAttributes.member.1.SourceDataCopy.SourceName=yearreleased &IndexField.SourceAttributes.member.1.SourceDataFunction=Copy &IndexField.UIntOptions.DefaultValue=0 &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120401/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-01T17:00:07.803Z &X-Amz-SignedHeaders=host &X-Amz-Signature=b291a01dd69a49e04f4a84862b38e0758c53cf93b76dd452cc802886b20 724bc
Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).
Stems are specified as a collection of term and stem pairs. When you configure stemming options, the existing stemming dictionary is replaced with the mappings you specify. By default, Amazon CloudSearch does not define any stems. However, some basic algorithmic stemming is always performed, such as removing plural suffixes. (This is done whether or not you specify a custom stemming dictionary.) The maximum size of a stemming dictionary is 500 KB. You can configure stems using the cs-configure-text-options (p. 64) command, from the Amazon CloudSearch console (p. 64), or using the UpdateStemmingOptions (p. 65) configuration action.
3.
If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain
5. 6.
Click Submit to save your changes. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.
API
Use the UpdateStemmingOptions (p. 148) configuration action to upload a JSON-formatted stemming dictionary to your domain. A stemming dictionary has a single JSON object with one property, stems. The value of the stems property is an object that contains a collection of string: value pairs that map terms to their stems:
{"stems": {"term1": "stem1", "term2": "stem2", "term3": "stem3"}}
For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=UpdateStemmingOptions &DomainName=movies &Stems={"stems": {"mice": "mouse", "people": "person", "running": "run"} } &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T21:43:50.884Z
To configure stopwords
1. Create a text file that contains your stopword dictionary. In the file, specify one stopword per line. For example: the or and
2.
Run the cs-configure-text-options command with the --stopwords option to upload the stopword dictionary to your domain.
cs-configure-text-options -d mydomain --stopwords stopwords.txt Updating stop words options Sent 3 stop words.
3.
If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain
To configure stopwords
1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain, and then click the domain's Text Options link. On the Stopwords tab, for each word you want to add to the stopword dictionary, enter it in the Add a Stopword field and click Add. You can also edit the list directly or copy and paste the list into a text editor to make changes.
4. 5.
Click Submit to save your changes. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.
API
Use the UpdateStopwordOptions (p. 150) configuration action to upload a JSON-formatted stopword dictionary to your domain. A stopword dictionary has a single JSON object with one property, stopwords. The value of the stopwords property is an object that contains an array of strings:
{"stopwords": ["string1", "string2", "string3"]}
For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=UpdateStopwordOptions &DomainName=movies &Stopwords={"stopwords": ["a", "an", "the", "of"]} &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T21:47:23.216Z &X-Amz-SignedHeaders=host &X-Amz-Signature=47bc42cfb11561dffade2779bac6af7f6b53d76d2a729f9e62b2e528 d5eac319
To configure synonyms
1. Create a text file that contains your synonym dictionary. Each line in the file should specify a term followed by a comma-separated list of its synonyms. For example: cat, feline, kitten dog, canine, puppy horse, equine, colt 2. Run the cs-configure-text-options command with the --synonyms option to upload the synonym dictionary to your domain.
cs-configure-text-options -d mydomain --synonyms synonyms.txt
3.
If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain
To configure synonyms
1. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home.
API Version 2011-02-01 69
2. 3. 4.
In the Navigation panel, click the name of the domain, and then click the domain's Text Options link. In the Text Options panel, click the Synonyms tab. For each term and synonym list that you want to add to your synonyms dictionary, enter the term in the Add a Term field and the comma-separated list of synonyms in the Synonyms field. You can also edit the list directly or copy and paste the list into a text editor to make changes.
5. 6.
Click Submit to save your changes. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.
API
Call UpdateSynonymOptions (p. 152) to upload a JSON-formatted synonym dictionary to your domain. A synonym dictionary has a single JSON object with one property, synonyms. The value of the synonyms property is a collection of string: value pairs that map each term to one or more synonyms. To map a term to multiple synonyms, specify the synonyms as an array of strings:
{"synonyms": { "term1": ["synonym1", "synonym2"], "term2": ["synonym1"], "term2": ["synonym1", "synonym2", "synonym3"] } }
For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=UpdateSynonymOptions &DomainName=movies &Synonyms={"synonyms": { "cat": ["feline", "kitten"], "dog": ["canine","puppy"]}} &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T21:56:03.214Z &X-Amz-SignedHeaders=host &X-Amz-Signature=cf9a49e63e25b0131da11c9dccb4c648ff243c01ea4282d14d88e2c6 ec414523
Important
To successfully upload SDF data to your domain, it has to be valid JSON or XML and conform to the SDF data conventions. For information about creating SDF batches, see Preparing Your Data for Amazon CloudSearch (p. 46). For information about configuring index fields for a domain, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). You can submit SDF data to a domain using the cs-post-sdf (p. 72) command, from the Amazon CloudSearch console (p. 73), or by posting it directly (p. 76) to the domain's Document endpoint.
CSV files are parsed row-by-row and a separate document is generated for each row. All other types of files are treated as a single document. For more information about automatically generating SDF, see Preparing Your Data for Amazon CloudSearch (p. 46). You can also upload SDF batches through the Amazon CloudSearch console.
4.
Select the location of the data you want to upload to your domain: File(s) on my local disk Object(s) from Amazon S3 Predefined data
Note
If you upload data in a format other than SDF, it will automatically be converted to SDF during the upload process.
5.
If you are uploading local files, click Browse to choose the file(s) to upload:
6.
If you are uploading objects from Amazon S3, select the bucket you want to upload from. To upload the entire contents of the bucket, leave the Prefix field empty and click Add. To upload selected objects, enter a filter in the Prefix field and click Add. (You can add multiple prefixes.)
7.
If are uploading predefined sample data, choose the data set that you want to use:
8.
Once you've selected the data you want to upload, click Continue.
9.
On the Review Documents step, review the documents to be uploaded and click Upload Documents to continue.
10. On the Document Summary step, if SDF batches have been automatically generated from your data, you can click Download the generated SDF files to get them. Click Finish to return to the domain dashboard.
API
You use the documents/batch (p. 174) document service API to post SDF data to your domain to add, update, or remove documents. For example:
curl -X POST --upload-file data1.sdf doc.movies-123456789012.us-east-1.cloud search.amazonaws.com/2011-02-01/documents/batch --header "Content-Type:applica tion/json"
Note
Depending on the volume of data, building a full index can take a considerable amount of compute power. Amazon CloudSearch automatically manages the resources needed to build the index in a timely fashion. Most data updates and simple domain configuration changes are built and deployed in minutes. Indexing large volumes of data and applying configuration changes that require rebuilding the full index will take longer to complete. You can initiate indexing using the the cs-index-documents (p. 77) command, from the Amazon CloudSearch console (p. 78), or using the IndexDocuments (p. 79) configuration action.
cs-index-documents --domain-name movies =========================================== Indexing documents for domain [movies] Now indexing fields: =========================================== actor director genre title year ===========================================
To run indexing
1. 2. 3. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home. In the Navigation panel, click the name of the domain that needs indexing. On the Domain Dashboard, click the Run Indexing button.
4.
Click OK in the Starting Indexing dialog box to return to the domain dashboard.
API
You use the IndexDocuments (p. 143) configuration action to initiate indexing. For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=IndexDocuments &DomainName=movies &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-02T22:41:07.764Z &X-Amz-SignedHeaders=host &X-Amz-Signature=cf2f7663cc7c80901474f889ab9b1b8e65deea5be1e2c527319bc8e1 6859d7a4
Note
Amazon CloudSearch configuration requests are authenticated using your AWS credentials. For more information about signing requests, see Request Authentication (p. 23).
You can specify additional constraints to search specific fields and use Boolean logic to construct more complex queries. When you search text fields, Amazon CloudSearch finds all documents that contain the search terms anywhere within the specified field. If the field being searched is a literal field, the field contents must exactly match the search string to be returned in results. You can search uint fields for a particular value or a range of values. In your search requests, you can also specify how you want Amazon CloudSearch to rank and return the search hits. Instead of using the default text_relevance scores, you can rank hits alphabetically, numerically, or according to your own custom rank expressions.
API Version 2011-02-01 80
When you submit a search request, Amazon CloudSearch returns a response that specifies how the results were ranked, the match expression that was derived from your query constraints, and a collection of hits that represents the documents that match the query constraints. For example:
{ "rank":"-text_relevance", "match-expr":"(label 'star wars')", "hits":{ "found":7, "start":0, "hit":[ {"id":"tt1185834"}, {"id":"tt0076759"}, {"id":"tt0121765"}, {"id":"tt0080684"}, {"id":"tt0086190"}, {"id":"tt0120915"}, {"id":"tt0121766"}] }, "info":{ "rid":"b7c167f6c2da6d93ecb53d18230cbc27146c9356f9c643ec9dec53e707b9af87f27b24b2f4b636a9", "time-ms":4, "cpu-time-ms":0 } }
foundspecifies the total number of documents that matched the query. startspecifies the offset of the first hit included in the response. idspecifies the unique document ID of an individual hit. By default, a search response contains the IDs of the first 10 ranked hits. You can retrieve additional information for each hit by specifying which result enabled fields should be included in the response. You can also control how many hits are returned at a time. When you want to page through a large set of matching documents, you can specify the offset of the first hit that you want to retrieve. For more information, see Controlling How Search Results are Returned in Amazon CloudSearch (p. 89).
http://search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch. amazonaws.com/2011-02-01/search?q=star+wars&return-fields=title
Note
The API version must be specified in all search requests. When there are updates to the Search API, you access them using a new API version. The query string in a search request must be URL-encoded. You can use any method you want to send GET requests to your domain's search endpointyou can enter the request URL directly in a Web browser, use cURL to submit the request, or generate an HTTP call using your favorite HTTP library. By default, Amazon CloudSearch returns the response in JSON. You can also get the results formatted in XML by specifying the results-type parameter, results-type=xml.
Note
You can also use the Search Tester in the Amazon CloudSearch console to search your data, browse the results, and view the generated request URLs and JSON and XML responses. For more information, see Searching with the Search Tester (p. 14).
Amazon CloudSearch Developer Guide Searching Text Fields with the Boolean Query Parameter
By default, documents must contain all of the terms you specify to be considered a match. Unlike literal fields, the terms can occur anywhere within the text field, in any order. You can prefix a term with the (NOT) operator to exclude all results that include that term. Similarly, you can separate terms with the | (OR) operator if you want to match documents that contain any of the specified terms. For more information, see Using Boolean Operators in Amazon CloudSearch Text Searches (p. 84). To search for a phrase rather than individual terms, enclose the phrase in double quotes. For more information, see Searching for Phrases in Text Fields in Amazon CloudSearch (p. 86). For example, to search the default search field for star wars, specify q=star+wars in the query string:
https://search-domainname-domainid.us-east-1.cloudsearch.amazonaws.com/2011-0201/search?q=star+wars
Searching Text Fields with the Boolean Query Parameter in Amazon CloudSearch
The Boolean query parameter, bq, provides a rich expression language for fine-grained control over document matching. You can search within particular fields and combine expressions with the and, or, and not prefix operators. In addition to searching text fields, you can use the bq parameter to search literal and uint fields. If you don't specify any fields when using the bq parameter, the default search field is used, just like with the q parameter. For example, the following queries produce the same results:
search?bq='star' search?q=star
When constructing queries with bq, you must enclose the search terms within single quotes. By default, documents must contain all of the terms you specify to be considered a match. When you search text fields, the terms can occur anywhere within the text field, in any order. You can prefix a term with the - (NOT) operator to exclude all results that include that word. Similarly, you can separate terms with the | (OR) operator if you want to match documents that contain any of the specified terms. For more information, see Using Boolean Operators in Amazon CloudSearch Text Searches (p. 84). To search for a phrase rather than individual terms, enclose the phrase in double quotes. For more information, see Searching for Phrases in Text Fields in Amazon CloudSearch (p. 86). To search a particular text field, prefix the search terms with the name of the field you want to search, followed by a colon. For example:
search?bq=title:'star'
This searches the title field of each document and matches all documents whose titles contain the term star. In addition to searching text fields, the bq parameter can be used to search specific literal (p. 86) and uint (p. 87) fields. To combine matches against multiple fields, you can use the Boolean operators and, or, and not. For more information about constructing Boolean queries, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).
Note
You can only search literal fields that are search-enabled in your domain's configuration. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
Note
To retrieve all of the documents in your domain, you can prefix a term that you know doesn't exist in your domain's data with the NOT operator, for example -1234567. However, keep in mind that this is a resource-intensive operation if you have a large dataset and might be subject to timeouts. For example, when searching the sample movie data: search?q=star|wars matches movies that contain either star or wars in the default search field.
API Version 2011-02-01 84
search?bq=title:'story funny|underdog' matches movies that contain both the terms story and funny or the term underdog in the title field. search?bq=title:'red|white|blue' matches movies that contain either red, white, or blue in the title field. search?bq=actor:'"evans, chris"|"Garity, Troy"' matches movies that contain either the phrase evans, chris or the phrase Garity, Troy in the actor field. search?bq='title:-star+war|world' matches movies whose titles do not contain star, but do contain either war or world. You can also use the Boolean operators when constructing queries using the full Boolean query syntax. For example, search?bq=(and director:'Lucas|Spielberg' (not actor:'"Ford, Harrison"')) matches movies that either Lucas or Speilberg directed, but did not star Harrison Ford. For more information about the Boolean query syntax, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).
Note
When performing wildcard searches on text fields, keep in mind that Amazon CloudSearch tokenizes the text fields during indexing and peforms basic text processing such as removing the trailing s from plural terms. Normally, the same text processing is performed on the search query. However, when you use the wildcard operator, no text processing is performed on the prefix. This means that a search for a prefix that ends in "s" won't match the singular version of the term. This can happen for any term that ends in s, not just plurals. For example, if you search the actor field in the sample movie data for "Gillanders", there are three matching movies. If you search for "Gillander*", you get the same three movies. However, if you search for "Gillanders*" there are no matches. This is because the term is stored in the index as "Gillander", "Gillanders" does not appear in the index. For example, the following Boolean query searches the title field for the prefix star:
search?bq=title:'star*'&return-fields=title
If you perform this search against the sample movie data, the response will contain movies such as Stargate, Dark Star, and Starsky & Hutch:
{"rank":"-text_relevance", "match-expr":"(label title:'star*')", "hits":{"found":34,"start":0, "hit":[ {"id":"tt1408101","data":{"title":["Untitled Star Trek Sequel"]}}, {"id":"tt0111282","data":{"title":["Stargate"]}}, {"id":"tt0335438","data":{"title":["Starsky & Hutch"]}}, {"id":"tt0477095","data":{"title":["Starter for 10"]}},
{"id":"tt1185834","data":{"title":["Star Wars: The Clone Wars"]}}, {"id":"tt0069945","data":{"title":["Dark Star"]}}, {"id":"tt0088172","data":{"title":["Starman"]}}, {"id":"tt0844760","data":{"title":["Starship Troopers 3: Marauder"]}}, {"id":"tt0092007","data":{"title":["Star Trek IV: The Voyage Home"]}}, {"id":"tt0098382","data":{"title":["Star Trek V: The Final Frontier"]}} ] }, "info":{ "rid":"8a0620f6c72ff3e73c2a10e59f186fa89ba1fa67e3b160548fb2c7aa91bce7aeb dc0b87198cf138a", "time-ms":3, "cpu-time-ms":0 }}
If you perform this search against the sample movie data, you'll notice that the results for the phrase search contain one less hit than a simple search for the terms with love:
{"rank":"-text_relevance", "match-expr":"(label '\"with love\"')", "hits":{ "found":4, "start":0, "hit":[ {"id":"tt0062376"}, {"id":"tt0309530"}, {"id":"tt1179034"}, {"id":"tt0057076"} ] }, "info":{"rid":"7508c2e52f5c3c25eca625c994c1351ed8fed385d15bffaf9dd32aae31644e 939b8656dcd8c96d09","time-ms":2,"cpu-time-ms":0} }
Amazon CloudSearch Developer Guide Searching Literal Fields with the Boolean Query Parameter
Literal fields are often used in conjunction with faceting to enable users to drill down into the results according to the faceted attributes. For more information about faceting, see Getting and Using Facet Information in Amazon CloudSearch (p. 91).
Searching Literal Fields with the Boolean Query Parameter in Amazon CloudSearch
To search literal fields, you must use the Boolean Query parameter, bq. To search a literal field, prefix the search string with the name of the literal field you want to search, followed by a colon. The search string must be enclosed in single quotes. For example:
search?bq=genre:'sci-fi'
This searches the genre field of each document and matches all documents whose genre field contains the value sci-fi. To be a match, the field value must be an exact match for the search string. For example, documents that contain the value young adult sci-fi in the genre field will not be included in the search results when you search for "sci-fi". In addition to searching literal fields, the bq parameter can be used to search specific text (p. 82) and uint (p. 87) fields. To combine matches against multiple fields, you can use the Boolean operators and, or, and not. For more information about constructing Boolean queries, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).
Note
You can only search literal fields that are search-enabled in your domain's configuration. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
Searching Uint Fields with the Boolean Query Parameter in Amazon CloudSearch
To search uint fields, you must use the Boolean Query parameter, bq. To search a uint field, prefix the value or range of values you want to search with the name of the uint field, followed by a colon. The integer value or range is not enclosed in single quotes. In addition to searching uint fields, you can use the bq parameter to search specific text (p. 82) and literal (p. 86) fields. To combine matches against multiple fields, you can use the Boolean operators and,
or, and not. For more information about constructing Boolean queries, see Constructing Boolean Search Queries in Amazon CloudSearch (p. 88).
Ranges can be open ended. For example, you could specify year:2002.. to find all matching movies released from 2002 onward, or ..1970 to find all the movies released through 1970:
search?bq=year:2002.. search?bq=year:..1970
You can use and, or, and not at the field level, and still use the - and | operators within the match expressions. For example, the following queries produce the same results:
search?bq=(or title:'star' title:'-wars') search?bq=(or title:'star' (not title:'wars'))
For more information about using Boolean operators in match expressions, see Using Boolean Operators in Amazon CloudSearch Text Searches (p. 84). You can construct Boolean search queries to combine searches against multiple fields. For example:
Note
If you don't get the results you expect from a search request, check the match-expr in the response to see how Amazon CloudSearch parsed the match expression specified in the bq parameter.
Search responses formatted in XML contain exactly the same information as a JSON response:
<results> <rank>-text_relevance</rank> <match-expr>(label 'star wars')</match-expr> <hits found="7" start="0"> <hit id="tt1185834"/> <hit id="tt0076759"/> <hit id="tt0121765"/> <hit id="tt0080684"/> <hit id="tt0086190"/> <hit id="tt0120915"/> <hit id="tt0121766"/> </hits> <facets/> <info rid="b7c167f6c2da6d93501039ad23f00811361e4acf6ca09ec98ae60af47463dfe4 ce2e5565e736aa1f" time-ms="3" cpu-time-ms="0"/> </results>
For detailed information about the JSON and XML response formats for search requests, see Search Response (p. 190).
If you want to retrieve 25 hits at a time, set the size parameter to 25. To get the first set of hits, you don't have to set the start parameter:
search?q=-star&size=25
For subsequent requests, use the start parameter to retrieve the set of hits you want. For example, to get the third batch of 25 hits specify:
search?q=-star&size=25&start=50
Note
Making fields result enabled increases the size of your index, which can increase the cost of running your domain. You should only store document data in the search index by making fields result-enabled when it's difficult or costly to retrieve the data using other means. Since it can take some time to apply document updates across the domain, critical data such as pricing information should be retrieved using the returned document IDs instead of returned from the index. To retrieve source data for result-enabled fields, you specify the return-fields parameter in the query string. You can specify a single return field, or up to 10 fields as a comma-separated list. For example, to include the actor, title, and default text_relevance score in the search results:
search?q=star+wars&return-fields=actor,title,text_relevance
{ "id":"tt1185834", "data":{ "actor":["Abercrombie, Ian","Baker, Dee Bradley","Burton, Corey", "Eckstein, Ashley","Futterman, Nika","Kane, Tom", "Lanter, Matt","Taber, Catherine","Taylor, James Arnold", "Wood, Matthew"], "text_relevance":["308"], "title":["Star Wars: The Clone Wars"] } }
By default, results are listed in an ascending order. To sort in descending order, prefix the field name with - (minus sign):
search?q=star+wars&rank=-title
You can use any uint field to sort results numerically. For example, specifying rank=-year will sort the results by year with the most recent year listed first:
search?q=star+wars&return-fields=title,year&rank=-year
Note
If you don't specify the rank option, it is set to -text_relevance by default so the highest-scoring documents are listed first. You can also define custom rank expressions and use them to sort results. For more information about creating and using your own rank expressions, see Customizing Result Ranking with Amazon CloudSearch (p. 98).
Amazon CloudSearch Developer Guide Getting Facet Information for Text and Literal Fields
Using Facet Information in Amazon CloudSearch (p. 95) A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a particular field. You can display this information along with the search results and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.) You can get facet information for any uint field and facet-enabled text and literal fields by specifying the facet parameter in your search request. Amazon CloudSearch also provides search parameters that enable you to control how facet values are returned and sorted. You can select which facets to retrieve, limit the number of facet values returned, and control the sorting of the facet values for each field.
Getting Facet Information for Text and Literal Fields in Amazon CloudSearch
When you request facet information for a text or literal field, Amazon CloudSearch returns facet counts for the top 40 values in the specified field. You can include the facet-FIELD-top-n parameter to limit the number of facet values that are returned for a particular field.
Note
To get facet information for a text or literal field, the field must be configured to enable faceting. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53). For example, the following request gets facet counts for the top five most-frequently-occurring values in the genre field:
search?bq=title:'star'&facet=genre&facet-genre-top-n=5
The response includes the returns the facet information after the list of hits.
"facets":{ "genre":{"constraints":[ {"value":"Sci-Fi","count":20}, {"value":"Action","count":18}, {"value":"Adventure","count":16}, {"value":"Thriller","count":10}, {"value":"Fantasy","count":5} ] }
Amazon CloudSearch Developer Guide Getting Facet Information for Particular Values
To drill down into particular bins of integers, you use the facet-FIELD-constraints parameter. For more information, see Getting Facet Information for Particular Values in Amazon CloudSearch (p. 93).
Note
If commas occur in a facet value you want to use as a constraint, the comma must be escaped with a backslash. For example, facet-actor-constraints='Bai\, Ling','Bryant\, Gene'. For example, to find out how many documents have Drama or Sci-Fi in the genre field, you'd set facet-genre-constraints='Drama','Sci-Fi':
search?q=star&facet=genre&facet-genre-constraints='Drama','Sci-Fi'
In the response, the counts are only shown for the specified constraints:
facets":{"genre": {"constraints":[ {"value":"Sci-Fi","count":20}, {"value":"Drama","count":4} ]} }
The facet-FIELD-constraints parameter can also be used with uint fields.You can specify individual values, as well as ranges of values, which enables you to do range-based binning. You can use the min and max values returned when you don't specify any constraints to calculate the ranges, and then get facet counts for each of those ranges with a subsequent search. The values and ranges are specified as a comma-separated list. For example, the following request gets facet counts for documents with a year value of 2000, 2001, 2002 through 2004, and all documents with year greater than or equal to 2005:
search?q=star&facet=year&facet-year-constraints=2000,2001,2002..2004,2005..
By default, the response shows the constraints with the highest counts first:
"facets":{ "year":{"min":1970,"max":2012, "constraints":[ {"value":"2005..","count":8}, {"value":"2002..2004","count":2}, {"value":"2001","count":1} ] } }
To sort values for a facet field using the value of a uint field or rank expression
Specify facet-FIELD-sort=max(FIELD). When you use the max option, the score used for sorting is the maximum value in the specified field across all matching documents with that facet value. By default, the values are sorted in ascending order. You can prefix the max option with a - (minus sign) to reverse the order. For example, you could use the default text_relevance score to sort the facet values. In the following request, the facet value that has the matching document with the highest text_relevance score is listed first:
search?bq=title:'star'&facet=genre&facet-genre-sort=-max(text_relevance)
The maximum text_relevance score for each facet value is displayed in the facet information:
"facets": {"genre": {"constraints":[ {"value":"Action","count":18,"score":288}, {"value":"Adventure","count":16,"score":288}, {"value":"Sci-Fi","count":20,"score":288}, {"value":"Animation","count":1,"score":282}, {"value":"Comedy","count":4,"score":282}, {"value":"Thriller","count":10,"score":282}, {"value":"Biography","count":1,"score":276}, {"value":"Drama","count":3,"score":276}, {"value":"Romance","count":1,"score":276},
To sum the values in a field and use the resulting score to sort the facet values
Specify facet-FIELD-sort=sum(FIELD) . When you use the sum option, the score used for sorting is the sum of the values in the specified field for all matching documents with that facet value. By default, the values are listed in ascending order. For example:
search?bq='state'&facet=chief&facet-chief-sort=sum(majvotes)
The sum is displayed in the facet information as the score for the facet value:
facets": { "chief": { "constraints: [ {"value": "Roberts","count": 116,"score": 869}, ... {"value": "Warren",count": 712,"score": 4932} ] } }
Note
You can prefix the sum option with a - (minus sign) to list the values in descending order.
<hit id="tt0069945"/> <hit id="tt1185834"/> <hit id="tt0092007"/> <hit id="tt0098382"/> </hits> <facets> <facet name="actor"/> <facet name="genre"> <constraint value="Sci-Fi" count="20"/> <constraint value="Action" count="18"/> <constraint value="Adventure" count="17"/> <constraint value="Thriller" count="10"/> <constraint value="Fantasy" count="5"/> </facet> </facets> <info rid="3c5a461d28b76874a756e4d419a38646955da47864afeeef172add882f 712bb0b7c9e486627e07e2" time-ms="3" cpu-time-ms="0"/> </results>
Using the document ids, you can retrieve the data you want to display for each hit from a separate system. By displaying the facet information, you can provide a way for the user to zero on in the movie he's looking for. For example, he might click "William Shatner" in the list of actors to see the subset of movies that William Shatner appeared in. To retrieve the subset, you can use the bq search parameter to perform a fielded search against the actor field and find the matches that contain star in any text field and William Shatner in the actor field.
Note
In this example, both the actor and genre fields have configured as facets. If you want to try out these queries with the sample imdb-movie data, you'll need to modify your movie domain's indexing options to configure the actor field as a facet. For more information, see Configuring Index Fields for an Amazon CloudSearch Domain (p. 53).
search?bq=(and 'star' actor:'William Shatner')&facet=actor,genre &facet-actor-top-n=10&facet-genre-top-n=5&size=5 &results-type=xml
This retrieves the subset of hits along with the actor and genre facet information:
<results> <rank>-text_relevance</rank> <match-expr>(and 'star' actor:'William Shatner')</match-expr> <hits found="6" start="0"> <hit id="tt0092007"/> <hit id="tt0098382"/> <hit id="tt0088170"/> <hit id="tt0079945"/> <hit id="tt0084726"/> </hits> <facets> <facet name="actor"> <constraint value="Doohan, James" count="6"/> <constraint value="Kelley, DeForest" count="6"/> <constraint value="Koenig, Walter" count="6"/> <constraint value="Nichols, Nichelle" count="6"/> <constraint value="Nimoy, Leonard" count="6"/>
<constraint value="Shatner, William" count="6"/> <constraint value="Takei, George" count="6"/> <constraint value="Butrick, Merritt" count="2"/> <constraint value="Lenard, Mark" count="2"/> <constraint value="Adamson, Joseph" count="1"/> </facet> <facet name="genre"> <constraint value="Sci-Fi" count="6"/> <constraint value="Action" count="5"/> <constraint value="Adventure" count="5"/> <constraint value="Thriller" count="4"/> <constraint value="Mystery" count="2"/> </facet> </facets> <info rid="ccd66a5219f938d2d27598352059d8c34094e7b0695b7c51dc91631555cb382dc17ef8064dbc9fdd" time-ms="3" cpu-time-ms="0"/> </results>
At this point, the user might remember that the movie he's trying to find also had Joseph Adamson in it and click on Joseph Adamson in the actor list. Again, you would use his selection to further refine the query:
search?bq=(and 'star' actor:'William Shatner' actor:'Adamson, Joseph') &return-fields=title&facet=actor,genre&facet-actor-top-n=10 &facet-genre-top-n=5&size=5&results-type=xml
Now, there's just a single match that you can display to the user Star Trek IV: The Voyage Home:
<results> <rank>-text_relevance</rank> <match-expr>(and 'star' actor:'William Shatner' actor:'Adamson, Joseph')</match-expr> <hits found="1" start="0"> <hit id="tt0092007"> <d name="title">Star Trek IV: The Voyage Home</d> </hit> </hits> <facets> ... </facets> </results>
Arithmetic operators: + - * / % Bitwise operators: | & ^ ~ << >> >>> Boolean operators (including the ternary operator): && || ! ?: Comparison operators: < <= = >= > Common mathematic functions: abs ceil erf exp floor lgamma ln log2 log10 max min sqrt pow Trigonometric library functions: acosh acos asinh asin atanh atan cosh cos sinh sin tanh tan Miscellaneous functions: rand, time, min, max JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operatorsthe second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression a || b, b is only evaluated if a is not true. Rank expressions always return an integer value from 0 to the maximum unsigned 32-bit integer value. If the expression is invalid or evaluates to a negative value, it returns 0. If the expression evaluates to a value greater than the maximum, it returns the maximum value. Intermediate results are calculated as double-precision floating point values and the return value is rounded to the nearest integer. Rank expression names must begin with a letter and be at least 3 and no more than 64 characters long. The following characters are allowed: a-z (lower-case letters), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be specified as rank expression names. For example, if you define a uint field named popularity for your domain, you could use that field in conjunction with the default text_relevance score to construct a custom rank expression. The following expression bases 30% of a document's rank score on the popularity field, and 70% of its rank score on its default text_relevance score. (The default text_relevance score is in the range 0-1000, the popularity field is assumed to have values in the range 0-10000, and the expression returns a value in the range 0-1000.)
((0.3*popularity)/10.0)+(0.7*text_relevance)
For more information about using rank expressions to sort search results, see Sorting Results in Amazon CloudSearch (p. 91). In addition to specifying how you want to rank results, the Amazon CloudSearch Search API also enables you to specify threshold constraints. A threshold constraint can be based on the value of a uint field, or on the value of a rank expression. For example, if your documents have an available_on field that specifies a date as an epoch uint value, you could define a rank expression to exclude documents whose available_on value is later than the current time:
(time() > available_on)?1:0
For more information about using rank expressions to constrain search results, see Constraining Search Results in Amazon CloudSearch (p. 102) You can configure rank expressions using the cs-configure-ranking (p. 99) command, from the Amazon CloudSearch console (p. 100), or using the DefineRankExpression (p. 101) configuration action.
4.
5.
Enter the numerical expression you want to evaluate at search time in the Expression field. You can use the insert... menu to insert special values and mathematic and trigonometric functions.
6. 7.
Click Add a New Expression to configure additional rank expressions. Click Submit to save your changes.
API
You use the DefineRankExpression (p. 131) configuration action to specify rank expressions. The name you specify in the RankExpression.RankName option is how you reference the expression in your search requests. You specify the numeric expression that you want to evaluate for each search result in the RankExpression.RankExpression option. For example:
https://cloudsearch.us-east-1.amazonaws.com ?Action=DefineRankExpression &DomainName=movies &RankExpression.RankExpression=((0.3*year)/10.0)+((0.7*text_relevance)) &RankExpression.RankName=popularhits &Version=2011-02-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120403/us-east-1/cloudsearch/aws4_re quest &X-Amz-Date=2012-04-03T00:16:02.684Z &X-Amz-SignedHeaders=host &X-Amz-Signature=30205ede7907cf8a3fc41172fc63e323136a083b0967f96196bdea53f60d3cf3
How to run the Amazon CloudSearch command line tools. (p. 106)
Note
These examples temporarily set the CS_HOME and PATH variables for the duration of your terminal session. You can also set them permanently. On Linux and MacOSX, add the export commands to your shell startup file (.profile, .bashrc, .tcshrc, or .zshrc) in your home directory. On Windows, you can do this through the Control Panel: Control Panel > System and Security > System > Advanced > Environment Variables. Add the CS_HOME environment variable to your PATH. On Linux and UNIX, enter following command:
export PATH=$PATH:$CS_HOME/bin
5.
6.
Make sure you have the Java 6 (or later) JRE installed and the JAVA_HOME environment variable is set to the full path of the directory that contains the bin directory in which the Java executable resides. For information about checking your Java installation, go to java.com.
Note
On Mac OS X, JAVA_HOME should be set using the /usr/libexec/java_home command. For example: export JAVA_HOME=$(/usr/libexec/java_home). For more information, see QA1170 on developer.apple.com. 7. Configure the command line tools to use your AWS identifiers. The Amazon CloudSearch command line tools look for your AWS identifiers in a text file on your local system in the location specified by the AWS_CREDENTIAL_FILE environment variable. If you have not already configured an AWS credential file: a. Use a text editor to create a two-line text file that specifies your AWS identifiers. The first line sets the accessKey property and the second line sets the secretKey property. For example:
accessKey=AKIAIOSFODNN7EXAMPLE secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY
b. c. d.
Save the file using any name you want (for example, account-key). Limit the file permissions to only the file owner. (For example, use chmod 600 on the file if you're using Linux/UNIX). Set the AWS_CREDENTIAL_FILE environment variable. On Linux and UNIX, enter following command:
export AWS_CREDENTIAL_FILE=credential_file_path
8.
To verify that the Amazon CloudSearch tools are configured correctly, run the cs-describe-domain command. (Since you haven't configured any domains yet, the Domain Summary will be empty.)
cs-describe-domain
If you get an error, check the following: If the system cannot find the specified path, your JAVA_HOME environment variable needs to be set to the location where you have the JRE installed. For example, C:\Program Files\Java\jre6. If cs-describe-domain is not recognized as a command, check your PATH and make sure it contains the bin directory for the command line tools, for example /Users/username/CloudSearch/tools/bin.
If you get an InvalidClientTokenId error, your AWS credentials are not configured correctly. Make sure that you've configured the AWS_CREDENTIAL_FILE environment variable and that your credential file contains valid AWS identifiers.
cs-configure-access-policies
NAME cs-configure-access-policies - Configure access to an Amazon CloudSearch domain. SYNOPSIS cs-configure-access-policies --service doc|search|all [--allow IP|CIDR|all] [--deny IP|CIDR|all] [--update] [--policy-file FILE] [--delete IP|CIDR] [--force] [--retrieve] COMMON_OPTIONS DESCRIPTION Defines access policies for a domain's document and search endpoints. When a domain is first created, it is configured to deny all access. To access the document or search services through the Amazon CloudSearch Command Line Tools or APIs, you must authorize one or more IP addresses. This command provides two ways for you to update your domain's access policies: --update Add or remove specific permissions from your domain's access policies. Changes are automatically merged with the domain's existing policy document.
--policy-file Upload a policy document to your domain. The uploaded file overwrites the domain's existing policy document.
When using the --update option, you can specify multiple --allow or --deny options to allow or block multiple IP addresses or address ranges. You must specify one or more --service options to indicate which service endpoints you want to apply the access policies to. Address ranges are specified using Classless Inter-Domain Routing (CIDR) notation with the base IP address followed by a / and a network mask that indicates the number of leftmost bits used to identify the network. If you don't specify a network mask it defaults to 32, which authorizes or blocks only the specified IP address. When using the --policy-file option, the uploaded policy document replaces the domain's existing policy document. The specified file must be a valid AWS Identity and Access Management (IAM) policy document. (You can use the --retrieve mode to get the domain's current policy document.) For information about the IAM Access Policy Language, see http://docs.amazonwebservices.com/IAM/latest/UserGuide/index.html? AccessPolicyLanguage.html. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are querying or configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
-e,
--endpoint URL
UPDATE ACCESS POLICY OPTIONS -al, --allow IP|CIDR Add access privileges for a specific IP address or CIDR block. Specify all to allow access from any IP address. Multiple --allow options can be specified to authorize multiple addresses or address ranges. Used in conjunction with the --update option.
-del,--delete IP|CIDR
Delete the allow or deny rule configured for the specified IP address or CIDR block. Used in conjunction with the --update option. Deny access privileges for a specific IP address or CIDR block. Specify all to block access from all IP addresses. Multiple --deny options can be specified to block multiple addresses or address ranges. Used in conjunction with the --update option. Specify the service to apply the policy changes to: doc, search, or all. All allow, deny, and delete options will be applied to the specified service. Multiple --service options can be specified to apply the same policies to multiple services. Required when using the --update option. Update the policy with the specified allow, deny and delete options. When using --update, you must also specify at least one --allow, --deny, or --delete option. You must also specify at least one of the domain's endpoints with the --service option.
-u, --update
POLICY FILE OPTIONS -pf, --policy-file FILE Replace the domain's existing policy document with the specified JSON policy document. Can be specified as a path to a local file or an S3 URI. Retrieve the domain's existing policy document.
-r, --retrieve
MISCELLANEOUS OPTIONS -f, --force Apply changes to the domain's access policies without confirmation. Can be used in conjunction with either the --update or --policy-file option.
EXAMPLES Authorize addresses in the range 192.0.2.0 to 192.0.2.255 to access all services: cs-configure-access-policies -d mydomain --update --allow 192.0.2.0/24 --service all COMMON_OPTIONS Block a particular IP address from accessing the search service: cs-configure-access-policies -d mydomain --update --deny 192.0.2.0 --service search
COMMON_OPTIONS Allow access to all services from any IP address: cs-configure-access-policies -d mydomain --update --allow all --service all COMMON_OPTIONS Upload a policy document and overwrite the domain's access policies without having to confirm the change: cs-configure-access-policies -d mydomain --policy-file c:\mypolicydoc.json --force COMMON_OPTIONS
cs-configure-fields
NAME cs-configure-fields - Define index fields for a domain. SYNOPSIS cs-configure-fields --name STRING --type text|literal|uint [--option search|nosearch|facet|nofacet|result|noresult] [--source STRING] [--default-value NUM] [--delete] COMMON_OPTIONS
DESCRIPTION Defines the fields that will be included in a domain's index and specifies which fields can be searched, included in search results, or used as facets. You can also use this command to delete fields from the domain. The --option values you can specify for a field depend on the field type: - text Text fields are always searchable. You can specify the facet, nofacet, result, or noresult options for a text field. A text field can be used as a facet or returned in search results, but not both. By default, text fields are not facet or result enabled. You can specify the search, nosearch, facet, nofacet, result, or noresult options for a literal field. A literal field can be used as a facet or returned in search results, but not both. By default, literal fields are not searchable, facet-enabled, or result enabled. Uint fields can always be used as facets and returned in results. No --option values are valid for a uint field.
- literal
- uint
For more information about configuring indexing options, see the Amazon CloudSearch Developer Guide.
COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
-e,
--endpoint URL
INDEXING OPTIONS --default-value NUM The default value for a uint field. This value will be added to any document that does not contain at least one value for the field. Delete the field specified by the --name and --type options. The name of the field you are configuring or deleting. Field names must begin with a letter and can contain the following characters: a-z (lower-case letters), 0-9, and _ (underscore). Field names must be at least 3 and no more than 28 characters. Required. Configures an option for the field specified by the --name and --type options. Valid values: search, nosearch, facet, nofacet, result, noresult. Text and literal fields cannot have both the facet and result options enabled. By default, text and uint fields are always searchable and uint fields are always facet-enabled.
--delete
--name STRING
--option OPTION
--source FIELD
A source field for a compound field. The value of a compound field is the concatenation of the values of all of its sources. The type of the field that you are configuring or deleting: text, literal, uint. Required.
--type TYPE
EXAMPLES Configure index fields: cs-configure-fields -d mydomain --name title --type text --option result COMMON_OPTIONS cs-configure-fields -d mydomain --name people --type text --source actor --source director COMMON_OPTIONS cs-configure-fields -d mydomain --name category --type literal --options facet COMMON_OPTIONS cs-configure-fields --name value --type uint --default-value 100 COMMON_OPTIONS Delete an index field: cs-configure-fields -d mydomain --name obsolete_field --type index-uint --delete COMMON_OPTIONS
cs-configure-ranking
NAME cs-configure-ranking - Configure a custom rank expression for a domain. SYNOPSIS cs-configure-ranking --name STRING --expression EXPRESSION [--delete] COMMON_OPTIONS DESCRIPTION Enables you to specify a rank expression to control how search results are ranked. A rank expression is a numeric expression that can reference uint fields and other rank expressions by name. You can also reference a document's default text_relevance score in a rank expression. A document's text_relevance score is a value from 0 to 1000 (inclusive). To calculate the relevance score, Amazon CloudSearch takes into account how many times the search terms appear (term frequency) and how close
the search terms are to each other (proximity). All of the usual arithmetic, bitwise, boolean, and comparison operators and most common math C library functions can be used in rank expressions. Intermediate results are calculated as double-precision floating point values and the return value is rounded to the nearest integer. If the expression is invalid or evaluates to a negative value, it returns 0. To use a rank expression to sort search results, you specify &rank=RANKEXPRESSION in your search requests. For more information about constructing and using rank expressions, see the Amazon CloudSearch API Reference and the Amazon CloudSearch Developer Guide. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
-e,
--endpoint URL
RANKING OPTIONS --delete Delete the rank expression specified in the --name option. The name of the rank expression you are configuring or deleting. Required. The rank expression to be computed when processing a search request. A rank expression is a numeric expression that can reference uint fields and other rank expressions by name, as well as a document's
--name STRING
-ex,--expression EXPRESSION
default text_relevance score. EXAMPLES cs-configure-ranking -d mydomain --name myrankexp --expression ((0.3*myuintfield)/10.0)+((0.7*text_relevance)) COMMON_OPTIONS cs-configure-ranking -d mydomain --name myrankexp --expression text_relevance+myotherankexp/100000 COMMON_OPTIONS
cs-configure-text-options
NAME cs-configure-text-options - Specify domain-specific stopwords, synonyms, and stems. SYNOPSIS cs-configure-text-options [--stopwords FILE|S3_URI] [--synonyms FILE|S3_URI] [--stems FILE|S3_URI] [--print-stopwords] [--print-synonyms] [--print-stems] COMMON_OPTIONS DESCRIPTION Amazon CloudSearch gives you control over how your content is indexed by enabling you to specify the following language-specific text options: - stopwords Words that should typically be ignored both during indexing and at search time because they are either insignificant or so common that including them would result in a massive number of matches. The default stopwords for English are: a, an, and, are, as, at, be, but, by, for, in, is, it, of, on, or, the, to, was. Words that have the same or nearly the same meaning as terms that appear in your corpus. When a user searches for a synonym rather than the indexed term, the results will include documents that contain the indexed term. No synonyms are defined by default. Define mappings between related words and a common stem. This enables matching on variants of a word. No stems are defined by default.
- synonyms
- stems
COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction
with --secret-key. Must be specified if you do not use an AWS credential file. -c, --aws-credential-file FILE The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are querying or configuring. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
-e,
--endpoint URL
TEXT OPTIONS --stems FILE The path or S3 URI for a stemming dictionary file. The stemming dictionary file should contain one comma-separated term, stem pair per line. For example: mice, mouse people, person running, run --stopwords FILE The path or S3 URI for a stopwords dictionary file. The stopwords dictionary file should contain one stopword per line. For example: the or and --synonyms FILE The path or S3 URI for a synonyms dictionary file. Each line in the file should specify a term followed by a comma-separated list of its synonyms. For example: cat, feline, kitten dog, canine, puppy horse, equine, colt, filly -psw, --print-stopwords -psm, --print-stems List the domain's stopwords. List the domain's stems.
cs-configure-text-options -d mydomain --stems /home/mystems.txt --stopwords /home/mystopwords.txt --synonyms /home/mysynonyms.txt COMMON_OPTIONS cs-configure-text-options -d mydomain --print-stopwords COMMON_OPTIONS
cs-create-domain
NAME cs-create-domain - Create a new Amazon CloudSearch domain. SYNOPSIS cs-create-domain --domain-name STRING [--wait] COMMON_OPTIONS DESCRIPTION Creates a search domain with the name specified by the --domain-name option. Domain names must begin with a letter or number and can contain the following characters: a-z, 0-9, and -. Uppercase letters and underscores are not allowed. Domain names must be at least 3 and no more than 28 characters. By default, this command returns immediately. If you specify the --wait option, cs-create-domain will return once the domain is created. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are creating. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message.
-e,
--endpoint URL
-h, --help
Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
DOMAIN OPTIONS -w, --wait Wait for domain creation to complete before returning.
cs-configure-from-sdf
NAME cs-configure-from-sdf - Define index fields for a domain based on the contents of one or more SDF batches. SYNOPSIS cs-configure-from-sdf --source PATH|S3_URI+ [--replace] [--force] COMMON_OPTIONS DESCRIPTION Scans SDF batches specified with the --source option and configures index fields for all of the document fields. Prompts for confirmation before making any changes unless you specify the --force option. By default, fields that have already been configured are left as-is. You can use the --replace option to overwrite the existing configuration. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are configuring. Required. The endpoint for the Amazon Cloud Search
-e,
--endpoint URL
Configuration Service. Defaults to cloud9.us-east-1.amazonaws.com. -h, --help -k, --secret-key STRING Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
FIELD OPTIONS -f, --force Apply changes to the domain's configuration without confirmation. Upload configuration information for all identified fields and overwrite the configuration of any fields that were already defined. (Prompts for confirmation unless you also specify --force.) The path to a file or an S3 URI that contains the data you want to scan. Required.
-re, --replace
cs-delete-domain
NAME cs-delete-domain - Permanently delete the specified domain and all of its data. SYNOPSIS cs-delete-domain --domain-name STRING [--force] COMMON_OPTIONS DESCRIPTION Deletes the search domain specified by the --domain-name option. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.
The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are deleting. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
-e,
--endpoint URL
DELETE DOMAIN OPTIONS -f, --force Delete the domain without prompting for confirmation.
EXAMPLES Delete a domain without prompting for confirmation: cs-delete-domain -d mydomain --force COMMON_OPTIONS
cs-describe-domain
NAME cs-describe-domain - Display information about a domain, including its status and endpoints. SYNOPSIS cs-describe-domain [--show-all] COMMON_OPTIONS DESCRIPTION Display information about your configured domains. If the --domain-name option is specified, cs-describe-domain only shows information for the specified domain. This command returns a table that contains the following information about the domain(s):
Domain Name
Document Service Endpoint The endpoint through which you can submit document updates. Search Endpoint The endpoint through which you can submit search requests. The number of documents that have been indexed. The name and type of each configured index field. Only shown when --show-all is specified. The name and type of each ranking field. Only shown when --show-all is specified. The number of partitions being used to hold the search index. The number of search instances being used to process search requests. The Amazon EC2 instance type being used to process search requests.
Searchable Documents
Index Fields
Ranking Fields
SearchPartitionCount
SearchInstanceCount
SearchInstanceType
The domain status also indicates whether or not the index needs to be rebuilt to process configuration changes. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used with --secret-key. Must be specified if you don't use an AWS credential file. The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are querying. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you don't an AWS credential file. -ve, --verbose Display verbose log messages.
-e,
--endpoint URL
-v, --version
DESCRIBE DOMAIN OPTIONS -all, --show-all Display all available information for the domain, including configured fields.
EXAMPLES Get information about a particular domain: cs-describe-domain -d mydomain --show-all COMMON_OPTIONS
cs-index-documents
NAME cs-index-documents - Index a domain's documents. SYNOPSIS cs-index-documents COMMON_OPTIONS DESCRIPTION Builds and deploys a complete index for the domain specified by the --domain-name option. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.
The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are indexing. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not
-e,
--endpoint URL
use an AWS credential file. -ve, --verbose -v, --version Display verbose log messages. Display the version number of the command line tools.
cs-post-sdf
NAME cs-post-sdf - Upload the SDF documents that you want to index and search. SYNOPSIS cs-post-sdf --source PATH|S3_URI+ COMMON_OPTIONS DESCRIPTION Update the contents of the domain specified by the --domain-name option with the documents specified by the --source option. The source documents must be specified in the SDF format, which can be generated from most types of files using the cs-generate-sdf command. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.
The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are updating. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message.
-e,
--endpoint URL
-h, --help
Your AWS secret key. Used in conjunction with --access-key. Must be specified if you do not use an AWS credential file. Display verbose log messages. Display the version number of the command line tools.
UPDATE DOCUMENTS OPTIONS -s, --source PATH|S3_URI The path to a file or an S3 URI that contains the SDF data you want to upload.
cs-generate-sdf
NAME cs-generate-sdf - Experimental tool for analyzing the data you want to index and automatically generating SDF batches for indexing. SYNOPSIS cs-generate-sdf --source PATH|S3_URI --output PATH|S3_URI [--modified-after yyyy-mm-ddTnn:nn] [--exclude-metadata] [--exclude-content] [--single-doc-per-csv] [--sdf-format json|xml] [--docid-prefix STRING] [--doc-version NUM] [--batch-size MB] [--batch-docs NUM] COMMON_OPTIONS DESCRIPTION Analyze your data and generate SDF (Search Data Format) batches that can be submitted to Amazon CloudSearch for indexing using the cs-post-sdf command. The generated SDF batches can be saved to your local file system or to an S3 bucket. The cs-generate-sdf command can generate SDF batches from the following content types:
text/csv text/html text/plain application/json application/msword application/pdf application/vnd.ms-excel application/vnd.ms-powerpoint application/vnd.openxmlformats-officedocument.presentationml.presentation application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.openxmlformats-officedocument.wordprocessingml.document application/xhtml+xml application/xml Generally, a single add document request is added to the SDF batch for each source file. Where possible, the contents of the source file are parsed into one or more index fields. If metadata is available for the file, an index field is added for each piece of metadata. When creating SDF batches from CSV source files, they are automatically parsed to generate a separate document for each row in the CSV file. The contents of the first row are used to define the document fields. If you are processing multiple files, CSV files are parsed row-by-row, and non-CSV files are treated as individual documents. You can specify the --single-doc-per-csv option to override the default behavior and treat each CSV file as a single document. Specifying the --single-doc-per-csv option has no effect on non-CSV files. Note: Currently, only CSV files are parsed to automatically extract custom field data and generate multiple documents. When processing XML and JSON files, each file is treated as a separate document and the contents of the file are used to populate a single text field. COMMON OPTIONS -a, --access-key STRING Your AWS access key. Used in conjunction with --secret-key. Must be specified if you do not use an AWS credential file.
The path to the file that contains your AWS credentials. Must be specified if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. The name of the domain that you are updating. Required. The endpoint for the Amazon Cloud Search Configuration Service. Defaults to cloudsearch.us-east-1.amazonaws.com. Display this help message. Your AWS secret key. Used in conjunction with
-e,
--endpoint URL
--access-key. Must be specified if you do not use an AWS credential file. -ve, --verbose -v, --version Display verbose log messages. Display the version number of the command line tools.
REQUIRED SDF OPTIONS -o, --output PATH|S3_URI The local directory or S3 bucket where you want to save the generated SDF batches. You must either specify an output location with the --output option, or specify the --domain option to upload the generated SDF batches to a search domain. The local directory, file, or S3 bucket that contains the data that you want to create SDF batches from. You can process data from multiple locations by specifying multiple --source options. Accepts Apache-ant style wildcards such as */** for files and S3 prefixes. Required.
ADVANCED SDF OPTIONS -bd, --batch-docs NUM -bs, --batch-size MB -sdpc, --single-doc-per-csv The maximum number of documents in a batch. The maximum batch size in MB. Defaults to 5MB. Treat the CSV file as a single document. If this option is specified, the contents of the CSV file will be treated as a single text field. This option has no effect on non-CSV files. The prefix to prepend to the document ID while processing CSV data. If not specified, the filename is used as the --docid-prefix. The docid column is used as the document ID if it is included in the CSV data; otherwise, the row number is used as the document ID. The version number to use for all of the generated SDF documents. Defaults to 1. Do not include the content of the source files in the generated SDF documents, only process the metadata. Do not include the metadata of the source files in the generated SDF documents, only process the content. The format of the generated SDF docments: json or xml. Defaults to json.
-ec, --exclude-content
-em, --exclude-metadata
Only process files or S3 objects modified after the specified date and time. Specified as yyyy-mm-ddTnn:nn.
EXAMPLES Generate an SDF batch from a plain text file: cs-generate-sdf --source c:\myAmazingDataSet\data1.txt --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate a single document for each CSV file: cs-generate-sdf --source c:\myAmazingDataSet\*.csv -sdpc --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate an SDF batch from multiple documents: cs-generate-sdf --source c:\myAmazingDataSet\data1.xml --source c:\myAmazingDataSet\data2.xml --source c:\myAmazingDataSet\data3.xml --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches from all HTML documents in a directory: cs-generate-sdf --source c:\myAmazingDataSet\*.html --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches from all Word or PDF documents in a directory: cs-generate-sdf --source c:\myAmazingDataSet\*.doc --source c:\myAmazingDataSet\*.pdf --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches from all recognized file types: cs-generate-sdf --source c:\myAmazingDataSet\* --output c:\myAmazingDataSet\SDF\batch COMMON_OPTIONS Generate SDF batches and upload them to your domain: cs-generate-sdf -d mydomain --source c:\myAmazingDataSet\* COMMON_OPTIONS SEE ALSO cs-post-sdf
Actions
The actions described in this guide are called using the AWS Query protocol. The following actions are supported: CreateDomain (p. 128) DefineIndexField (p. 129)
DefineRankExpression (p. 131) DeleteDomain (p. 132) DeleteIndexField (p. 133) DeleteRankExpression (p. 134) DescribeDefaultSearchField (p. 135) DescribeDomains (p. 136) DescribeIndexFields (p. 137) DescribeRankExpressions (p. 138) DescribeServiceAccessPolicies (p. 139) DescribeStemmingOptions (p. 140) DescribeStopwordOptions (p. 141) DescribeSynonymOptions (p. 142) IndexDocuments (p. 143) UpdateDefaultSearchField (p. 144) UpdateServiceAccessPolicies (p. 146) UpdateStemmingOptions (p. 148) UpdateStopwordOptions (p. 150) UpdateSynonymOptions (p. 152)
CreateDomain
Description
Creates a new search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a CreateDomainResult structure. Name
DomainStatus
Description The current status of the search domain. Type: DomainStatus (p. 159)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400 500
Base Internal
An error occurred while processing the request. An internal error occurred while processing the request. If this problem persists, report an issue from the Service Health Dashboard.
LimitExceeded
The request was rejected because a resource limit has already been met. 409
DefineIndexField
Description
Configures an IndexField for the search domain. Used to create new fields and modify existing ones. If the field exists, the new configuration replaces the old one. You can configure a maximum of 200 index fields.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. IndexField Defines a field in the index, including its name, type, and the source of its Yes data. The IndexFieldType indicates which of the options will be present. It is invalid to specify options for a type other than the IndexFieldType. Type: IndexField (p. 161)
Response Elements
The following elements come wrapped in a DefineIndexFieldResult structure. Name Description
IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met.
InvalidType LimitExceeded
Error
Description
ResourceNotFound
The request was rejected because it attempted to reference a resource 409 that does not exist.
DefineRankExpression
Description
Configures a RankExpression for the search domain. Used to create new rank expressions and modify existing ones. If the expression exists, the new configuration replaces the old one. You can configure a maximum of 50 rank expressions.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName
Description
Required
A string that represents the name of a domain. Domain names must Yes be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. A named expression that can be evaluated at search time and used Yes for ranking or thresholding in a search query. Type: NamedRankExpression (p. 162)
RankExpression
Response Elements
The following elements come wrapped in a DefineRankExpressionResult structure. Name
RankExpression
Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met. The request was rejected because it attempted to reference a resource 409 that does not exist.
InvalidType LimitExceeded
ResourceNotFound
DeleteDomain
Description
Permanently deletes a search domain and all of its data.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a DeleteDomainResult structure. Name
DomainStatus
Description The current status of the search domain. Type: DomainStatus (p. 159)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400 500
Base Internal
An error occurred while processing the request. An internal error occurred while processing the request. If this problem persists, report an issue from the Service Health Dashboard.
DeleteIndexField
Description
Removes an IndexField from the search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName
Description
Required
A string that represents the name of a domain. Domain names must Yes be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. A string that represents the name of an index field. Field names must Yes begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names "body", "docid", and "text_relevance" are reserved and cannot be specified as field or rank expression names. Type: String Length constraints: Minimum length of 1. Maximum length of 64.
IndexFieldName
Response Elements
The following elements come wrapped in a DeleteIndexFieldResult structure. Name Description
IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DeleteRankExpression
Description
Removes a RankExpression from the search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. RankName
The name of the RankExpression to delete. Type: String Length constraints: Minimum length of 1. Maximum length of 64.
Yes
Response Elements
The following elements come wrapped in a DeleteRankExpressionResult structure. Name
RankExpression
Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeDefaultSearchField
Description
Gets the default search field configured for the search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a DescribeDefaultSearchFieldResult structure. Name
DefaultSearchField
Description The name of the IndexField to use for search requests issued with the q parameter. The default is the empty string, which automatically searches all text fields. Type: DefaultSearchFieldStatus (p. 155)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeDomains
Description
Gets information about the search domains owned by this account. Can be limited to specific domains. Shows all domains by default.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainNames.member.N
Description
Required
Limits the DescribeDomains response to the specified search No domains. Type: String list
Response Elements
The following elements come wrapped in a DescribeDomainsResult structure. Name
DomainStatusList
Description The current status of all of your search domains. Type: DomainStatus (p. 159) list
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400 500
Base Internal
An error occurred while processing the request. An internal error occurred while processing the request. If this problem persists, report an issue from the Service Health Dashboard.
DescribeIndexFields
Description
Gets information about the index fields configured for the search domain. Can be limited to specific fields by name. Shows all fields by default.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName
Description
Required
A string that represents the name of a domain. Domain names Yes must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Limits the DescribeIndexFields response to the specified No fields. Type: String list
FieldNames.member.N
Response Elements
The following elements come wrapped in a DescribeIndexFieldsResult structure. Name
IndexFields
Description The index fields configured for the domain. Type: IndexFieldStatus (p. 162) list
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeRankExpressions
Description
Gets the rank expressions configured for the search domain. Can be limited to specific rank expressions by name. Shows all rank expressions by default.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DomainName
Description
Required
A string that represents the name of a domain. Domain names Yes must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Limits the DescribeRankExpressions response to the specified fields. Type: String list No
RankNames.member.N
Response Elements
The following elements come wrapped in a DescribeRankExpressionsResult structure. Name
RankExpressions
Description The rank expressions configured for the domain. Type: RankExpressionStatus (p. 164) list
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeServiceAccessPolicies
Description
Gets information about the resource-based policies that control access to the domain's document and search services.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a DescribeServiceAccessPoliciesResult structure. Name
AccessPolicies
Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeStemmingOptions
Description
Gets the stemming dictionary configured for the search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a DescribeStemmingOptionsResult structure. Name
Stems
Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeStopwordOptions
Description
Gets the stopwords configured for the search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a DescribeStopwordOptionsResult structure. Name
Stopwords
Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
DescribeSynonymOptions
Description
Gets the synonym dictionary configured for the search domain.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a DescribeSynonymOptionsResult structure. Name
Synonyms
Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
IndexDocuments
Description
Tells the search domain to start indexing its documents using the latest text processing options and IndexFields. This operation must be invoked to make options whose OptionStatus (p. 164) has OptionState of RequiresIndexDocuments visible in search results.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in a IndexDocumentsResult structure. Name Description
FieldNames The names of the fields that are currently being processed due to an IndexDocuments action. Type: String list
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
UpdateDefaultSearchField
Description
Configures the default search field for the search domain. The default search field is used when a search request does not specify which fields to search. By default, it is configured to include the contents of all of the domain's text fields.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
DefaultSearchField
Description The IndexField to use for search requests issued with the q parameter. The default is an empty string, which automatically searches all text fields. Type: String
Required Yes
DomainName
A string that represents the name of a domain. Domain names Yes must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in an UpdateDefaultSearchFieldResult structure. Name
DefaultSearchField
Description The value of the DefaultSearchField configured for this search domain and its current status. Type: DefaultSearchFieldStatus (p. 155)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409
InvalidType
Error
Description
ResourceNotFound
The request was rejected because it attempted to reference a resource 409 that does not exist.
UpdateServiceAccessPolicies
Description
Configures the policies that control access to the domain's document and search services. The maximum size of an access policy document is 100KB.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name
AccessPolicies
Description
Required
An IAM access policy as described in The Access Policy Language Yes in Using AWS Identity and Access Management. The maximum size of an access policy document is 100KB. Example: {"Statement": [{"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:search/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }}, {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:documents/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }} ] } Type: String
DomainName
A string that represents the name of a domain. Domain names must Yes be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28.
Response Elements
The following elements come wrapped in an UpdateServiceAccessPoliciesResult structure. Name
AccessPolicies
Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172).
Error
Description
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met. The request was rejected because it attempted to reference a resource 409 that does not exist.
InvalidType LimitExceeded
ResourceNotFound
UpdateStemmingOptions
Description
Configures a stemming dictionary for the search domain.The stemming dictionary is used during indexing and when processing search requests. The maximum size of the stemming dictionary is 500KB.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Stems
Maps terms to their stems, serialized as a JSON document. The document Yes has a single object with one property "stems" whose value is an object mapping terms to their stems. The maximum size of a stemming document is 500KB. Example: { "stems": {"people": "person", "walking": "walk"} } Type: String
Response Elements
The following elements come wrapped in an UpdateStemmingOptionsResult structure. Name
Stems
Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met.
InvalidType LimitExceeded
Error
Description
ResourceNotFound
The request was rejected because it attempted to reference a resource 409 that does not exist.
UpdateStopwordOptions
Description
Configures stopwords for the search domain. Stopwords are used during indexing and when processing search requests. The maximum size of the stopwords dictionary is 10KB.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Stopwords
Lists stopwords serialized as a JSON document. The document has a Yes single object with one property "stopwords" whose value is an array of strings. The maximum size of a stopwords document is 10KB. Example: { "stopwords": ["a", "an", "the", "of"] } Type: String
Response Elements
The following elements come wrapped in an UpdateStopwordOptionsResult structure. Name
Stopwords
Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409 The request was rejected because a resource limit has already been 409 met.
InvalidType LimitExceeded
Error
Description
ResourceNotFound
The request was rejected because it attempted to reference a resource 409 that does not exist.
UpdateSynonymOptions
Description
Configures a synonym dictionary for the search domain. The synonym dictionary is used during indexing to configure mappings for terms that occur in text fields. The maximum size of the synonym dictionary is 100KB.
Request Parameters
For information about the common parameters that all actions use, see Common Query Parameters (p. 171). Name Description Required
DomainName A string that represents the name of a domain. Domain names must be Yes unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. Synonyms
Maps terms to their synonyms, serialized as a JSON document. The document has a single object with one property "synonyms" whose value is an object mapping terms to their synonyms. Each synonym is a simple string or an array of strings. The maximum size of a stopwords document is 100KB. Example: { "synonyms": {"cat": ["feline", "kitten"], "puppy": "dog"} } Type: String
Yes
Response Elements
The following elements come wrapped in an UpdateSynonymOptionsResult structure. Name
Synonyms
Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)
Errors
For information about the common errors that all actions use, see Common Errors (p. 172). Error Description HTTP Status Code 400
Base Internal
An internal error occurred while processing the request. If this problem 500 persists, report an issue from the Service Health Dashboard. The request was rejected because it specified an invalid type definition. 409
InvalidType
Error
Description
LimitExceeded
The request was rejected because a resource limit has already been 409 met. The request was rejected because it attempted to reference a resource 409 that does not exist.
ResourceNotFound
Data Types
The Amazon CloudSearch Configuration Service API contains several data types that various actions use. This section describes each data type in detail.
Note
The order of each element in the response is not guaranteed. Applications should not assume a particular order. The following data types are supported: AccessPoliciesStatus (p. 154) CreateDomainResult (p. 155) DefaultSearchFieldStatus (p. 155) DefineIndexFieldResult (p. 155) DefineRankExpressionResult (p. 156) DeleteDomainResult (p. 156) DeleteIndexFieldResult (p. 156) DeleteRankExpressionResult (p. 156) DescribeDefaultSearchFieldResult (p. 157) DescribeDomainsResult (p. 157) DescribeIndexFieldsResult (p. 157) DescribeRankExpressionsResult (p. 158) DescribeServiceAccessPoliciesResult (p. 158) DescribeStemmingOptionsResult (p. 158) DescribeStopwordOptionsResult (p. 158) DescribeSynonymOptionsResult (p. 159) DomainStatus (p. 159) IndexDocumentsResult (p. 160) IndexField (p. 161) IndexFieldStatus (p. 162) LiteralOptions (p. 162) NamedRankExpression (p. 162) OptionStatus (p. 164)
RankExpressionStatus (p. 164) ServiceEndpoint (p. 165) SourceAttribute (p. 165) SourceData (p. 166) SourceDataMap (p. 166) SourceDataTrimTitle (p. 166) StemmingOptionsStatus (p. 167) StopwordOptionsStatus (p. 167) SynonymOptionsStatus (p. 168) TextOptions (p. 168) UIntOptions (p. 169) UpdateDefaultSearchFieldResult (p. 169) UpdateServiceAccessPoliciesResult (p. 169) UpdateStemmingOptionsResult (p. 170) UpdateStopwordOptionsResult (p. 170) UpdateSynonymOptionsResult (p. 170)
AccessPoliciesStatus
Description
A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies.
Contents
Name Description
Options An IAM access policy as described in The Access Policy Language in Using AWS Identity and Access Management. The maximum size of an access policy document is 100KB.
Example: {"Statement": [{"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:search/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }}, {"Effect":"Allow", "Action": "*", "Resource": "arn:aws:cs:us-east-1:1234567890:documents/movies", "Condition": { "IpAddress": { aws:SourceIp": ["203.0.113.1/32"] } }} ] } Type: String
Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
CreateDomainResult
Description
A response message that contains the status of a newly created domain.
Contents
Name
DomainStatus
Description The current status of the search domain. Type: DomainStatus (p. 159)
DefaultSearchFieldStatus
Description
The value of the DefaultSearchField configured for this search domain and its current status.
Contents
Name Description
Options The name of the IndexField to use as the default search field. The default is an empty string, which automatically searches all text fields. Type: String Length constraints: Minimum length of 1. Maximum length of 64. Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
DefineIndexFieldResult
Description
A response message that contains the status of an updated index field.
Contents
Name Description
IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)
DefineRankExpressionResult
Description
A response message that contains the status of an updated RankExpression.
Contents
Name
RankExpression
Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)
DeleteDomainResult
Description
A response message that contains the status of a newly deleted domain, or no status if the domain has already been completely deleted.
Contents
Name
DomainStatus
Description The current status of the search domain. Type: DomainStatus (p. 159)
DeleteIndexFieldResult
Description
A response message that contains the status of a deleted index field.
Contents
Name Description
IndexField The value of an IndexField and its current status. Type: IndexFieldStatus (p. 162)
DeleteRankExpressionResult
Description
A response message that contains the status of a deleted RankExpression.
Contents
Name
RankExpression
Description The value of a RankExpression and its current status. Type: RankExpressionStatus (p. 164)
DescribeDefaultSearchFieldResult
Description
A response message that contains the default search field for a search domain.
Contents
Name
DefaultSearchField
Description The name of the IndexField to use for search requests issued with the q parameter. The default is the empty string, which automatically searches all text fields. Type: DefaultSearchFieldStatus (p. 155)
DescribeDomainsResult
Description
A response message that contains the status of one or more domains.
Contents
Name
DomainStatusList
Description The current status of all of your search domains. Type: DomainStatus (p. 159) list
DescribeIndexFieldsResult
Description
A response message that contains the index fields for a search domain.
Contents
Name
IndexFields
Description The index fields configured for the domain. Type: IndexFieldStatus (p. 162) list
DescribeRankExpressionsResult
Description
A response message that contains the rank expressions for a search domain.
Contents
Name
RankExpressions
Description The rank expressions configured for the domain. Type: RankExpressionStatus (p. 164) list
DescribeServiceAccessPoliciesResult
Description
A response message that contains the access policies for a domain.
Contents
Name
AccessPolicies
Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)
DescribeStemmingOptionsResult
Description
A response message that contains the stemming options for a search domain.
Contents
Name
Stems
Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)
DescribeStopwordOptionsResult
Description
A response message that contains the stopword options for a search domain.
Contents
Name
Stopwords
Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)
DescribeSynonymOptionsResult
Description
A response message that contains the synonym options for a search domain.
Contents
Name
Synonyms
Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)
DomainStatus
Description
The current status of the search domain.
Contents
Name
Created
Description True if the search domain is created. It can take several minutes to initialize a domain when CreateDomain (p. 128) is called. Newly created search domains are returned from DescribeDomains (p. 136) with a false value for Created until domain creation is complete. Type: Boolean True if the search domain has been deleted. The system must clean up resources dedicated to the search domain when DeleteDomain (p. 132) is called. Newly deleted search domains are returned from DescribeDomains (p. 136) with a true value for IsDeleted for several minutes until resource cleanup is complete. Type: Boolean The service endpoint for updating documents in a search domain. Type: ServiceEndpoint (p. 165)
Deleted
DocService
Name
DomainId
Description An internally generated unique identifier for a domain. Type: String Length constraints: Minimum length of 1. Maximum length of 64. A string that represents the name of a domain. Domain names must be unique across the domains owned by an account within an AWS region. Domain names must start with a letter or number and can contain the following characters: a-z (lowercase), 0-9, and - (hyphen). Uppercase letters and underscores are not allowed. Type: String Length constraints: Minimum length of 3. Maximum length of 28. The number of documents that have been submitted to the domain and indexed. Type: Integer True if processing is being done to activate the current domain configuration. Type: Boolean True if IndexDocuments (p. 143) needs to be called to activate the current domain configuration. Type: Boolean The number of search instances that are available to process search requests. Type: Integer The instance type that is being used to process search requests. Type: String Valid Values: SearchInstance:t1.micro | SearchInstance:m1.small | SearchInstance:m1.large | SearchInstance:m2.xlarge The number of partitions across which the search index is spread. Type: Integer The service endpoint for requesting search results from a search domain. Type: ServiceEndpoint (p. 165)
DomainName
NumSearchableDocs
Processing
RequiresIndexDocuments
SearchInstanceCount
SearchInstanceType
SearchPartitionCount
SearchService
IndexDocumentsResult
Description
The result of an IndexDocuments action.
Contents
Name Description
FieldNames The names of the fields that are currently being processed due to an IndexDocuments action. Type: String list
IndexField
Description
Defines a field in the index, including its name, type, and the source of its data. The IndexFieldType indicates which of the options will be present. It is invalid to specify options for a type other than the IndexFieldType.
Contents
Name
IndexFieldName
Description The name of a field in the search index. Field names must begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names "body", "docid", and "text_relevance" are reserved and cannot be specified as field or rank expression names. Type: String Length constraints: Minimum length of 1. Maximum length of 64. The type of field. Based on this type, exactly one of the UIntOptions (p. 169), LiteralOptions (p. 162) or TextOptions (p. 168) must be present. Type: String Valid Values: uint | literal | text Options for literal field. Present if IndexFieldType specifies the field is of type literal. Type: LiteralOptions (p. 162) An optional list of source attributes that provide data for this index field. If not specified, the data is pulled from a source attribute with the same name as this IndexField. When one or more source attributes are specified, an optional data transformation can be applied to the source data when populating the index field. You can configure a maximum of 20 sources for an IndexField. Type: SourceAttribute (p. 165) list Options for text field. Present if IndexFieldType specifies the field is of type text. Type: TextOptions (p. 168) Options for an unsigned integer field. Present if IndexFieldType specifies the field is of type unsigned integer. Type: UIntOptions (p. 169)
IndexFieldType
LiteralOptions
SourceAttributes
TextOptions
UIntOptions
IndexFieldStatus
Description
The value of an IndexField and its current status.
Contents
Name Description
Options Defines a field in the index, including its name, type, and the source of its data. The IndexFieldType indicates which of the options will be present. It is invalid to specify options for a type other than the IndexFieldType. Type: IndexField (p. 161) Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
LiteralOptions
Description
Options that define a literal field in the search index.
Contents
Name
DefaultValue
Description The default value for a literal field. Optional. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. Specifies whether facets are enabled for this field. Default: False. Type: Boolean Specifies whether values of this field can be returned in search results and used for ranking. Default: False. Type: Boolean Specifies whether search is enabled for this field. Default: False. Type: Boolean
FacetEnabled
ResultEnabled
SearchEnabled
NamedRankExpression
Description
A named expression that can be evaluated at search time and used for ranking or thresholding in a search query.
Contents
Name
RankExpression
Description The expression to evaluate for ranking or thresholding while processing a search request. The RankExpression syntax is based on JavaScript expressions and supports: Integer, floating point, hex and octal literals Shortcut evaluation of logical operators such that an expression a || b evaluates to the value a if a is true without evaluting b at all JavaScript order of precedence for operators Arithmetic operators: + - * / % Boolean operators (including the ternary operator) Bitwise operators Comparison operators Common mathematic functions: abs ceil erf exp floor lgamma ln log2 log10 max min sqrt pow Trigonometric library functions: acosh acos asinh asin atanh atan cosh cos sinh sin tanh tan Random generation of a number between 0 and 1: rand Current time in epoch: time The min max functions that operate on a variable argument list Intermediate results are calculated as double precision floating point values. The final return value of a RankExpression is automatically converted from floating point to a 32-bit unsigned integer by rounding to the nearest integer, with a natural floor of 0 and a ceiling of max(uint32_t), 4294967295. Mathematical errors such as dividing by 0 will fail during evaluation and return a value of 0. The source data for a RankExpression can be the name of an IndexField of type uint, another RankExpression or the reserved name text_relevance. The text_relevance source is defined to return an integer from 0 to 1000 (inclusive) to indicate how relevant a document is to the search request, taking into account repetition of search terms in the document and proximity of search terms to each other in each matching IndexField in the document. For more information about using rank expressions to customize ranking, see the Amazon CloudSearch Developer Guide. Type: String Length constraints: Minimum length of 1. Maximum length of 10240.
RankName
The name of a rank expression. Rank expression names must begin with a letter and can contain the following characters: a-z (lowercase), 0-9, and _ (underscore). Uppercase letters and hyphens are not allowed. The names "body", "docid", and "text_relevance" are reserved and cannot be specified as field or rank expression names. Type: String Length constraints: Minimum length of 1. Maximum length of 64.
OptionStatus
Description
The status of an option, including when it was last updated and whether it is actively in use for searches.
Contents
Name
CreationDate
Description A timestamp for when this option was created. Type: DateTime The state of processing a change to an option. Possible values: RequiresIndexDocuments: the option's latest value will not be visible in searches until IndexDocuments (p. 143) has been called and indexing is complete. Processing: the option's latest value is not yet visible in all searches but is in the process of being activated. Active: the option's latest value is completely visible. Type: String Valid Values: RequiresIndexDocuments | Processing | Active
State
UpdateDate
A timestamp for when this option was last updated. Type: DateTime A unique integer that indicates when this option was last updated. Type: Integer
UpdateVersion
RankExpressionStatus
Description
The value of a RankExpression and its current status.
Contents
Name Description
Options The expression that is evaluated for ranking or thresholding while processing a search request. Type: NamedRankExpression (p. 162) Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
ServiceEndpoint
Description
The endpoint to which service requests can be submitted, including the actual URL prefix for sending requests and the Amazon Resource Name (ARN) so the endpoint can be referenced in other API calls such as UpdateServiceAccessPolicies (p. 146).
Contents
Name
Arn
Description An Amazon Resource Name (ARN). See Identifiers for IAM Entities in Using AWS Identity and Access Management for more information. Type: String The URL (including /version/pathPrefix) to which service requests can be submitted. Type: String
Endpoint
SourceAttribute
Description
Identifies the source data for an index field. An optional data transformation can be applied to the source data when populating the index field. By default, the value of the source attribute is copied to the index field.
Contents
Name
SourceDataCopy
Description Copies data from a source document attribute to an IndexField. Type: SourceData (p. 166) Identifies the transformation to apply when copying data from a source attribute. Type: String Valid Values: Copy | TrimTitle | Map Maps source document attribute values to new values when populating the IndexField. Type: SourceDataMap (p. 166) Trims common title words from a source document attribute when populating an IndexField. This can be used to create an IndexField you can use for sorting. Type: SourceDataTrimTitle (p. 166)
SourceDataFunction
SourceDataMap
SourceDataTrimTitle
SourceData
Description
The source attribute name and an optional default value to use if a document doesn't have an attribute of that name.
Contents
Name
DefaultValue
Description An optional default value to use if the source attribute is not specified in a document. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. The name of the document source field to add to this IndexField. Type: String Length constraints: Minimum length of 1. Maximum length of 64.
SourceName
SourceDataMap
Description
Specifies how to map source attribute values to custom values when populating an IndexField.
Contents
Name
Cases
Description A map that translates source field values to custom values. Type: String to String map An optional default value to use if the source attribute is not specified in a document. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. The name of the document source field to add to this IndexField. Type: String Length constraints: Minimum length of 1. Maximum length of 64.
DefaultValue
SourceName
SourceDataTrimTitle
Description
Specifies how to trim common words from the beginning of a field to enable title sorting by that field.
Contents
Name
DefaultValue
Description An optional default value to use if the source attribute is not specified in a document. Type: String Length constraints: Minimum length of 0. Maximum length of 1024. An IETF RFC 4646 language code. Only the primary language is considered. English (en) is currently the only supported language. Type: String The separator that follows the text to trim. Type: String The name of the document source field to add to this IndexField. Type: String Length constraints: Minimum length of 1. Maximum length of 64.
Language
Separator
SourceName
StemmingOptionsStatus
Description
The stemming options configured for this search domain and the current status of those options.
Contents
Name Description
Options Maps terms to their stems, serialized as a JSON document. The document has a single object with one property "stems" whose value is an object mapping terms to their stems. The maximum size of a stemming document is 500KB. Example: { "stems": {"people": "person", "walking": "walk"} } Type: String Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
StopwordOptionsStatus
Description
The stopword options configured for this search domain and the current status of those options.
Contents
Name Description
Options Lists stopwords serialized as a JSON document. The document has a single object with one property "stopwords" whose value is an array of strings. The maximum size of a stopwords document is 10KB. Example: { "stopwords": ["a", "an", "the", "of"] } Type: String Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
SynonymOptionsStatus
Description
The synonym options configured for this search domain and the current status of those options.
Contents
Name Description
Options Maps terms to their synonyms, serialized as a JSON document. The document has a single object with one property "synonyms" whose value is an object mapping terms to their synonyms. Each synonym is a simple string or an array of strings. The maximum size of a stopwords document is 100KB. Example: { "synonyms": {"cat": ["feline", "kitten"], "puppy": "dog"} } Type: String Status
The status of an option, including when it was last updated and whether it is actively in use for searches. Type: OptionStatus (p. 164)
TextOptions
Description
Options that define a text field in the search index.
Contents
Name
DefaultValue
Description The default value for a text field. Optional. Type: String Length constraints: Minimum length of 0. Maximum length of 1024.
Name
FacetEnabled
Description Specifies whether facets are enabled for this field. Default: False. Type: Boolean Specifies whether values of this field can be returned in search results and used for ranking. Default: False. Type: Boolean
ResultEnabled
UIntOptions
Description
Options that define a uint field in the search index.
Contents
Name
DefaultValue
Description The default value for an unsigned integer field. Optional. Type: Integer
UpdateDefaultSearchFieldResult
Description
A response message that contains the status of an updated default search field.
Contents
Name
DefaultSearchField
Description The value of the DefaultSearchField configured for this search domain and its current status. Type: DefaultSearchFieldStatus (p. 155)
UpdateServiceAccessPoliciesResult
Description
A response message that contains the status of updated access policies.
Contents
Name
AccessPolicies
Description A PolicyDocument that specifies access policies for the search domain's services, and the current status of those policies. Type: AccessPoliciesStatus (p. 154)
UpdateStemmingOptionsResult
Description
A response message that contains the status of updated stemming options.
Contents
Name
Stems
Description The stemming options configured for this search domain and the current status of those options. Type: StemmingOptionsStatus (p. 167)
UpdateStopwordOptionsResult
Description
A response message that contains the status of updated stopword options.
Contents
Name
Stopwords
Description The stopword options configured for this search domain and the current status of those options. Type: StopwordOptionsStatus (p. 167)
UpdateSynonymOptionsResult
Description
A response message that contains the status of updated synonym options.
Contents
Name
Synonyms
Description The synonym options configured for this search domain and the current status of those options. Type: SynonymOptionsStatus (p. 168)
Description The action to perform. Default: None Type: String The parameters required to authenticate a query request. Contains: AWSAccessKeyID SignatureVersion Timestamp Signature Default: None The Access Key ID corresponding to the AWS Secret Access Key you used to sign the request. Default: None Type: String The date and time at which the request signature expires, in the format YYYY-MM-DDThh:mm:ssZ, as specified in the ISO 8601 standard. Condition: Requests must include either Timestamp or Expires, but not both. Default: None Type: String The temporary security token obtained through a call to AWS Security Token Service. Only available for actions in the following AWS services: Amazon EC2, Amazon Simple Notification Service, Amazon SQS, and AWS SimpleDB. Default: None Type: String
Required Yes
AuthParams
Conditional
AWSAccessKeyId
Yes
Expires
Conditional
SecurityToken
Parameter Name
Signature
Description
Required
The digital signature you created for the request. Refer Yes to the service's developer documentation for information about how to generate the signature. Default: None Type: String The hash algorithm you used to create the request signature. Default: None Valid Values: HmacSHA256 | HmacSHA1. Type: String Yes
SignatureMethod
SignatureVersion
The signature version you use to sign the request. Set Yes this to the value recommended in your product-specific documentation on security. Default: None Type: String The date and time the request was signed, in the format Conditional YYYY-MM-DDThh:mm:ssZ, as specified in the ISO 8601 standard. Condition: Requests must include either Timestamp or Expires, but not both. Default: None Type: String The API version to use, in the format YYYY-MM-DD. Default: None Type: String Yes
Timestamp
Version
Common Errors
This section lists the common errors that all actions return. Any action-specific errors are listed in the topic for the action. Error Description HTTP Status Code 400
IncompleteSignature
InternalFailure
The request processing has failed due to some 500 unknown error, exception, or failure. The action or operation requested is invalid. The X.509 certificate or AWS Access Key ID provided does not exist in our records. 400 403
InvalidAction InvalidClientTokenId
Error
Description
InvalidParameterCombination
Parameters that must not be used together were used together. A bad or out-of-range value was supplied for the input parameter. AWS query string is malformed, does not adhere to AWS standards. The query string is malformed. The request is missing an action or operation parameter.
InvalidParameterValue
400
InvalidQueryParameter
400
MalformedQueryString MissingAction
404 400
MissingAuthenticationToken
Request must contain either a valid (registered) 403 AWS Access Key ID or X.509 certificate. An input parameter that is mandatory for processing the request is not supplied. 400
MissingParameter
OptInRequired
The AWS Access Key ID needs a subscription 403 for the service. Request is past expires date or the request date (either with 15 minute padding), or the request date occurs more than 15 minutes in the future. The request has failed due to a temporary failure of the server. 400
RequestExpired
ServiceUnavailable
503
Throttling
Note
The document service API is a REST-style API that has a single resource, documents/batch. The API version must be specified in all requests. The current Amazon CloudSearch API version is 2011-02-01. The other APIs you use to interact with Amazon CloudSearch are: Amazon CloudSearch Configuration API Reference (p. 126)Set up and manage your search domain. Amazon CloudSearch Search API Reference (p. 184)Search your domain.
documents/batch
This section describes the HTTP request and response messages for the documents/batch resource. You use the documents/batch resource to submit data to your search domain for indexing. It is accessed through a domain's document service endpoint at /2011-02-01/documents/batch. All requests must be submitted using HTTP POST.
Requests can only be submitted to your search domain's document service from authorized IP addresses. For information about authorizing IP addresses to submit document service requests, see Configuring Access for an Amazon CloudSearch Domain (p. 32). For more information about submitting data for indexing, see Uploading Data to an Amazon CloudSearch Domain (p. 72).
Note
When specifying SDF in JSON, the value for a field cannot be null. An add or delete operation is only applied to an existing document if the version number specified in the operation is greater than the existing document's version number. If a batch contains multiple add or delete operations for the same document, the operation with the highest version number is applied. (If multiple operations in a batch specify the same document and version number, the document service arbitrarily picks which one to apply.) The JSON schema representation of a batch is shown below:
{ "type": "array", "minItems": 1, "items": { "type": "object", "properties": { "type": { "type": "string",
"enum": ["add", "delete"], "required": true }, "id": { "type": "string", "pattern": "[a-z0-9][a-z0-9_]{0,127}", "minLength": 1, "maxLength": 128, "required": true }, "version": { "type": "number", "minimum": 1, "maximum": 4294967295 "required": true }, "lang": { "type": "string", "minLength": 2, "maxLength": 2 }, "fields": { "type": "object", "patternProperties": { "[a-zA-Z0-9][a-zA-Z0-9_]{0,63}": { "type": "string", } } } } } }
version lang
Yes Conditional
Property field_name
Description
Required
Specifies a field within the document being added. Conditional Field names must begin with a letter and can contain the following characters: a-z (lower case), 0-9, and _ (underscore). Field names must be at least 3 and no more than 64 characters.The names "body", "docid", and "text_relevance" are reserved names and cannot be used as field names. To specify multiple values for a field, you specify an array of values instead of a single value. For example:
"genre": ["Adven ture","Drama","Fantasy","Thriller"]
} } }, "warnings": { "type": "array", "required": false, "items": { "type": "object", "properties": { "message": { "type": "string", "required": true } } } } } }
name="actor">Crewson, Wendy</field> name="actor">Warner, Amelia</field> name="actor">Cosmo, James</field> name="actor">Hickey, John Benjamin</field> name="actor">Piddock, Jim</field> name="actor">Lockhart, Emma</field> id="tt0301199" version="1" />
The collection of add or delete operations that you Yes want to submit to your search domain. A batch must contain at least one add or delete element. Specifies a document that you want to add to your No search domain. The id, version, and lang attributes are required and an add element must contain at least one field. Attributes: idAn alphanumeric string. Any characters other than A-Z (upper or lower case) and 0-9 are illegal. The max length is 128 characters. versionAny non-negative number less than 2^32. langAn ISO-639-1 two-letter language code. English (en) is currently the only supported language.
add
field
Specifies a field in the document being added. The Conditional name attribute and a field value are required. Field names must begin with a letter and can contain the following characters: a-z (lower case), 0-9, and _ (underscore). The names "body", "docid", and "text_relevance" are reserved names and cannot be used as field names. The field value can be text or CDATA. To specify multiple values for a field, you include multiple field elements with the same name. For example:
<field <field <field <field name="genre">Adventure</field> name="genre">Drama</field> name="genre">Fantasy</field> name="genre">Thriller</field>
Constraints: nameAn alphanumeric string that begins with a letter. Can contain a-z (lower case), 0-9, _ (underscore), - (hyphen), and . (period). Condition: At least one field must be specified in an add element.
Element delete
Description
Required
Specifies a document that you want to remove from No your search domain. The id and version attributes are required. A delete element must be empty. Constraints: idAn alphanumeric string. Any characters other than A-Z (upper or lower case) and 0-9 are illegal. versionAny number less than 2^32. The version number specified must be higher than the document's current version number for the document to be deleted.
Error No Content-Type No Content-Length Incorrect Path Invalid HTTP Method Invalid Accept Type
Description The Content-Type header is missing. The Content-Length header is missing. URL path does not match ''/YYYY-MM-DD/documents/batch''. The HTTP method is not POST. Requests must be posted to documents/batch. Accept header specifies a content type other than ''application/xml'' or ''application/json''. Responses can be sent only as XML or JSON.
Description
The length of the request body is larger than 413 the maximum allowed value. The character set is something other than ''ASCII'', ''ISO-8859-1'', or '''UTF-8''. 415
A standard MIME type describing the format of the No response data. For more information, see W3C RFC 2616 Section 14. Default: the content-type of the request Constraints: application/json or application/xml only
CommonResponse Headers
Name Content-Type Description A standard MIME type describing the format of the object data. For more information, see W3C RFC 2616 Section 14. Default: application/xml Constraints: application/xml or application/json only Content-Length The length in bytes of the body in the response.
search
You use the search API to search the documents that you've uploaded to your search domain. Search requests are submitted via GET with a set of field-value pairs specified directly in the HTTP query string.The maximum size of a search request is 8190 bytes, including the HTTP method, URI, and protocol version. The response format can be either JSON or XML. (Errors are always returned in JSON.)
Note
Requests can be submitted to your search domain's search service only from authorized IP addresses. For information about authorizing IP addresses to submit search requests, see Configuring Access for an Amazon CloudSearch Domain (p. 32). Amazon CloudSearch processes search requests in two phases. First, it identifies the complete set of documents that match the terms specified with the q (query) and bq (Boolean query) Search Request Parameters (p. 186). Amazon CloudSearch then processes the match-set of search hits to: Filter the hits according to the value of the t-FIELD parameter (if specified).
Rank the filtered hits using the fields specified in the rank parameter. If the rank parameter is not specified, results are ranked according to their text_relevance scores. Compute facet counts for the fields specified in the facet parameter and the constraints specified for each field (if any). Return the processed set of hits. The maximum number of hits returned is controlled by the size parameter. By default, the top ten results are returned. You can specify an offset with the start parameter to retrieve the next set of hits. For more information about searching with Amazon CloudSearch, see Searching Your Data with Amazon CloudSearch (p. 80).
Search Requests
You submit search requests to your domain's search endpoint via HTTP GET. To construct a search request, you append the Amazon CloudSearch API version and the name of the resource you are accessing, 2011-02-01/search, and a query string that specifies the terms and constraints for your search and what you want to get back in the response. The maximum size of a search request is 8190 bytes, including the HTTP method, URI, and protocol version. For example, the following request performs a simple text search of the search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com domain and gets the contents of the title field:
http://search-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch. amazonaws.com/2011-02-01/search?q=star+wars&return-fields=title
Note
The API version must be specified in all search requests. When there are updates to the Search API, you access them using a new API version. The query string in a search request must be URL-encoded. You can use any method you want to send GET requests to your domain's search endpointyou can enter the request URL directly in a Web browser, use cURL to submit the request, or generate an HTTP call using your favorite HTTP library. By default, Amazon CloudSearch returns the response in JSON. You can also get the results formatted in XML by specifying the results-type parameter, results-type=xml.
Note
You can also use the Search Tester in the Amazon CloudSearch console to search your data, browse the results, and view the generated request URLs and JSON and XML responses. For more information, see Searching with the Search Tester (p. 14). Amazon CloudSearch can return up to 2 KB of data from a text fieldif the contents of the field exceed 2 KB, only the first 2 KB is included in the results. (All of the data is searchable, only the result data is truncated.)
Search Syntax
GET /2011-02-01/search
Description One or more match expressions (p. 189) that define a Boolean search. Multiple expressions are joined with a top-level AND. If the bq parameter is specified in conjunction with the q parameter, the values are joined with a top-level AND. Within a match expression, you can use the - (NOT), | (OR), and * (wildcard) operators to exclude particular terms, find results that match any of the specified terms, or search for a prefix. To search for a phrase rather than individual terms, you can enclose the phrase in double quotes. For more information, see Searching Your Data with Amazon CloudSearch (p. 80). Condition: Required if the q parameter is not specified. Type: String
Required Conditional
facet
A comma-separated list of the fields for which you want to compute facets. The specified fields must be numeric fields or defined as facet-enabled in the domain configuration. By default, counts are computed for all field values. If you want to specify the field values that you want counted for a particular field, use the facet-FIELD-constraints parameter instead, where FIELD is the name of the field. You can specify the maximum number of constraints to include in the results with the facet-FIELD-top-n parameter. By default, the results include counts for the top 40 constraints. Type: String
No
Name
Description
Required
facet-FIELD-constraints The field values (facet constraints) that you want to count No for a particular field. FIELD is the name of the field. Constraints are specified as a comma-separated list of ranges or single-quoted strings. For example, facet-year-constraints=2000..2011 calculates facet counts for the years 2000 through 2011, inclusive. You can omit the lower end of a range to count all of the values less than or equal to the specified value. Similarly, you can omit the upper end of a range to count all of the values greater than or equal to the specified value. To specify constraints for a text field, enclose the values in single quotes. For example, facet-color-constraints='red','blue','green'. If you don't specify facet constraints, counts are computed for all field values.
Type: String
facet-FIELD-sort
How you want to sort facet values for a particular field. FIELD is the name of the field. There are four sorting options: alphaSort the facet values alphabetically (in ascending order). countSort the facet values by their counts (in descending order). maxSort the facet values according to the maximum values in the specified field. This option is specified as max(FIELD). By default, the facet values are sorted in ascending order. To sort in descending order, prefix the sort option with - (minus): -max(FIELD). sumSort the facet values according to the sum of the values in the specified field (in ascending order). This option is specified as sum(FIELD). Type: String
No
facet-FIELD-top-n
Set the maximum number of facet constraints to be included for the specified field in the search results. By default, the results include counts for the top 40 constraints. Type: Integer
No
Name
q
Description
Required
The string to search for. You use the q parameter to Conditional perform simple text searches. This searches the default search field for the specified text. If the q parameter is specified in conjunction with the bq parameter, the values are joined with a top-level AND. If you separate search terms with plus (+) or a space, Amazon CloudSearch matches documents that contain all of the specified search termsthey are ANDed together. For example, q=star+wars searches the default field for star and wars. This is equivalent to specifying bq='star wars'. You can use the - (NOT), | (OR), and * (wildcard) operators to exclude particular terms, find results that match any of the specified terms, or search for a prefix. To search for a phrase rather than individual terms, you can enclose the phrase in double quotes. For more information, see Searching Your Data with Amazon CloudSearch (p. 80). Condition: Required if the bq parameter is not specified. Type: String
rank
A comma-separated list of fields or rank expressions to No use for ranking. A maximum of 10 fields and rank expressions can be specified. You can use any uint field to rank results numerically. Any result-enabled text or literal field can be used to rank results alphabetically. To rank results by relevance, you can specify the name of a custom rank expression or text_relevance. Hits are ordered according to the specified rank field(s). By default, hits are ranked in ascending order. You can prefix a field name with a minus (-) to rank in descending order. If no rank parameter is specified, it defaults to rank=-text_relevance, which lists results according to their text_relevance scores with the highest-scoring documents first. Type: String
results-type
Controls the content type of the response, json or xml. The default is json. Type: String
No
return-fields
The document fields to include in the response. Up to 2 No KB of data can be returned from a text field. If the field contents exceed 2 KB, only the first 2 KB is included in the results. Specified as a comma-separated list of field names. If no return-fields are specified, only the document ids of the hits are returned. Type: String
Name
size
Description
Required
The maximum number of search hits to return. The default No is 10. Type: Positive Integer
start
The offset of the first search hit you want to return. The default is 0 (the first hit). Type: Positive Integer
No
t-FIELD
Restrict the match set used in subsequent post-processing No steps according to the specified rank expression. Only hits that have a score within the specified RANGE are included. Ranges are specified as described in Expression Syntax for Boolean Queries (p. 189). Type: RANGE
Description Search for a string in the specified text or literal field. For example, bq=title:'star'. Any single quotation marks or backslashes in the string must be escaped with a backslash. Search for a string in the specified text or literal field. For example, (field title 'star'). Any single quotation marks or backslashes in the string must be escaped with a backslash. You can use this alternate fielded search syntax when you're specifying multiple fielded search expressions as part of Boolean expression. For example, bq=(and (field title 'star') (filter year ..2000)). Search for an integer value in the specified uint field. For example, bq=year:2000. Matches documents that have at least one value in the field that equals the specifed value. You can specify a single value or a range of values. A pair of nonnegative integers separated by two dots matches documents that have at least one attribute in the field that falls in the specified range. You can omit one value to specify an open-ended upper or lower limit. The range is inclusive on both ends. For example, bq=year:1998..2000.
FIELD:value
Expression Syntax
(filter FIELD value)
Description Search for an integer in the specified uint field. For example, (filter year 2000). Matches documents that have at least one value in the field that equals the specifed value. You can use this alternate fielded search syntax when you're specifying multiple fielded search expressions as part of Boolean expression. For example, bq=(and (field title 'star') (filter year ..2000)). You can specify a single value or a range of values. A pair of nonnegative integers separated by two dots matches documents that have at least one attribute in the field that falls in the specified range. You can omit one value to specify an open-ended upper or lower limit. The range is inclusive on both ends. For example, (filter year 1998..2000). Include hits only if they match all of the specified expressions. (Boolean AND operator.) For example, bq=(and (field title 'star') (field actor 'Ford, Harrison') (filter year ..2000)). Exclude hits that match the specified expression. (Boolean NOT operator.) For example, bq=(not (and (field actor 'Guinness, Alec') (field actor 'Ford, Harrison'))). Include hits that match any of the specified expressions. (Boolean OR operator.) For example, bq=(or (field actor 'Guinness, Alec') (field actor 'Ford, Harrison') (field actor 'Jones, James Earl')).
(not expression1)
Search Response
When a request completes successfully, the response body contains the search results. By default, search results are returned in JSON. If the results-type parameter is set to XML, search results are returned in XML. When a request returns an error code, the body of the response contains information about the error that occurred. Error responses are always returned in JSON. If an error occurs while the request body is parsed and validated, the error code is set to 400 and the response body includes a list of the errors and where they occurred. The following example shows a JSON response.
{ "rank":"-text_relevance", "match-expr":"(label 'star wars')", "hits":{ "found":7, "start":0, "hit":[
{"id":"tt1185834", "data":{ "actor":["Abercrombie, Ian","Baker, Dee","Burton, Corey"], "title":["Star Wars: The Clone Wars"] } }, . . . {"id":"tt0121766", "data":{ "actor":["Bai, Ling","Bryant, Gene","Castle-Hughes, Keisha"], "title":["Star Wars: Episode III - Revenge of the Sith"] } } ] }, "info":{ "rid":"b7c167f6c2da6d93531b9a7b314ad030b3a74803b4b7797edb905ba5a6a08", "time-ms":2, "cpu-time-ms":0 } }
Description A standard MIME type describing the format of the object data. For more information, see W3C RFC 2616 Section 14. Default: application/json Constraints: application/json or application/xml only
Content-Length
Description Shows the match expression constructed from the search parameters. Contains hit statistics (found, start) and a hit array that lists the document ids and data for each hit. The total number of hits that match the search request after Amazon CloudSearch finished processing the match set. The index of the first hit returned in this response. An array that lists the document ids and data for each hit. The unique identifier for a document. A list of returned fields. Contains facet information and facet counts. A field for which facets were calculated. An array of the facet values and counts. The facet value being counted. The number of hits that contain the facet value in FacetFieldName. Contains information about the request processing. Lists the fields that were used to rank the search hits. The encrypted Resource ID. How long it took to process the search request in milliseconds. The CPU time required to process the search request in milliseconds. Contains any warning or error messages returned by the search service. The severity, source, host, code, and message are included for each one. Whether the message is a warning or error. The host from which the message originated.
found
start hit id data facets FacetFieldName constraints value count info rank rid time-ms cpu-time-ms messages
severity host
Property code
Description The warning or error code. The search service returns the following warnings and errors: WildcardTermLimitmore than 2000 terms matched the wildcard in the search request. The number of terms matched was limited to 2000. InvalidFieldOrRankAliasInRankParameterthe specified ranking field could not be found. UnknownFieldInMatchExpressiona field specified in the bq parameter could not be found. IncorrectFieldTypeInMatchExpressionthe type specified in the match expression does not match the field type. InvalidMatchExpressionthe match expression could not be parsed. UndefinedFieldan unknown field was specified in the match expression.
message
A description of the warning or error that was returned by the search service.
Description If the request was successful, contains the search results. If an error occurs, the info element lists the warnings or errors that were returned by the search service. Lists the fields that were used to rank the search hits. Shows the match expression constructed from the search parameters. Contains hit statistics and a collection of hit elements. The found attribute is the total number of hits that match the search request after Amazon CloudSearch finished processing the results. The contained hit elements are ordered according to their text_relevance scores or the rank option specified in the search request. A document that matched the search request. The id attribute is the document's unique id. Contains a d (data) element for each returned field. A field returned from a hit. Hit elements contain a d (data) element for each returned field. Contains a facet element for each facet requested in the search request. Contains a constraint element for each value of a field for which a facet count was calculated. The facet-FIELD-top-n request parameter can be used to specify how many constraints to return. By default, facet counts are returned for the top 40 constraints. The facet-FIELD-constraints request parameter can be used to explicitly specify which values to count. A facet field value and the number of occurrences (count) of that value within the search hits.
hit
facets facet
constraint
Name
info
Description Information about the request processing. The rid attribute is the encrypted Resource ID. The time-ms attribute is how long it took to process the search request, in milliseconds. The cpu-time-ms attribute is the CPU time required to process the search request, in milliseconds. Information about a warning or error returned by the search service while processing the request. The severity attribute is either warning or error. The code attribute specifies one of the following warning or error codes: WildcardTermLimitmore than 2000 terms matched the wildcard in the search request. The number of terms matched was limited to 2000. InvalidFieldOrRankAliasInRankParameterthe specified ranking field could not be found. UnknownFieldInMatchExpressiona field specified in the bq parameter could not be found. IncorrectFieldTypeInMatchExpressionthe type specified in the match expression does not match the field type. InvalidMatchExpressionthe match expression could not be parsed. UndefinedFieldan unknown field was specified in the match expression. The host attribute specifies the id of the host from which the message originated.
message
Description
The request path (API version or collection name) 404 was not valid. Consult the body of the response for details and adjust the request before retrying. The HTTP method was not GET, POST, HEAD, or 405 OPTIONS. The search API does not support PUT or DELETE methods. The server did not receive a complete request within the time allowed. 408
A POST request did not include a Content-Length 411 header. A POST request included a body larger than the 413 search API supports. Use multiple simpler, smaller requests in place of one large request.
Description
An internal problem occurred. The request can be 500 retried. The request was throttled. The request rate or resource consumption should be reduced before retrying the request. 509
Uploading Documents
If your SDF is not formatted correctly or contains invalid values, you will get errors when you attempt to upload it or use it to configure fields for your domain. Here are some common problems and their solutions: Invalid JSONif you are using JSON, the first thing to do is make sure there are no JSON syntax errors in your SDF batch. To do that, run it through a validation tool such as the JSON Validator. This will identify any fundamental issues with the data. Invalid XMLSDF batches must be well-formed XML. You are especially likely to encounter issues if your fields contain XML datathe data must be XML-encoded or enclosed in CDATA sections. To identify any problems, run your SDF batch through a validation tool such as the W3C Markup Validation Service. Not Recognized as SDFif you are configuring your domain from SDF and Amazon CloudSearch doesnt recognize your data as valid SDF, it responds with a list of generic metadata fields: content_encoding content_language content_type language resourcename For example, this can happen if there are invalid document IDs or version numbers. Make sure that your SDF data contains all of the required properties for each document.
Document IDs with bad valuescapital letters, hyphens, and other special characters are not allowed in document IDs. Document IDs can only contain the characters a-z (lowercase letters), 0-9, and underscore (_). Document IDs must start with a letter or number; they cannot start with an underscore. Bad version numbersversion numbers must fit within a 32-bit unsigned integer (. When specifying your SDF in JSON, make sure that the version number is not enclosed in quotes. If it is, the version is treated as a string and Amazon CloudSearch will reject the SDF as invalid. Multi-valued fields without a valuewhen specifying SDF in JSON, you cannot specify an empty array as the value of a field. Multi-valued fields must contain at least one value. Bad charactersone problem that can be difficult to detect if you do not filter your data while generating your SDF batch is that can contain characters that are invalid in XML. Both JSON and XML batches can contain only UTF-8 characters that are valid in XML. You can use a validation tool such as the JSON Validator or W3C Markup Validation Service to identify invalid characters.
Field name
Document size Batch size Document version number size Document language Maximum number of index fields
Item Maximum number of sources for an index field Maximum number of field values
Limit Up to 20 sources can be configured for a field. Up to 100 values can be specified in a field.
Maximum size of terms in an index field Individual terms within a text or literal field are truncated if they exceed 256 characters. Default value size Uint field range The maximum size of a default value for a field is 1 KB. A uint field can contain values in the range 0 - max(uint32_t).
Maximum number of rank expressions Up to 50 rank expressions can be configured for a domain. Rank expression size The maximum size of a rank expression is 10240 bytes. The maximum value that can be returned by a rank expression is max(uint32_t). An integer value in the range 0-1000. 10 50 The maximum size of a Amazon CloudSearch policy document is 100 KB. The maximum size of a Amazon CloudSearch stemming dictionary is 500 KB. The maximum size of a Amazon CloudSearch stopwords dictionary is 10 KB. The maximum size of a Amazon CloudSearch synonym dictionary is 100 KB. The size parameter can contain values in the range 0 max(uint32_t). The start parameter can contain values in the range 0 max(uint32_t). Up to 10 uint fields and expressions can be specified in the rank parameter. The maximum size of a search request submitted as an HTTP GET request is 8190 bytes. Up to 2 KB of data can be returned from a field. If the field contents exceed 2 KB, only the first 2 KB is included in the results.
text_relevance score Maximum search partitions Maximum search instances Policy document size Stemming dictionary size Stopwords dictionary size Synonym dictionary size Search requests: size parameter Search requests: start parameter Search requests: rank parameter Search requests: GET requests Search requests: returned data
document ID (docid)
facets
facet constraints facet enabled hits index index field index field name indexing options
Specify the particular facet values that you want to count. An index field option that enables facet information to be calculated for the field. Documents that match the criteria specified in the search request. Also referred to as search results. See search index (p. 202). A name-value pair that is included in a search domain's index. An index field can contain text, literal, or unsigned integer data. The name of a text, literal, or uint field. Configuration settings that define a search domain's index fields, how SDF data is mapped to those index fields, and how the index fields can be used. A numeric expression that you can use to control how search hits are ranked. You can construct rank expressions using uint fields, other rank expressions, a document's default text_relevance score, and standard numeric operators and functions. When you use the rank option to specify a rank expression in a search request, the expression is evaluated for each search hit and the hits are listed according to their rank expression values. An index field option that enables the field's value(s) to be returned in the search results. Search Data Format. The API that you use to submit search requests to a domain. The format that you use to describe the data that you want to add or delete from your search domain. Search Data Format (SDF) can be represented as either JSON or XML. Encapsulates your searchable data and the search instances that handle your search requests. You set up a separate domain for each different collection of data that you want to search. A search domain's indexing options, text options, access policies, and rank expressions. A user-specified name that is used to construct a unique identifier for a domain. An index field option that enables the field data to be searched. A representation of your searchable data that facilitates fast and accurate data retrieval. A search instance is a compute resource that indexes your data and processes search requests. A search domain has one or more search instances, each with a finite amount of RAM and CPU resources. As your data volume grows, more search instances or larger search instances are deployed to contain your indexed data. When necessary, your index is automatically partitioned across multiple search instances. As your request volume or complexity increases, each
rank expression
search domain
search domain configuration search domain name search enabled search index search instances
search partition is automatically replicated to provide additional processing capacity. search requests search result search service endpoint source stem stemming A request that is sent to a search domain to retrieve documents that match particular search criteria. A document that matches a search request. Also referred to as a search hit. The URL that you connect to when sending search requests to a search domain. An SDF document field that is used to populate an index field. The common root or substring shared by a set of related words. The process of mapping related words to a common stem. This enables matching on variants of a word. For example, a search for "horse" could return matches for horses, horseback, and horsing, as well as horse. A domain-specific collection of mappings of words to their stems. Amazon CloudSearch does not define a default stemming dictionary. The process of filtering stop words from an index or search request. A word that is not indexed and is automatically filtered out of search requests because it is either insignificant or so common that including it would result in too many matches to be useful. Stop words are language-specific. A domain-specific collection of stopwords. Amazon CloudSearch defines a default stopword dictionary for English that you can use as-is, or customize to suit your collection of data. A word that is the same or nearly the same as an indexed word and that should produce the same results when specified in a search request. For example, a search for "Rocky Four" or "Rocky 4" should return the fifth Rocky movie. This can be done by designating that four and 4 are synonyms for IV. Synonyms are language-specific. A domain-specific collection of synonym mappings. Amazon CloudSearch does not define a default synonym dictionary. A built-in relevance score that's based on the repetition of search terms in the document and proximity of search terms to each other in each matching index field in the document. A document's text_relevance score is an integer value from 0 to 1000 (inclusive). Domain-specific stopword, stemming, and synonym dictionaries used during text processing when building a search index. Stopwords and stems are also used at search time to process the search terms before looking for matching documents in the index. Part of the text processing that Amazon CloudSearch performs when indexing and processing search requests. During indexing, the contents of each text field are split into a collection of tokens that can be indexed separately. Punctuation is stripped and each word (that isn't in the stopword list) becomes a token. For example, the string "spider-man" would be split into two tokens: spider and man. At search
API Version 2011-02-01 203
stopword dictionary
synonym
text options
tokenization
time, the search terms are tokenized using the same rules before being matched against the indexed tokens. version See document version (p. 201).
Change
Description
Initial product release Amazon CloudSearch is introduced as a new service in Beta release. Added how to clone a You can clone an existing search domain to get an empty domain domain that has the same indexing options. For more information, see Cloning an Existing Domain's Indexing Options (p. 59). Added Getting Started video and the Troubleshooting and Articles sections
25 April 2012
A screencast of the Getting Started tutorial is now available 9 July 2012 on YouTube. The Troubleshooting Amazon CloudSearch (p. 196) provides solutions to common SDF issues, a workaround for deleting all documents from a domain, and tips for reducing document update latency. The Articles section provides a link to the new Guide to Formatting Your Data in SDF for Amazon CloudSearch available from aws.amazon.com/articles. Reorganized Searching Your Data with Amazon 27 July 2012 CloudSearch (p. 80), added the Guide to Using Elastic IPs to Manage Access to Amazon CloudSearch Domains to Amazon CloudSearch Articles and Tutorials (p. 200), and added an item about retrieving document versions to Troubleshooting Amazon CloudSearch (p. 196).