Skip to content

Commit 01f9f0e

Browse files
author
Sam Hames
authored
Merge pull request #554 from DocNow/add_searches_docs
Add higher level examples for the searches command
2 parents 2d2d210 + 79047df commit 01f9f0e

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

docs/twarc2_en_us.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,34 @@ leave off the `--start-time`:
133133

134134
twarc2 search --end-time 2014-07-24 '"eric garner"' tweets.jsonl
135135

136+
## Searches
137+
138+
Searches works like the [search](#search) command, but instead of taking a single query, it reads from a file containing many queries. You can use the same limit and time options just like a single search command, but it will be applied to every query.
139+
140+
The input file for this command needs to be a plain text file, with one line for each query you want to run, for example you might have a file called `animals.txt` with the following lines:
141+
142+
cat
143+
dog
144+
mouse OR mice
145+
146+
Note that each line will be passed through directly to the Twitter API - if you have quoted strings, they will be treated as a phrase search by the Twitter API, which might not be what you intended.
147+
148+
If you run the following `searches` command, `animals.json` will contain at least 100 tweets for each query in the input file:
149+
150+
twarc2 searches --limit 100 animals.txt animals.json
151+
152+
You can use the `--archive` and `--start-time` flags just like a regular search command too, in this case to search the full archive of all tweets for the first day of 2020:
153+
154+
twarc2 searches --archive --start-time 2020-01-01 --end-time 2020-01-02 animals.txt animals.json
155+
156+
You can also use the `--counts-only` flag to check volumes first. This produces a csv file in the same format as the [counts](#counts) command with the `--csv` flag, with the addition of a column containing the query for that row.
157+
158+
twarc2 searches --counts-only animals.txt animals_counts.csv
159+
160+
One more thing - if you have a lot searches you want to run, you might want to consider using the `--combine-queries` flag. This combines consecutive queries into the file into a single longer query, meaning you issue fewer API calls and potentially collect fewer duplicate tweets that match more than one query. Using this on the `animals.txt` file as input will combine the three queries into the single longer query `(cat) OR (dog) OR (mouse OR mice)`, and only issue one logical query.
161+
162+
twarc2 searches --combine-queries animals.txt animals_combined.json
163+
136164
## Stream
137165

138166
The `stream` command will use Twitter's API

0 commit comments

Comments
 (0)