A good place to start is with batches of 1,000 to 5,000 documents and a total payload between 5MB and 15MB.
hits._score - the document’s relevance score (not applicable when using match_all)
Each search request is self-contained: Elasticsearch does not maintain any state information across requests.
must or should clause contributes to the document’s relevance score while must_not clause is treated as a filter
- You can also explicitly specify arbitrary filters to include or exclude documents based on structured data
The first alternative is to have an index per document type. Instead of storing tweets and users in a single twitter index, you could store tweets in the tweets index and users in the user index. Indices are completely independent of each other and so there will be no conflict of field types between indices. This approach has two benefits:
- Data is more likely to be dense and so benefit from compression techniques used in Lucene.
- The term statistics used for scoring in full text search are more likely to be accurate because all documents in the same index represent a single entity.
Each index can be sized appropriately for the number of documents it will contain: you can use a smaller number of primary shards for users and a larger number of primary shards for tweets.