Python Elasticsearch Client - Manage and Query Data - elasticsearch Cheatsheets

Elasticsearch Cheatsheet with Python

This cheatsheet provides essential commands and examples for using the Python Elasticsearch client to manage and query your data.

Key Features and Operations:

Using the Python Elasticsearch Library
Authentication with AWS IAM
Authentication with HTTP Basic Auth
Retrieving Elasticsearch Cluster Info
Ingesting Single Documents
Performing Bulk Data Ingestion
Viewing Available Indices
Searching for Documents
Interacting with Elasticsearch via Python Requests
Creating an Index using Requests

Python Library Usage

Setting up the Elasticsearch Client

Connecting with AWS IAM Authentication

Securely connect to AWS Elasticsearch Service using IAM credentials.

from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers
from requests_aws4auth import AWS4Auth

# Replace with your actual AWS credentials and region
aws_access_key_id = "YOUR_ACCESS_KEY_ID"
aws_secret_access_key = "YOUR_SECRET_ACCESS_KEY"
aws_region = "YOUR_AWS_REGION"
es_endpoint = "YOUR_ES_ENDPOINT"
aws_session_token = "YOUR_SESSION_TOKEN" # Optional, if using temporary credentials

aws_auth = AWS4Auth(aws_access_key_id, aws_secret_access_key, aws_region, 'es', session_token=aws_session_token)

es = Elasticsearch(
    hosts = [{'host': es_endpoint, 'port': 443}],
    http_auth=aws_auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

Connecting with HTTP Basic Authentication

Connect to Elasticsearch using standard username and password authentication.

from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers

es_endpoint = "YOUR_ES_ENDPOINT"
es_username = "YOUR_USERNAME"
es_password = "YOUR_PASSWORD"

es = Elasticsearch(
    hosts = [{'host': es_endpoint, 'port': 443}],
    http_auth=(es_username, es_password),
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

Elasticsearch Cluster Information

Retrieve detailed information about your Elasticsearch cluster.

>>> es.info()
{'name': 'elasticsearch-02', 'cluster_name': 'es-cluster', 'cluster_uuid': 'EJDqv5VrQyao07ndQuwhCw', 'version': {'number': '6.8.2', 'build_flavor': 'default', 'build_type': 'deb', 'build_hash': 'b506955', 'build_date': '2019-07-24T15:24:41.545295Z', 'build_snapshot': False, 'lucene_version': '7.7.0', 'minimum_wire_compatibility_version': '5.6.0', 'minimum_index_compatibility_version': '5.0.0'}, 'tagline': 'You Know, for Search'}

For a more readable output, you can use JSON formatting:

>>> import json
>>> print(json.dumps(es.info(), indent=2))
{
  "name": "elasticsearch-02",
  "cluster_name": "es-cluster",
  "cluster_uuid": "EJDqv5VrQyao07ndQuwhCw",
  "version": {
    "number": "6.8.2",
    "build_flavor": "default",
    "build_type": "deb",
    "build_hash": "b506955",
    "build_date": "2019-07-24T15:24:41.545295Z",
    "build_snapshot": false,
    "lucene_version": "7.7.0",
    "minimum_wire_compatibility_version": "5.6.0",
    "minimum_index_compatibility_version": "5.0.0"
  },
  "tagline": "You Know, for Search"
}

Ingesting Documents

Ingesting a Single Document

Add a document to an index, specifying a unique document ID.

doc = {
    'author': 'john_doe',
    'text': 'Elasticsearch is amazing!',
    'timestamp': '2023-10-27T10:00:00Z'
}
res = es.index(index="my-test-index", doc_type='tweet', id=1, body=doc)
print(res)

Bulk Data Ingestion

Efficiently ingest multiple documents using the helpers.bulk utility.

from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers

bulk_docs = []
list_of_dicts = [
    {"name": "ruan", "age": 32, "city": "New York"},
    {"name": "stefan", "age": 31, "city": "London"},
    {"name": "alice", "age": 29, "city": "Paris"}
]

for doc in list_of_dicts:
    action = {
        "_index": 'my-bulk-index',
        "_type": '_doc',
        "_source": doc
    }
    bulk_docs.append(action)

success, errors = helpers.bulk(es, bulk_docs)
print(f"Successfully indexed: {success} documents")
if errors:
    print(f"Encountered errors: {errors}")

Managing Indices

View Available Indices

List all indices currently present in your Elasticsearch cluster.

>>> indices = es.indices.get_alias("*").keys()
>>> print(list(indices))
['fluentd-2020.02.26', 'metricbeat-2020.02.25', 'filebeat-2020.02.25', 'fluentd-2020.02.26', '.tasks', 'fluentd-2020.02.24', 'metricbeat-2019', 'telegram-bot', '.kibana_1', 'metricbeat-2020.02.26', 'filebeat-2020.02.26', 'metricbeat-2020.02']

Searching Documents

Perform a search query against a specific index.

Example Query: {"query": {"match": {"text": "HI"}}}

>>> search_body = {
    "query": {
        "match": {
            "text": "HI"
        }
    }
}
>>> response = es.search(index="telegram-bot", doc_type="_doc", body=search_body)
>>> print(json.dumps(response, indent=2))
{
  "took": 335,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "telegram-bot",
        "_type": "_doc",
        "_id": "x",
        "_score": 0.6931472,
        "_source": {
          "message_id": "x",
          "date": "x",
          "text": "HI",
          "entities": [],
          "caption_entities": [],
          "photo": [],
          "new_chat_members": [],
          "new_chat_photo": [],
          "delete_chat_photo": false,
          "group_chat_created": false,
          "supergroup_chat_created": false,
          "channel_chat_created": false
        }
      }
    ]
  }
}

Interacting with Elasticsearch via Python Requests

While the official client is recommended, you can also use the requests library for direct HTTP interactions.

Create an Index using Requests

Define index settings and create a new index.

import requests
import json

es_url = 'https://es.x.x/my-new-index' # Replace with your Elasticsearch endpoint and index name
auth_user = 'username'
auth_pass = 'pass'

index_settings = {
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 1
    }
}

response = requests.put(
    es_url,
    auth=(auth_user, auth_pass),
    headers={'content-type': 'application/json'},
    data=json.dumps(index_settings)
)

print(f"Status Code: {response.status_code}")
print(response.json())

For more advanced operations and comprehensive documentation, refer to the official Python Elasticsearch Client documentation and the Elasticsearch documentation.