python-elasticsearch
Learn to use the Python Elasticsearch client for managing indices, ingesting data, and performing searches. Includes examples for AWS IAM and HTTP Basic Auth.
Python Elasticsearch Client
Elasticsearch Cheatsheet with Python
This cheatsheet provides essential commands and examples for using the Python Elasticsearch client to manage and query your data.
Key Features and Operations:
- Using the Python Elasticsearch Library
- Authentication with AWS IAM
- Authentication with HTTP Basic Auth
- Retrieving Elasticsearch Cluster Info
- Ingesting Single Documents
- Performing Bulk Data Ingestion
- Viewing Available Indices
- Searching for Documents
- Interacting with Elasticsearch via Python Requests
- Creating an Index using Requests
Python Library Usage
Setting up the Elasticsearch Client
Connecting with AWS IAM Authentication
Securely connect to AWS Elasticsearch Service using IAM credentials.
from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers
from requests_aws4auth import AWS4Auth
# Replace with your actual AWS credentials and region
aws_access_key_id = "YOUR_ACCESS_KEY_ID"
aws_secret_access_key = "YOUR_SECRET_ACCESS_KEY"
aws_region = "YOUR_AWS_REGION"
es_endpoint = "YOUR_ES_ENDPOINT"
aws_session_token = "YOUR_SESSION_TOKEN" # Optional, if using temporary credentials
aws_auth = AWS4Auth(aws_access_key_id, aws_secret_access_key, aws_region, 'es', session_token=aws_session_token)
es = Elasticsearch(
hosts = [{'host': es_endpoint, 'port': 443}],
http_auth=aws_auth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
Connecting with HTTP Basic Authentication
Connect to Elasticsearch using standard username and password authentication.
from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers
es_endpoint = "YOUR_ES_ENDPOINT"
es_username = "YOUR_USERNAME"
es_password = "YOUR_PASSWORD"
es = Elasticsearch(
hosts = [{'host': es_endpoint, 'port': 443}],
http_auth=(es_username, es_password),
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
Elasticsearch Cluster Information
Retrieve detailed information about your Elasticsearch cluster.
>>> es.info()
{'name': 'elasticsearch-02', 'cluster_name': 'es-cluster', 'cluster_uuid': 'EJDqv5VrQyao07ndQuwhCw', 'version': {'number': '6.8.2', 'build_flavor': 'default', 'build_type': 'deb', 'build_hash': 'b506955', 'build_date': '2019-07-24T15:24:41.545295Z', 'build_snapshot': False, 'lucene_version': '7.7.0', 'minimum_wire_compatibility_version': '5.6.0', 'minimum_index_compatibility_version': '5.0.0'}, 'tagline': 'You Know, for Search'}
For a more readable output, you can use JSON formatting:
>>> import json
>>> print(json.dumps(es.info(), indent=2))
{
"name": "elasticsearch-02",
"cluster_name": "es-cluster",
"cluster_uuid": "EJDqv5VrQyao07ndQuwhCw",
"version": {
"number": "6.8.2",
"build_flavor": "default",
"build_type": "deb",
"build_hash": "b506955",
"build_date": "2019-07-24T15:24:41.545295Z",
"build_snapshot": false,
"lucene_version": "7.7.0",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
Ingesting Documents
Ingesting a Single Document
Add a document to an index, specifying a unique document ID.
doc = {
'author': 'john_doe',
'text': 'Elasticsearch is amazing!',
'timestamp': '2023-10-27T10:00:00Z'
}
res = es.index(index="my-test-index", doc_type='tweet', id=1, body=doc)
print(res)
Bulk Data Ingestion
Efficiently ingest multiple documents using the helpers.bulk
utility.
from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers
bulk_docs = []
list_of_dicts = [
{"name": "ruan", "age": 32, "city": "New York"},
{"name": "stefan", "age": 31, "city": "London"},
{"name": "alice", "age": 29, "city": "Paris"}
]
for doc in list_of_dicts:
action = {
"_index": 'my-bulk-index',
"_type": '_doc',
"_source": doc
}
bulk_docs.append(action)
success, errors = helpers.bulk(es, bulk_docs)
print(f"Successfully indexed: {success} documents")
if errors:
print(f"Encountered errors: {errors}")
Managing Indices
View Available Indices
List all indices currently present in your Elasticsearch cluster.
>>> indices = es.indices.get_alias("*").keys()
>>> print(list(indices))
['fluentd-2020.02.26', 'metricbeat-2020.02.25', 'filebeat-2020.02.25', 'fluentd-2020.02.26', '.tasks', 'fluentd-2020.02.24', 'metricbeat-2019', 'telegram-bot', '.kibana_1', 'metricbeat-2020.02.26', 'filebeat-2020.02.26', 'metricbeat-2020.02']
Searching Documents
Perform a search query against a specific index.
Example Query: {"query": {"match": {"text": "HI"}}}
>>> search_body = {
"query": {
"match": {
"text": "HI"
}
}
}
>>> response = es.search(index="telegram-bot", doc_type="_doc", body=search_body)
>>> print(json.dumps(response, indent=2))
{
"took": 335,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "telegram-bot",
"_type": "_doc",
"_id": "x",
"_score": 0.6931472,
"_source": {
"message_id": "x",
"date": "x",
"text": "HI",
"entities": [],
"caption_entities": [],
"photo": [],
"new_chat_members": [],
"new_chat_photo": [],
"delete_chat_photo": false,
"group_chat_created": false,
"supergroup_chat_created": false,
"channel_chat_created": false
}
}
]
}
}
Interacting with Elasticsearch via Python Requests
While the official client is recommended, you can also use the requests
library for direct HTTP interactions.
Create an Index using Requests
Define index settings and create a new index.
import requests
import json
es_url = 'https://es.x.x/my-new-index' # Replace with your Elasticsearch endpoint and index name
auth_user = 'username'
auth_pass = 'pass'
index_settings = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
}
}
response = requests.put(
es_url,
auth=(auth_user, auth_pass),
headers={'content-type': 'application/json'},
data=json.dumps(index_settings)
)
print(f"Status Code: {response.status_code}")
print(response.json())
For more advanced operations and comprehensive documentation, refer to the official Python Elasticsearch Client documentation and the Elasticsearch documentation.