Skip to content

Elasticsearch-Compatible API

IndentiaDB exposes a REST API on port 9200 that is compatible with the Elasticsearch 8.x API surface. Existing Elasticsearch clients, SDKs, and tooling (Kibana, Logstash, Beats, OpenSearch Dashboards) can connect to this endpoint without modification.

Authentication is required. Use Basic auth (-u root:changeme) or an API key header on all requests.


Cluster API

GET / — Cluster Info

Returns basic cluster information.

curl -u root:changeme http://localhost:9200/
{
  "name": "indentiadb-0",
  "cluster_name": "indentiadb",
  "cluster_uuid": "abc123",
  "version": {
    "number": "8.11.0",
    "build_flavor": "default"
  },
  "tagline": "You Know, for Search"
}

GET /_cluster/health

Returns cluster health status.

curl -u root:changeme http://localhost:9200/_cluster/health
{
  "cluster_name": "indentiadb",
  "status": "green",
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 15,
  "active_shards": 30,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0
}

GET /_cat/indices

curl -u root:changeme "http://localhost:9200/_cat/indices?v"

GET /_cat/nodes

curl -u root:changeme "http://localhost:9200/_cat/nodes?v"

Index Management

PUT /{index} — Create Index with Mapping

Create an index with explicit field mappings:

curl -u root:changeme \
  -X PUT http://localhost:9200/articles \
  -H "Content-Type: application/json" \
  -d '{
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "refresh_interval": "1s"
    },
    "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "english"
        },
        "title_keyword": {
          "type": "keyword"
        },
        "content": {
          "type": "text",
          "analyzer": "english"
        },
        "author": {
          "type": "keyword"
        },
        "category": {
          "type": "keyword"
        },
        "published_at": {
          "type": "date",
          "format": "strict_date_optional_time"
        },
        "view_count": {
          "type": "integer"
        },
        "embedding": {
          "type": "dense_vector",
          "dims": 1536,
          "index": true,
          "similarity": "cosine"
        }
      }
    }
  }'

Dense Vector Field Configuration

The dense_vector type supports three similarity functions:

Similarity Description
cosine Cosine similarity (recommended for normalized embeddings)
l2_norm Euclidean distance (lower = more similar)
dot_product Dot product (requires unit-length vectors)

Full dense vector field example:

"embedding": {
  "type": "dense_vector",
  "dims": 768,
  "index": true,
  "similarity": "cosine",
  "index_options": {
    "type": "hnsw",
    "m": 16,
    "ef_construction": 100
  }
}

GET /{index} — Get Mapping

curl -u root:changeme http://localhost:9200/articles

DELETE /{index} — Delete Index

curl -u root:changeme -X DELETE http://localhost:9200/articles

Document Operations

POST /{index}/_doc — Index with Auto ID

curl -u root:changeme \
  -X POST http://localhost:9200/articles/_doc \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Introduction to Graph Databases",
    "content": "Graph databases store data as nodes and edges.",
    "author": "alice",
    "category": "databases",
    "published_at": "2026-03-21T10:00:00Z",
    "view_count": 0
  }'

Response:

{
  "_index": "articles",
  "_id": "abc123xyz",
  "_version": 1,
  "result": "created",
  "_shards": { "total": 2, "successful": 1, "failed": 0 }
}

PUT /{index}/_doc/{id} — Index with Explicit ID

curl -u root:changeme \
  -X PUT http://localhost:9200/articles/_doc/article-001 \
  -H "Content-Type: application/json" \
  -d '{
    "title": "SPARQL Query Language",
    "content": "SPARQL is the standard query language for RDF data.",
    "author": "bob",
    "category": "rdf",
    "published_at": "2026-03-15T09:00:00Z"
  }'

GET /{index}/_doc/{id} — Get Document

curl -u root:changeme http://localhost:9200/articles/_doc/article-001

Response:

{
  "_index": "articles",
  "_id": "article-001",
  "_version": 1,
  "found": true,
  "_source": {
    "title": "SPARQL Query Language",
    "author": "bob"
  }
}

DELETE /{index}/_doc/{id} — Delete Document

curl -u root:changeme \
  -X DELETE http://localhost:9200/articles/_doc/article-001

Bulk Operations — POST /_bulk

The bulk API processes multiple operations in a single request. Each operation is two lines: an action line and (optionally) a document line.

curl -u root:changeme \
  -X POST http://localhost:9200/_bulk \
  -H "Content-Type: application/x-ndjson" \
  -d '
{"index": {"_index": "articles", "_id": "1"}}
{"title": "Graph Databases", "author": "alice", "category": "databases"}
{"index": {"_index": "articles", "_id": "2"}}
{"title": "Vector Search", "author": "bob", "category": "search"}
{"update": {"_index": "articles", "_id": "1"}}
{"doc": {"view_count": 100}}
{"delete": {"_index": "articles", "_id": "old-doc"}}
'

Supported operation types:

Operation Description
index Index or replace a document
create Index only if the document does not exist
update Partial update of an existing document
delete Delete a document (no document line required)

Search API — GET / POST /{index}/_search

The _search endpoint supports a rich query DSL. All examples use POST with a JSON body.

1. Match All

Return all documents:

curl -u root:changeme \
  -X POST http://localhost:9200/articles/_search \
  -H "Content-Type: application/json" \
  -d '{ "query": { "match_all": {} } }'

2. Match — Full-Text BM25

{
  "query": {
    "match": {
      "content": {
        "query": "graph databases traversal",
        "operator": "or"
      }
    }
  }
}

With AND operator (all terms must appear):

{
  "query": {
    "match": {
      "content": {
        "query": "graph database",
        "operator": "and"
      }
    }
  }
}

3. Match Phrase

All terms must appear in the specified order:

{
  "query": {
    "match_phrase": {
      "content": "graph database traversal"
    }
  }
}

4. Multi-Match

Search across multiple fields simultaneously:

{
  "query": {
    "multi_match": {
      "query": "sparql rdf knowledge graph",
      "fields": ["title^3", "content", "tags"],
      "type": "best_fields"
    }
  }
}

Multi-match types: best_fields, most_fields, cross_fields, phrase, phrase_prefix.

5. Term — Keyword Exact Match

{
  "query": {
    "term": {
      "author": { "value": "alice" }
    }
  }
}

6. Terms — Match Any of Multiple Values

{
  "query": {
    "terms": {
      "category": ["databases", "rdf", "search"]
    }
  }
}

7. Range

{
  "query": {
    "range": {
      "published_at": {
        "gte": "2026-01-01",
        "lte": "2026-12-31"
      }
    }
  }
}

Numeric range:

{
  "query": {
    "range": {
      "view_count": {
        "gte": 100,
        "lt": 10000
      }
    }
  }
}

8. Bool — Compound Queries

Combine must, filter, should, and must_not clauses:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "sparql query" } }
      ],
      "filter": [
        { "term": { "category": "rdf" } },
        { "range": { "published_at": { "gte": "2025-01-01" } } }
      ],
      "should": [
        { "term": { "author": "alice" } },
        { "term": { "author": "bob" } }
      ],
      "must_not": [
        { "term": { "status": "draft" } }
      ],
      "minimum_should_match": 1
    }
  }
}
Clause Description
must Required — contributes to score
filter Required — does not contribute to score (cached)
should Optional — boosts score; minimum_should_match controls how many must match
must_not Excluded from results

9. Exists

Match documents where a field has a non-null value:

{
  "query": {
    "exists": { "field": "embedding" }
  }
}

10. Prefix

{
  "query": {
    "prefix": {
      "title": { "value": "graph" }
    }
  }
}

11. Wildcard

{
  "query": {
    "wildcard": {
      "title": { "value": "graph*data?" }
    }
  }
}

12. Fuzzy

Match terms within a given edit distance:

{
  "query": {
    "fuzzy": {
      "title": {
        "value": "grph databse",
        "fuzziness": "AUTO",
        "max_expansions": 50
      }
    }
  }
}

KNN Search — Vector / Nearest Neighbor

Use the knn parameter to perform approximate nearest-neighbor search over dense_vector fields.

curl -u root:changeme \
  -X POST http://localhost:9200/articles/_search \
  -H "Content-Type: application/json" \
  -d '{
    "knn": {
      "field": "embedding",
      "query_vector": [0.12, 0.45, 0.78, 0.23, 0.56],
      "k": 10,
      "num_candidates": 100,
      "filter": {
        "term": { "status": "published" }
      }
    },
    "fields": ["title", "author", "published_at"],
    "_source": false
  }'
Parameter Description
field The dense_vector field to search
query_vector The query embedding vector (must match dims)
k Number of nearest neighbors to return
num_candidates Number of candidate vectors to examine per shard (higher = more accurate, slower)
filter Optional pre-filter applied before KNN search

Hybrid Search — BM25 + KNN

Combine full-text BM25 and vector KNN scores in a single request. IndentiaDB merges the result lists using the scorer configured by ES_HYBRID_SCORER (rrf, bayesian, or linear).

curl -u root:changeme \
  -X POST http://localhost:9200/articles/_search \
  -H "Content-Type: application/json" \
  -d '{
    "query": {
      "bool": {
        "must": [
          { "match": { "content": "graph database knowledge" } }
        ],
        "filter": [
          { "term": { "status": "published" } }
        ]
      }
    },
    "knn": {
      "field": "embedding",
      "query_vector": [0.12, 0.45, 0.78, 0.23],
      "k": 20,
      "num_candidates": 100
    },
    "size": 10,
    "from": 0
  }'

Hybrid Scorer Configuration

Set the ES_HYBRID_SCORER environment variable on the server:

Scorer Description
rrf Reciprocal Rank Fusion — robust default, no tuning needed
bayesian Bayesian score fusion — better calibration for imbalanced corpora
linear Weighted linear combination — requires manual weight tuning

Aggregations

Terms Aggregation

{
  "query": { "match_all": {} },
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      }
    }
  },
  "size": 0
}

Histogram Aggregation

{
  "aggs": {
    "view_ranges": {
      "histogram": {
        "field": "view_count",
        "interval": 100
      }
    }
  },
  "size": 0
}

Date Histogram Aggregation

{
  "aggs": {
    "articles_over_time": {
      "date_histogram": {
        "field": "published_at",
        "calendar_interval": "month",
        "format": "yyyy-MM"
      }
    }
  },
  "size": 0
}

Metric Aggregations

{
  "aggs": {
    "avg_views": { "avg": { "field": "view_count" } },
    "total_views": { "sum": { "field": "view_count" } },
    "max_views": { "max": { "field": "view_count" } },
    "min_views": { "min": { "field": "view_count" } },
    "unique_authors": { "cardinality": { "field": "author" } }
  },
  "size": 0
}

Nested Aggregations

{
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_views": { "avg": { "field": "view_count" } }
      }
    }
  },
  "size": 0
}

SPARQL Extensions (_ext)

IndentiaDB extends the standard Elasticsearch-compatible search API with SPARQL-enriched query patterns via the _ext field.

1. SPARQL Enrich

Enrich search results with data from the RDF triple store:

{
  "query": { "match": { "content": "alice" } },
  "_ext": {
    "sparql_enrich": {
      "query": "SELECT ?name ?org WHERE { <{id}> foaf:name ?name ; ex:worksAt ?org }",
      "bind": "id"
    }
  }
}

2. Knowledge Graph Boost

Boost documents whose entities appear prominently in the knowledge graph:

{
  "query": { "match": { "content": "machine learning" } },
  "_ext": {
    "kg_boost": {
      "entity_field": "entities",
      "boost_factor": 1.5,
      "min_pagerank": 0.01
    }
  }
}

3. SPARQL Filter

Pre-filter candidates using a SPARQL ASK query:

{
  "query": { "match_all": {} },
  "_ext": {
    "sparql_filter": {
      "ask": "ASK { <{id}> ex:status 'active' }",
      "bind": "id"
    }
  }
}

4. SPARQL Expand

Expand query terms using ontology synonyms from the triple store:

{
  "query": { "match": { "content": "car" } },
  "_ext": {
    "sparql_expand": {
      "synonym_query": "SELECT ?syn WHERE { ex:car skos:altLabel ?syn }",
      "field": "content"
    }
  }
}

5. Format

Return results in a specific RDF serialization:

{
  "query": { "match": { "content": "rdf" } },
  "_ext": {
    "format": "turtle"
  }
}

Pagination

From / Size

{
  "query": { "match_all": {} },
  "from": 20,
  "size": 10,
  "sort": [{ "published_at": "desc" }]
}

Search After (Deep Pagination)

Use search_after with a sort value from the last result to paginate efficiently without offset overhead:

{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [
    { "published_at": "desc" },
    { "_id": "asc" }
  ],
  "search_after": ["2026-03-01T00:00:00Z", "article-099"]
}

Scroll API

For bulk data export, use the scroll API to iterate over large result sets:

# Initialize scroll
curl -u root:changeme \
  "http://localhost:9200/articles/_search?scroll=1m" \
  -H "Content-Type: application/json" \
  -d '{ "size": 100, "query": { "match_all": {} } }'

# Continue scrolling using the scroll_id from the previous response
curl -u root:changeme \
  -X POST "http://localhost:9200/_search/scroll" \
  -H "Content-Type: application/json" \
  -d '{
    "scroll": "1m",
    "scroll_id": "<scroll_id_from_previous_response>"
  }'

Field Collapsing

Collapse duplicate results by a field value, returning only the top-ranked document per group:

{
  "query": { "match": { "content": "graph" } },
  "collapse": {
    "field": "author",
    "inner_hits": {
      "name": "other_by_author",
      "size": 3,
      "sort": [{ "published_at": "desc" }]
    }
  }
}

Highlighting

Return highlighted snippets showing where query terms matched:

{
  "query": { "match": { "content": "graph database" } },
  "highlight": {
    "fields": {
      "content": {
        "fragment_size": 150,
        "number_of_fragments": 3,
        "pre_tags": ["<mark>"],
        "post_tags": ["</mark>"]
      }
    }
  }
}

Source Filtering

Control which fields are included in the _source of each result:

{
  "query": { "match_all": {} },
  "_source": {
    "includes": ["title", "author", "published_at"],
    "excludes": ["embedding", "content"]
  }
}

Disable source entirely (useful when using fields):

{
  "query": { "match_all": {} },
  "_source": false,
  "fields": ["title", "author"]
}