Semantic Inference¶
IndentiaDB supports RDFS and OWL forward-chaining materialization. Entailed (inferred) triples are stored in the triple index alongside asserted triples, making inferred knowledge as fast to query as directly inserted data.
How Forward-Chaining Materialization Works¶
Forward-chaining works by applying inference rules exhaustively at write time and storing the results. When you insert a triple, the inference engine immediately computes all triples that can be derived from it (using the current schema), stores those derived triples, and propagates any further inferences they trigger. This continues until no new triples can be derived (a fixed point).
INSERT: ex:poodle rdfs:subClassOf ex:dog
INSERT: ex:rex a ex:poodle
Inferred: ex:rex a ex:dog (subClassOf rule)
Inferred: ex:rex a owl:Thing (top type)
Because inferred triples are stored and indexed normally, a SPARQL query for ?x a ex:dog will return ex:rex immediately — no runtime rule evaluation is required.
Trade-off: Write vs. Read¶
| Approach | Write Cost | Query Cost | Suitable For |
|---|---|---|---|
| Forward-chaining (materialization) | Higher — inference runs on every write | Lower — no runtime reasoning | Read-heavy workloads, large corpora |
| Backward-chaining (on-demand) | Lower — no pre-computation | Higher — rules evaluated at query time | Write-heavy workloads, small corpora |
IndentiaDB uses forward-chaining. If your schema changes rarely and queries run frequently (the typical knowledge graph workload), materialization gives the best query performance.
Incremental Updates¶
IndentiaDB uses incremental inference — when a triple is inserted or deleted, only the inferences that depend on that triple are recomputed. Full re-materialization is not required on every write.
When a triple is retracted, the engine identifies which inferred triples have that triple as a dependency, retracts those inferences, and re-evaluates whether they can still be derived from remaining asserted triples.
This is safe because IndentiaDB tracks derivation proofs: each inferred triple records which asserted triples and rules produced it.
Configuration¶
Enable inference in config.toml:
[query]
inference = true # Enable globally for all queries
inference_level = "rdfs" # "rdfs" | "owl_rl" | "owl_rl_full"
Inference can also be toggled per-query at the SPARQL request level:
Then pass the inference flag in the request header:
curl -X POST http://localhost:7001/sparql \
-H "Content-Type: application/sparql-query" \
-H "Accept: application/sparql-results+json" \
-H "X-IndentiaDB-Inference: true" \
-d 'SELECT ?dog WHERE { ?dog a <http://example.org/dog> }'
Inference Levels¶
| Level | Rules Applied | Overhead |
|---|---|---|
rdfs |
rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, rdfs:member |
Low |
owl_rl |
All RDFS + owl:sameAs, owl:equivalentClass, owl:equivalentProperty, owl:inverseOf, owl:symmetricProperty, owl:transitiveProperty |
Medium |
owl_rl_full |
All OWL-RL rules including cardinality restrictions | High |
owl_rl_full Write Amplification
owl_rl_full can produce very large numbers of inferred triples for ontologies with broad owl:sameAs or complex cardinality constraints. Benchmark with your ontology before enabling in production. Use rdfs or owl_rl for most knowledge graph applications.
Example: Class Hierarchy Inference with rdfs:subClassOf¶
Assert the Schema¶
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/>
INSERT DATA {
ex:Poodle rdfs:subClassOf ex:Dog .
ex:Dog rdfs:subClassOf ex:Animal .
ex:Animal rdfs:subClassOf ex:LivingThing .
ex:GoldenRetriever rdfs:subClassOf ex:Dog .
}
Assert an Individual¶
PREFIX ex: <http://example.org/>
INSERT DATA {
ex:rex a ex:Poodle ;
ex:name "Rex" .
ex:buddy a ex:GoldenRetriever ;
ex:name "Buddy" .
}
Inferred Triples (Automatically Added)¶
| Triple | Rule |
|---|---|
ex:rex a ex:Dog |
ex:Poodle rdfs:subClassOf ex:Dog |
ex:rex a ex:Animal |
Transitivity of subClassOf |
ex:rex a ex:LivingThing |
Transitivity of subClassOf |
ex:buddy a ex:Dog |
ex:GoldenRetriever rdfs:subClassOf ex:Dog |
ex:buddy a ex:Animal |
Transitivity |
ex:buddy a ex:LivingThing |
Transitivity |
SPARQL Query with Inference Enabled¶
PREFIX ex: <http://example.org/>
SELECT ?animal ?name WHERE {
?animal a ex:Animal ;
ex:name ?name .
}
Results (inference ON):
| animal | name |
|---|---|
ex:rex |
"Rex" |
ex:buddy |
"Buddy" |
Results (inference OFF):
No results — ex:rex and ex:buddy were only asserted as ex:Poodle and ex:GoldenRetriever, not ex:Animal.
Example: Property Inheritance with rdfs:subPropertyOf¶
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
INSERT DATA {
# Any ex:authoredBy relationship is also a foaf:maker relationship
ex:authoredBy rdfs:subPropertyOf foaf:maker .
ex:article42 ex:authoredBy ex:alice .
}
After this insert, ex:article42 foaf:maker ex:alice is automatically inferred and stored. Queries for ?x foaf:maker ex:alice will return ex:article42.
Example: OWL Property Axioms¶
owl:sameAs¶
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX ex: <http://example.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
INSERT DATA {
ex:amsterdam owl:sameAs dbr:Amsterdam .
ex:amsterdam ex:population 921402 .
}
With owl_rl inference, queries for dbr:Amsterdam ex:population ?pop will return 921402 — the sameAs link propagates all properties from ex:amsterdam to dbr:Amsterdam.
owl:equivalentClass¶
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
INSERT DATA {
ex:Person owl:equivalentClass schema:Person .
ex:alice a ex:Person .
}
ex:alice a schema:Person is automatically inferred.
owl:inverseOf¶
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX ex: <http://example.org/>
INSERT DATA {
ex:parentOf owl:inverseOf ex:childOf .
ex:alice ex:parentOf ex:bob .
}
ex:bob ex:childOf ex:alice is automatically inferred — no need to assert both directions.
owl:symmetricProperty¶
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX ex: <http://example.org/>
INSERT DATA {
ex:knows a owl:SymmetricProperty .
ex:alice ex:knows ex:bob .
}
ex:bob ex:knows ex:alice is automatically inferred.
SPARQL Query Comparison: Inference ON vs. OFF¶
The following shows how inference affects query results for a medical ontology:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX med: <http://example.org/medical/>
# Schema (already inserted)
# med:Ibuprofen rdfs:subClassOf med:NSAID
# med:NSAID rdfs:subClassOf med:AntiInflammatory
# med:NSAID rdfs:subClassOf med:Analgesic
# Data (already inserted)
# med:advil a med:Ibuprofen
# med:motrin a med:Ibuprofen
SELECT ?drug WHERE { ?drug a med:Analgesic }
| Inference | Result |
|---|---|
| OFF | (empty) — no triple directly asserts a med:Analgesic |
| ON (rdfs) | med:advil, med:motrin — via subClassOf chain |
| Inference | Result |
|---|---|
| OFF | (empty) |
| ON (rdfs) | med:advil, med:motrin |
Performance Impact and Trade-offs¶
| Factor | Impact |
|---|---|
| Write latency | Increased by 10–40% for ontology-heavy schemas. Trivial schemas (few subClassOf chains) have negligible overhead. |
| Storage | Inferred triples use the same storage as asserted triples. A deep class hierarchy can increase triple count by 3–10×. |
| Query performance | Same as querying asserted data — no runtime overhead. |
| Schema changes | Adding a new rdfs:subClassOf triple triggers incremental re-materialization across all affected individuals. For large ontology refactoring, use the inference:rematerialize admin command. |
| Deletion | Deleting an asserted triple that was the sole support for an inferred triple retracts that inferred triple. |
Admin: Force Re-Materialization¶
If the inference state becomes inconsistent (e.g., after a crash during materialization):
This is an offline operation — the server pauses writes during re-materialization. For large corpora (>100M triples), expect this to take several minutes.
Limiting Inference Scope
To avoid inference on raw data graphs and only infer within a specific named graph, set inference_graphs in config:
[query]
inference = true
inference_graphs = [
"http://example.org/ontology",
"http://example.org/schema"
]
Triples in other named graphs are not used as premises for inference. This is useful when you have a large RDF dump (e.g., a DBpedia snapshot) where you want only your own schema's inference rules to apply.
RETE Engine Architecture¶
IndentiaDB's inference engine is built on a RETE-like algorithm — a well-known pattern-matching approach from production rule systems, adapted for RDF triple processing.
Alpha Network¶
The alpha network is the first filtering stage. Each alpha node matches incoming triples against a single triple pattern (e.g., ?x rdf:type ?c). When a triple is inserted, it is passed through all alpha nodes in parallel. If the triple matches a node's pattern, the resulting variable bindings are stored in that node's memory.
Triple: (ex:rex, rdf:type, ex:Poodle)
↓
Alpha Node 1: pattern (?x rdf:type ?c) → match: {?x=ex:rex, ?c=ex:Poodle}
Alpha Node 2: pattern (?c rdfs:subClassOf ?d) → no match
Alpha Node 3: pattern (?x ?p ?y) → match: {?x=ex:rex, ?p=rdf:type, ?y=ex:Poodle}
Alpha nodes act as filters — they discard triples that do not match their pattern. Because alpha nodes are independent, they can be evaluated concurrently across worker threads.
Beta Network¶
The beta network implements multi-pattern joins. A beta node takes two inputs — either two alpha nodes, or an alpha node and a preceding beta node — and joins their stored matches on shared variables.
For example, the RDFS subClassOf inference rule has two antecedent patterns:
Pattern 1: ?instance rdf:type ?class (Alpha Node A)
Pattern 2: ?class rdfs:subClassOf ?super (Alpha Node B)
The beta node joining A and B will look for compatible bindings where ?class has the same value in both. When a match is found, the merged bindings {?instance, ?class, ?super} are passed to the rule's consequent to derive: ?instance rdf:type ?super.
Token Flow and Conflict Resolution¶
When a new triple arrives:
- It flows through the alpha network, activating matching alpha nodes.
- Each activated alpha node propagates its new bindings to connected beta nodes.
- Beta nodes join the new bindings against stored matches from their other input.
- Successful joins propagate upward until they reach a terminal node, producing a rule firing.
- Rule firings are collected, deduplicated, and ordered by rule priority (higher priority fires first).
- The consequent triples are generated from the bindings and fed back into the network for further inference.
This process continues until no new triples are derived (the fixed point).
Shared Pattern Prefixes¶
When multiple rules share the same antecedent pattern (e.g., several rules match on ?x rdf:type ?c), they share the same alpha node. This avoids redundant matching work and is one of the key efficiency advantages of the RETE approach over naive rule evaluation.
Incremental Materialization¶
Rather than recomputing all inferences from scratch on every write, IndentiaDB uses delta rules — only the rules affected by the changed triple are re-evaluated.
Delta Rules¶
When a triple is inserted, the RETE network identifies which alpha nodes match the new triple and propagates only the new bindings through the network. Existing matches stored in alpha and beta node memories are reused — the engine does not re-scan the entire triple store.
When a triple is retracted, the engine:
- Removes the triple's bindings from the alpha node that stored it.
- Identifies all downstream beta node matches that depended on those bindings.
- Retracts the inferred triples produced by those matches.
- Checks whether the retracted inferred triples can still be derived via alternative rule paths (multi-support check).
This incremental approach meets the design requirement INF-NF-5: incremental update cost is strictly less than full re-materialization.
Batch Processing¶
For bulk inserts, the materializer processes triples in configurable batches (default: 1000 triples per batch). Within each batch, all rule firings are collected and deduplicated before derived triples are generated. This avoids redundant intermediate inferences when loading large datasets.
[query]
inference = true
[query.materialization]
mode = "forward"
max_depth = 10 # Maximum transitive closure depth
batch_size = 5000 # Triples per inference batch
Custom Inference Rules¶
Beyond the built-in RDFS and OWL profiles, IndentiaDB supports user-defined inference rules using the Custom profile.
Defining Rules via Configuration¶
Custom rules follow an antecedent-consequent pattern. Each rule specifies triple patterns for the IF (antecedent) and THEN (consequent) parts, with variables prefixed by ?:
[query]
inference = true
inference_level = "custom"
[[query.custom_rules]]
id = "domain-expertise"
name = "Expert in domain implies knowledge of sub-domain"
priority = 10
[[query.custom_rules.antecedent]]
subject = "?person"
predicate = "http://example.org/expertIn"
object = "?domain"
[[query.custom_rules.antecedent]]
subject = "?subdomain"
predicate = "http://example.org/subDomainOf"
object = "?domain"
[[query.custom_rules.consequent]]
subject = "?person"
predicate = "http://example.org/knowledgeableIn"
object = "?subdomain"
Defining Rules via SPARQL INSERT¶
Custom rules can also be loaded as RDF triples using an IndentiaDB-specific vocabulary:
PREFIX rule: <http://indentia.ai/vocab/rule#>
PREFIX ex: <http://example.org/>
INSERT DATA {
rule:expert-knowledge a rule:InferenceRule ;
rule:id "expert-knowledge" ;
rule:name "Expert knowledge propagation" ;
rule:priority 10 ;
rule:antecedent [
rule:subject "?person" ;
rule:predicate ex:expertIn ;
rule:object "?domain"
] ;
rule:antecedent [
rule:subject "?sub" ;
rule:predicate ex:subDomainOf ;
rule:object "?domain"
] ;
rule:consequent [
rule:subject "?person" ;
rule:predicate ex:knowledgeableIn ;
rule:object "?sub"
] .
}
Optional Filters¶
Rules can include filter conditions using SPARQL FILTER syntax to restrict when the rule fires:
[[query.custom_rules]]
id = "active-employee-access"
name = "Active employees have system access"
filter = "?status = 'active'"
[[query.custom_rules.antecedent]]
subject = "?person"
predicate = "http://example.org/employmentStatus"
object = "?status"
[[query.custom_rules.consequent]]
subject = "?person"
predicate = "http://example.org/hasAccess"
object = "http://example.org/InternalSystem"
Monitoring Inference¶
IndentiaDB exposes inference statistics through both the admin CLI and the metrics endpoint.
Inference Statistics¶
Output:
{
"total_inferred": 284731,
"by_rule": {
"rdfs9": 142365,
"rdfs11": 89012,
"rdfs7": 31204,
"rdfs5": 22150
},
"max_depth_reached": 7,
"depth_limit_warnings": 0,
"duplicates_avoided": 18429,
"last_materialization": "2026-03-22T14:30:00Z",
"materialization_duration_ms": 1245
}
Key Metrics¶
| Metric | Description |
|---|---|
total_inferred |
Total number of materialized inferred triples in the store |
by_rule |
Breakdown of how many triples each rule produced |
max_depth_reached |
Deepest transitive chain encountered |
depth_limit_warnings |
Times the max_depth limit was hit (potential incompleteness) |
duplicates_avoided |
Inferences that were already present (deduplication savings) |
materialization_duration_ms |
Time of the last (re-)materialization operation |
Prometheus Metrics¶
When the metrics endpoint is enabled, inference metrics are exported in Prometheus format:
indentiadb_inference_total_inferred 284731
indentiadb_inference_rule_firings{rule="rdfs9"} 142365
indentiadb_inference_rule_firings{rule="rdfs11"} 89012
indentiadb_inference_max_depth 7
indentiadb_inference_materialization_duration_seconds 1.245
Inference with Named Graphs¶
By default, inference applies across all named graphs. To scope inference to specific graphs, configure inference_graphs and use GRAPH patterns in queries.
Practical Example: Multi-Tenant Knowledge Graph¶
Consider a multi-tenant setup where each tenant has their own named graph, but all tenants share a common ontology graph:
[query]
inference = true
inference_level = "rdfs"
inference_graphs = [
"http://example.org/ontology"
]
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/>
# Schema in the ontology graph
INSERT DATA {
GRAPH <http://example.org/ontology> {
ex:Manager rdfs:subClassOf ex:Employee .
ex:Employee rdfs:subClassOf ex:Person .
ex:worksAt rdfs:domain ex:Employee .
ex:worksAt rdfs:range ex:Organization .
}
}
# Tenant data in a separate graph
INSERT DATA {
GRAPH <http://example.org/tenant/acme> {
ex:alice a ex:Manager ;
ex:worksAt ex:acme_corp .
}
}
With inference_graphs restricting inference to the ontology graph, the ontology triples serve as premises, but inferred triples are only produced for data that references those ontology classes. You can then query across graphs:
PREFIX ex: <http://example.org/>
SELECT ?person ?name WHERE {
GRAPH <http://example.org/tenant/acme> {
?person a ex:Person .
}
}
This returns ex:alice — inferred via ex:Manager rdfs:subClassOf ex:Employee rdfs:subClassOf ex:Person — even though ex:alice was only asserted as ex:Manager.