SHACL Validation¶
IndentiaDB includes a native SHACL (Shapes Constraint Language) validation engine that enforces data quality rules on RDF graphs. The engine implements the W3C SHACL Core specification and extends it with SPARQL-based constraints from the SHACL 1.2 draft, allowing you to define shape graphs that describe the expected structure of your data and validate incoming triples against those shapes before they are committed to the store.
How SHACL Works¶
SHACL validation operates on two graphs:
- Data graph -- the RDF triples you want to validate.
- Shapes graph -- a set of shape definitions that describe valid data structures.
The validation engine selects focus nodes from the data graph based on targets declared in each shape, then checks every applicable constraint. The result is a validation report that either confirms conformance or lists individual violations.
┌──────────────────────┐ ┌──────────────────────┐
│ Shapes Graph │ │ Data Graph │
│ ┌────────────────┐ │ │ │
│ │ NodeShape │──┼──────┼──▶ Focus Node │
│ │ PropertyShape │ │ │ ├─ property A │
│ └────────────────┘ │ │ ├─ property B │
│ │ │ └─ property C │
└──────────────────────┘ └──────────────────────┘
│ │
▼ ▼
┌────────────────────────────────────────────────┐
│ SHACL Validation Engine │
│ select focus nodes → apply constraints │
│ → collect results → produce report │
└────────────────────────────────────────────────┘
│
▼
ValidationReport
{ conforms: true/false,
results: [...] }
Supported Constraints¶
IndentiaDB supports the full SHACL Core constraint vocabulary plus SPARQL constraints:
| Category | Constraints | SHACL Properties |
|---|---|---|
| Cardinality | Minimum / maximum value count | sh:minCount, sh:maxCount |
| Datatype | Literal type checking | sh:datatype |
| Class | Instance-of checking | sh:class |
| Node Kind | IRI / BlankNode / Literal | sh:nodeKind |
| Value Range | Numeric and date bounds | sh:minInclusive, sh:maxInclusive, sh:minExclusive, sh:maxExclusive |
| String | Length and pattern | sh:minLength, sh:maxLength, sh:pattern, sh:flags |
| Enumeration | Allowed values list | sh:in, sh:hasValue |
| Language | Language tags | sh:languageIn, sh:uniqueLang |
| Logical | Boolean combinations | sh:and, sh:or, sh:not, sh:xone |
| Closed Shape | Restrict allowed properties | sh:closed, sh:ignoredProperties |
| Comparison | Cross-property checks | sh:equals, sh:disjoint, sh:lessThan, sh:lessThanOrEquals |
| Qualified | Conditional cardinality | sh:qualifiedValueShape, sh:qualifiedMinCount, sh:qualifiedMaxCount |
| SPARQL (1.2) | Custom SPARQL SELECT constraints | sh:sparql, sh:select |
Defining Shapes¶
Node Shapes¶
A node shape defines constraints on a focus node itself. Use sh:targetClass to select all instances of a given class:
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/shapes/> .
ex:PersonShape
a sh:NodeShape ;
sh:targetClass schema:Person ;
sh:property ex:PersonNameShape ;
sh:property ex:PersonEmailShape ;
sh:property ex:PersonAgeShape .
Property Shapes¶
A property shape defines constraints on the values reached through a property path from the focus node:
ex:PersonNameShape
a sh:PropertyShape ;
sh:path schema:name ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:minLength 1 ;
sh:maxLength 200 ;
sh:message "Every person must have exactly one name (1-200 characters)" .
ex:PersonEmailShape
a sh:PropertyShape ;
sh:path schema:email ;
sh:minCount 1 ;
sh:nodeKind sh:Literal ;
sh:pattern "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" ;
sh:message "Email must be a valid email address" .
ex:PersonAgeShape
a sh:PropertyShape ;
sh:path schema:age ;
sh:maxCount 1 ;
sh:datatype xsd:integer ;
sh:minInclusive 0 ;
sh:maxInclusive 150 ;
sh:severity sh:Warning ;
sh:message "Age must be between 0 and 150" .
Target Declarations¶
IndentiaDB supports all four SHACL target types:
# Target all instances of a class
ex:PersonShape sh:targetClass schema:Person .
# Target a specific node
ex:AliceShape sh:targetNode <http://example.org/alice> .
# Target all subjects that use a predicate
ex:HasNameShape sh:targetSubjectsOf schema:name .
# Target all objects of a predicate
ex:NameValueShape sh:targetObjectsOf schema:name .
Shapes Without Targets
Shapes that do not declare any target are treated as value shapes. They are never validated on their own but can be referenced from other shapes via sh:node or sh:qualifiedValueShape.
Property Path Expressions¶
IndentiaDB evaluates the full range of SHACL property paths, enabling validation of values reached through complex graph traversals:
| Path Type | Syntax | Example | Description |
|---|---|---|---|
| Predicate | sh:path schema:name |
Direct property | Single predicate hop |
| Inverse | sh:path [ sh:inversePath schema:parent ] |
^schema:parent |
Reverse direction |
| Sequence | sh:path ( schema:address schema:city ) |
schema:address / schema:city |
Multi-hop path |
| Alternative | sh:path [ sh:alternativePath ( schema:name schema:givenName ) ] |
schema:name \| schema:givenName |
Either path |
| Zero-or-more | sh:path [ sh:zeroOrMorePath schema:parent ] |
schema:parent* |
Transitive closure |
| One-or-more | sh:path [ sh:oneOrMorePath schema:parent ] |
schema:parent+ |
At least one hop |
| Zero-or-one | sh:path [ sh:zeroOrOnePath schema:parent ] |
schema:parent? |
Optional hop |
Example -- validate that every person has a city in their address:
ex:PersonCityShape
a sh:PropertyShape ;
sh:path ( schema:address schema:city ) ;
sh:minCount 1 ;
sh:datatype xsd:string ;
sh:message "Person must have a city in their address" .
Repetition Path Limits
Zero-or-more (*) and one-or-more (+) paths can traverse very large subgraphs. IndentiaDB enforces a configurable max_path_steps limit (default: 50,000) to prevent unbounded expansion. If this limit is reached, validation returns a ResourceLimitExceeded error.
Logical Constraints¶
Combine shapes using boolean logic for complex validation rules:
# AND: value must satisfy all listed shapes
ex:FullNameShape
a sh:PropertyShape ;
sh:path schema:name ;
sh:and (
[ sh:minLength 2 ]
[ sh:maxLength 100 ]
[ sh:pattern "^[A-Z]" ]
) ;
sh:message "Name must be 2-100 chars and start with uppercase" .
# OR: value must satisfy at least one shape
ex:ContactInfoShape
a sh:PropertyShape ;
sh:path schema:contactPoint ;
sh:or (
[ sh:datatype xsd:string ; sh:pattern "^\\+?[0-9\\-\\s]+$" ]
[ sh:datatype xsd:string ; sh:pattern "^[^@]+@[^@]+$" ]
) ;
sh:message "Contact must be a phone number or email" .
# NOT: value must not satisfy the shape
ex:NotRetiredShape
a sh:PropertyShape ;
sh:path schema:status ;
sh:not [ sh:hasValue "retired" ] .
# XONE: value must satisfy exactly one shape
ex:IdentifierShape
a sh:PropertyShape ;
sh:path schema:identifier ;
sh:xone (
[ sh:pattern "^[0-9]{9}$" ]
[ sh:pattern "^[A-Z]{2}[0-9]{6}$" ]
) ;
sh:message "Identifier must be either a 9-digit number or a 2-letter + 6-digit code" .
Closed Shapes¶
Restrict which properties a node may have:
ex:StrictPersonShape
a sh:NodeShape ;
sh:targetClass schema:Person ;
sh:closed true ;
sh:ignoredProperties ( rdf:type ) ;
sh:property [
sh:path schema:name ;
sh:minCount 1 ;
] ;
sh:property [
sh:path schema:email ;
] .
With sh:closed true, any property on a schema:Person node that is not declared as a sh:property path (and not in sh:ignoredProperties) triggers a violation.
Severity Levels¶
Each shape or property shape can declare a severity level that controls how violations are reported:
| Severity | IRI | Meaning |
|---|---|---|
| Violation | sh:Violation |
Data does not conform (default) |
| Warning | sh:Warning |
Non-critical issue |
| Info | sh:Info |
Informational notice |
ex:DeprecatedFieldShape
a sh:PropertyShape ;
sh:path ex:legacyField ;
sh:maxCount 0 ;
sh:severity sh:Warning ;
sh:message "legacyField is deprecated; migrate to ex:newField" .
Conformance and Severity
Per the W3C SHACL specification, only sh:Violation results affect conformance. A report with only warnings and infos still has conforms: true.
Validation Endpoint¶
IndentiaDB exposes a REST endpoint for SHACL validation:
Load Shapes¶
curl -X POST http://localhost:7001/shacl/shapes \
-H "Content-Type: text/turtle" \
-H "Authorization: Bearer $TOKEN" \
-d '@shapes.ttl'
Response:
{
"loaded": 3,
"shape_ids": [
"http://example.org/shapes/PersonShape",
"http://example.org/shapes/PersonNameShape",
"http://example.org/shapes/PersonEmailShape"
]
}
Validate Data¶
curl -X POST http://localhost:7001/shacl/validate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"graph": "http://example.org/data",
"shapes": ["http://example.org/shapes/PersonShape"]
}'
Response (conforming):
{
"conforms": true,
"results": [],
"stats": {
"shapes_evaluated": 1,
"focus_nodes_validated": 42,
"constraints_checked": 126,
"duration_ms": 12
}
}
Response (non-conforming):
{
"conforms": false,
"results": [
{
"focusNode": "http://example.org/person/99",
"resultPath": "http://schema.org/name",
"sourceShape": "http://example.org/shapes/PersonNameShape",
"sourceConstraintComponent": "http://www.w3.org/ns/shacl#MinCountConstraintComponent",
"resultSeverity": "http://www.w3.org/ns/shacl#Violation",
"resultMessage": "Every person must have exactly one name (1-200 characters)",
"value": null
},
{
"focusNode": "http://example.org/person/42",
"resultPath": "http://schema.org/email",
"sourceShape": "http://example.org/shapes/PersonEmailShape",
"sourceConstraintComponent": "http://www.w3.org/ns/shacl#PatternConstraintComponent",
"resultSeverity": "http://www.w3.org/ns/shacl#Violation",
"resultMessage": "Email must be a valid email address",
"value": {
"type": "literal",
"value": "not-an-email"
}
}
],
"stats": {
"shapes_evaluated": 1,
"focus_nodes_validated": 42,
"constraints_checked": 126,
"duration_ms": 18
}
}
Validate Inline Data¶
Send both shapes and data in one request:
curl -X POST http://localhost:7001/shacl/validate-inline \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"shapes_format": "turtle",
"shapes": "@prefix sh: <http://www.w3.org/ns/shacl#> ...",
"data_format": "turtle",
"data": "@prefix schema: <http://schema.org/> ..."
}'
Validate-Before-Write¶
IndentiaDB can enforce SHACL validation on every SPARQL INSERT operation. When enabled, the engine validates the incoming triples against all loaded shapes before committing them to the store. If any violation with severity sh:Violation is found, the INSERT is rejected and the full validation report is returned as an error.
Enable via Configuration¶
[shacl]
enabled = true
validate_on_insert = true # reject INSERTs that violate loaded shapes
strict_mode = true # also reject on sh:Warning (optional)
Enable via REST¶
curl -X PUT http://localhost:7001/shacl/config \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"validate_on_insert": true,
"strict_mode": false
}'
When validate_on_insert is active, a SPARQL INSERT that would create invalid data returns HTTP 422:
curl -X POST http://localhost:7001/sparql \
-H "Content-Type: application/sparql-update" \
-H "Authorization: Bearer $TOKEN" \
-d 'INSERT DATA {
<http://example.org/person/new> a <http://schema.org/Person> .
}'
{
"error": "SHACL validation failed",
"conforms": false,
"results": [
{
"focusNode": "http://example.org/person/new",
"resultPath": "http://schema.org/name",
"sourceConstraintComponent": "http://www.w3.org/ns/shacl#MinCountConstraintComponent",
"resultMessage": "Every person must have exactly one name (1-200 characters)"
}
]
}
Performance Impact
Validate-before-write adds latency to every INSERT. For bulk data loading, consider disabling it temporarily and running a full validation pass after the load completes.
SPARQL Constraints (SHACL 1.2)¶
IndentiaDB supports SPARQL-based constraints from the SHACL 1.2 draft. These let you write arbitrary SPARQL SELECT queries as validation rules:
ex:UniqueEmailShape
a sh:NodeShape ;
sh:targetClass schema:Person ;
sh:sparql [
sh:message "Email address must be unique across all persons" ;
sh:select """
SELECT $this (schema:email AS ?path) ?value
WHERE {
$this schema:email ?value .
?other schema:email ?value .
FILTER(?other != $this)
}
""" ;
] .
Each solution returned by the SELECT query is treated as one constraint violation. The query receives $this bound to the current focus node.
SPARQL Executor Required
SPARQL constraints require the query engine to be available. The SHACL engine calls a SparqlExecutor implementation that routes queries through IndentiaDB's QLever-compatible SPARQL engine.
Validation Reports as RDF¶
Validation reports can be serialized to RDF quads using the standard SHACL vocabulary, making them queryable with SPARQL:
curl -X POST http://localhost:7001/shacl/validate \
-H "Accept: text/turtle" \
-H "Authorization: Bearer $TOKEN" \
-d '{ "graph": "http://example.org/data" }'
_:report0 a sh:ValidationReport ;
sh:conforms "false"^^xsd:boolean ;
sh:result _:result0 .
_:result0 a sh:ValidationResult ;
sh:focusNode <http://example.org/person/99> ;
sh:resultPath schema:name ;
sh:sourceShape <http://example.org/shapes/PersonNameShape> ;
sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
sh:resultSeverity sh:Violation ;
sh:resultMessage "Every person must have exactly one name (1-200 characters)" .
You can also persist reports to the store for audit purposes:
curl -X POST http://localhost:7001/shacl/validate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"graph": "http://example.org/data",
"persist_report": true,
"report_graph": "http://example.org/validation-reports"
}'
SurrealDB Schema for SHACL Storage¶
The SHACL engine persists shape definitions and validation history in three SurrealDB tables:
| Table | Purpose | Key Fields |
|---|---|---|
shacl_shape |
Stored NodeShape definitions | shape_id, shape (flexible object), created_at, updated_at |
shacl_validation |
Validation run metadata | validation_id, conforms, stats, results_count, report, created_at |
shacl_result |
Flattened validation results | validation_id, focus_node, result_path, value, source_shape, severity, message |
Indexes:
| Index | Table | Fields |
|---|---|---|
idx_shacl_shape_id |
shacl_shape |
shape_id (UNIQUE) |
idx_shacl_validation_id |
shacl_validation |
validation_id (UNIQUE) |
idx_shacl_validation_created_at |
shacl_validation |
created_at |
idx_shacl_result_validation |
shacl_result |
validation_id |
idx_shacl_result_focus |
shacl_result |
focus_node |
idx_shacl_result_severity |
shacl_result |
severity |
Query validation history with SurrealQL:
-- List recent validation runs
SELECT validation_id, conforms, results_count, created_at
FROM shacl_validation
ORDER BY created_at DESC
LIMIT 10;
-- Get all violations for a specific node
SELECT *
FROM shacl_result
WHERE focus_node = "http://example.org/person/99"
AND severity = "http://www.w3.org/ns/shacl#Violation"
ORDER BY created_at DESC;
Configuration¶
Server Configuration (TOML)¶
[shacl]
# Enable the SHACL validation engine
enabled = true
# Reject SPARQL INSERTs that violate loaded shapes
validate_on_insert = false
# Also reject on sh:Warning (not just sh:Violation)
strict_mode = false
# Validation timeout per run (seconds)
timeout = 10
# Maximum recursion depth for nested shapes (sh:node, logical constraints)
max_shape_depth = 16
# Maximum focus nodes validated per shape
max_focus_nodes = 100000
# Maximum value nodes returned by a property path evaluation
max_value_nodes = 10000
# Maximum graph expansion steps for repetition paths (*, +)
max_path_steps = 50000
# Maximum validation results per report
max_results = 10000
# Include warnings and infos in the report
include_warnings = true
include_infos = true
# Optional named graph filter (validate only this graph)
# graph_iri = "http://example.org/data"
Configuration Reference¶
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
bool | true |
Enable/disable the SHACL engine |
validate_on_insert |
bool | false |
Enforce shape validation on SPARQL INSERT |
strict_mode |
bool | false |
Treat sh:Warning as a conformance failure |
timeout |
duration | 10s |
Maximum duration for a full validation run |
max_shape_depth |
int | 16 |
Maximum nesting depth for recursive shapes |
max_focus_nodes |
int | 100,000 |
Cap on focus nodes per shape |
max_value_nodes |
int | 10,000 |
Cap on values returned by a single path evaluation |
max_path_steps |
int | 50,000 |
Cap on graph traversal steps for repetition paths |
max_results |
int | 10,000 |
Maximum results in a single report |
include_warnings |
bool | true |
Include sh:Warning results |
include_infos |
bool | true |
Include sh:Info results |
graph_iri |
string | (none) | Restrict validation to a specific named graph |
Resource Limits Are Safety Guards
The max_* parameters exist to prevent accidental or malicious denial-of-service via expensive shape definitions. Raising them beyond the defaults should be done carefully and only after profiling your workload.
Python Client Example¶
import requests
BASE_URL = "http://localhost:7001"
HEADERS = {"Authorization": "Bearer <token>"}
# 1. Load shapes
shapes_ttl = """
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/shapes/> .
ex:PersonShape
a sh:NodeShape ;
sh:targetClass schema:Person ;
sh:property [
sh:path schema:name ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
] ;
sh:property [
sh:path schema:email ;
sh:minCount 1 ;
sh:pattern "^[^@]+@[^@]+$" ;
] .
"""
resp = requests.post(
f"{BASE_URL}/shacl/shapes",
headers={**HEADERS, "Content-Type": "text/turtle"},
data=shapes_ttl,
)
print(f"Loaded shapes: {resp.json()}")
# 2. Insert some data
insert_sparql = """
INSERT DATA {
<http://example.org/person/1> a <http://schema.org/Person> ;
<http://schema.org/name> "Alice" ;
<http://schema.org/email> "alice@example.org" .
<http://example.org/person/2> a <http://schema.org/Person> ;
<http://schema.org/email> "not-an-email" .
}
"""
requests.post(
f"{BASE_URL}/sparql",
headers={**HEADERS, "Content-Type": "application/sparql-update"},
data=insert_sparql,
)
# 3. Validate
resp = requests.post(
f"{BASE_URL}/shacl/validate",
headers={**HEADERS, "Content-Type": "application/json"},
json={"shapes": ["http://example.org/shapes/PersonShape"]},
)
report = resp.json()
if report["conforms"]:
print("All data conforms to shapes")
else:
print(f"Found {len(report['results'])} violation(s):")
for r in report["results"]:
print(f" - {r['focusNode']}: {r.get('resultMessage', 'constraint violated')}")
Shape Format Support¶
The SHACL parser accepts shapes in multiple RDF serialization formats:
| Format | Content-Type | File Extension |
|---|---|---|
| Turtle | text/turtle |
.ttl |
| JSON-LD | application/ld+json |
.jsonld |
| RDF/XML | application/rdf+xml |
.rdf |
Turtle Is Recommended
Turtle is the most readable format for SHACL shape definitions and is used in most W3C examples. Use JSON-LD if you need to integrate with JSON-based tooling.
Interpreting Validation Reports¶
Each ValidationResult in a report contains the following fields:
| Field | Description |
|---|---|
focusNode |
The IRI of the node that was validated |
resultPath |
The property path that was evaluated (if applicable) |
value |
The value that caused the violation (if applicable) |
sourceShape |
The IRI of the shape that defined the constraint |
sourceConstraintComponent |
The IRI identifying the type of constraint (e.g., sh:MinCountConstraintComponent) |
resultSeverity |
sh:Violation, sh:Warning, or sh:Info |
resultMessage |
Human-readable explanation (from sh:message on the shape, or auto-generated) |
Common constraint component IRIs:
| Constraint Component | Meaning |
|---|---|
sh:MinCountConstraintComponent |
Too few values |
sh:MaxCountConstraintComponent |
Too many values |
sh:DatatypeConstraintComponent |
Wrong literal datatype |
sh:ClassConstraintComponent |
Wrong rdf:type |
sh:NodeKindConstraintComponent |
Wrong node kind (IRI/BlankNode/Literal) |
sh:PatternConstraintComponent |
String does not match regex |
sh:MinInclusiveConstraintComponent |
Value below minimum |
sh:MaxInclusiveConstraintComponent |
Value above maximum |
sh:InConstraintComponent |
Value not in allowed set |
sh:ClosedConstraintComponent |
Unexpected property on closed shape |
sh:SPARQLConstraintComponent |
Custom SPARQL constraint violated |
Performance Considerations¶
- Shape complexity: Shapes with deep nesting (
sh:nodereferencing shapes with their ownsh:node) multiply the validation cost. Keep nesting depth below 4-5 levels for interactive workloads. - Broad targets:
sh:targetClasson a class with millions of instances validates every instance. Usemax_focus_nodesto cap this, or prefersh:targetNodefor spot-checks. - Repetition paths:
sh:zeroOrMorePathandsh:oneOrMorePathperform BFS traversal and can visit the entire graph. Themax_path_stepslimit prevents runaway expansion. - SPARQL constraints: Custom SPARQL queries run against the full data graph. Index the predicates used in your constraint queries for best performance.
- Batch validation: When loading large datasets, disable
validate_on_insert, load all data, then run a single validation pass. This is orders of magnitude faster than per-triple validation.