2020-07-11 09:09:17 +02:00
Extend Rhai with Custom Syntax
=============================
2020-07-09 16:21:07 +02:00
{{#include ../links.md}}
2020-07-10 16:01:47 +02:00
2020-09-28 16:14:19 +02:00
For the ultimate adventurous, there is a built-in facility to _extend_ the Rhai language
2020-07-10 16:01:47 +02:00
with custom-defined _syntax_ .
But before going off to define the next weird statement type, heed this warning:
Don't Do It™
------------
Stick with standard language syntax as much as possible.
Having to learn Rhai is bad enough, no sane user would ever want to learn _yet_ another
obscure language syntax just to do something.
Try to use [custom operators] first. Defining a custom syntax should be considered a _last resort_ .
Where This Might Be Useful
-------------------------
* Where an operation is used a _LOT_ and a custom syntax saves a lot of typing.
* Where a custom syntax _significantly_ simplifies the code and _significantly_ enhances understanding of the code's intent.
2020-09-28 16:14:19 +02:00
* Where certain logic cannot be easily encapsulated inside a function.
2020-07-10 16:01:47 +02:00
* Where you just want to confuse your user and make their lives miserable, because you can.
2020-07-22 11:05:13 +02:00
Step One - Design The Syntax
2020-07-10 16:01:47 +02:00
---------------------------
A custom syntax is simply a list of symbols.
These symbol types can be used:
* Standard [keywords ]({{rootUrl}}/appendix/keywords.md )
* Standard [operators ]({{rootUrl}}/appendix/operators.md#operators ).
* Reserved [symbols ]({{rootUrl}}/appendix/operators.md#symbols ).
* Identifiers following the [variable] naming rules.
* `$expr$` - any valid expression, statement or statement block.
* `$block$` - any valid statement block (i.e. must be enclosed by `'{'` .. `'}'` ).
* `$ident$` - any [variable] name.
2020-10-25 14:57:18 +01:00
### The First Symbol Must be an Identifier
2020-07-10 16:01:47 +02:00
There is no specific limit on the combination and sequencing of each symbol type,
2020-07-17 08:50:23 +02:00
except the _first_ symbol which must be a custom keyword that follows the naming rules
of [variables].
2020-07-10 16:01:47 +02:00
2020-10-25 14:57:18 +01:00
The first symbol also cannot be a normal or reserved [keyword].
In other words, any valid identifier that is not a [keyword] will work fine.
2020-07-10 16:01:47 +02:00
### The First Symbol Must be Unique
Rhai uses the _first_ symbol as a clue to parse custom syntax.
Therefore, at any one time, there can only be _one_ custom syntax starting with each unique symbol.
Any new custom syntax definition using the same first symbol simply _overwrites_ the previous one.
### Example
```rust
exec $ident$ < - $ expr $ : $ block $
```
The above syntax is made up of a stream of symbols:
| Position | Input | Symbol | Description |
| :------: | :---: | :-------: | -------------------------------------------------------------------------------------------------------- |
| 1 | | `exec` | custom keyword |
| 2 | 1 | `$ident$` | a variable name |
| 3 | | `<-` | the left-arrow symbol (which is a [reserved symbol ]({{rootUrl}}/appendix/operators.md#symbols ) in Rhai). |
| 4 | 2 | `$expr$` | an expression, which may be enclosed with `{` .. `}` , or not. |
| 5 | | `:` | the colon symbol |
| 6 | 3 | `$block$` | a statement block, which must be enclosed with `{` .. `}` . |
This syntax matches the following sample code and generates three inputs (one for each non-keyword):
```rust
// Assuming the 'exec' custom syntax implementation declares the variable 'hello':
let x = exec hello < - foo ( 1 , 2 ) : {
hello += bar(hello);
baz(hello);
};
print(x); // variable 'x' has a value returned by the custom syntax
print(hello); // variable declared by a custom syntax persists!
```
2020-07-22 11:05:13 +02:00
Step Two - Implementation
-------------------------
2020-07-10 16:01:47 +02:00
Any custom syntax must include an _implementation_ of it.
### Function Signature
The function signature of an implementation is:
2020-10-19 13:11:55 +02:00
> `Fn(context: &mut EvalContext, inputs: &[Expression]) -> Result<Dynamic, Box<EvalAltResult>>`
2020-07-10 16:01:47 +02:00
where:
2020-10-27 02:56:37 +01:00
| Parameter | Type | Description |
| ----------------------------- | :-----------------------------: | ------------------------------------------------------------------------------------- |
| `context` | `&mut EvalContext` | mutable reference to the current evaluation _context_ |
| - `context.scope` | `&mut Scope` | mutable reference to the current [`Scope`]; variables can be added to/removed from it |
| - `context.engine()` | `&Engine` | reference to the current [`Engine`] |
2020-11-07 16:33:21 +01:00
| - `context.imports()` | `&Imports` | reference to the current stack of [modules] imported via `import` statements |
2020-10-27 02:56:37 +01:00
| - `context.iter_namespaces()` | `impl Iterator<Item = &Module>` | iterator of the namespaces (as [modules]) containing all script-defined functions |
| - `context.this_ptr()` | `Option<&Dynamic>` | reference to the current bound [`this`] pointer, if any |
| - `context.call_level()` | `usize` | the current nesting level of function calls |
| `inputs` | `&[Expression]` | a list of input expression trees |
2020-10-11 15:58:11 +02:00
2020-10-27 02:56:37 +01:00
### Return Value
2020-07-10 16:01:47 +02:00
2020-10-27 02:56:37 +01:00
Return value is the result of evaluating the custom syntax expression.
2020-10-25 14:57:18 +01:00
2020-07-10 16:01:47 +02:00
### Access Arguments
The most important argument is `inputs` where the matched identifiers (`$ident$`), expressions/statements (`$expr$`)
2020-10-07 04:43:53 +02:00
and statement blocks (`$block$`) are provided.
2020-07-10 16:01:47 +02:00
To access a particular argument, use the following patterns:
2020-07-11 09:09:17 +02:00
| Argument type | Pattern (`n` = slot in `inputs` ) | Result type | Description |
| :-----------: | ---------------------------------------- | :----------: | ------------------ |
| `$ident$` | `inputs[n].get_variable_name().unwrap()` | `&str` | name of a variable |
| `$expr$` | `inputs.get(n).unwrap()` | `Expression` | an expression tree |
| `$block$` | `inputs.get(n).unwrap()` | `Expression` | an expression tree |
2020-07-10 16:01:47 +02:00
### Evaluate an Expression Tree
2020-10-11 15:58:11 +02:00
Use the `EvalContext::eval_expression_tree` method to evaluate an arbitrary expression tree
within the current evaluation context.
2020-07-10 16:01:47 +02:00
```rust
2020-10-19 13:11:55 +02:00
let expression = inputs.get(0).unwrap();
let result = context.eval_expression_tree(expression)?;
2020-07-10 16:01:47 +02:00
```
### Declare Variables
New variables maybe declared (usually with a variable name that is passed in via `$ident$).
2020-09-28 16:14:19 +02:00
It can simply be pushed into the [`Scope`].
2020-07-10 16:01:47 +02:00
However, beware that all new variables must be declared _prior_ to evaluating any expression tree.
2020-10-19 13:11:55 +02:00
In other words, any [`Scope`] calls that change the list of must come _before_ any
`EvalContext::eval_expression_tree` calls.
2020-07-10 16:01:47 +02:00
```rust
2020-10-19 13:11:55 +02:00
let var_name = inputs[0].get_variable_name().unwrap();
let expression = inputs.get(1).unwrap();
2020-07-10 16:01:47 +02:00
2020-10-19 13:11:55 +02:00
context.scope.push(var_name, 0 as INT); // do this BEFORE 'context.eval_expression_tree'!
2020-07-10 16:01:47 +02:00
2020-10-19 13:11:55 +02:00
let result = context.eval_expression_tree(expression)?;
2020-07-10 16:01:47 +02:00
```
2020-07-22 11:05:13 +02:00
Step Three - Register the Custom Syntax
--------------------------------------
2020-07-10 16:01:47 +02:00
Use `Engine::register_custom_syntax` to register a custom syntax.
Again, beware that the _first_ symbol must be unique. If there already exists a custom syntax starting
with that symbol, the previous syntax will be overwritten.
The syntax is passed simply as a slice of `&str` .
```rust
// Custom syntax implementation
fn implementation_func(
2020-10-11 15:58:11 +02:00
context: & mut EvalContext,
2020-07-22 07:08:51 +02:00
inputs: & [Expression]
2020-07-10 16:01:47 +02:00
) -> Result< Dynamic , Box < EvalAltResult > > {
let var_name = inputs[0].get_variable_name().unwrap().to_string();
let stmt = inputs.get(1).unwrap();
let condition = inputs.get(2).unwrap();
2020-10-19 13:11:55 +02:00
// Push one new variable into the scope BEFORE 'context.eval_expression_tree'
context.scope.push(var_name, 0 as INT);
2020-07-10 16:01:47 +02:00
loop {
// Evaluate the statement block
2020-10-19 13:11:55 +02:00
context.eval_expression_tree(stmt)?;
2020-07-10 16:01:47 +02:00
// Evaluate the condition expression
2020-10-19 13:11:55 +02:00
let stop = !context.eval_expression_tree(condition)?
2020-10-11 15:58:11 +02:00
.as_bool().map_err(|err| Box::new(
EvalAltResult::ErrorMismatchDataType(
"bool".to_string(),
err.to_string(),
condition.position(),
)
))?;
2020-07-10 16:01:47 +02:00
if stop {
break;
}
}
Ok(().into())
}
2020-10-25 14:57:18 +01:00
// Register the custom syntax (sample): exec |x| -> { x += 1 } while x < 0 ;
2020-07-10 16:01:47 +02:00
engine.register_custom_syntax(
2020-10-25 14:57:18 +01:00
& [ "exec", "|", "$ident$", "|", "->", "$block$", "while", "$expr$" ], // the custom syntax
2020-07-10 16:01:47 +02:00
1, // the number of new variables declared within this custom syntax
implementation_func
)?;
```
2020-07-22 11:05:13 +02:00
Step Four - Disable Unneeded Statement Types
2020-07-10 16:01:47 +02:00
-------------------------------------------
When a DSL needs a custom syntax, most likely than not it is extremely specialized.
Therefore, many statement types actually may not make sense under the same usage scenario.
So, while at it, better [disable][disable keywords and operators] those built-in keywords
and operators that should not be used by the user. The would leave only the bare minimum
language surface exposed, together with the custom syntax that is tailor-designed for
the scenario.
A keyword or operator that is disabled can still be used in a custom syntax.
In an extreme case, it is possible to disable _every_ keyword in the language, leaving only
custom syntax (plus possibly expressions). But again, Don't Do It™ - unless you are certain
of what you're doing.
2020-07-22 11:05:13 +02:00
Step Five - Document
--------------------
2020-07-10 16:01:47 +02:00
For custom syntax, documentation is crucial.
Make sure there are _lots_ of examples for users to follow.
2020-07-22 11:05:13 +02:00
Step Six - Profit!
------------------
2020-10-25 14:57:18 +01:00
Really Advanced - Low Level Custom Syntax API
--------------------------------------------
Sometimes it is desirable to have multiple custom syntax starting with the
same symbol. This is especially common for _command-style_ syntax where the
second symbol calls a particular command:
```rust
// The following simulates a command-style syntax, all starting with 'perform'.
2020-10-26 12:46:58 +01:00
perform hello world; // A fixed sequence of symbols
2020-10-25 14:57:18 +01:00
perform action 42; // Perform a system action with a parameter
perform update system; // Update the system
perform check all; // Check all system settings
perform cleanup; // Clean up the system
perform add something; // Add something to the system
perform remove something; // Delete something from the system
```
For even more flexibility, there is a _low level_ API for custom syntax that
allows the registration of an entire mini-parser.
Use `Engine::register_custom_syntax_raw` to register a custom syntax _parser_
together with the implementation function:
```rust
engine.register_custom_syntax_raw(
2020-10-26 12:46:58 +01:00
"perform",
2020-10-25 14:57:18 +01:00
|stream| match stream.len() {
2020-10-26 12:46:58 +01:00
// perform ...
2020-10-27 02:56:37 +01:00
1 => Ok(Some("$ident$".to_string())),
2020-10-26 12:46:58 +01:00
// perform command ...
2 => match stream[1].as_str() {
2020-10-27 02:56:37 +01:00
"action" => Ok(Some("$expr$".to_string())),
"hello" => Ok(Some("world".to_string())),
"update" | "check" | "add" | "remove" => Ok(Some("$ident$".to_string())),
2020-10-26 12:46:58 +01:00
"cleanup" => Ok(None),
cmd => Err(ParseError(Box::new(ParseErrorType::BadInput(
2020-11-02 05:50:27 +01:00
LexError::ImproperSymbol(format!("Improper command: {}", cmd))
)), NO_POS)),
2020-10-26 12:46:58 +01:00
},
// perform command arg ...
3 => match (stream[1].as_str(), stream[2].as_str()) {
("action", _) => Ok(None),
("hello", "world") => Ok(None),
("update", arg) if arg == "system" => Ok(None),
("update", arg) if arg == "client" => Ok(None),
("check", arg) => Ok(None),
("add", arg) => Ok(None),
("remove", arg) => Ok(None),
(cmd, arg) => Err(ParseError(Box::new(ParseErrorType::BadInput(
2020-11-02 05:50:27 +01:00
LexError::ImproperSymbol(
format!("Invalid argument for command {}: {}", cmd, arg)
)
)), NO_POS)),
2020-10-26 12:46:58 +01:00
},
2020-10-25 14:57:18 +01:00
_ => unreachable!(),
2020-10-26 12:46:58 +01:00
},
0, // the number of new variables declared within this custom syntax
implementation_func
);
2020-10-25 14:57:18 +01:00
```
### Function Signature
The custom syntax parser has the following signature:
2020-10-27 02:56:37 +01:00
> `Fn(stream: &[String]) -> Result<Option<String>, ParseError>`
2020-10-25 14:57:18 +01:00
where:
2020-10-27 02:56:37 +01:00
| Parameter | Type | Description |
| --------- | :---------: | -------------------------------------------------------------------------------------------------- |
| `stream` | `&[String]` | a slice of symbols that have been parsed so far, possibly containing `"$expr$"` and/or `"$block$"` |
2020-10-25 14:57:18 +01:00
### Return Value
2020-10-27 02:56:37 +01:00
The return value is `Result<Option<String>, ParseError>` where:
2020-10-25 14:57:18 +01:00
2020-11-02 05:50:27 +01:00
| Value | Description |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Ok(None)` | parsing complete and there are no more symbols to match |
| `Ok(Some(symbol))` | next symbol to match, which can also be `"$expr$"` , `"$ident$"` or `"$block$"` |
| `Err(ParseError)` | error that is reflected back to the [`Engine`].< br /> Normally this is `ParseError(ParseErrorType::BadInput(LexError::ImproperSymbol(message)), NO_POS)` to indicate that there is a syntax error, but it can be any `ParseError` . |