rhai/doc/src/language/strings-chars.md

126 lines
4.1 KiB
Markdown
Raw Normal View History

2020-06-20 06:06:17 +02:00
Strings and Characters
=====================
{{#include ../links.md}}
String in Rhai contain any text sequence of valid Unicode characters.
Internally strings are stored in UTF-8 encoding.
2020-07-13 07:41:01 +02:00
Strings can be built up from other strings and types via the `+` operator
(provided by the [`MoreStringPackage`][packages] but excluded if using a [raw `Engine`]).
This is particularly useful when printing output.
2020-06-20 06:06:17 +02:00
[`type_of()`] a string returns `"string"`.
The maximum allowed length of a string can be controlled via `Engine::set_max_string_size`
(see [maximum length of strings]).
2020-07-13 07:41:01 +02:00
2020-06-20 06:06:17 +02:00
The `ImmutableString` Type
-------------------------
All strings in Rhai are implemented as `ImmutableString` (see [standard types]).
2020-07-22 15:32:56 +02:00
An `ImmutableString` does not change and can be shared.
2020-06-20 06:06:17 +02:00
2020-07-22 15:32:56 +02:00
Modifying an `ImmutableString` causes it first to be cloned, and then the modification made to the copy.
### **IMPORTANT** - Avoid `String` Parameters
`ImmutableString` should be used in place of `String` for function parameters because using
`String` is very inefficient (the `String` argument is cloned during every call).
2020-07-13 07:41:01 +02:00
A alternative is to use `&str` which maps straight to `ImmutableString`.
2020-06-20 06:06:17 +02:00
String and Character Literals
----------------------------
String and character literals follow C-style formatting, with support for Unicode ('`\u`_xxxx_' or '`\U`_xxxxxxxx_')
and hex ('`\x`_xx_') escape sequences.
Hex sequences map to ASCII characters, while '`\u`' maps to 16-bit common Unicode code points and '`\U`' maps the full,
32-bit extended Unicode code points.
Standard escape sequences:
| Escape sequence | Meaning |
| --------------- | ------------------------------ |
| `\\` | back-slash `\` |
| `\t` | tab |
| `\r` | carriage-return `CR` |
| `\n` | line-feed `LF` |
| `\"` | double-quote `"` in strings |
| `\'` | single-quote `'` in characters |
| `\x`_xx_ | Unicode in 2-digit hex |
| `\u`_xxxx_ | Unicode in 4-digit hex |
| `\U`_xxxxxxxx_ | Unicode in 8-digit hex |
Differences from Rust Strings
----------------------------
Internally Rhai strings are stored as UTF-8 just like Rust (they _are_ Rust `String`'s!),
but nevertheless there are major differences.
In Rhai a string is the same as an array of Unicode characters and can be directly indexed (unlike Rust).
This is similar to most other languages where strings are internally represented not as UTF-8 but as arrays of multi-byte
Unicode characters.
Individual characters within a Rhai string can also be replaced just as if the string is an array of Unicode characters.
2020-07-13 07:41:01 +02:00
In Rhai, there are also no separate concepts of `String` and `&str` as in Rust.
2020-06-20 06:06:17 +02:00
Examples
--------
```rust
let name = "Bob";
let middle_initial = 'C';
let last = "Davis";
let full_name = name + " " + middle_initial + ". " + last;
full_name == "Bob C. Davis";
// String building with different types
let age = 42;
let record = full_name + ": age " + age;
record == "Bob C. Davis: age 42";
// Unlike Rust, Rhai strings can be indexed to get a character
// (disabled with 'no_index')
let c = record[4];
c == 'C';
ts.s = record; // custom type properties can take strings
let c = ts.s[4];
c == 'C';
let c = "foo"[0]; // indexing also works on string literals...
c == 'f';
let c = ("foo" + "bar")[5]; // ... and expressions returning strings
c == 'r';
// Escape sequences in strings
record += " \u2764\n"; // escape sequence of '❤' in Unicode
record == "Bob C. Davis: age 42 ❤\n"; // '\n' = new-line
// Unlike Rust, Rhai strings can be directly modified character-by-character
// (disabled with 'no_index')
record[4] = '\x58'; // 0x58 = 'X'
record == "Bob X. Davis: age 42 ❤\n";
// Use 'in' to test if a substring (or character) exists in a string
"Davis" in record == true;
'X' in record == true;
'C' in record == false;
// Strings can be iterated with a 'for' statement, yielding characters
for ch in record {
print(ch);
}
```