Characters and Bytes
Zig does not have a separate character type for ordinary strings.
Characters and Bytes
Zig does not have a separate character type for ordinary strings.
A string is a sequence of bytes.
const message = "hello";
The bytes are:
104 101 108 108 111
These are the ASCII byte values for h, e, l, l, and o.
A byte has type u8.
const c: u8 = 'A';
The value of c is 65.
A character literal with one ASCII character gives a u8.
const a: u8 = 'a';
const newline: u8 = '\n';
const tab: u8 = '\t';
Escape sequences are used for bytes that are hard to write directly.
| Escape | Meaning |
|---|---|
\n |
newline |
\t |
tab |
\r |
carriage return |
\\ |
backslash |
\" |
double quote |
\' |
single quote |
A string literal may contain escape sequences.
const text = "one\ntwo\n";
This contains two newline bytes.
Printing it gives:
one
two
A string literal has a sentinel-terminated array type. In ordinary code, it is often used as a slice of constant bytes.
const name: []const u8 = "zig";
Read this as: name is a slice of constant u8 values.
The elements cannot be changed through name.
name[0] = 'Z'; // error
A mutable byte array can be changed.
var name = [_]u8{ 'z', 'i', 'g' };
name[0] = 'Z';
Now the array contains:
Zig
A string is not the same thing as text in the full human sense. Zig strings are bytes. Text encoding is a separate matter.
Most modern text uses UTF-8. UTF-8 stores some characters in one byte and others in several bytes.
const s = "é";
This looks like one character, but in UTF-8 it uses two bytes.
195 169
So this is not a good way to count human characters:
const s = "é";
const n = s.len; // 2
len counts bytes, not Unicode characters.
For ASCII text, one byte usually corresponds to one visible character.
const s = "abc";
Here s.len is 3.
For UTF-8 text, byte length and character count may differ.
const s = "hello 世界";
The visible text has fewer characters than its byte length.
This is deliberate. Zig keeps the low-level representation clear. A string is bytes. If the program needs Unicode rules, it must use code that understands Unicode.
A byte can be printed as a character with {c}.
const std = @import("std");
pub fn main() void {
const c: u8 = 'A';
std.debug.print("{c}\n", .{c});
}
The output is:
A
The same byte can be printed as a number with {d}.
std.debug.print("{d}\n", .{c});
The output is:
65
This is often useful when inspecting data.
A simple loop over a string visits bytes.
const std = @import("std");
pub fn main() void {
const s = "abc";
for (s) |b| {
std.debug.print("{c} {d}\n", .{ b, b });
}
}
The output is:
a 97
b 98
c 99
Each value b has type u8.
Use u8 when you mean a byte. Use []const u8 when you mean a read-only byte string. Treat Unicode as an encoding problem, not as a hidden language feature.
Exercises:
-
Declare a
u8with value'A'and print it with{c}and{d}. -
Write a string containing a newline and print it.
-
Create a mutable array containing the bytes for
cat, then change it tobat. -
Loop over
"zig"and print each byte as both a character and a number. -
Check the
.lenof"é"and explain why it is not1.