Should String Elements and UTF-8 Elements be handled differently?

EBML [differentiates](https://www.rfc-editor.org/rfc/rfc8794.html#name-string-element) between String Elements and UTF-8 Elements.

They are almost exactly the same, except for this:
- "String" only allows ASCII and terminator bytes.
- "UTF-8" only allows UTF-8 and terminator bytes.

Since UTF-8 is compatible with ASCII, we can just treat both Elements like UTF-8 without any parsing failures. And it is what the library currently does:
https://github.com/rust-av/matroska/blob/7f15b7c9c6e7d2ad0749da053b77f051c5e4e146/src/ebml/parse.rs#L80-L84

However, this is technically not spec-compliant as we should reject non-ASCII (or terminator) values if we have a String Element, while they may be allowed for UTF-8 Elements.

A small overview:

| String = UTF-8 | String != UTF-8 |
| - | - |
| only `String` | `String` + new type |
| `from_utf8` (std) | `from_utf8` (std) + `from_ascii` (custom, maybe faster?) |
| mostly compliant | compliant |

Honestly, I can't think of many advantages to implementing this. We won't even "save" any memory because Rust strings are all UTF-8 internally (so a String made up of only ASCII will only use one byte per character regardless). But I wanted to at least put this information somewhere. What do you think?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should String Elements and UTF-8 Elements be handled differently? #134

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	impl<'a> EbmlParsable<'a> for String {
	fn try_parse(data: &'a [u8]) -> Result<Self, ErrorKind> {
	String::from_utf8(data.to_vec()).map_err(\|_\| ErrorKind::StringNotUtf8)
	}
	}

String = UTF-8	String != UTF-8
only `String`	`String` + new type
`from_utf8` (std)	`from_utf8` (std) + `from_ascii` (custom, maybe faster?)
mostly compliant	compliant

Should String Elements and UTF-8 Elements be handled differently? #134

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions