-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add quoted_strings to FormatOptions
#9221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
42b5c25 to
3bba1f3
Compare
| .unwrap() | ||
| .to_string(); | ||
|
|
||
| assert!(table.contains("| hello")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we assert the whole table instead of doing these contain calls, to make it easier to see what the full output looks like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use insta:s inline snapshot tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked yesterday already, so far Arrow does not use insta, but it might be very helpful for these cases, if maintainers agree to pull it in. For now I looked into using raw string literals.
The problem is with escaping. The current (as in prior to this PR) implementation already seems to misrender tables with escaped characters with the length calculation being off (the following is without my changes):
let schema = Arc::new(Schema::new(vec![Field::new(
"strings",
DataType::Utf8,
true,
)]));
let string_array = StringArray::from(vec![
Some("hello"),
Some("world"),
Some(""),
Some("tab\there"),
Some("newline\ntest"),
Some("quote\"test"),
Some("backslash\\test"),
None,
]);
let batch = RecordBatch::try_new(schema, vec![Arc::new(string_array)]).unwrap();
let options = FormatOptions::new().with_null("NULL");
let table = pretty_format_batches_with_options(&[batch], &options)
.unwrap()
.to_string();
let expected = r#"+-------------------+
| strings |
+-------------------+
| hello |
| world |
| |
| tab\there |
| newline\ntest |
| quote\"test |
| backslash\\test |
| NULL |
+-------------------+"#;
assert_eq!(expected, table, "Actual result:\n{table}");I hope to be able to look into this later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could introduce insta, whether in this PR or in followups; I would be happy to review such PRs. It looks like we do have insta in arrow-schema too: #8424
Rationale for this change
In some cases, it is desirable to print strings with surrounding quotation marks. A typical example that we run into in https://github.com/rerun-io/rerun is a
StructArraythat contains empty strings:Current formatting:
Added option in this PR:
What changes are included in this PR?
This PR relies on
std::fmt::Debugto do the actual formatting of strings, which means that all escaping is handled out of the box.Are these changes tested?
This PR contains test for different types of inputs, including escape sequences. Additionally, it also tests the
StructArrayexample outlined above.Are there any user-facing changes?
By default this option is false, making the feature opt-in.