Skip to content

Conversation

@grtlr
Copy link
Contributor

@grtlr grtlr commented Jan 19, 2026

Rationale for this change

In some cases, it is desirable to print strings with surrounding quotation marks. A typical example that we run into in https://github.com/rerun-io/rerun is a StructArray that contains empty strings:

Current formatting:

{name: }

Added option in this PR:

{name: ""}

What changes are included in this PR?

This PR relies on std::fmt::Debug to do the actual formatting of strings, which means that all escaping is handled out of the box.

Are these changes tested?

This PR contains test for different types of inputs, including escape sequences. Additionally, it also tests the StructArray example outlined above.

Are there any user-facing changes?

By default this option is false, making the feature opt-in.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 19, 2026
@grtlr grtlr force-pushed the grtlr/quoted-strings branch from 42b5c25 to 3bba1f3 Compare January 19, 2026 19:05
.unwrap()
.to_string();

assert!(table.contains("| hello"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we assert the whole table instead of doing these contain calls, to make it easier to see what the full output looks like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use insta:s inline snapshot tests

Copy link
Contributor Author

@grtlr grtlr Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked yesterday already, so far Arrow does not use insta, but it might be very helpful for these cases, if maintainers agree to pull it in. For now I looked into using raw string literals.

The problem is with escaping. The current (as in prior to this PR) implementation already seems to misrender tables with escaped characters with the length calculation being off (the following is without my changes):

        let schema = Arc::new(Schema::new(vec![Field::new(
            "strings",
            DataType::Utf8,
            true,
        )]));

        let string_array = StringArray::from(vec![
            Some("hello"),
            Some("world"),
            Some(""),
            Some("tab\there"),
            Some("newline\ntest"),
            Some("quote\"test"),
            Some("backslash\\test"),
            None,
        ]);

        let batch = RecordBatch::try_new(schema, vec![Arc::new(string_array)]).unwrap();

        let options = FormatOptions::new().with_null("NULL");
        let table = pretty_format_batches_with_options(&[batch], &options)
            .unwrap()
            .to_string();

        let expected = r#"+-------------------+
| strings           |
+-------------------+
| hello             |
| world             |
|                   |
| tab\there         |
| newline\ntest     |
| quote\"test      |
| backslash\\test  |
| NULL              |
+-------------------+"#;

        assert_eq!(expected, table, "Actual result:\n{table}");

I hope to be able to look into this later this week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could introduce insta, whether in this PR or in followups; I would be happy to review such PRs. It looks like we do have insta in arrow-schema too: #8424

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants