Skip to content

ENH: Include df.attrs in to_csv output #53577

@canthonyscott

Description

@canthonyscott

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

There are many use cases (especially in the scientific community) where the best/only course of action is to enable to embedding of configuration parameters and/or other metadata into the beginning of a CSV file itself. These are typically prefaced with some comment-indication prefix such as #. This maintains human readability while attaching the metadata to the generated file itself.

Pandas' read_csv method already implements a feature to read such files and ignore these lines when parsing the the data into a dataframe. This new feature would implements the complement of this feature. It allows users to write these metadata and/or comment lines in their CSV outputs as well.

This could be accomplished file handlers (thanks @twoertwein)

with open("test.csv", mode="wt") as handle:
    handle.write(comments)
    dataframe.to_csv(handle)

However, adding the comment param to the to_csv would better match to read_csv method.

Feature Description

A new function would be implemented to write commend lines using the csv writer

def _save_comment_lines(self) -> None:
    if self.comment_lines:
        for line in self.comment_lines:
            self.writer.writerow([f"{self.comment}" + line])

This could then be called in the _save method

def _save(self) -> None:
        if self.comment:  # Addition here
            self._save_comment_lines()  # Addition here
        if self._need_to_save_header:
            self._save_header()
        self._save_body()

Alternative Solutions

Technically, using the file handlers method mentioned in the above would satisfy this feature request. However, it could be more logical for users to find if it mirrored the read_csv API.

An alternative, more complex, but perhaps more flexible solution could be to store the comment lines in the DataFrame object itself with a flag to automatically write those comment lines when to_csv is called. This way when to_csv is called the comments would be guaranteed to write. This would ensure the comments would be written in systems where the DataFrame writing to disk mechanism is abstracted away from the users code. This exists in situations where the pandas/python code is being run my a job submission/scheduling system.

Additional Context

I was a little exited and already created a PR for this feature. #53569

Apologies! I should have started here first. I am happy to close or modify it as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions