-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
There are many use cases (especially in the scientific community) where the best/only course of action is to enable to embedding of configuration parameters and/or other metadata into the beginning of a CSV file itself. These are typically prefaced with some comment-indication prefix such as #. This maintains human readability while attaching the metadata to the generated file itself.
Pandas' read_csv
method already implements a feature to read such files and ignore these lines when parsing the the data into a dataframe. This new feature would implements the complement of this feature. It allows users to write these metadata and/or comment lines in their CSV outputs as well.
This could be accomplished file handlers (thanks @twoertwein)
with open("test.csv", mode="wt") as handle:
handle.write(comments)
dataframe.to_csv(handle)
However, adding the comment param to the to_csv
would better match to read_csv
method.
Feature Description
A new function would be implemented to write commend lines using the csv writer
def _save_comment_lines(self) -> None:
if self.comment_lines:
for line in self.comment_lines:
self.writer.writerow([f"{self.comment}" + line])
This could then be called in the _save
method
def _save(self) -> None:
if self.comment: # Addition here
self._save_comment_lines() # Addition here
if self._need_to_save_header:
self._save_header()
self._save_body()
Alternative Solutions
Technically, using the file handlers method mentioned in the above would satisfy this feature request. However, it could be more logical for users to find if it mirrored the read_csv
API.
An alternative, more complex, but perhaps more flexible solution could be to store the comment lines in the DataFrame object itself with a flag to automatically write those comment lines when to_csv
is called. This way when to_csv
is called the comments would be guaranteed to write. This would ensure the comments would be written in systems where the DataFrame writing to disk mechanism is abstracted away from the users code. This exists in situations where the pandas/python code is being run my a job submission/scheduling system.
Additional Context
I was a little exited and already created a PR for this feature. #53569
Apologies! I should have started here first. I am happy to close or modify it as needed.