Which python data structure is best to grow dynamically for my use case?

I writing a tool to analyse test reports and create a summary report of the containing data values.
From several reports in this format (could be thousands of those csv files in the end)…

TEST CASE;PARAMETER;MIN;MAX;VALUE;UNIT;RESULT
TestRun;Serial;;;000059;;
TestRun;Date;;;20200220;;
Test Run;Start Time;;;132547;;
TestCase1;Param1;0;100;92;mV;Pass
TestCase1;Param2;0;100;0;mV;Pass

TEST CASE;PARAMETER;MIN;MAX;VALUE;UNIT;RESULT
TestRun;Serial;;;000060;;
TestRun;Date;;;20200220;;
Test Run;Start Time;;;132722;;
TestCase2;Param1;0;100;130;mV;Fail
TestCase2;Param2;0;100;12;mV;Pass

TEST CASE;PARAMETER;MIN;MAX;VALUE;UNIT;RESULT
TestRun;Serial;;;000061;;
TestRun;Date;;;20200220;;
Test Run;Start Time;;;132921;;
TestCase1;Param1;0;100;93;mV;Pass
TestCase1;Param2;0;100;1;mV;Pass
TestCase2;Param1;0;100;131;mV;Fail
TestCase2;Param2;0;100;13;mV;Pass

…my code should create just one summary in this format with one line per processed report:

TestRun_Serial;TestRun_Date;TestRun_StartTime;TestCase1_Param1;TestCase1_Param2;TestCase2_Param1;TestCase2_Param2
000059;20200220;132547;92;0;na;na
000060;20200220;132722;na;na;130;12
000061;20200220;132921;93;1;131;13

One important thing to know is that the test case names as well as the param names are not fixed. This means when iterating over the csv files I will come across new test case names and new param names that have not been part of a previous processed report. So, each time I have a new test case/param combination I would like to extend to a data structure with an additional ‘column’.

In my current code I’m reading the report files into pandas dataframes to access the needed values for each report. My main question now: What is a suitable data structure to collect the data in and write it to a file at the end? I was thinking of another pandas dataframe but learned that growing a dataframe dynamically is a bad bad idea from a performance perspective.
So what would be the preferrable aproach here instead? Is there something like a dictanary with several values for the same key?
Here is the relevant snippet of my current code.. how to continue?

with open(report_summary_path, "w", newline='') as summary_file:
summary_writer = csv.writer(summary_file, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL)

for report_file in glob.glob(pathname_pattern):
    with open(report_file) as current_report:
        # read from each report file into pandas DataFrame
        df_in = pd.read_csv(current_report, delimiter=';', header=0,
                            names=['TEST CASE', 'PARAMETER', 'MIN', 'MAX', 'VALUE', 'UNIT', 'RESULT'],
                            index_col=['TEST CASE', 'PARAMETER'], usecols=['TEST CASE', 'PARAMETER', 'VALUE'])

        for idx, data in df_in.groupby(level='PARAMETER'):
            test_case = f'{data.index.values[0][0]}'
            test_param = f'{data.index.values[0][1]}'
            data_value = data.values[0][0]

Go to Source
Author: SeBASStian