I have a pandas DataFrame that looks a bit like this:
id name date col1 col2 total 0 123456748 EXAMPLENAME 2020-09-01 SOMEDATA MOREDATA 5.99 1 123456748 EXAMPLENAME 2020-09-01 SOMEDATA OTHERDATA 25.99
There are 15 columns, the name values are associated with the ID, and the rest is some data relevant for that person.
col2 would have about 400 unique values. The database would be about 300,000,000 rows to start with, and then will grow at about 500,000 records per week.
The records in the database will never be updated or deleted, only new ones will be added. Final purpose of the database is to have a web app in which the user can select the ID of the person. The database would retrieve the information, and the website would render a graph and a dataframe. The expected traffic of the website is very low, so I was thinking about using SQLite.
Based on that, I have two questions:
- Should I use Relational Databases, like PostgreSQL or SQLite, or should I try MongoDB? I’m interest on the performance of the database to select and retrieve the data; don’t care too much about insert time as it won’t be done very often (once per week).
- Based on performance to query, in case you select Relational Databases, should I have all data in one table or should I split it (normalize it)? I read that normalizing a database when its purpose is only to query and store the data, could lead to worse performance than having it all in one table. However, I do not know much about databases and would prefer an expert opinion, or resources to learn more about the correct implementation and maintenance.
Go to Source
Author: Jose Vega