Read-only database: Normalize or not for best query performace

I have a pandas DataFrame that looks a bit like this:

         id        name       date     col1     col2  total
0 123456748 EXAMPLENAME 2020-09-01 SOMEDATA MOREDATA   5.99
1 123456748 EXAMPLENAME 2020-09-01 SOMEDATA OTHERDATA 25.99

There are 15 columns, the name values are associated with the ID, and the rest is some data relevant for that person. col2 would have about 400 unique values. The database would be about 300,000,000 rows to start with, and then will grow at about 500,000 records per week.

The records in the database will never be updated or deleted, only new ones will be added. Final purpose of the database is to have a web app in which the user can select the ID of the person. The database would retrieve the information, and the website would render a graph and a dataframe. The expected traffic of the website is very low, so I was thinking about using SQLite.

Based on that, I have two questions:

  1. Should I use Relational Databases, like PostgreSQL or SQLite, or should I try MongoDB? I’m interest on the performance of the database to select and retrieve the data; don’t care too much about insert time as it won’t be done very often (once per week).
  2. Based on performance to query, in case you select Relational Databases, should I have all data in one table or should I split it (normalize it)? I read that normalizing a database when its purpose is only to query and store the data, could lead to worse performance than having it all in one table. However, I do not know much about databases and would prefer an expert opinion, or resources to learn more about the correct implementation and maintenance.

Thanks.

Go to Source
Author: Jose Vega

What is the best database design for storing survey form with different types of questions and answer formats and branching is possible?

I would like to store the format of the survey form which can branch into different question based on
Questions can be video, audio, text and answer can be text, multiple choice, video, audio, geolocation etc. Also based on the answers of a question branching into different question should be possible. It should also be possible for user to fill the form in multiple session so some state should also be there. So the answers to the columns can be missing due to branching as well as the response being incomplete. There is a need of fast filtering and analysis of the database. Also, it should be possible to extract all the responses of a particular form in CSV file. What would be the best implementation for this problem?

Go to Source
Author: Shrey Paharia

SQL Server Slowest Query is NULL

I am looking at both the SQL Server expensive queries report and the query below, but both are showing this mysterious NULL query as the slowest query on my server.

Is there any way I can find out more about this NULL query and why it might be so slow?

Is this some internal query? It doesn’t seem like this should be showing up in the report if so.

enter image description here

This is the query which is also showing NULL as the slowest query on my server:

select 
    r.session_id,
    r.status,
    r.command,
    r.cpu_time,
    r.total_elapsed_time,
    t.text
from 
    sys.dm_exec_requests as r
cross apply 
    sys.dm_exec_sql_text(r.sql_handle) as t

enter image description here

How can I find out what this query is and why it’s so slow?

Go to Source
Author: user1477388

MySQL InnoDB Weird Query Performance

I designed two tables, both using InnoDB.

The first one has columns “Offset, Size, and ColumnToBeIndexed” with BIGINT “Offset” being the primary key, and the second one has columns “OffsetAndSize, and ColumnToBeIndexed” with BIGINT “OffsetAndSize” being the primary key. And there are both 15 millions rows in each table.

I added indexes to both tables on “ColumnToBeIndexed.

My two queries for them are

SELECT Offset, Size 
FROM someTable 
WHERE ColumnToBeIndex BETWEEN 20 AND 10000 
ORDER BY Offset ASC

and

SELECT OffsetAndSize 
FROM someTable 
WHERE ColumnToBeIndex BETWEEN 20 AND 10000 
ORDER BY OffsetAndSize ASC

Because the second query could use the secondary index and does not need to look up the clustered index to find the “size” information, I expected that the second query on the second table performs better than the first query on the first table. However, as the test came out, it turned out that the first query performs better every single time.

Does anybody out there know what seems to be the problem?

Go to Source
Author: Bruce