How to model database for system with historic relationships (many-to-many and one-to-many)

I’m working on a system with a main entity, let’s say “Members”, these members have relationships to other entities and it’s important to keep the history to be able to query and know the relationships at a given time in history.

We mainly have two kind of relationships:

  • Company (most exist and can only be on company at a given time)
  • Tags (optional, multiple tags at any given time)

After looking at this I was thinking that I could model the relationships with dates on the “relation table” something like this:

Tag:
- Id
- Text

Company
- Id 
- Name

Member:
- Id
- Name

Member2Tag
- Id
- MemberId
- TagId
- Added (DateTime)
- Removed (Nullable DateTime)

Member2Company
- Id
- MemberId
- CompanyId
- Added (DateTime)
- Removed (Nullable DateTime)

This way I’ve managed to query for relationships at any given time in history, but I started thinking about this and felt a little unsure if the “Removed”-column is really needed, I guess it is since I can’t remove the relation-row (need to keep history).

I just wanted some feedback on the design, what do you think? Have someone else faced a similar problem and used another solution?

All the best!

Go to Source
Author: Markus Knappen Johansson

Read-only database: Normalize or not for best query performace

I have a pandas DataFrame that looks a bit like this:

         id        name       date     col1     col2  total
0 123456748 EXAMPLENAME 2020-09-01 SOMEDATA MOREDATA   5.99
1 123456748 EXAMPLENAME 2020-09-01 SOMEDATA OTHERDATA 25.99

There are 15 columns, the name values are associated with the ID, and the rest is some data relevant for that person. col2 would have about 400 unique values. The database would be about 300,000,000 rows to start with, and then will grow at about 500,000 records per week.

The records in the database will never be updated or deleted, only new ones will be added. Final purpose of the database is to have a web app in which the user can select the ID of the person. The database would retrieve the information, and the website would render a graph and a dataframe. The expected traffic of the website is very low, so I was thinking about using SQLite.

Based on that, I have two questions:

  1. Should I use Relational Databases, like PostgreSQL or SQLite, or should I try MongoDB? I’m interest on the performance of the database to select and retrieve the data; don’t care too much about insert time as it won’t be done very often (once per week).
  2. Based on performance to query, in case you select Relational Databases, should I have all data in one table or should I split it (normalize it)? I read that normalizing a database when its purpose is only to query and store the data, could lead to worse performance than having it all in one table. However, I do not know much about databases and would prefer an expert opinion, or resources to learn more about the correct implementation and maintenance.

Thanks.

Go to Source
Author: Jose Vega

Multiple intersection tables vs multiple joins

I have a hierarchical relationship between my tables, with the children having foreign keys referring back to their parent ids (assuming id is the primary key for each table):

Department has many Category Groups
Category Group has many Category(-ies)
Category has many Sub-Category(-ies)
Sub-Category has many Attributes.

Now, all these entities except for Attributes are optional meaning if I don’t select anything on my hierarchical cascading dropdown based UI, I need to display the Attributes that belong to all Departments, if I only select a Department then I need to display Attributes that belong to all Category Groups belonging to that Department and so on.

Obviously, one option to implement it is to do a inner join between all the tables to get to Attribute. For instance, if nothing is selected it will be:
Department inner join Category Group
inner join Category
inner join Sub-Category
inner join Attribute
to show all the attributes belonging to all departments.

The other thought in my head is to have intersection/relation mapping table(s) –
DepartmentAttributeRelation which has foreign keys to Department and Attribute,
CategoryGroupAttributeRelation which has foreign keys to CategoryGroup and Attribute and so on.
This will enable direct search to get to the Attributes given any entity.

My question is – Are there any downsides to the second approach above or are there any better approaches to solve this?

Go to Source
Author: linuxNoob

Restaurant reservations – Tables combinations

I have 3 tables in a reservation system for a restaurant. So far the software was used only by restaurant’s staff, however we want to accept reservations online as well. We have some small tables for 2 that can be easily moved to each other and make room for bigger parties and I want to accept reservations automatically if all of the tables that can be combined are available.

tables: holds all tables for each area in the restaurant.

| id | min_capacity | max_capacity | name | area   |
|----|--------------|--------------|------|--------|
| 1  | 2            | 4            | #1   | Inside |
| 2  | 6            | 8            | #2   | Inside |

reservations: holds reservation details

| id | datetime            | name     | status   |
|----|---------------------|----------|----------|
| 1  | 2020-09-01 20:00:00 | John Doe | Upcoming |
| 2  | 2020-09-05 13:00:00 | Jane Doe | Upcoming |

And one pivot table that holds reservation <=> table relation:

| id | table_id | reservation_id |
|----|----------|----------------|
| 1  | 1        | 1              |
| 2  | 2        | 2              |

How can I store different combinations of tables (manually entered) and “attach” reservations to tables/table combinations (so I can check if tables are available for specific time) efficiently?

Go to Source
Author: Clarissa

What is the best database design for storing survey form with different types of questions and answer formats and branching is possible?

I would like to store the format of the survey form which can branch into different question based on
Questions can be video, audio, text and answer can be text, multiple choice, video, audio, geolocation etc. Also based on the answers of a question branching into different question should be possible. It should also be possible for user to fill the form in multiple session so some state should also be there. So the answers to the columns can be missing due to branching as well as the response being incomplete. There is a need of fast filtering and analysis of the database. Also, it should be possible to extract all the responses of a particular form in CSV file. What would be the best implementation for this problem?

Go to Source
Author: Shrey Paharia

SQL Server Unique Constraint on two columns with an exception

Hi all and thanks for your advice.

Expense(SupplierID(Foreign Key), DocumentID(vchar))

I understand how to add a simple unique constraint on two columns. However, if DocumentID = ‘NA’, I would like to ignore the rules of the constraint.

Some suppliers in our system do not provide an invoice id, for example. Therefore, I leave the field NULL. I would like to remove all nulls for the field ‘DocumentID’ to avoid accounting for the NULLS in my client code.

I am new to SQL Server, but I could figure out how to do this using a trigger. The reason I’m asking here is to see if there is a better way to respond to this scenario or to avoid it by a different design.

Thanks!

Go to Source
Author: Tom Schreiner

Scaling out MySQL & Redundancy-Speed tradeoff?

I’m building an e-commerce service for a group of sellers. They have a common HQ who manufactures their product.

Tables:

  1. order (id, seller_id, timestamp)
  2. order_products (order_id, product_id, seller_id, timestamp, pincode)
  3. transaction (id, seller_id, timestamp)
  4. transaction_products (transaction_id, product_id, seller_id, timestamp, pincode)
  5. seller (id, pincode, name)
  6. product(id, price)

Specifications:

  1. There are 100 sellers
  2. Each vendor performs 500 transactions per day
  3. Each transaction has 4 products associated with it
  4. Each Vendor places two orders per day to HQ
  5. Each order have 50 products

HQ Requirements:

  1. How many products were sold by which seller in a given month
  2. How many products were sold in a given pincode in a given month
  3. Orders placed by all sellers in a given month

Seller Requirements:

  1. View cost of order placed by him/her (the seller)
  2. View his/her sales of a given month

The product is ready and application works just fine. But, I’m concerned with the two things.

  1. Scaling: Being really new, I don’t know much about scaling out or sharding or clustering. How much time have I got until I can keep these aside?
  2. Redundancy: As you can see in transaction_product & order_product, I’ve reused columns from transaction & order, respectively. The redundant columns are: timestamp, seller_id, pincode. My idea was to avoid joins. But I’m not sure if joins would be more expensive than current redundancy. Can anyone point me in the current direction?

Go to Source
Author: Koushik Shom Choudhury

common columns in all tables in mysql

I want to create a table like base_table with below columns –
id, created_at, created_by.

and for all other tables, I want created_at and create_by columns available through inheritance.
I don’t want to create these common columns in all other tables.

Go to Source
Author: zeeshank1

Finding Minimal Cover Set

Given a description of a database, I’m asked to find a suitable FDs, then find the minimal cover set of all the FDs and then design a relational schema.

My question is;
Say i came up with these FDs, FD = {FD_1, …, FD_n}.
Then now when finding the minimal cover set of F, should i consider the (PK, Fk) FDs with them?
That’s say i have foreign keys FDs such as, FK = {FK_1,…,FK_m}.
When finding the minimal cover set, should i consider (FD ⋃ FK) to work on or just work on the FD set? and why?

Go to Source
Author: Moosmir

Best approach for extending a foreign relationship for an existing table

I’m not sure the title accurately reflects my question. I have an existing Rate table that has an identity column key and contains a [Rate] column. col1 + col2 + col3 don’t uniquely identify a row:

+--------+------+------+------+------+
| RateId | col1 | col2 | col3 | Rate |
+--------+------+------+------+------+

I have a new table that needs to match a rate. The new table has 3 of the required columns to make a match but those columns will return a number of rows in the Rate table. To uniquely identify a rate for the new table I need to match on MaterialTypeId and UnitTypeId where UnitTypeId can be null.

+------------+------+------+------+----------------+------------+
| MaterialId | col1 | col2 | col3 | MaterialTypeId | UnitTypeId |
+------------+------+------+------+----------------+------------+

What is the best approach to resolving this? I could add the two additional columns to the Rate table but that would not be relevant for the current uses of the Rate table, i.e. the two additional columns would be null for all existing rows.

I could introduce an intermediate table and join on MaterialTypeId & UnitTypeID which would return multiple rows and then join using the existing columns to uniquely identify the rate:

+--------+----------------+------------+
| RateId | MaterialTypeId | UnitTypeId |
+--------+----------------+------------+

Is that the correct approach?

Go to Source
Author: David Clarke

When placing an order in a shop, should a shapshop be taken of the products in an order also?

When placing an order in a shop, should a shapshop be taken of the products in an order also?

…to prevent messing up with data updates, such as prices, titles, of the products that are placed in an order.

Namely, a customer buys 3 items: for $5, $10 and $33 and pays for them. All is well. When I as an owner of a shop step in and, before I’ve delivered those products to a customer, decide to descrease the price of the product, say, #2. Its new price will be $8 instead of $10. And I’ll also rename the product #3 a little bit. A customer then would go to a status tracking page and they’ll see updated data. Yes, they’ve paid and the order is being delivered, but the data has already changed in the database. They’ll be consufed.

And so will be I in a year after I’ll have changed the prices again, renamed something, etc… and decide to view history of the orders for a year.

Question 1: is there a practise of making a snaphot of the current prices, names, total order price and other characteristics of the products that constitute an order, as well as other details (chosen shipping rate and shipping method, taxes, discounts, etc) at the moment when it’s being placed? Rather than calculating those dynamically when an “order page” is opened?

If yes, does it have a name? And are there recommendations of how to do it properly?

Question 2: where and how should I store a snaphot?

option #1:

I’d have to create multiple tables then:

  • frozen_products
  • frozen_discounts
  • frozen_shipping_method_and_rates

    etc…

that will have the same structure as their dynamics corresponding ones.

Laborious. Is there a better way?

option #2:

along with an order, in the “orders” table. But how again, given the fact that an order is a single row? For instance, the products in an order is a list. How would I store a list with its characteristics (price, weight, colour, material, what have you) in a row in such a way that’ll be more or less easy to retrieve in the future? Not as a string.

Go to Source
Author: Dajaku