I'm relatively new to SQL so thank you in advance for any help you can give me. I'm writing a query that needs to produce a table that shows all students' majors, minors, and the dept. I've created multiple columns already: (maj, dept, min, dept, concentration, dept). My problem is that my table is producing multiple rows because I have not included anything in my query that allows for students who have multiple majors. So instead what is happening is that they are showing up in multiple rows. I need every student to occupy only one row. Students can have any combination of maj's, minor's, and concentrations (and multiple for each). I'm dealing with 5,000 students. What do I need to include in my syntax? Thank you.
What you need is
- a "student" table
- a "major" table
- a linking table (also known variously as an "intersection" or "association" or "joining" table). The fancy name for this is "Associative Entity" (if you read nothing else, read this!).
1 row per "student" in the student table - this is critical for good design - otherwise you are going to have to keep track of multiple copies of your student data. This will typically contain their student id, name(s), address, tel: & -
PRIMARY KEY student_id.
1 row per "major" - again, no multiple copies of data. Typically contains, id major name, department, prerequisites.... &c.
PRIMARY KEY - major_id.
The joining table will probably contain only 1 record per student - the brigher/more hardworking ones might have more than 1 record per student and will just simply contain (student_id, major_id) - the
PRIMARY KEY in this case will just simply also be (student_id, major_id). So, a student with 2 majors will have 2 records &c.
You will obviously then have a department table - possibly a joining table for department-major for multi-disciplinary majors? I imagine that most (all) majors will have a primary department?
NEVER have a .csv type arrangement where you have multiple "facts" stored in the same field of a database - there are two reasons for this:
1) it makes SQL programming impossible and
2) it's a breach of Codd's rules (the second one) which states:
Rule 2: The guaranteed access rule:
Each and every data (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
Even if you don't understand them yet, there is a reason why Codd's rules are regarded as fundamental amongst database practicioners.
Start with this - break down the problem, determine your entities (or "objects", or just plain "things"). Have a table for each one. That and joining tables for "things" which are connected in a 1->n or m->n relationship and you'll at least have the basics right. Come back to us if you have more specific requirements after having done that.
I'm sure there are lots of ways to solve this. The first thing that comes to my mind is to create a simple scalar value function that accepts the Student-Id and 'type' of column you are trying to populate and returns a delimited string of values for that student and type of column.
The scalar value function (GetInfo) uses the student id and the 'type' of column (Maj, Dept, Min, etc.) to select and concatenate the results into a varchar string to return.
The main query would simply invoke the scalar function
dbo.GetInfo(STUDENT_ID,'MAJ') AS MAJORS,
dbo.GetInfo(STUDENT_ID,'DEPT') AS DEPARTMENTS