However, I'm finding that some books have mulitple authors, and I'm
wondering if there is a way to add more than one id to the author id
field. Or do I have to have many author id fields? Any way around that?
Thanks,
Bill
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Anith Sen wrote:
> GMann@.dublin.com asked: I'm confused about the practical aspects of declaring a primary key on
> one of the "intersection" tables. (BookAuthors, in this case.) Are there
> any practical benefits or is it strictly a philosophical thing? <<
I'd say that there *is* no benefit of putting a primary key on *one column* of
an intersection table. The semantics would enforce that an Author could only
write one book, or that a book could have no more than one author,
depending on which column you chose as a primary key. For some other
application it might be correct, but then you would only need a parent-child
relationship, and an intersection table wouldn't be necessary.
If you were to define a multi-column primary key, it would prevent mistaken
duplicate entries. If you defined the primary key as on (author, book) it would
also help queries that wanted all books for a given author. If your application did
queries the other way, to find all the authors of a given book, that index would
not help, so I would add a second index (not unique) on book.
> Declaring a column/set of columns in a table as primary key is definitely
> beneficial. In fact, it is quite suicidal not to have one.
Except as it enforces unrealistic semantics and destroys the purpose of an
intersection table (which is to model a many-to-many relationship, as described above...
> >> How does it affect performance? Does it introduce overhead, or does it
> speed performance do to indexing?
Generally, an appropriate index will help query speed, and will potentially hurt
updates and inserts because the index needs to be maintained to reflect data
changes. However, to the extent that row changes do not involve indexed
columns, the index helps to find the row, and needs no mainenance.
> Strictly speaking, having a primary in a table has nothing to do with
> performance. Keys are logical concepts and have no performance connotations.
But realistically speaking, they do, because they are implemented via a unique
index, that definitely provides fast query lookup as well as enforcing unique
values.
> But, note that SQL Server, by default, implements an index on keys. Indexes
> are physical artifacts and have performance implications. However, there is
> no proven and consistent statistics which shows that having a key in a table
> adversely or favorably affect the performance.
You say "by default". Is there a way to have a primary key *without* an index?
I don't think so. Otherwise the DBMS would have to do a table scan to verify an
insert was OK. Can you show me one DBMS that can enforce primary keys
and doesn't create a unique index to do it?
Joe Weinstein at BEA|||>> Except as it enforces unrealistic semantics and destroys the purpose of
an intersection table <<
The DDL in my post has a two-column primary key on the table BookAuthors.
What 'unrealistic' semantics did it enforce? The decomposition I had removes
the transitive dependencies of books on authors and gives a clean 3NF
projection. In what way did it destroy the 'purpose' of that table?
>> ...that definitely provides fast query lookup as well as enforcing unique
values. <<
Not as a rule. Just because you have an index does not mean that the
optimizer uses it all the time. The heuristics used by the optimizer is not
all documented and the empirical algorithms which generate the execution
plan depend on several factors, not solely on an index on the primary key.
>> You say "by default". Is there a way to have a primary key ... <<
Actually it was my mistake, I meant, SQL Server, by default, implements a
clustered index on keys.
However, the point was that the key vs. index concept is simply a logical
vs. physical concept. At the logical level, the user should not be even
concerned about what index will be implemented by the DBMS in the physical
level. Depending on the extensions provided by the DBMS, you can tune your
queries with required indexes, hints, force orders etc. Performance
considerations should not corrupt the key selection decisions. In a
well-crafted RDBMS, key declaration is a logical modeling aspect which
should not be overlapped with indexing, which is a physical implementation
matter. Keys are for logical integrity and indexes are for physical
performance. However, the fact of the matter being, no DBMS provides or
assumes a clean separation of logical and physical levels, it may be
worthwhile not to confuse an index with a key and/or vice-versa.
--
- Anith
( Please reply to newsgroups only )|||
Anith Sen wrote:
> >> Except as it enforces unrealistic semantics and destroys the purpose of
> an intersection table <<
> The DDL in my post has a two-column primary key on the table BookAuthors.
> What 'unrealistic' semantics did it enforce?
None! I was describing the unrealism/destruction caused by defining a primary key
on *one* column of an interscection table.
> The decomposition I had removes
> the transitive dependencies of books on authors and gives a clean 3NF
> projection. In what way did it destroy the 'purpose' of that table?
Again none. Sorry if I was unclear. I said that a one-column primary key would
destroy the purpose of an intersection table.
> >> ...that definitely provides fast query lookup as well as enforcing unique
> values. <<
> Not as a rule. Just because you have an index does not mean that the
> optimizer uses it all the time. The heuristics used by the optimizer is not
> all documented and the empirical algorithms which generate the execution
> plan depend on several factors, not solely on an index on the primary key.
Here I'd disagree with you. As a rule, the unique index does help with queries
that suit a unique index, such as one-row searches and most small-range
queries on the key. If/when the query is unsuitable for a unique index, or
when the undocumented query engine is broken, an index may/will not be
used, but you could say the same for the DBMS itself. What is an "empirical
algorithm"?
> >> You say "by default". Is there a way to have a primary key ... <<
> Actually it was my mistake, I meant, SQL Server, by default, implements a
> clustered index on keys.
Ok, I think I understand. That means that the 'default DBMS' for this
discussion is MS SQLServer. That would imply that you have at least
one alternative DBMS to the default, that does not implement a primary
key constraint with a unique index. Can you name it?
> At the logical level, the user should not be even
> concerned about what index will be implemented by the DBMS in the physical
> level.
I agree.
> Depending on the extensions provided by the DBMS, you can tune your
> queries with required indexes, hints, force orders etc. Performance
> considerations should not corrupt the key selection decisions.
Ok, but they may certainly conflict if you want one sort of index for
performance, but it conflicts with the index the DBMS will make if you
define a primary key as such. I'm still jumping up and down on my limb
of claiming that all DBMSes create a unique index if you specifiy a
primary key, so in the real world you need to know about both, soon
in the design process.
> In a
> well-crafted RDBMS, key declaration is a logical modeling aspect which
> should not be overlapped with indexing, which is a physical implementation
> matter.
I would happily take the fall from that limb I'm jumping on, if you can name
one well-crafted RDBMS as you define it.
> Keys are for logical integrity and indexes are for physical
> performance.
That's true, except for the fact that all DBMSes use indexes to implement
logical consistency constraints (at least unique keys).
> However, the fact of the matter being, no DBMS provides or
> assumes a clean separation of logical and physical levels, it may be
> worthwhile not to confuse an index with a key and/or vice-versa.
Sure. They aren't the same thing. A key need not be enforced by anything
except application code, and an index can be on any column or columns
in any order, independent of operational value. I'm just thinking that it would
be confusing not to realize the ramifications of defining a primary key in
SQL: a unique index. (This is only true for MS, Sybase, Informix, DB2,
Oracle, Ingres, Postgres, Pointbase, Cloudscape, times10, mySQL,
Tandem, so we should definitely document the others that don't)
Joe
> --
> - Anith
> ( Please reply to newsgroups only )|||>> I was describing the unrealism/destruction caused by defining a primary
key on *one* column of an interscection table. <<
I understand now. In that case you wouldn't even need an intersection table
for a 1-to-many relationship. All you need is an author_id column in your
Books table as a foreign key to the Author table.
>> As a rule, the unique index does help with queries that suit a unique
index, such as one-row searches and most small-range queries on the key. <<
Not necessarily all the time, an example in SQL Server is to use a
cross-join with a derived table of numbers with a multi-table join where the
key column of the table is non-clustered. For examples search the archives
of msnews.public.programming. There are instances when the optimizer decides
to scan the table, even when there is an explicit index on the sarg. The
reason for this is the cost-efficiency of scanning a table (depending on
many factors including the magnitude of data) in certain cases may
outperform the usage of an index. Another common phenomenon is the optimizer
disregarding the index due to recompilations, outdated statistics etc.
And FYI http://dictionary.reference.com/search?q=empirical
>> That means that the 'default DBMS' for this discussion is MS SQLServer.
That would imply that you have at least one alternative DBMS to the default,
that does not implement a primary key constraint with a unique index. <<
No, I think you misinterpreted; let me clarify. I meant when you declare a
key in a SQL Server table, the created index by default is a clustered
index. For example,
1)
CREATE TABLE tbl (col1 INT NOT NULL PRIMARY KEY)
2)
CREATE TABLE tbl (col1 INT NOT NULL)
GO
ALTER TABLE tbl ADD CONSTRAINT PK_tbl PRIMARY KEY (col1)
Alternatively, you can override this default making it a non-clustered index
by explicitly using ALTER TABLE statement. I had no other DBMSs in my mind
while stating it.
--
- Anith
( Please reply to newsgroups only )|||Guinness Mann (GMann@.dublin.com) writes:
> I'm confused about the practical aspects of declaring a primary key on
> one of the "intersection" tables. (BookAuthors, in this case.) Are
> there any practical benefits or is it strictly a philosophical thing?
Yes, there are practical benefits. What's the use with having multiple
entries that Isaac Asimov wrote "Foundation"?
> How does it affect performance? Does it introduce overhead, or does it
> speed performance do to indexing?
There is always a trade-off with indexes. Indexes can speed up queries
to access the data enormously, but there is a little price for maintaing
the index at insert, updates and deletes.
For tabels like Bookauthors, I often find myself defining two indexes.
One (Book, Author) and one (Author, Book), as access often are in
both directions.
--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
No comments:
Post a Comment