Refactoring Databases: Evolutionary Database Design

Category: Programming
Author: Scott W. Ambler, Pramod J. Sadalage
All Stack Overflow 26
This Month Stack Overflow 2


by anonymous   2019-07-21

I'm going to add one thing. It is not enough to write an alter table statement to change a table structure. If you are changing a table structure, you had better be sure before you run it that you know exactly what other views,functions, tables, stored procs, triggers, SSIS(DTS) packages (for SQL Server) and dynamic code from the applications will be affected by the change. If you are not completely sure what other objects may be impacted, you are not ready to alter the table. I've seen way too many business critical functions break because someone hapahazardly changed a table structure without considering what else used that structure. If you are considering making database structural changes, I suggest you read up on database refactoring before you do so.

Here isa good book to start with:

by anonymous   2019-07-21

First, stop trying to think of databases in object oriented terms. Principles of object oriented programming simply do NOT apply to relational databases.

Shared databases are a very good thing from a business perspective. Multiple databases storing information that has to be transferred between them quickly becomes way more complex than your piddly many hundreds of objects. Data that is consistent between enterprise applications is priceless. Trying to reconcile if GE Corp and General Electric Corporation are really the same entity between two databases can be a nightmare.

Refactoring datbases is a nice goal, but it is very complex in reality. Don't do it unless you have a major performance issue that needs to be addressed or unless you are willing to commit to a process of identifying all the code that might be affected by a change. Even then, consider if you can know all the code that might change (this is one reason why database people hate, hate, hate dynamic code!).

Often the best way to refactor is to add your change and start changing over to using your new field, sp etc while leaving the old one in place until a set expiration date. Since you are on an annual cycle, you will need to manage those dates over a long period of time. To see if sps are being used, you can identify the ones you aren't sure of and add some code to them to insert to a table everytime they are run. If after your whole year cycle, they haven't been run, you can safely eliminate them. The cycle may be shorter depending on the sp.

If I'm writing something that will only be run annually, I would normally put the word annual in the sp name. But that may not be true where you are, however, the function of the sp should give you an idea if it is something that should only be run periodically. I wouldn't expect usp_send email proc to only run once a year but I might expect that a usp_attendance_report might not be run often. Of course as I said, I would have named it something more like usp_annual_attendance_report and you can consider doing that sort of thing moving forward.

But be aware that any refactoring you do will have to take place on a long cycle to ensure that you don't delete something you need. If your code is in a source control system (and all database tables, sp, views, UDFs, triggers, etc should be), you can probably eliminate some things knowing that if they fail you can pretty instantly put them back. Again, I'd examine the object to determine the possible risk eliminating them would have.

Of course if you have good automated tests in place, eliminating something on dev and running the tests can help you find out if something is still being referenced.

If you are looking for an easy way to refactor, I don't know of one. Refactoring databses is a time-consuming, risky activity and one which may not show enough improvement for the powers that be to be willing to pay for it.

A good book on refactoring databases is:

by anonymous   2019-07-21

In general it is much harder to fix a poor database design that is causing performance issues after going live becasue you have to deal with the existing records. Even worse, the poor design may not become apparent until months after going live when there are many records instead of a few. This is why databses should be designed with performance in mind (no this is not premature optimization, there are known techniques which generally perform better than other techniques and they shoulod be considered inthe design) and databases should be tested against a test set of records that is close to or more than the expected level of records you would have after a couple of years.

As to how long it will take to completely fix a badly designed database, months or years. Often the worst part is something that is central to the design (like say an EAV table) and which will require almost every query/sp/view. UDF to be adjusted to move to a better structure. You then have to ensure all records are moved to the new better structure. The sooner you can fix a mistake like this the better. Far better to move a couple of thousand records to a new structure than 100,000,000.

If your structure is ok but your queries are bad, you are better off as you can take the top ten worst performing (Choose based not just on total time to run but time X no of times run) and fix, rinse and repeat.

If you are in the midst of fixing a poor database, this book might come in handy:

by anonymous   2019-07-21

Tackling just the database part of this, here is a book that wil lhelp you out:

You have to set thing up properly to refactor a database and this book is invaluable.

There are also tools that can help you find the most poorly performing queries, start with those. If you can get a big win in performance on something that is annoying everyone, then it will give you more leverage to keep on fixing the other problems.

You might also look at a book on performance tuning specific to the datbase backend you have. There are a lot of know perfornace issues that relate to both database design and query design, knowing how to fix them can help you out immersurably as you refactor this mess.

I know it's tempting to just throw it out and start over new, but you will be introducing major new bugs that way, taking a huge amount of time while the users see no improvement and possibly missing some very important business rules that must be enforced. While incremetally changing and refactoring appears to be harder in a case this bad, it really is the better choice.

You might talk to the users and find out what they perceive as the worst problems the system has, that's one place to start after all making the users happier is part of what you are there for.

Make sure that you document the performance improvements and other changes for your resume. Think how much better it will look to your next potential employer when you can give actual performance improvement figures. Actaull accpmplichments attached to figures that show how much you accomplish are rare on resumes, this job can really make you stand out in the future as someone who gets things done.

by anonymous   2019-07-21

I would only consider redoing the schema if you have a a database professional to help you with the design. Application programmers in general do a poor job of designing perfomant databases that have all the checks and balances a business critical system needs.

Changing the schema and successfully moving existing data is a much harder job than you may think. This will be a large effort taking months of full-time work and it's risky. The larger and more complex the existing database is the harder the redesign.

One thing I would consider is moving the old data to a data warehouse and then designing the new system for data going forward. It would then send data to the data warehouse periodically too for people to be able to query historical and current records. That way your new system can have constraints that maybe the old data didn't have and you won't have to try to figure out what values to put in the required field for old data which did not have a value.

If you are considering this, you may also want to read up on refactoring databases. Here is an excellent book onthe subject:

Also do not consider doing this without readding in depth about performance tuning the database you plan to use as your backend. THere is no point in redesiging if you don't develop something that will perform well and scale up. Forget that trash about premature optimization - databases need to be designed from the start with performance as well as dat aintegrity and security in mind. There are a lot of well-known techniques to create better performance that should be considered in any redesign.

by Gilligan   2017-08-20

This book (Refactoring Databases) has been a God-send to me when dealing with legacy database schemas, including when I had to deal with almost the exact same issue for our inventory database.

Also, having a system in place to track changes to the database schema (like a series of alter scripts that is stored int he source control repository) helps immensely in figuring out code-to-database dependencies.

by anonymous   2017-08-20

What you describe is something that Oracle should be capable of handling very easily if you have the right equipment and database design. It should scale well if you get someone on your team who is a specialist in performance tuning large applications.

Redoing the database from scratch would cost a fortune and would introduce new bugs and the potential for loss of critical information is huge. It almost never is a better idea to rewrite the database at this point. Usually those kinds of projects fail miserably after costing the company thousands or even millions of dollars. Your architects made the right choice. Learn to accept that what you want isn't always the best way. The data is far more important to the company than the app. There are many reasons why people have learned not to try to redesign the database from scratch.

Now there are ways to improve database performance. First thing I would consider with a database this size is partioning the data. I would also consider archiving old data to a data warehouse and doing most reporting from that. Other things to consider would be improving your servers to higher performing models, profiling to find slowest running queries and individually fixing them, looking at indexing, updating statistics and indexes (not sure if this is what you do on Oracle, I'm a SLQ Server gal but your dbas would know). There are some good books on refactoring old legacy databases. The one below is not datbase specific. There are also some good books on performance tuning (look for ones specific to Oracle, what works for SQL Server or mySQL is not what is best for Oracle) Personally I would get those and read them from cover to cover before designing a plan for how you are going to fix the poor performance. I would also include the DBAs in all your planning, they know things that you do not about the database and why some things are designed the way they are.

by anonymous   2017-08-20

Basic process:

Make sure you have a current backup and do not attempt to do this on prod without a through test on develpment first. This is so extensive a change, you might want to restore prod to a new dev instance because it will be time-consuming and tricky and other development will be interfered with while you do this.

You add the identity column to the parent table called SysNameID (see @marc_s' answer for details)

You add an int column to each child table also called SysNameId. It is not an autoincrementing column and must allow nulls.

You Update this column by using the current sysname columns to find the id assciated with that sysname.

Once all the columns are populated you set the column to not allow nulls and create the foreign key to the parent table. The surrogate keys should not change, so you don;t really need to cascade update.

Finally you drop the sysname column from the child tables and adjust all the code that uses it to join to the parent table and look it up. Alternatively, you rename each child table and create a view that joins the child table to the parent and gets the sysname column from there. That should ensure existing code doesn't break.

There is no simple way to meet your requirement. You are changing the very fundamentals of how your database works. It has the potential to affect virtually every query against the child tables. It may affect reports (which are likely sorted by sysname which is now no longer unique). This is a major change and to do it properly could take months.

Suggested reading: