data migration between mongodb and oracle
TRANSCRIPT
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
1
Data Migration Between Oracle And MongoDB
ChihYung(Raymond) Wu 8.17.2015
Tables Of Content
• Introduction -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 3
• Reasons For Migration -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 3
• Differences Between RDBMS and NoSQL -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐5
• Demonstration of Data Migration Between Oracle and MongoDB -‐-‐-‐-‐-‐-‐-‐-‐-‐ 6
1. Data Migration From Oracle To MongoDB -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 6
2. 2. Data Migration From MongoDB To Oracle -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 15
• Conclusion -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 20
• References -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 22
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
3
Introduction There has always been a question regarding to what DBMS is the best in my mind ever since I started learning Database Management. Until I took this course, I realize that there is no saying that which DBMS is absolutely better than another. It is just that different DBMSs are ideal for certain occasions they are designed for. As a result, it is important for us to possess the knowledge of using different DBMSs. However, there comes to a situation sometimes where we might need to move the data to another database from a database that we developed earlier. As a result, knowing the techniques of migrating the data between different DBMSs is getting more important as more and more DBMSs are developed to cater to different needs. As MongoDB rapidly gains popularity in the world of rapid web application development, the popularity of a RDBMS is facing a big challenge. As Figure 1 shows, many famous enterprises today had done certain levels of data migration from a RDBMS, such as MySQL and Oracle, to a NoSQL DBMS, such as MongoDB.
Figure 1. Enterprises That Migrated Date From RDBMS To MongoDB1
Because of the increasing popularity of MongoDB, I therefore chose a topic about the Data Migration between Oracle and MongoDB with the goal of helping developers to be able to take advantages of these two DBMSs so that these two systems can work together to bring the maximum benefit to their organizations. Reasons For Migration We just saw an increasing number of enterprises applying the benefits of a MongoDB in Figure 1. However, in my opinion, that does not mean the RDBMS, such as Oracle is losing its position to MongoDB in the market. Very often the use of a DBMS depends on the operations required for a DBMS and the data that will be stored in a DBMS. As a result, it is important for us to know the reasons for a migration before one is really carried out.
With the fast increasing capacity and much lower costs of different storage devices available in the market, the amount of data an enterprise can store or users demand these days is skyrocketing. As a result, enterprises value more on effective data access rather than on efficient memory access. MongoDB is developed to meet this requirement. Unlike a RDMS that values the efficient data storage, MongoDB provides better performance through several mechanisms. First of all, joins are not used in MongoDB to fetch data in different tables, and what replace joins in MongoDB are embedding and referencing. Either of them helps provide better access speed. Embedding means that documents with relations could be embedded in another document to form an embedded document. The term document in MongoDB basically has a similar meaning with a row in a table in RDBMS. A document can have several properties as a row can have several attributes. As a result, joins are not needed because all the data might be fetched from one embedded document. Since all the data potential needed in a query exists in one embedded document, and a embedded document can be fetched as a whole directly. This immediately improves the performance compared with the situation where data has to be fetched from a huge number of rows. Another mechanism used to replace joins is referencing, for many people who are familiar with an object-‐oriented programming, the concept of referencing is not strange at all. Referencing avoids having to join big tables together and instead just use a referencing pointer to point to the target data. As a result, the amount of overhead saved is significant. Flexibility is another reason that helps improve the performance of MongoDB. Unlike a RDBMS, such as Oracle and MySQL, which requires a rigid implementation for a schema, MongoDB supports a dynamic schema. What this means is that the schema of MongoDB database can be modified on the fly. As a result, the schema of a document might be different from that of the next one. This kind of flexibility avoids the use of the ALTER TABLE command on a huge table in a RDBMS. As you can image, altering the schema of a big table could be potentially fearsome. The last factor for the improvement of performance is MongoDB’s scalability. The basic storage unit in MongoDB is document, and documents can be stored on different storage disks, which can be accessed in parallel. As a result, this kind symmetric processing along with the MongoDB’s storage mechanism greatly enhances its horizontal scalability and performance. Another reason why MongoDB is gaining its popularity quickly is also because the data it supports often not require transaction unlike the oeprations normally carried out on Oracle. Most of the operations on MongoDB requires fast and effective access to data rather than ensuring the consistency of the data. Since speed is the concern here and transaction slows down an operation, MongoDB fits better in this situation. Generally, the following criteria can be used to consider when deciding the use of MongoDB.
• The read/write ratio of operations on database • The types of queries and updates performed on database • The lifecycle of the data and the growth rate of documents.
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
5
As the main benefit of MongoDB is the faster data fetching speed, as a result,
the higher the read/write ratio, the more suitable it is to use MongoDB. Because repeating group is allowed in MongoDB, data anomaly could happen if there is a big number of write operations. In addition, the amount of data that needs to be accessed along with the two criteria just mentioned can also be used to decide if MongoDB is ideal. Differences Between RDBMS and NoSQL Before we jump into the demonstration for the process of migration between Oracle and MongoDB, it might be better for us to understand terminological differences between a RDBMS, such as Oracle, and MongoDB, so that we can have a better understanding as we go along with the project. Figure 2 shows the terminological comparison between a RDBMS and MongoDB.
Figure 2. Terminological Differences Between RDBMS and MongoDB1
As you can see, table in a RDBMS is referred to as a collection in MongoDB,
and we had previously mentioned the relation of a row and a document. In MongoDB, there is also the use of indexes to improve the performance. Several indexes supported by MongoDB are compound indexes, unique indexes, array indexes, TTL(Time-‐To-‐Live) Indexes, geospatial indexes, sparse indexes, hash indexes, and text search indexes. Last, joins in RDBMS are referred to as embedding and referencing as we previously mentioned.
With that being said, now we had got enough knowledge about what benefits MongoDB can bring. Next we will jump into the demonstration of data migration between Oracle and MongoDB to really get an idea of how MongoDB works and how these two systems can talk to each other.
Demonstration of Data Migration Between Oracle and MongoDB 1. Data Migration From Oracle To MongoDB In order to pain a clear and familiar pictures for our classmates to undertand my example of data migration, I figured it might be a good choice for me to use the Netflix schema that we used for the second homework assignment. As you can see Figure 3 shows the schema for the database on Oracle for this project.
However, I made several modifications on this schema. First of all, I removed other tables related to rental queues and members. In addition, I added one more table named COMMENTS for this project and populated the table with the data showed in Figure 4. The reason I added this COMMENTS table is that I wanted to take advantage of this project to demonstrate a migration when today we have to deal with a RDBMS table with a repeating group, which is perfectly supported in MongoDB as embedded documents.
When one is working on the data migration between Oracle and MongoDB, mapping the schema of a RDBMS directly onto that of MongoDB might not be the best idea since that kind of direct mapping will not fully take advantage of benefits provided by MongoDB sometimes.
Figure 3. Schema For RDBMS
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
7
Figure 4. Data in COMMENTS Table
Several things that should be kept in mind when redesigning the schema for MongoDB database, and they are: 1. Data with a 1:1 or 1:Many relationship are natural candidates for embedding
within a single document. 2. The concept of data ownership and containment can be modeled with
embedding. 3. Exceptions happen when:
i. A document is frequently read, but contains an embedded document that is rarely accessed.
ii. One part of a document is frequently updated and constantly growing in size, while the remainder of the document is relatively static.
4. For many-‐to-‐many relations and exceptions mentioned above, referencing should normally be used.
With the principle mentioned above, I redesigned the schema for the MongoDB database. As you can see, the GENRE table, and RATING table have a 1:Many relationship and ownership relationship with the DVD table, as a result, I embed these two tables inside the DVD table. Since the DVD table and the COMMENTS table also have an ownership relationship. As a result, I also embed the COMMENT table inside the DVD table and obtain the schema in Figure 5.
Figure 5. Schema For MongoDB database
In Figure 6, I created a new table containing all the data in the new DVD table in Figure 5 in order to give an idea about what will be the data representation in the schema of our corresponding MongoDB database. As you can see now new attributes such as GENREID, GENRENAME, RATINGID, RATINGNAME, RATINGDESCRIPTION, COMMENTS are now included in the new DVD table. What is notable here is that I used to abstract data type to create a repeating group to store all the comments for each single dvd in the DVD table.
Figure 6. Data in the New DVD Table
Once we know what data will be required for the implementation of the new schema. We need to decide now how to migrate the data from Oracle to MongoDB. Because MongoDB uses syntax of JavaScript for it data queries and the documents in MongoDB are represented in a way very similar with the JSON (JavaScript Object Notation) format. As a result, it will be much efficient to format the data in JSON format when migrating the data. Figure 7 shows the
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
9
JSON format for one document in MongoDB and Figure 8 shows the JSON format for an array of documents in MongoDB. Our goal next is to convert the data in the tables into the JSON format.
Figure 7. Representation Of A Document in MongoDB
Figure 8. Representation Of An Array Of Documents in MongoDB2
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
11
Figure 9. An Anonymous Block For Creating JSON For Data In DVD Table
In Figure 9, it shows the PL/SQL code in Oracle for the creation of a JSON file for the target data in the database on Oracle. Here I used to SPOOL command in this PL/SQL code to create a text file named DVD_NOSQL.txt at line 1 above. Figure 10 shows the content in the file DVD_NOSQL.txt, and then Figure 11 shows the content of DVD_NOSQL.json converted from the DVD_NOSQL.txt by removing the PL/SQL part of the file and the change of the file extension.
Figure 10. Content Of DVD_NOSQL.txt
Figure 11. DVD_NOSQL.json Converted From DVD_NOSQL.txt
MongoDB in fact has some support for importing JSON file or CSV file directly onto itself. For example, one can use the command mongoimport in his or her terminal on Mac or Command Prompt on PC to import the files directly onto MongoDB as showed in Figure 12. However, one thing to note is that when importing files directly into MongoDB, MongoDB is very sensitive with any newline characters which are not expected and could cause compilation error. As a result, the content of the file DVD_NOSQL.json shows its content without any new-‐line character.
Figure 12. mongoimport command
Despite the fact that the mongoimport command is a convenient tool for the data migration, it in fact is not user friendly when it comes to processing the data in the JSON file before importing into MongoDB. For instance, the type of the date value in DVD_NOSQL.json that we got from our Oracle database is the String type. If we use mongoimport to import data directly import MongoDB. The properties or attributes that are supposed to have Date types will not be able to enjoy certain operations on Date property, which is often considered very important. As a result, we need a middle layer sitting between MongoDB and the JSON file to do the processing for us. Because I used Node.js and Express.js, which is a full-‐stack web development framework that can provide an interface and is supported smoothly by MongoDB, I decided to use some modules or packages supported on Node.js to do the data processing work for me. That is why I decided to use Mongoose. Mongoose is a module fundamentally based on MongoDB rather than replacing it and is intended to be that middle layer for data processing for MongoDB. Several features that can be provided by Mongoose include data type definition, virtual property setup, data validity check, and default values setting, etc. Next, I installed Express.js on my computer and run the command express DVD_NOSQL. As a result, now there is a folder named DVD_NOSQL on my computer, and I move the DVD_NOSQL.json file into the folder for Mongoose to import the data from. Figure 13 shows the default file structure inside the folder DVD_NOSQL. The file app.js in the folder DVD_NOSQL is where all the processing will be happening. Opening the file, as you can see I made some modification to include
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
13
the module named Mongoose as in Figure 14. At the line 69, I specify the connection to a database named DVD_NOSQL2 on my localhost listening on the port 27017. What this line of code does is to connect to the database named DVD_NOSQL2 if it exists, or create a database if one does not. From line 74 to line 88, the code specifies the schema for a new collection (table) named DVD on my MongoDB. In Figure 15, which shows a part of the app.js file, here I used a for loop to import the data in the JSON file into MongoDB. Once the for loop is executed, and then my new collection named DVD will have all the documents with values from the JSON file. However, what is worth noting is that I use the constructor of Date type in JavaScript, which supports Both Express.js and Mongoose. As a result, now the property of Date type can be used with Date operations on MongoDB. This is an example of the processing work Mongoose can do for MongoDB.
Figure 13. File Structure Of Folder DVD_NOSQL
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
15
Figure 15. Processing Data For MongoDB
When running the command npm start in the terminal or command prompt,
the Express.js file along with the Mongoose Module will be executed automatically. As a result, I had the data on MongoDB as showed in Figure 16.
2. Data Migration From MongoDB To Oracle In the previous session, we showed how to migrate the data from Oracle to MongoDB. As a result, now our data originally existing on the DVD table of a Oracle Database has a new schema and is copied onto the MongoDB. Next, we will taking a look at how to do the reverse engineering by migrating the data on MondoDB back onto Oracle. In the work of data migration, we should always start by seeing if there is a possible need for the schema redesign for the target DBMS. We did the schema redesign when migrating the data from Oracle to MongoDB using different techniques, such as embedding and referencing. Now when we migrate data from MongoDB to Oracle, we need to redesign the schema using the RDBMS’s principle of normalization. As a result, the schema for our database on Oracle will like what is shown in Figure 17, which is as shown in Figure 3.
Figure 16. Data On MongoDB
Figure 17. Schema For RDBMS
Because the need for the rigid implementation of normalization for a typical RDBMS database, which normally has a fixed schema rather than a dynamic schema. As a result, I decided to use the CSV file format to do the migration
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
17
from MongoDB to Oracle this time based on the reason a CSV file format can capture the rigidity of normalization for a RDBMS database. As a result, my next goal will be to explore a functionality of generating a CSV file on Mongoose. In my research, I found a package or module named Mongoose-‐to-‐csv, which is designed to generate a CSV file on Mongoose. Thus, I downloaded the module and imported it into the file app.js as shown in Figure 18 so that I can apply its functionalities.
Figure 18. Import of Mongoose-‐to-‐csv Module
Several functions are provided by the module, which can be futher studied at the link: https://www.npmjs.com/package/mongoose-‐to-‐csv. As shown in Figure 19, I applied the functionalities provided by the module to the schema of the collection DVD on MongoDB. As a result, now the module’s functionalities can be applied to the schema of my DVD collection. As shown in Figure 19, the code between line 92 and line 105 specifies the headers of the values in my intended CSV file. In order to reconstruct a normalized table, I removed all the embedded documents and include their primary keys as the foreign keys in the table. As you can see, the headers in my intended CSV file will be exactly the same as the original DVD schema on Oracle. Once the structure of the CSV file is set, the next thing will be to set the destination for the file. As shown in Figure 20, the code between line 144 and line 147 fetch all the data in the DVD document and store it in a file named DVD_NOSQL.csv file. Once this line of code is executed. We now have a CSV file existing in our folder shown in Figure 21.
Figure 19. Defining the Structure of the CSV File
Figure 20. Creation of the DVD_NOSQL.csv File
Figure 21. DVD_NOSQL.csv File in Folder DVD_NOSQL
If we open the file DVD_NOSQL file, now we can see we have a normalized table with the fields same with the original DVD table on our original Netflix Schema. The content of the CSV file is shown in Figure 22.
Figure 22. Content of the DVD_NOSQL.csv
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
19
One interesting thing to note is that certain values that were specified as the value of 0 on MongoDB are not having their values shown in the CSV file. In addition, the values for the properties of Date type that were processed previously using Mongoose are showing their values in a format that is not supported by Oracle. As a result, certain issues, such as the way of handling empty fields and the conversion of values of Date type, need to be addressed before we can successfully import the data into Oracle. As a result, in this case I used the functions of Format Cells provided by Excel to modify the format for the columns containing the values of Date type and assumed the empty fields as NULL rather than 0. Once these two issues are handled in this way. The next thing we will need to do is to create a new DVD table for the import of data in the CSV file. Figure 23 shows the schema for our new DVD table named DVD2.
Figure 23. The Schema of A New DVD Table Named DVD2
Then I utilized the Oracle’s built-‐in support for importing a CSV file by right-‐clicking the table item named DVD2 on the left bar on the screen as shown in Figure24 and selected the default setting for the import as shown in Figure 25.
Figure 24. Oracle’s Built-‐In Support For Importing A CSV File
Figure 25. Oracle’s Process For Creating A Table Using CSV Files
After clicking the finish button, the DVD2 table in Oracle is now populated with the data in our CSV file as shown in Figure 26 with the empty fields in the CSV file having the value of NULL. If that is against your business policy, one of the ways to solve the issue is to use an UPDATE query to set all the fields with value of NULL to 0. Here we are showing only the data migration for one table in order to paint a clear picture. However, the process of a data migration for an entire database could be done using the same principles. Thus, if we need to also recreate other tables, we could use the same process and modify their referential integrity using the ALTER TABLE query to implement the relations between each other after all the required tables are created.
Figure 26. Content of The DVD2 Table On Oracle
Conclusion As I was working on this project, several findings had been made. And they are:
• The migration seems more straightforward if JSON format is used when migrating from Oracle to MongoDB because of MongoDB’s ability to accommodate the repeating groups.
1. http://s3.amazonaws.com/info-‐mongodb-‐com/RDBMStoMongoDBMigration.pdf 2. http://zaiste.net/2012/08/importing_json_into_mongodb/
21
• The migration seems more straightforward if CSV format is used when migrating from MongoDB to Oracle because of the requirement for the rigidity implementation of normalization on a RDBMS database.
• The default value settings are especially more important when doing the migration since different DBMSs might have different values to deal with unspecified values, such as NULL.
• Last, I found the amount of work for data migration is heavily dependent on the schema redesign. As a well-‐planned and well-‐organized redesigned schema can provide a clear picture about every single migration process and the potential tools needed to avoid redundant work, a poorly planned redesigned schema can otherwise cause unnecessary waste of enterprise’s time and other valuable resources.
References
1. mongoimport, mongoDB,Inc, http://docs.mongodb.org/manual/reference/program/mongoimport/, August 11, 2015 2. Write Concern, mongoDB,Inc http://docs.mongodb.org/manual/core/write-concern/, August 11, 2015
3. Perform Two Phase Commits, mongoDB,Inc http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/, August 11, 2015
4. SQL to Aggregation Mapping Chart, mongoDB,Inc http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/, August 12, 2015
5. SQL to MongoDB Mapping Chart, mongoDB,Inc http://docs.mongodb.org/manual/reference/sql-comparison/, August 12., 2015
6. Operators, mongoDB.Inc http://docs.mongodb.org/manual/reference/operator/, August 12, 2015
7. Importing JSON into MongoDB, zaiste.net http://zaiste.net/2012/08/importing_json_into_mongodb/, August 12, 2015 8. Export mongoose querys as csv streams, npm.Inc . https://www.npmjs.com/package/mongoose-to-csv, August 13, 2015 9. File Stream in Node.js, Node.js Foundation https://nodejs.org/api/fs.html, August 13,2015 10. Stream in Node.js, Node.js Foundation https://nodejs.org/api/stream.html, August 13, 2015 11. REF Cursor to JSON, Morten Braten http://ora-00001.blogspot.com/2010/02/ref-cursor-to-json.html, February 11,2010, August 13, 2015 12. JSON in Oracle Database, Oracle. Inc https://docs.oracle.com/database/121/ADXDB/json.htm, August 13,2015 13. How to Import from Excel to Oracle with SQL Developer , Jeff Smith
http://www.thatjeffsmith.com/archive/2012/04/how-to-import-from-excel-to-oracle-with-sql-developer/, April 11,2012, August 13, 2015 14. Format a date the way you want, Microsoft. Inc, https://support.office.com/en-us/article/Format-a-date-the-way-you-want-8e10019e-d5d8-47a1-ba95-db95123d273e, August 14, 2015 15. mongoose documents, mongoose.com, http://mongoosejs.com/docs/guide.html, August 13,2015
16. Web Development with MongoDB and Node.js , Chapter 7, Jason Krol, Published by Packt Publishing Ltd. , September 2014, August 13,2015
17. RDBMS to MongoDB Migration Guide – Considerations and Best Practices, August 2015 http://s3.amazonaws.com/info-mongodb-com/RDBMStoMongoDBMigration.pdf, August 12,2015