using xml files as real corpora making an xml database with the dbxml program
Post on 19-Dec-2015
229 views
TRANSCRIPT
![Page 1: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/1.jpg)
Using XML files as real corpora
making an XML database with the dbXML program
http://www.dbxml.com
![Page 2: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/2.jpg)
The dbXML program
• The dbXML program is one of a range of programs that lets you use a set of XML files as a database.
• The program is free and can be downloaded from the web.
• It is likely that many more programs like this will be springing up over the next couple of years.
![Page 3: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/3.jpg)
Basic concepts
• Using a database requires the following basic concepts
– the set of files you are looking at is called a collection
– a collection of files must be indexed so that the program can find things quickly
– you ask questions by posting queries to the database manager
![Page 4: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/4.jpg)
Using the dbXML program to manage an XML database
• Our starting point assumes that we have some set of marked-up XML files that we want to manage.
• We first set up these files as a database
• We then use the dbXML tool for extracting information from this database.
![Page 5: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/5.jpg)
Example XML files in our data set
![Page 6: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/6.jpg)
Steps…
• Now we will see:– how to add a collection of files to a database– how to index those files– how to ask queries to get information about
the content of those files
![Page 7: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/7.jpg)
Getting started… (1)
• First, we need to start up the DBXML server program
This is the program the does all the actual work.
To do this:– Make sure you know where the dbxml folder is
– Run the program startup-server.bat in that folder (e.g., by double clicking on it).
– This should start the dbxml server with a message like:
dbXML 2.0 (Dragonfly)Logging to E:\junk\logging\dbXML.out
![Page 8: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/8.jpg)
Getting started…(2)• Next, we turn a set of XML files into an XML
database. To do this we must start the dbxml administration program and tell it which files to use.– Start a DOS-Command window
– Make sure you know where the dbxml folder is
– Run the command ‘startup-command-line.bat’ that is in the dbxml folder
– This should then start the dbxml program and you should get something that looks like the window on the next slide…
![Page 9: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/9.jpg)
The program when it starts…
![Page 10: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/10.jpg)
The DBXML administration actions
• Now you can tell the program which files you want to include in your database.– To do this, you first have to login to the program:
You must use exactly this name and password for the moment!
– make a collection
– Finally, go to the collection and say that everyone is allowed to look at it and exit:
connect user= scott pass= tiger
mkcol myXMLfiles
col myXMLfilesgrant admin READ WRITE EXECUTE CREATEexit
![Page 11: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/11.jpg)
The dbXML program proper
• With the administrative details aside, we can start the main program.
• Find the dbxml item in the normal program start menu from Windows and click on it.
• This should bring up the following window:
If it does not, or if you cannot find it, you will have to ask for help.
![Page 12: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/12.jpg)
Finding your collection
Expand the items in the list under “localhost” until you find the collection that you made in the previous step.
![Page 13: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/13.jpg)
Finding your collection
![Page 14: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/14.jpg)
Adding files to your collection
Expand your collection to find the ‘documents’
Click on this.
Select ‘Documents>Import Documents’ from the menu bar.
You will then be asked which files are to be added to the collection.
Previous slide
![Page 15: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/15.jpg)
When you have added your documents…
select them all at one go if possible
… you then have to index them…
![Page 16: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/16.jpg)
Select the indexes folder in your collection…
![Page 17: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/17.jpg)
Define an index as follows…
1. Give the index a name2. Then you must type “pattern=*@*” to index all
ELEMENTS + ATTRIBUTES3. and click on create.
1
2
3
![Page 18: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/18.jpg)
… you can now ask questions about
their content
• using XPath
• XSLT
• full text
QUERY WINDOW
RESULT WINDOW
![Page 19: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/19.jpg)
Selecting all ‘turns’ in the corpus
![Page 20: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/20.jpg)
Selecting all ‘attrib’ in the corpus
![Page 21: Using XML files as real corpora making an XML database with the dbXML program](https://reader035.vdocument.in/reader035/viewer/2022062304/56649d2b5503460f94a00863/html5/thumbnails/21.jpg)
The results….• are presented as
XML• therefore you can
pass them straight to a style sheet to look at them…