files are used for long-term retention of large amounts of data, even after the program that created...
TRANSCRIPT
Chapter 17Files and Streams, LINQ*
Data Hierarchy Files are used for long-term retention of large amounts of data, even after the
program that created the data terminates. persistent data.
The smallest data item that computers support is called a bit short for “binary digit”—a digit that can assume one of two values
Digits, letters and special symbols are referred to as characters Bytes are composed of 8 bits.
C# uses the Unicode® character set (www.unicode.org) in which characters are composed of 2 bytes.
Just as characters are composed of bits, fields are composed of characters. A field is a group of characters that conveys meaning. Typically, a record is composed of several related fields. A file is a group of related records.
To facilitate the retrieval of specific records from a file, at least one field in each record is chosen as a record key, which uniquely identifies a record.
A common file organization is called a sequential file, in which records typically are stored in order by a record-key field.
Files
A file can be seen as 1. A stream of bytes (no structure), or2. A collection of records with fields
Random access
To access a specific record without having to retrieve all records before it
File structures allowing this: ◦ indexed files ◦ hashed files
To access a record in a file randomly, we need to know the address of the record
Random files: indexed
Can have more than one index, each with a different key. For example, an employee file can be retrieved based on either social security number or last name. This type of indexed file is usually called an inverted file.
Random files: hashed
A hashed file uses a mathematical function to accomplish this mapping. The user gives the key, the function maps the key to the address and passes it to the operating system, and the record is retrieved
Random files: hashing methodsMany several hashing methods
Example: Direct hashingThe key is the data file address without any algorithmic manipulation. The file must therefore contain a record for every possible key. Although situations suitable for direct hashing are limited, it can be very powerful, because it guarantees that there are no synonyms or collisions, as with other methods.
Random files: hashing methods cont
Example: Modulo division (%) hashing
(division remainder hashing) divides the key by the file size and uses the remainder plus 1 for the address. This gives the simple hashing algorithm that follows, where list_size is the number of elements in the file. The reason for adding a 1 to the mod operation result is that the list starts with 1 instead of 0.
Sequential Files
Records can only be accessed one after another from beginning to end.
Records are stored one after another in auxiliary storage◦ disk◦ tape
EOF (end-of-file) marker after the last record. ◦ The operating system has no information about the record addresses, it
only knows where the whole file is stored. ◦ The operating system knows that records are sequential.
17.2 Data Hierarchy
Sequential file: Processing records
Pseudo codeWhile Not EOF{
//Read the next record//Process the record
}
Record key Identifies a record to facilitate the retrieval of specific records
from a file
Sequential file: Applications
that need to access all records from beginning to end◦Ex.: Personal information
Because each record is processed, sequential access is more efficient and easier than random access
Sequential vs Random (https://technet.microsoft.com/en-us/library/cc938619.aspx)
Comparing random versus sequential operations is one way of assessing application efficiency in terms of disk use.
Accessing data sequentially is much faster than accessing it randomly because of the way in which the disk hardware works. The seek operation, which occurs when the disk head positions itself at the right disk cylinder to access data requested, takes more time than any other part of the I/O process.
Because reading randomly involves a higher number of seek operations than does sequential reading, random reads deliver a lower rate of throughput. The same is true for random writing. You might find it useful to examine your workload to determine whether it accesses data randomly or sequentially. If you find disk access is predominantly random, you might want to pay particular attention to the activities being done and monitor for the emergence of a bottleneck.
For workloads of either random or sequential I/O, use drives with faster rotational speeds. For workloads that are predominantly random I/O, use a drive with faster seek time.
Sequential files: updating
Must be updated periodically to reflect changes in information
all of the records need to be checked and updated (if necessary) sequentially: ◦ new Master File◦ old Master File◦ transaction File
(contains changes to be applied to the master file) add delete change
◦ Report Message
OLD MASTER
TRANSACTION
NEWMASTER
ERRORMESSAGES
UPDATEPROGRAM
Example: Sequential Update with Data Files
ERROR MESSAGES:
NO MATCH 500000000DUPLICATE ADDITION 888888888
UPDATE
OLD MASTER FILE:
111111111ADAMS 015000 NEW YORK222222222BAKER 025000 NEW YORK333333333ZIDROW 008000 NEW YORK444444444MILGROM 040000 BOSTON555555555BENJAMIN 100000 CHICAGO666666666SHERRY 007500 CHICAGO777777777BOROW 017500 BOSTON888888888JAMES 050000 NEW YORK 999999999RENAZEV 030000 NEW YORK
TRANSACTION FILE:
222222222 028000 C222222222 BOSTON C400000000NEW EMPLOYEE 016000 BOSTON A500000000 020000 C610000000NEW EMPLOYEE II 018000 CHICAGO A610000000 NEW YORK C 666666666SHERRY D777777777 055000 C888888888JAMES 017500 NEW YORK A
NEW MASTER FILE:
111111111ADAMS 015000 NEW YORK222222222BAKER 028000 BOSTON333333333ZIDROW 008000 NEW YORK400000000NEW EMPLOYEE 016000 BOSTON444444444MILGROM 040000 BOSTON 555555555BENJAMIN 100000 CHICAGO 610000000NEW EMPLOYEE II 018000 NEW YORK777777777BOROW 055000 NEW YORK888888888JAMES 050000 NEW YORK999999999RENAZEV 030000 NEW YORK
A text file is a file of characters. It cannot contain integers, floating-point numbers, or any other data structures in their internal memory format. To store these data types, they must be converted to their character equivalent formats.
Some files can only use character data types. Most notable are file streams (input/output objects in some object-oriented language like C#, C++, Java) for keyboards, monitors and printers. This is why we need special functions to format data that is input from or output to these devices.
Text files
Binary files
A binary file is a collection of data stored in the internal format of the computer. In this definition, data can be an integer (including other data types represented as unsigned integers, such as image, audio, or video), a floating-point number or any other structured data (except a file).
Unlike text files, binary files contain data that is meaningful only if it is properly interpreted by a program. If the data is textual, one byte is used to represent one character (in ASCII encoding). But if the data is numeric, two or more bytes are considered a data item.
17.3 Files and Streams C# views each file as a sequential stream of bytes
When a console application executes, the runtime environment creates the Console.Out, Console.In and Console.Error streams◦ Console.In refers to the standard input stream object, which
enables a program to input data from the keyboard. ◦ Console.Out refers to the standard output stream object, which
enables a program to output data to the screen. ◦ Console.Error refers to the standard error stream object, which
enables a program to output error messages to the screen. Console methods Write and WriteLine use Console.Out to perform
output Console methods Read and ReadLine use Console.In to perform
input
17.4 Classes File and Directory
Class Directory provides capabilities for manipulating directories
The DirectoryInfo object returned by method CreateDirectory contains information about a directory
!
Using (from msdn)
Defines a scope, outside of which an object or objects will be disposed.
C#, through the .NET Framework common language runtime (CLR), automatically releases the memory used to store objects that are no longer required. The release of memory is non-deterministic; memory is released whenever the CLR decides to perform garbage collection. However, it is usually best to release limited resources such as file handles and network connections as quickly as possible.
The using statement allows the programmer to specify when objects that use resources should release them. The object provided to the using statement must implement the IDisposable interface. This interface provides the Dispose method, which should release the object's resources.
A using statement can be exited either when the end of the using statement is reached or if an exception is thrown and control leaves the statement block before the end of the statement.
map
Revisit LINQ*
class IntroToLINQ
{
static void Main()
{
// The Three Parts of a LINQ Query:
// 1. Data source.
int[] numbers = newint[7] { 0, 1, 2, 3, 4, 5, 6 };
// 2. Query creation.// numQuery is an IEnumerable<int>
var numQuery =
from num in numbers
where (num % 2) == 0
select num;
// 3. Query execution.
foreach (int num in numQuery)
{
Console.Write("{0,1} ", num);
}
}
}
Revisit LINQ cont In LINQ the execution of the query is distinct from the query itself
No data retrieving by creating a query variable
Create a data source from an XML document
// using System.Xml.Linq;
XElement contacts =
XElement.Load(@"c:\myContactList.xml");
Northwnd db=new Northwnd(@"c:\northwnd.mdf"); // Query for customers in London.
IQueryable<Customer> custQuery =
from cust in db.Customers
where cust.City == "London"
select cust;
Revisit LINQ executionQueries that perform aggregation functions over a range of source elements must first iterate over those elements. Examples of such queries are Count, Max, Average, and First. These execute without an explicit foreach statement because the query itself must use foreach in order to return a result.
Note also that these types of queries return a single value, not an IEnumerable collection.
var evenNumQuery =
from num in numbers
where (num % 2) == 0
select num;
int evenNumCount = evenNumQuery.Count();
Revisit LINQ forced executionTo force immediate execution of any query and cache its results,
call the ToList <TSource> or ToArray<TSource> methods.
List<int> numQuery2 =
(from num in numbers
where (num % 2) == 0
select num).ToList();
or
int[]var numQuery3 =
(from num in numbers
where (num % 2) == 0
select num).ToArray();
17.5 Creating a Sequential-Access Text File
Step 1. Build REUSABLE dll in BankLibrary.sln. Record simulator
CreateFileForm Class SaveFileDialog is used for selecting files.
The constant FileMode.OpenOrCreate indicates that the FileStream should open the file if it exists or create the file if it does not.
To preserve the original contents of a file, use FileMode.Append.
The constant FileAccess.Write indicates that the program can perform only write operations with the FileStream object.
There are two other FileAccess constants—FileAccess.Read for read-only access and FileAccess.ReadWrite for both read and write access.
An IOException is thrown if there is a problem opening the file or creating the StreamWriter .
StreamWriter method WriteLine writes a sequence of characters to a file.
The StreamWriter object is constructed with a FileStream argument that specifies the file to which the StreamWriter will output text.
Method Close throws an IOException if the file or stream cannot be closed properly.
Same as “”
Must be a Key
Select where to save first, than enter data
Clients.txt
100,Nancy,Brown,-25.54200,Stacey,Dunn,314.33300,Doug,Barker,0.00400,Dave,Smith,258.34500,Sam,Stone,34.98
17.6 Reading Data from a Sequential-Access Text File
ReadSequentialAccessFileForm
OpenFileDialog is used to open a file. The behavior and GUI for the Save and Open dialog types are
identical, except that Save is replaced by Open. Specify read-only access to a file by passing constant FileAccess.Read as the third argument to the FileStream constructor.
StreamReader method ReadLine reads the next line from the file.
File Position
• A FileStream object can reposition its file-position pointer to any position in the file.
• When a FileStream object is opened, its file-position pointer is set to byte position 0.
• You can use StreamReader property BaseStream to invoke the Seek method of the underlying FileStream to reset the file-position pointer back to the beginning of the file.
• Exercise: Add a Start/Beginning button to the ReadSequentialAccessFileForm program to go to the very first record.
17.7 Case Study: Credit Inquiry Program
filter
Serialization
Class BinaryFormatter enables entire objects to be written to or read from a stream.
BinaryFormatter method Serialize writes an object’s representation to a file.
BinaryFormatter method Deserialize reads this representation from a file and reconstructs the original object.
Both methods throw a SerializationException if an error occurs during serialization or deserialization.
Serialization (writing to a file)
• Method Serialize takes the FileStream object as the first argument so that the BinaryFormatter can write its second argument to the correct file.
• Remember that we are now using binary files, which are not human readable.
17.9 Creating a Sequential-Access File Using Object Serialization
Deserialization (reading from a file)
• Deserialize returns a reference of type object.• If an error occurs during deserialization, a SerializationException is thrown.
17.10 Reading and Deserializing Data from a Binary File
Clients.bin
100,Nancy,Brown,-25.54 200,Stacey,Dunn,314.33 300,Doug,Barker,0.00 400,Dave,Smith,258.34 500,Sam,Stone,34.98