File System in Operating System: A Brief Overview
General graph directory structure
Basic Partitioned Access Method
Basic Sequential Access Method
Indexed Sequential Access Method
What is a File System
A file system, often known as file management or FS, is a method of controlling how and where data is stored on a disk. It is a logical disk component that organizes files into groups called directories for the purpose of compression. It controls a disk’s internal activities because it is computer-related and abstract to a human user. The directories may contain files and extra directories. Without file management, it would be impossible for two files with the same name to coexist, it would be impossible to uninstall installed programs and retrieve certain files, and data would not be organized without a file structure. Since files are frequently managed in a hierarchy, the file system enables you to view a file in the current directory.
Regardless of its purpose or nature, a disk (such as a hard drive) has a file system. Additionally, it explains how a user or application may access the data and includes details like file size, file name, file location, and fragment information. The file system is responsible for managing tasks including file naming, metadata, storage management, and directories and folders. Files are stored on a storage device in sectors, while data is kept in blocks, which are collections of sectors. The file system helps to identify which sectors are available for use as well as the size and location of the files. Partitions are occasionally discussed in terms of “file systems.” For instance, stating that “two file systems are available on the hard drive” does not necessarily imply that the drive is split into the NTFS and FAT file systems. However, it indicates that there are two distinct partitions that share the same physical disk.
There are several types of file systems such as FAT, GFS, NTFS, HFS, UDF etc. The file system can hold up to three layers. These layers occasionally work together and occasionally are purposefully divided. Because it is responsible for interacting with the user program, the logical file system offers the API (Application Program Interface) for file operations including OPEN, CLOSE, READ, and more. Additionally, the requested action is sent to the layer behind it for processing. The second optional layer, a virtual file system, provides support for numerous concurrent instances of physical file systems. A file system implementation is what is used to refer to each concurrent instance.
(File System | What Is a File System — Javatpoint, n.d.; )
The Directory
A directory is a type of container used to store files and folders. It sets up folders and files in a hierarchical order.
Single Level Directory
The simplest directory structure is a single-level directory. It is simple to support and understand because all of the files are located in the same directory. However, a single-level directory has major limitations as file sizes grow or as the number of users on the system increases. All of the files must have a distinct name because they are all in the same directory. If two users call their dataset “experiment,” the rule requiring a unique name has been broken.
(Tutorials Point, n.d.)
Two-Level Directory
As we’ve seen, a single-level directory frequently causes confusion among users regarding file names. The answer to this issue is to make a unique directory for every user. Each user’s user files directory is located in the two-level directory hierarchy (UFD). Similar in structure, the UFDs each list the files belonging to a single user. Every time a new user logs in, the system examines the master file directory (MFD). Each entry in the MFD points to the UFD for that user and is indexed by username or account number.
(Tutorials Point, n.d.)
Tree-Structured Directory
The natural generalization is to expand the directory structure to a tree of arbitrary height once we have seen a two-level directory as a tree of height two. The user can make their own subdirectories and organize their files as desired thanks to this generalization.
(Tutorials Point, n.d.)
Acyclic graph directory
Acyclic graphs, which lack cycles and permit the sharing of subdirectories and files, Two distinct directories may contain the same file or subdirectory. It is a logical extension of the directory that is tree-structured. It is used in situations where two programmers need to access files while working on a shared project. Since they are working on a joint project, the relevant files are stored in a subfolder, which keeps them separate from other projects and the files of other programmers. They then want to move the subdirectories into their own folders. Shared subdirectories should exist. This is where acyclic directories are used. The important thing to remember is that shared files are not the same as copied files; if a programmer makes modifications to one subfolder, they will affect the other as well.
(Tutorials Point, n.d.)
General graph directory structure
Cycles are permitted in a directory structure where multiple directories can derive from multiple parent directories in a general graph directory structure. Calculating the overall amount or space that the files and directories have consumed is the main issue with this type of directory layout.
(Tutorials Point, n.d.)
(Operating System | Structures of Directory — Tutorialspoint.dev, n.d.)
File Allocation Methods
How the files are stored in the disk blocks is determined by the allocation methods. There are three primary file or disk space allocation techniques.
Ø Contiguous Allocation
Ø Linked Allocation
Ø Indexed Allocation
The fundamental purpose of these techniques is to offer efficient use of disk space and a quick way to get to the file blocks.
Contiguous Allocation
A traditional memory allocation model is contiguous memory allocation. In this case, a system gives a process access to successive memory blocks or memory blocks with consecutive addresses. One of the earliest types of memory allocation is contiguous memory allocation. Here’s how it operates: Whenever a process needs to execute, it requests memory. The amount of contiguous main memory that is available to run the process is compared to the size of the process. If enough contiguous memory is found, it is allocated, and the process begins running. If not, the process is added to a list of processes that must wait until there is enough free contiguous memory available. A file will be assigned the following blocks if it needs n blocks and is given a block b as its starting point, for instance: b, b+1, b+2,……b+n-1. This means that we can determine the blocks occupied by the file given the starting block address and the length of the file (in terms of necessary blocks).
A file with contiguous allocation’s directory entry includes the address of starting block and the length of the allocated portion. Any memory location that a process references has its address checked to make sure it does not refer to the address of another process. The underlying operating system is in charge of this processing security.
Contiguous memory allocation provides certain unique advantages over non-contiguous memory allocation. It frequently entails less overhead, executes more quickly, and is simpler for the operating system to handle.
Contiguous memory allocation, however, has disadvantages as well. One of the key ones is that, if the smaller quantities of memory are not used because contiguous memory blocks are required, memory may be wasted in this manner. Also, because the contiguous block is not easily available, programs might have to wait longer for execution.
Linked List Allocation
Linked lists could be used to represent blocks in a file. The address of the first block the file occupies is all that needs to be stored. Each block of the file includes data as well as a pointer to the block after it.
In contrast to a method that demands that every file be contiguous, this method allows for the use of every block. No space is lost as a result of outside fragmentation (although there is internal fragmentation within the file, which can lead to performance issues). Only the first block needs to be stored in the directory entry. From there, you can access the rest of the file. There is no requirement that the file size is known in advance (unlike a contiguous file allocation scheme). Any block can be allocated when a file needs more space (e.g. the first block on the free block list).
The drawbacks of this approach are as follows: Random access is extremely slow (as it needs many disc reads to access a random point in the file). In actuality, sequential access is the only usage for the system as described above. The pointer takes up space within each block. The number of bytes cannot be a power of two due to this. Although not lethal, this does affect performance. Reliability can be an issue. One faulty block pointer is all that is necessary for the entire system to be compromised (e.g. writing over a block that belongs to another file).
Indexed Allocation
Each file has its own index block, which is an array of disk-block addresses, and this data structure, known as an i-node (index-node), lists the properties and disk addresses of the file blocks.
The benefits include direct access support, enabling quick access to the file blocks that are occupied by the file. The issue of external fragmentation is solved.
The disadvantage is that linked allocation has less pointer overhead than indexed allocation. The indexed allocation would store a full block (the index block) for the pointers for very small files, say files that extend only a couple of blocks, which is inefficient in terms of memory usage. However, with linked allocation, we only lose one pointer per block’s worth of space.
The following are some mechanisms for this purpose:
Linked scheme: Typically, an index block comprises one disk block. It can therefore be read and written directly on its own. We can link numerous index blocks together to support huge files.
Using a first-level index block to point to a collection of second-level index blocks, which in turn point to the file blocks, is one variation of the linked representation known as a multilevel index.
Combination scheme: The first, let’s say, 15 pointers from the index block are kept in the file’s inode as an additional option in the UFS (UNIX File System).
(What Is Contiguous Memory Allocation? — Definition from Techopedia, n.d.; Ozdogan, 2010; File Allocation Methods — GeeksforGeeks, 2018; )
Access Method
The process for storing and retrieving data is described by an access method. Data set structures, macros, and utility programs are all specific to access techniques and can be used to process data sets. The organization of the data set mostly determines the access methods. For instance, while working with sequential data sets, employ the basic sequential access technique (BSAM) or queued sequential access method (QSAM).
Types of Access Methods
Basic Direct Access Method
Records can be retrieved by actual or relative address, and BDAM can organize records in any order your program specifies. If you are unsure of a record’s specific placement, you can indicate a point in the data set where a search for it should start. Direct data sets are those that are arranged in this manner.
Basic Partitioned Access Method
On DASD, records are organized using the basic partitioned access method (BPAM) as members of a partitioned data set (PDS) or a partitioned data set extended (PDSE). A UNIX directory and its files can be viewed as a PDS using BPAM. With BSAM or QSAM, you can inspect each PDS, PDSE, or UNIX member in turn. A directory that links member names to specific locations within the data set is part of a PDS or PDSE. To obtain specific members, use the PDS, PDSE, or UNIX directory. The directory contains program attributes needed to load and rebind the member for program libraries (load modules and program objects). Despite the fact that UNIX files are capable of containing program objects, program management does not use BPAM to access UNIX files.
Basic Sequential Access Method
Records are organized sequentially by BSAM according to the order in which they are entered. Sequential data sets are ones that have this organization. The user creates blocks of records by combining them with other records. Access is on a basic level. The majority of file operations are read and write. A file pointer that tracks the location of I/O operations is automatically advanced during a read operation called “read next” that reads the file’s next position. In a similar manner, advance to the newly written information with the -write next- command and append to the end of the file.
Indexed Sequential Access Method
It is the alternative approach to file access that is constructed on top of sequential access. These techniques create the file’s index. The pointer to each block is contained in the index, much like an index at the back of a book. We first search the index to discover a record in the file, and then we use the pointer to access the file directly.
Virtual storage access method
By using an index key, a relative record number, or a relative byte addressing, VSAM can arrange records. In order to process fixed-length and variable-length records directly or sequentially on DASD, VSAM is needed. One of five different types of data sets is used to store data that has been organized by VSAM and is cataloged for quick retrieval.
(File Access Methods in Operating System — GeeksforGeeks, 2018; G53OPS : File Systems, 2022; Access Methods, n.d.)
References
Access methods. (n.d.). Www.ibm.com. Retrieved July 17, 2022, from https://www.ibm.com/docs/en/zos/2.4.0?topic=sets-access-methods
File Access Methods in Operating System — GeeksforGeeks. (2018, October 31). GeeksforGeeks. https://www.geeksforgeeks.org/file-access-methods-in-operating-system/
File Allocation Methods — GeeksforGeeks. (2018, September 10). GeeksforGeeks. https://www.geeksforgeeks.org/file-allocation-methods/
File System | What is a File System — javatpoint. (n.d.). Www.javatpoint.com. https://www.javatpoint.com/file-system
G53OPS : File Systems. (2022). Nott.ac.uk. https://www.cs.nott.ac.uk/~pszgxk/courses/g53ops/File%20Systems/File07-linkedlist.html
Operating System | Structures of Directory — Tutorialspoint.dev. (n.d.). Tutorialspoint.Dev. https://tutorialspoint.dev/computer-science/operating-systems/operating-system-structures-of-directory
Ozdogan, C. (2010, May 11). File Allocation. https://boron.physics.metu.edu.tr/ozdogan/OperatingSystems/week12/node13.htm
Tutorials Point. (n.d.). Directory Structure [Online]. In Tutorialspoint. https://tutorialspoint.dev/
What is Contiguous Memory Allocation? — Definition from Techopedia. (n.d.). Techopedia.com. https://www.techopedia.com/definition/3769/contiguous-memory-allocation