File System in Operating System: A Brief Overview

Mohammad M Rahman
10 min readJul 17, 2022

--

What is a File System

The Directory

Single Level Directory

Two-Level Directory

Tree-Structured Directory

Acyclic graph directory

General graph directory structure

File Allocation Methods

Contiguous Allocation

Linked List Allocation

Indexed Allocation

Access Method

Types of Access Methods

Basic Direct Access Method

Basic Partitioned Access Method

Basic Sequential Access Method

Indexed Sequential Access Method

Virtual storage access method

References

What is a File System

A file system, often known as file management or FS, is a method of controlling how and where data is stored on a disk. It is a logical disk component that organizes files into groups called directories for the purpose of compression. It controls a disk’s internal activities because it is computer-related and abstract to a human user. The directories may contain files and extra directories. Without file management, it would be impossible for two files with the same name to coexist, it would be impossible to uninstall installed programs and retrieve certain files, and data would not be organized without a file structure. Since files are frequently managed in a hierarchy, the file system enables you to view a file in the current directory.

Regardless of its purpose or nature, a disk (such as a hard drive) has a file system. Additionally, it explains how a user or application may access the data and includes details like file size, file name, file location, and fragment information. The file system is responsible for managing tasks including file naming, metadata, storage management, and directories and folders. Files are stored on a storage device in sectors, while data is kept in blocks, which are collections of sectors. The file system helps to identify which sectors are available for use as well as the size and location of the files. Partitions are occasionally discussed in terms of “file systems.” For instance, stating that “two file systems are available on the hard drive” does not necessarily imply that the drive is split into the NTFS and FAT file systems. However, it indicates that there are two distinct partitions that share the same physical disk.

There are several types of file systems such as FAT, GFS, NTFS, HFS, UDF etc. The file system can hold up to three layers. These layers occasionally work together and occasionally are purposefully divided. Because it is responsible for interacting with the user program, the logical file system offers the API (Application Program Interface) for file operations including OPEN, CLOSE, READ, and more. Additionally, the requested action is sent to the layer behind it for processing. The second optional layer, a virtual file system, provides support for numerous concurrent instances of physical file systems. A file system implementation is what is used to refer to each concurrent instance.

(File System | What Is a File System — Javatpoint, n.d.; )

The Directory

A directory is a type of container used to store files and folders. It sets up folders and files in a hierarchical order.

Single Level Directory

The simplest directory structure is a single-level directory. It is simple to support and understand because all of the files are located in the same directory. However, a single-level directory has major limitations as file sizes grow or as the number of users on the system increases. All of the files must have a distinct name because they are all in the same directory. If two users call their dataset “experiment,” the rule requiring a unique name has been broken.

(Tutorials Point, n.d.)

Two-Level Directory

As we’ve seen, a single-level directory frequently causes confusion among users regarding file names. The answer to this issue is to make a unique directory for every user. Each user’s user files directory is located in the two-level directory hierarchy (UFD). Similar in structure, the UFDs each list the files belonging to a single user. Every time a new user logs in, the system examines the master file directory (MFD). Each entry in the MFD points to the UFD for that user and is indexed by username or account number.

(Tutorials Point, n.d.)

Tree-Structured Directory

The natural generalization is to expand the directory structure to a tree of arbitrary height once we have seen a two-level directory as a tree of height two. The user can make their own subdirectories and organize their files as desired thanks to this generalization.

(Tutorials Point, n.d.)

Acyclic graph directory

Acyclic graphs, which lack cycles and permit the sharing of subdirectories and files, Two distinct directories may contain the same file or subdirectory. It is a logical extension of the directory that is tree-structured. It is used in situations where two programmers need to access files while working on a shared project. Since they are working on a joint project, the relevant files are stored in a subfolder, which keeps them separate from other projects and the files of other programmers. They then want to move the subdirectories into their own folders. Shared subdirectories should exist. This is where acyclic directories are used. The important thing to remember is that shared files are not the same as copied files; if a programmer makes modifications to one subfolder, they will affect the other as well.

(Tutorials Point, n.d.)

General graph directory structure

Cycles are permitted in a directory structure where multiple directories can derive from multiple parent directories in a general graph directory structure. Calculating the overall amount or space that the files and directories have consumed is the main issue with this type of directory layout.

(Tutorials Point, n.d.)

(Operating System | Structures of Directory — Tutorialspoint.dev, n.d.)

File Allocation Methods

How the files are stored in the disk blocks is determined by the allocation methods. There are three primary file or disk space allocation techniques.

Ø Contiguous Allocation

Ø Linked Allocation

Ø Indexed Allocation

The fundamental purpose of these techniques is to offer efficient use of disk space and a quick way to get to the file blocks.

Contiguous Allocation

A traditional memory allocation model is contiguous memory allocation. In this case, a system gives a process access to successive memory blocks or memory blocks with consecutive addresses. One of the earliest types of memory allocation is contiguous memory allocation. Here’s how it operates: Whenever a process needs to execute, it requests memory. The amount of contiguous main memory that is available to run the process is compared to the size of the process. If enough contiguous memory is found, it is allocated, and the process begins running. If not, the process is added to a list of processes that must wait until there is enough free contiguous memory available. A file will be assigned the following blocks if it needs n blocks and is given a block b as its starting point, for instance: b, b+1, b+2,……b+n-1. This means that we can determine the blocks occupied by the file given the starting block address and the length of the file (in terms of necessary blocks).

A file with contiguous allocation’s directory entry includes the address of starting block and the length of the allocated portion. Any memory location that a process references has its address checked to make sure it does not refer to the address of another process. The underlying operating system is in charge of this processing security.

Contiguous memory allocation provides certain unique advantages over non-contiguous memory allocation. It frequently entails less overhead, executes more quickly, and is simpler for the operating system to handle.

Contiguous memory allocation, however, has disadvantages as well. One of the key ones is that, if the smaller quantities of memory are not used because contiguous memory blocks are required, memory may be wasted in this manner. Also, because the contiguous block is not easily available, programs might have to wait longer for execution.

Linked List Allocation

Linked lists could be used to represent blocks in a file. The address of the first block the file occupies is all that needs to be stored. Each block of the file includes data as well as a pointer to the block after it.

In contrast to a method that demands that every file be contiguous, this method allows for the use of every block. No space is lost as a result of outside fragmentation (although there is internal fragmentation within the file, which can lead to performance issues). Only the first block needs to be stored in the directory entry. From there, you can access the rest of the file. There is no requirement that the file size is known in advance (unlike a contiguous file allocation scheme). Any block can be allocated when a file needs more space (e.g. the first block on the free block list).

The drawbacks of this approach are as follows: Random access is extremely slow (as it needs many disc reads to access a random point in the file). In actuality, sequential access is the only usage for the system as described above. The pointer takes up space within each block. The number of bytes cannot be a power of two due to this. Although not lethal, this does affect performance. Reliability can be an issue. One faulty block pointer is all that is necessary for the entire system to be compromised (e.g. writing over a block that belongs to another file).

Indexed Allocation

Each file has its own index block, which is an array of disk-block addresses, and this data structure, known as an i-node (index-node), lists the properties and disk addresses of the file blocks.

The benefits include direct access support, enabling quick access to the file blocks that are occupied by the file. The issue of external fragmentation is solved.

The disadvantage is that linked allocation has less pointer overhead than indexed allocation. The indexed allocation would store a full block (the index block) for the pointers for very small files, say files that extend only a couple of blocks, which is inefficient in terms of memory usage. However, with linked allocation, we only lose one pointer per block’s worth of space.

The following are some mechanisms for this purpose:

Linked scheme: Typically, an index block comprises one disk block. It can therefore be read and written directly on its own. We can link numerous index blocks together to support huge files.

Using a first-level index block to point to a collection of second-level index blocks, which in turn point to the file blocks, is one variation of the linked representation known as a multilevel index.

Combination scheme: The first, let’s say, 15 pointers from the index block are kept in the file’s inode as an additional option in the UFS (UNIX File System).

(What Is Contiguous Memory Allocation? — Definition from Techopedia, n.d.; Ozdogan, 2010; File Allocation Methods — GeeksforGeeks, 2018; )

Access Method

The process for storing and retrieving data is described by an access method. Data set structures, macros, and utility programs are all specific to access techniques and can be used to process data sets. The organization of the data set mostly determines the access methods. For instance, while working with sequential data sets, employ the basic sequential access technique (BSAM) or queued sequential access method (QSAM).

Types of Access Methods

Basic Direct Access Method

Records can be retrieved by actual or relative address, and BDAM can organize records in any order your program specifies. If you are unsure of a record’s specific placement, you can indicate a point in the data set where a search for it should start. Direct data sets are those that are arranged in this manner.

Basic Partitioned Access Method

On DASD, records are organized using the basic partitioned access method (BPAM) as members of a partitioned data set (PDS) or a partitioned data set extended (PDSE). A UNIX directory and its files can be viewed as a PDS using BPAM. With BSAM or QSAM, you can inspect each PDS, PDSE, or UNIX member in turn. A directory that links member names to specific locations within the data set is part of a PDS or PDSE. To obtain specific members, use the PDS, PDSE, or UNIX directory. The directory contains program attributes needed to load and rebind the member for program libraries (load modules and program objects). Despite the fact that UNIX files are capable of containing program objects, program management does not use BPAM to access UNIX files.

Basic Sequential Access Method

Records are organized sequentially by BSAM according to the order in which they are entered. Sequential data sets are ones that have this organization. The user creates blocks of records by combining them with other records. Access is on a basic level. The majority of file operations are read and write. A file pointer that tracks the location of I/O operations is automatically advanced during a read operation called “read next” that reads the file’s next position. In a similar manner, advance to the newly written information with the -write next- command and append to the end of the file.

Indexed Sequential Access Method

It is the alternative approach to file access that is constructed on top of sequential access. These techniques create the file’s index. The pointer to each block is contained in the index, much like an index at the back of a book. We first search the index to discover a record in the file, and then we use the pointer to access the file directly.

Virtual storage access method

By using an index key, a relative record number, or a relative byte addressing, VSAM can arrange records. In order to process fixed-length and variable-length records directly or sequentially on DASD, VSAM is needed. One of five different types of data sets is used to store data that has been organized by VSAM and is cataloged for quick retrieval.

(File Access Methods in Operating System — GeeksforGeeks, 2018; G53OPS : File Systems, 2022; Access Methods, n.d.)

References

Access methods. (n.d.). Www.ibm.com. Retrieved July 17, 2022, from https://www.ibm.com/docs/en/zos/2.4.0?topic=sets-access-methods

File Access Methods in Operating System — GeeksforGeeks. (2018, October 31). GeeksforGeeks. https://www.geeksforgeeks.org/file-access-methods-in-operating-system/

File Allocation Methods — GeeksforGeeks. (2018, September 10). GeeksforGeeks. https://www.geeksforgeeks.org/file-allocation-methods/

File System | What is a File System — javatpoint. (n.d.). Www.javatpoint.com. https://www.javatpoint.com/file-system

G53OPS : File Systems. (2022). Nott.ac.uk. https://www.cs.nott.ac.uk/~pszgxk/courses/g53ops/File%20Systems/File07-linkedlist.html

Operating System | Structures of Directory — Tutorialspoint.dev. (n.d.). Tutorialspoint.Dev. https://tutorialspoint.dev/computer-science/operating-systems/operating-system-structures-of-directory

Ozdogan, C. (2010, May 11). File Allocation. https://boron.physics.metu.edu.tr/ozdogan/OperatingSystems/week12/node13.htm

Tutorials Point. (n.d.). Directory Structure [Online]. In Tutorialspoint. https://tutorialspoint.dev/

What is Contiguous Memory Allocation? — Definition from Techopedia. (n.d.). Techopedia.com. https://www.techopedia.com/definition/3769/contiguous-memory-allocation

--

--

Mohammad M Rahman
Mohammad M Rahman

Written by Mohammad M Rahman

Research interest: Islam, Computer science, Psychology/Sociology. Please see my profile links for further info.

No responses yet