WebSVN - shark - Blame - Rev 1677 - /sharkdoc/trunk/vol6/vol6.tex

Rev	Author	Line No.	Line
1676	tullio	1	\documentclass[english]{report}
		2	\usepackage[T1]{fontenc}
		3	\usepackage[latin1]{inputenc}
		4	\usepackage{a4wide}
		5	\setcounter{secnumdepth}{3} \setcounter{tocdepth}{3}
		6	\usepackage{graphicx}
		7
		8	\makeatletter
		9
		10	\usepackage{babel}
		11	\makeatother
		12
		13	\begin{document}
		14
		15	\thispagestyle{empty}
		16
		17	\begin{center}{\LARGE S.Ha.R.K. User Manual}\end{center}{\LARGE \par} \vfill{}
		18
		19	\begin{center}{\large Volume VI}\end{center}
		20
		21	\begin{center}S.Ha.R.K. File System\end{center} \vfill{}
		22
		23	\begin{center}Written by\end{center}
		24	\begin{center}Paolo Gai (pj at sssup.it)\end{center}
		25	\begin{center}Tullio Facchinetti (tullio.facchinetti at unipv.it)\end{center}
		26	\begin{center}Original italian version by\end{center}
		27	\begin{center}Massimiliano Giorgi (massy at gandalf.sssup.it)\end{center}
		28	\vfill{}
		29
		30	\begin{center}\includegraphics[width=2cm]{../common/sssup.ps}\end{center}
		31
		32	\begin{center}Scuola Superiore di Studi e Perfezionamento S. Anna\end{center}
		33	\begin{center}RETIS Lab\end{center}
		34	\begin{center}Via Carducci, 40 - 56100 Pisa\end{center}
		35	\begin{center}\pagebreak\end{center}
		36
		37	\tableofcontents{}
		38
		39	\listoffigures
		40
		41	\listoftables
		42
		43	\begin{abstract}
		44	This is the documentation about S.Ha.R.K. File system. The document gives an
		45	insight on many data structures used internally by the file system. This is the
		46	first step toward a \emph{true} file system user manual. If you carefully read
		47	it, you will understand how it works, how can be initialized and used.
		48
		49	In any case, please send any comments to pj@sssup.it.
		50
		51	Enjoy,
		52
		53	Paolo
		54	\end{abstract}
		55
		56	%----------------------------------------------------------------------------
		57	\chapter{File system architecture}
		58	\label{sec:arch}
		59	%----------------------------------------------------------------------------
		60
		61	%----------------------------------------------------------------------------
		62	\section{Features}
		63	%----------------------------------------------------------------------------
		64
		65	\label{sec:arch:gen}The library is composed by two main layers:
		66
		67	\begin{description}
		68	\item [File-system] The file system contains all the data structures used to
		69	handle files.
		70	\item [Blocks~devices]This part handles directly the hardware. They provide a
		71	common interface that unifies the usage of all the block devices (i.e., the hard
		72	disks).
		73	\end{description}
		74
		75	The architecture of the library is modular, and it is described in
		76	Figure \ref{fig:Filesystem-architecture}.
		77
		78	\begin{figure}
		79	\begin{center}
		80	\includegraphics[width=6cm]{images/arch2.eps}
		81	\end{center}
		82	\caption{File system architecture}
		83	\label{fig:Filesystem-architecture}
		84	\end{figure}
		85
		86	The library is connected to the Kernel via a \emph{glue} layer, to
		87	the application via a \emph{wrapper} layer; the block devices and
		88	the file system operations are separated by a cache, that may be
		89	excluded.
		90
		91	\label{sec:arch:req}
		92
		93	The library needs a set of primitives that must be supported by
		94	the underlying kernel, as memory handling, string handling, and
		95	functions with a variable number of parameters. Moreover, the
		96	kernel should support functions that implement:
		97
		98	\begin{itemize}
		99	\item Mutual exclusion.
		100	\item Allocation of low level resources (as I/O spaces and interrupts).
		101	\item Memory protection (not needed in S.Ha.R.K.).
		102	\end{itemize}
		103
		104	%----------------------------------------------------------------------------
		105	\section{Low Level: Device Drivers}
		106	%----------------------------------------------------------------------------
		107
		108	\begin{figure}
		109	\begin{center}
		110	\includegraphics[width=6cm]{images/archlow.eps}
		111	\end{center}
		112	\caption{Device Drivers}
		113	\label{fig:archlow}
		114	\end{figure}
		115
		116	The device drivers are divided in two main parts:
		117
		118	\begin{itemize}
		119	\item A general part that exports the basic low level services. It
		120	also gives a set of internal routines to the internal modules.
		121	\item A set of internal modules that handle one or more
		122	peripherals. The application can choose which module must be
		123	linked, initialized and used at compile time.
		124	\end{itemize}
		125
		126	\begin{figure}
		127	\begin{center}
		128	\includegraphics[width=6cm]{images/archlow2.eps}
		129	\end{center}
		130	\caption{Device Drivers - details}
		131	\label{fig:archlow2}
		132	\end{figure}
		133
		134	The general part is composed by four modules:
		135
		136	\begin{description}
		137	\item [Block~device]The main part; it exports the interface to the upper layers,
		138	inits the device driver subsystem and register all the drivers present in the
		139	system.
		140	\item [Logical~disk]This module provides functions that allow to inspect the
		141	logical structure of an hard disk, like start, end, size and type of each
		142	partition on the Hard Disk.
		143	\item [Physical~disk]This module provide a small data base that contains
		144	informations about all the hard disks in the system; these informations can be
		145	useful for the module that must handle the low level pending request queues.
		146	\item [Queue~policy]This module handles the low level policies for the pending
		147	requests (SCAN, LOOK, EDF, ...).
		148	\end{description}
		149
		150	%----------------------------------------------------------------------------
		151	\section{High Level: File systems}
		152	%----------------------------------------------------------------------------
		153
		154	\begin{figure}
		155	\begin{center}
		156	\includegraphics[width=10cm]{images/archhigh.eps}
		157	\end{center}
		158	\caption{High Level - File system details}
		159	\label{fig:archhigh}
		160	\end{figure}
		161
		162	The high level is divided in three parts:
		163
		164	\begin{itemize}
		165	\item The first part handle the file systems, its initialization and its
		166	interface to the internal modules.
		167	\item A second part composed by a set of modules that implement a particular
		168	file system (i.e., FAT16).
		169	\item A cache module (whose policy can be easily changed).
		170	\end{itemize}
		171
		172	The first part is composed by sub-modules (internal implementation
		173	of a sub-module can be changed without affecting the rest of the
		174	library):
		175
		176	\begin{description}
		177	\item [Filesystems]It is responsible for the initialization and termination of
		178	all the high level layer. It registers all the file systems, and it contains the
		179	routines for mounting/unmounting the partitions.
		180	\item [File]It handles the descriptor tables and the files; it defines the
		181	\texttt{file} structure.
		182	\item [Super]It handles the super blocks (that contains the global informations
		183	about partitions on a disk).
		184	\item [Inodes]It handles the inodes (informations about a file in the file
		185	system).
		186	\item [Dentry]It handles file names in a tree structure.
		187	\item [Open,~read,~umask,~...]They implement the correspondent system
		188	primitives.
		189	\end{description}
		190
		191	%----------------------------------------------------------------------------
		192	\chapter{Device drivers}
		193	\label{sec:dd}
		194	%----------------------------------------------------------------------------
		195
		196	%----------------------------------------------------------------------------
		197	\section{Interface}
		198	\label{sec:dd:interfaccia}
		199	%----------------------------------------------------------------------------
		200
		201	%----------------------------------------------------------------------------
		202	\subsection{Initialization}
		203	\label{sec:dd:interfaccia:init}
		204	%----------------------------------------------------------------------------
		205
		206	\begin{table}
		207	\begin{verbatim}
		208	typedef struct bdev_parms {
		209	__uint32_t showinfo;
		210	char *config;
		211	void *bmutexattr;
		212	__uint16_t dummy;
		213	#ifdef IDE_BLOCK_DEVICE
		214	IDE_PARMS ideparms;
		215	#endif
		216	} BDEV_PARMS;
		217
		218	void bdev_def_showinfo (BDEV_PARMS s, int v);
		219	void bdev_def_configstring(BDEV_PARMS s, char *v);
		220	void bdev_def_mutexattrptr(BDEV_PARMS s,void *v);
		221	\end{verbatim}
		222	\caption{The BDEV\_PARMS structure}
		223	\label{tab:BDEV_PARMS}
		224	\end{table}
		225
		226	The bdev\_init() function (see Table \ref{tab:BDEV_PARMS}) must be
		227	called to initialize a device driver, passing a pointer to a
		228	BDEV\_PARMS structure. Use the provided macros to init that
		229	structure.
		230
		231	The structure is composed by two parts: a generic part and a
		232	specific part. The generic part is composed by the following
		233	fields:
		234
		235	\begin{description}
		236	\item [showinfo]If this flag is set, the initialization of the device will be
		237	``verbose''. To set this flag, use the macro bdev\_def\_showinfo.
		238	\item [config]It is a string that can have a particular meaning for the device
		239	driver. The string is parsed by the device driver, that usually searches for
		240	some initialization parameter (something like ``foo:[number]''). To set this
		241	string, use the macro bdev\_def\_configstring.
		242	\item [bmutexattr]This parameter can be used to tell the system which is the
		243	kind of mutex to use to implement the mutual exclusion. To set this parameter,
		244	use the macro \texttt{bdev\_def\_mutexattrptr}.
		245	\item [dummy]Used to align the data structure (see the macro BASE\_DEV).
		246	\end{description}
		247
		248	The specific part is directly specified by the device driver; the
		249	bdev\_init function can be called can be called passing a pointer
		250	to a BDEV\_PARMS structure initialized with a BASE\_BDEV value. If
		251	NULL is specified, the default values are used.
		252
		253	%----------------------------------------------------------------------------
		254	\subsubsection{How to create a new device driver}
		255	%----------------------------------------------------------------------------
		256
		257	\begin{enumerate}
		258	\item Add a new \emph{major} number in t the file include/fs/major.h (see
		259	\ref{sec:dd:interfaccia:nomi});
		260	\item Add a ``\#define'' into the file include/fs/bdevconf.h (to enable
		261	selective compilation);
		262	\item If needed, add fields to bdev\_parms, and modify the BASE\_BDEV macro with
		263	the new default parameters;
		264	\item Modify the function bdev\_init to call the init function of the new device
		265	driver (into drivers/block/bdev.c).
		266	\end{enumerate}
		267
		268	\begin{figure}
		269	\begin{center}
		270	\includegraphics[width=7cm]{images/ddinit.eps}
		271	\end{center}
		272	\caption{Device Drivers Initialization.}
		273	\label{fig:ddinit}
		274	\end{figure}
		275
		276	The device driver initialization function, after the
		277	initialization of its internal part, does the following actions
		278	(see Figure \ref{fig:ddinit}):
		279
		280	\begin{enumerate}
		281	\item It registers itself calling bdev\_register(); That function must be called
		282	for each device handled by the driver.
		283	\item It registers every block device that it find into its interfaces to the
		284	Physical Disk Module.
		285	\item It registers every logical device found calling bdev\_register\_minor().
		286	\end{enumerate}
		287
		288	Data structures and registration functions used in this phased are
		289	described in Section \ref{sec:dd:interfaccia:det}.
		290
		291	%----------------------------------------------------------------------------
		292	\subsection{Device serial number}
		293	\label{sec:dd:interfaccia:nomi}
		294	%----------------------------------------------------------------------------
		295
		296	\begin{figure}
		297	\begin{center}
		298	\includegraphics[width=7cm]{images/devtype.eps}
		299	\end{center}
		300	\caption{Device Serial Number}
		301	\label{tab:devtype}
		302	\end{figure}
		303
		304	Every logical device (i.e., a logical partition of a hard disk)
		305	has a device serial number of type \_\_dev\_t, that must have a
		306	unique value (see the POSIX standard).
		307
		308	The number has an internal structure (see Figure
		309	\ref{tab:devtype}) composed by a \emph{major} \emph{number}, and
		310	another number (whose number depends on the device implementation)
		311	called \emph{minor} \emph{number}.
		312
		313	Three macro exists (see include/fs/sysmacro.h) to handle the
		314	device numbers:
		315
		316	\begin{description}
		317	\item [makedev]This macro creates a \_\_dev\_t number from the major and the
		318	minor number.
		319	\item [major]Returns the major number of a device.
		320	\item [minor]Returns the minor number of a device.
		321	\end{description}
		322
		323	The major number is used to decide which device driver will handle
		324	a request (see Section \ref{sec:dd:interfaccia:det}). A device
		325	driver can register more than on major number. Minor numbers are
		326	device driver dependent, and the validity of a minor number is
		327	checked only by the driver.
		328
		329	To simplify the search of a device, every device has a symbolic
		330	name (see Section \ref{sec:dd:intercaccia:ext}). Symbolic names
		331	are composed by two parts: a unique name for each major, and a
		332	specific name assigned to each minor. For example, the major
		333	MAJOR\_B\_IDE has a symbolic name ``ide''; a minor name for that
		334	major can be ``hda2'', so the full name is ``ide/hda2'', that
		335	identifies the second partition on the master hard disk on the
		336	first IDE controller interface (see Section \ref{sec:dd:ide}).
		337
		338	%----------------------------------------------------------------------------
		339	\subsection{Implementation details}
		340	\label{sec:dd:interfaccia:det}
		341	%----------------------------------------------------------------------------
		342
		343	\begin{table}
		344	\begin{verbatim}
		345	int bdev_register(__dev_t major, char name, struct block_device );
		346	int bdev_register_minor(__dev_t device, char *name, __uint8_t fsind);
		347	struct phdskinfo phdsk_register( struct phdskinfo disk);
		348	\end{verbatim}
		349	\caption{Block device registration functions}
		350	\label{tab:Block_register}
		351	\end{table}
		352
		353	These are the registration functions for the device drivers (see
		354	Table \ref{tab:Block_register}):
		355
		356	\begin{description}
		357	\item [bdev\_register]This function registers a device type (see Section
		358	\ref{sec:dd:interfaccia:nomi}); the function receives as parameter the major
		359	number, its symbolic name, and a pointer to a structure block\_device. The
		360	function returns 0 in case of success, != 0 otherwise.
		361	\item [bdev\_register\_minor]It registers a minor number, its symbolic name and
		362	the file system type (see include/fs/fsind.h). This information is used by
		363	bdev\_finddevice\_byname (see Section \ref{sec:dd:intercaccia:ext}). The
		364	function returns 0 in case of success, != 0 otherwise.
		365	\item [phdsk\_register]This function is used to register every physical device
		366	found with parameters that can be used by the Disk Scheduling Algorithms (see
		367	Section \ref{sec:dd:sched}). This function receives a pointer to a structure
		368	phdiskinfo that contains the device parameters, and returns NULL in case of
		369	error, or a pointer to the registered structure (that can be != from the
		370	parameter) in case of success.
		371	\end{description}
		372
		373	\begin{table}
		374	\begin{verbatim}
		375	struct dskgeometry {
		376	__uint16_t cyls;
		377	__uint16_t heads;
		378	__uint16_t sectors;
		379	};
		380
		381	struct phdskinfo \{
		382	char pd_name[MAXPHDSKNAME];
		383	__dev_t pd_device;
		384	__blkcnt_t pd_size;
		385	int pd_sectsize;
		386	struct dskgeometry pd_phgeom;
		387	struct dskgeometry pd_logeom;
		388	};
		389	\end{verbatim}
		390	\caption{The phdsk structure.}
		391	\label{tab:structphdsk}
		392	\end{table}
		393
		394	The phdskinfo structure contains the physical parameters of a
		395	device. all the field must be filled before calling
		396	phdsk\_register. The fields are:
		397
		398	\begin{description}
		399	\item [pd\_name]device name, used only for debug purposes;
		400	\item [pd\_device]device identificator;
		401	\item [pd\_size]number of logical sectors of the device;
		402	\item [pd\_sectsize]Dimension (bytes) of a logical sector;
		403	\item [pd\_phgeom~e~pd\_logeom]geometry of the device (logical and phisical);
		404	geometries are described using the struct dskgeometry (see Table
		405	\ref{tab:structphdsk}):
		406
		407	\begin{description}
		408	\item [cyls]number of cylinders (traces) of the device;
		409	\item [heads]number of heads;
		410	\item [sectors]number of sectors for each cylinder.
		411	\end{description}
		412	\end{description}
		413
		414	\begin{table}
		415	\begin{verbatim}
		416	struct block_device {
		417	char *bd_name;
		418	int bd_sectorsize;
		419	__uint32_t bd_flag_used:1;
		420	struct block_device_operations *bd_op;
		421	};
		422	\end{verbatim}
		423	\caption{block\_device Structure}
		424	\label{tab:structbdev}
		425	\end{table}
		426
		427	The struct block\_device described in Table \ref{tab:structbdev}
		428	is used to register the block devices:
		429
		430	\begin{description}
		431	\item [bd\_name]Physical device name; for example, for the IDE device driver,
		432	``ide''.
		433	\item [bd\_sectorsize]logical sector dimension (bytes; redundant info but
		434	useful).
		435	\item [bd\_flag\_used]It should not modified by the device drivers. If set, the
		436	corresponding device driver is present (a table of all the possible devices is
		437	used to access the system in a fast way).
		438	\item [bd\_op]Virtual device operations.
		439	\end{description}
		440
		441	%----------------------------------------------------------------------------
		442	\subsubsection{block\_device\_operation structure.}
		443	\label{sec:dd:interfaccia:det:bdevop}
		444	%----------------------------------------------------------------------------
		445
		446	\begin{table}
		447	\begin{verbatim}
		448	struct block_device_operations {
		449	int (*_trylock) (__dev_t device);
		450	int (*_tryunlock)(__dev_t device);
		451	int (read) (__dev_t device, __blkcnt_t blocknum, __uint8_t buffer);
		452	int (*seek) (__dev_t device, __blkcnt_t blocknum);
		453	int (write) (__dev_t device, __blkcnt_t blocknum, __uint8_t buffer);
		454	};
		455	\end{verbatim}
		456	\caption{block\_device\_operations structure.}
		457	\label{tab:structbdevop}
		458	\end{table}
		459	This structure contains the virtual operation that have to be
		460	supplied by every device driver:
		461
		462	\begin{description}
		463	\item [read]This function should read a logical pblock passed as parameter,
		464	writing the result on the buffer. It returns 0 in case of success, != 0
		465	otherwise.
		466	\item [write]Similar to read, but it writes.
		467	\item [seek]moves the head on the required block.
		468	\item [\_trylock~e~\_tryunlock]Used to block/unblock a device. They return 0 in
		469	case of success, != 0 otherwise; after initialization, and usually, every device
		470	is in the unblocked state.
		471	\end{description}
		472
		473	Why blocking/unblocking a device? The problem, described in
		474	Section \ref{sec:dcache}, is that the data cache have to refer to
		475	a logical block in an unique way. Since there are logical devices
		476	that shares blocks (think at /dev/hda and /dev/hda2 in Linux), the
		477	mount operation must be done in mutual exclusion, blocking the
		478	device.
		479
		480	%----------------------------------------------------------------------------
		481	\subsubsection{The ``lodsk'' module}
		482	%----------------------------------------------------------------------------
		483
		484	\begin{table}
		485	\begin{verbatim}
		486	struct lodskinfo {
		487	__uint8_t fs_ind;
		488	__blkcnt_t start;
		489	__blkcnt_t size;
		490	};
		491	typedef int (lodsk_callback_func)(int, struct lodskinfo, void *);
		492	int lodsk_scan(__dev_t device, lodsk_callback_func func, void *data, int
		493	showinfo, char *lname);
		494	\end{verbatim}
		495	\caption{The lodsk module}
		496	\label{tab:lodsk}
		497	\end{table}
		498
		499	This module (see Table \ref{tab:lodsk}) offer a service to the
		500	device drivers: It recognizes the logical structure (partitions)
		501	of an HD as described into \cite{manual:DOS330} (part of the code
		502	is derived from Linux).
		503
		504	before calling lodsk\_scan, the device driver must be already
		505	registered; the function has five parameters:
		506
		507	\begin{enumerate}
		508	\item The device ID where the search have to be done.
		509	\item A callback function called for each partition found.
		510	\item A generic pointer passed to the callback function.
		511	\item A verbose flag.
		512	\item The name of the logical device (i.e., ``hdb'', used if the verbose option
		513	has been set.
		514	\end{enumerate}
		515
		516	Each callback function is called passing as arguments:
		517
		518	\begin{enumerate}
		519	\item The logical number of the partition (the Linux numbering scheme is used).
		520	\item A structure with informations on the partition found. The structure has
		521	the following fields:
		522
		523	\begin{description}
		524	\item [fs\_ind]The file system number (see include/fs/fsind.h).
		525	\item [start]The partition starting block.
		526	\item [size]The partition size (number of blocks).
		527	\end{description}
		528	\item A the generic pointer passed to lodsk\_scan.
		529	\end{enumerate}
		530
		531	The low\_level routines always check these values to avoid
		532	read/write operations outside the partition.
		533
		534	%----------------------------------------------------------------------------
		535	\subsubsection{Semaphores}
		536	%----------------------------------------------------------------------------
		537
		538	The files include/fs/mutex.h and include/fs/semaph.h contain the
		539	mutex/semaphore interface used by the file system.
		540
		541	\begin{table}
		542	\begin{verbatim}
		543	typedef internal_sem_t __sem_t;
		544
		545	#define __sem_init (ptr,val)
		546	#define __sem_wait (ptr)
		547	#define __sem_trywait (ptr)
		548	#define __sem_signal (ptr)
		549	\end{verbatim}
		550	\caption{The semaphore interface}
		551	\label{tab:semaph}
		552	\end{table}
		553
		554	In particular, the semaphore interface (see Table
		555	\ref{tab:semaph}) is simply a wrapper to the S.Ha.R.K. Internal
		556	Semaphores.
		557
		558	\begin{table}
		559	\begin{verbatim}
		560	typedef internal_sem_t __mutex_t;
		561
		562	#define __mutex_init (ptr,attr)
		563	#define __mutex_lock (ptr)
		564	#define __mutex_trylock (ptr)
		565	#define __mutex_unlock (ptr)
		566
		567	typedef SYS_FLAGS __fastmutex_t;
		568
		569	#define __fastmutex_init (ptr)
		570	#define __fastmutex_lock (ptr)
		571	#define __fastmutex_unlock (ptr)
		572	\end{verbatim}
		573	\caption{The mutex interface}
		574	\label{tab:mutex}
		575	\end{table}
		576
		577	The mutex interface requires two types of mutexes:
		578
		579	\begin{enumerate}
		580	\item Fast mutexes: when only a few lines have to be protected (remapped to
		581	kern\_fsave/kern\_frestore).
		582	\item Normal mutexes: without any restriction (remapped again to the internal
		583	semaphores).
		584	\end{enumerate}
		585
		586	Please note that these functions are not directly used in the
		587	code; instead, functions like \_\_b\_mutex\_lock and
		588	\_\_fs\_mutex\_lock are used. This is done to make possible the
		589	disabling of the mutual exclusion requirement when not needed
		590	(i.e., if the file system runs as a process with its own address
		591	space).
		592
		593	%----------------------------------------------------------------------------
		594	\subsection{Exported interface}
		595	\label{sec:dd:intercaccia:ext}
		596	%----------------------------------------------------------------------------
		597
		598	\begin{table}
		599	\begin{verbatim}
		600	int bdev_read (__dev_t dev, __blkcnt_t blocknum, __uint8_t *buffer);
		601	int bdev_write (__dev_t dev, __blkcnt_t blocknum, __uint8_t *buffer);
		602	int bdev_seek (__dev_t dev, __blkcnt_t blocknum);
		603	int bdev_trylock(__dev_t device); int bdev_tryunlock(__dev_t device);
		604	__uint8_t bdev_getdeffs(__dev_t device);
		605	__dev_t bdev_find_byname(char *name);
		606	__dev_t bdev_find_byfs(__uint8_t fsind);
		607	int bdev_findname(__dev_t device, char majorname, char minorname);
		608	int bdev_scan_devices(int(*callback) (__dev_t,__uint8_t));
		609	\end{verbatim}
		610	\caption{Low level functions}
		611	\label{tab:bdev}
		612	\end{table}
		613
		614	The low level exports two kinds of functions: one to use the
		615	devices, one to search/get the logical devices present on a
		616	system.
		617
		618	The three main functions are:
		619
		620	\begin{description}
		621	\item [bdev\_find\_byname]This function accepts a symbolic device name, as
		622	described in Section \ref{sec:dd:interfaccia:nomi}, and returns 0 if the device
		623	is not registered, or its number otherwise.
		624	\item [bdev\_find\_byfs]This function accepts a file system ID (see
		625	include/fs/fsind.h), and returns the first device that contains that file
		626	system, or 0 if no one provides it.
		627	\item [bdev\_scan\_device]This function can be used to get a list of all the
		628	devices in the system; the user have to pass a callback that is called for each
		629	device in the system. The function returns 0 in case of success, -1 if an error
		630	occurred, or a value != 0 if the callback returns a value != 0. The callback is
		631	called with a device serial number and a file system type that should be present
		632	in the device.
		633	\end{description}
		634
		635	These function are useful when at system startup the root file
		636	system have to be mounted (see Section \ref{sec:fs:mount}).
		637
		638	The functions bdev\_getdeffs and bdev\textbackslash{}\_findname
		639	are used for these purposes:
		640
		641	\begin{description}
		642	\item [bdev\_getdeffs]to know a probable file system present on a logical
		643	device.
		644	\item [bdev\_findname]to know the logical name of a device. The function returns
		645	!= 0 in case of error, and 0 in case of success. In the latter case, the two
		646	buffers passed as parameter are filled with the major and minor names.
		647	\end{description}
		648
		649	These functions simply check the parameter values and then they
		650	calls the device drivers functions (see Section
		651	\ref{sec:dd:interfaccia:det:bdevop}):
		652
		653	\begin{description}
		654	\item [bdev\_read]Reads a block on the device.
		655	\item [bdev\_write]Write a block on the device.
		656	\item [bdev\_seek]Move the head in the position passed as parameter.
		657	\item [bdev\_trylock~e~bdev\_tryunlock]Tries to block/unblock a device.
		658	\end{description}
		659
		660	These functions can block the calling thread for an unspecified
		661	amount of time. All the device drivers were thought to be used in
		662	a multithreaded systems.
		663
		664	If a device driver is not multithread, proper mutual exclusion
		665	must be ensured, because the file system high level functions will
		666	call a low level service before the previous request has been
		667	served. Note that if such a mutual exclusion is used, the access
		668	to the device driver will be forced by the mutex policy, and so
		669	all the disk scheduling algorithms will not work properly.
		670
		671	%----------------------------------------------------------------------------
		672	\section{I/O requests scheduling}
		673	\label{sec:dd:sched}
		674	%----------------------------------------------------------------------------
		675
		676	Massy's thesis here inserts a dissertation on
		677	the various disk scheduling algorithms that has been removed.
		678
		679	%----------------------------------------------------------------------------
		680	\section{Disk scheduling algorithms implemented in S.Ha.R.K.}
		681	\label{sec:dd:schedpres}
		682	%----------------------------------------------------------------------------
		683
		684	The system currently support 5 different scheduling algorithms
		685	that can be changed/increased simply modifying system
		686	configuration.
		687
		688	\begin{table}
		689	\begin{verbatim}
		690	int bqueue_init (bqueue_t *);
		691	int bqueue_numelements (bqueue_t *);
		692	int bqueue_removerequest (bqueue_t *);
		693	int bqueue_insertrequest (bqueue_t , struct request_prologue );
		694	struct request_prologue bqueue_getrequest(bqueue_t );
		695	\end{verbatim}
		696	\caption{Queue handling functions}
		697	\label{tab:bqueuefunc}
		698	\end{table}
		699
		700	Only one algorithm can be active at a time. All the device drivers
		701	should use the queuing interface listed in the bqueue.h file (see
		702	Table \ref{tab:bqueuefunc}). The structure bqueue\_t is specific
		703	for each scheduling algorithm. The semantic of each function in
		704	the interface is the following:
		705
		706	\begin{description}
		707	\item [bqueue\_init]Initialize the pending request queue; it returns 0 in case
		708	of success, != 0 otherwise.
		709	\item [bqueue\_numelements]Returns the number of elements in the queue.
		710	\item [bqueue\_insertrequest]This function inserts a request in the queue,
		711	following the specific algorithm behavior. The function returns 0 in case of
		712	success, != 0 otherwise. Please note that requests are passed through a pointer
		713	to the structrequest\_prologue\_t (see tab:structreqprolog). That is, every
		714	algorithm should use a structure with a struct request\_prologue as first field,
		715	passing a casted pointer to that structure.
		716	\item [bqueue\_getrequest]Returns the first request in queue (without removing
		717	it), and returns a pointer to that request, or NULL if there are not any pending
		718	requests.
		719	\item [bqueue\_removerequest]Removes the first request in queue, that is the
		720	request that would have been returned by bqueue\_getrequest; it returns 0 in
		721	case of success, != 0 in case of error.
		722	\end{description}
		723
		724	\begin{table}
		725	\begin{verbatim}
		726	#define REQ_DUMMY 0
		727	#define REQ_SEEK 1
		728	#define REQ_READ 2
		729	#define REQ_WRITE 3
		730
		731	struct request_prologue {
		732	unsigned sector;
		733	unsigned head;
		734	unsigned cylinder;
		735	unsigned nsectors;
		736	unsigned operation;
		737	struct request_specific x;
		738	};
		739
		740	#define request_prologue_t struct request_prologue
		741	\end{verbatim}
		742	\caption{struct request\_prologue\_t}
		743	\label{tab:structreqprolog}
		744	\end{table}
		745
		746	before calling bdev\_insertrequest, all the fields of the struct
		747	request\_prologue\_t must be filled (see Table
		748	\ref{tab:structreqprolog}):
		749
		750	\begin{description}
		751	\item [sector,~head,~cylinder]This is the starting position in the hard disk.
		752	\item [nsectors]This is the number of sectors to transfer.
		753	\item [operation]The operation that must be performed:
		754
		755	\begin{description}
		756	\item [REQ\_DUMMY]request not specified.
		757	\item [REQ\_SEEK]head seek to the specified position.
		758	\item [REQ\_READ]read request.
		759	\item [REQ\_WRITE]write request.
		760	\end{description}
		761	\end{description}
		762
		763	To add algorithm specific fields, you can use the method used for
		764	the deadlines into Section \ref{sec:dd:sched:edf}.
		765
		766	struct request\_specific (see Table \ref{tab:structreqprolog})
		767	must be defined by each algorithm, and contains algorithm-specific
		768	fields.
		769
		770	struct bqueue\_t contains a scheduling queue. Its first field must
		771	be a pointer to a struct phdiskinfo (see Section
		772	\ref{sec:dd:interfaccia:det}, and Table \ref{tab:structphdsk}).
		773
		774	Here is a description about the implementation of various disk
		775	scheduling algorithms.
		776
		777	%----------------------------------------------------------------------------
		778	\subsection{FCFS}
		779	\label{sec:dd:sched:fcfs}
		780	%----------------------------------------------------------------------------
		781
		782	\begin{table}
		783	\begin{verbatim}
		784	typedef struct TAGfcfs_queue_t {
		785	struct phdskinfo *disk;
		786	__b_fastmutex_t mutex;
		787	struct request_prologue *head;
		788	struct request_prologue *tail;
		789	int counter;
		790	} fcfs_queue_t;
		791
		792	struct request_specific {
		793	void *next;
		794	};
		795	\end{verbatim}
		796	\caption{struct fcfs\_queue\_t}
		797	\label{tab:structfcfs}
		798	\end{table}
		799
		800	The FCFS policy is implemented using the structure in table
		801	\ref{tab:structfcfs}:
		802
		803	\begin{description}
		804	\item [mutex]A mutual exclusion semaphore.
		805	\item [head~e~tail]a pointer to head and tail of a request queue.
		806	\item [counter]number of pending requests.
		807	\end{description}
		808
		809	Then, the struct request\_specific contains a pointer that is used
		810	to link the list.
		811
		812	%----------------------------------------------------------------------------
		813	\subsection{SSTF}
		814	\label{sec:dd:sched:sstf}
		815	%----------------------------------------------------------------------------
		816
		817	\begin{table}
		818	\begin{verbatim}
		819	typedef struct TAGsstf_queue_t {
		820	struct phdskinfo *disk;
		821	__b_fastmutex_t mutex;
		822	struct request_prologue *actual;
		823	struct request_prologue *lqueue;
		824	struct request_prologue *hqueue;
		825	int counter;
		826	} sstf_queue_t;
		827
		828	struct request_specific {
		829	void *next;
		830	};
		831	\end{verbatim}
		832	\caption{struct sstf\_queue\_t}
		833	\label{tab:structsstf}
		834	\end{table}
		835
		836	This scheduling policy is implemented using the data structures in
		837	Figure \ref{tab:structsstf}:
		838
		839	\begin{description}
		840	\item [mutex]for mutual exclusion.
		841	\item [actual]the current request (that is returned by bqueue\_getrequest()).
		842	\item [lqueue~e~hqueue]two lists ordered in decreasing and increasing order:
		843	when a new request arrives, the request is inserted on one of the two queues
		844	depending on the value of the current request cylinder. The current request will
		845	become the nearest to the current cylinder. Note: This algorithm can cause
		846	starvation.
		847	\item [counter]The request counter.
		848	\end{description}
		849
		850	Then, the struct request\_specific contains a pointer that is used
		851	to link the list.
		852
		853	%----------------------------------------------------------------------------
		854	\subsection{LOOK}
		855	\label{sec:dd:sched:look}
		856	%----------------------------------------------------------------------------
		857
		858	\begin{table}
		859	\begin{verbatim}
		860	typedef struct TAGlook_queue_t {
		861	struct phdskinfo *disk;
		862	__b_fastmutex_t mutex;
		863	int dir;
		864	struct request_prologue *queue[2];
		865	int counter;
		866	} look_queue_t;
		867
		868	struct request_specific {
		869	void *next;
		870	};
		871	\end{verbatim}
		872	\caption{struct look\_queue\_t}
		873	\label{tab:structlook}
		874	\end{table}
		875
		876	This scheduling policy is implemented using the data structures in
		877	Figure \ref{tab:structlook}:
		878
		879	\begin{description}
		880	\item [mutex]for mutual exclusion.
		881	\item [dir]Is the looking direction: 0 ascending, 1 descending.
		882	\item [queue]two ordered lists (one ascending, one descending). queue[dir] is
		883	the queue used. When the list becomes empty, dir=!dir. a new request is inserted
		884	in the current queue if its position is after/before the actual position
		885	(otherwise, it is inserted in the other list).
		886	\item [counter]the request counter.
		887	\end{description}
		888
		889	This algorithm can be implemented in two variants:
		890
		891	\begin{enumerate}
		892	\item Requests with position = current position are inserted in the current
		893	queue (starvation can happen).
		894	\item Requests with position = current position are inserted in the queue not
		895	currently used. This is the default choice; to modify it, you have to modify the
		896	look.c file.
		897	\end{enumerate}
		898
		899	%----------------------------------------------------------------------------
		900	\subsection{CLOOK}
		901	\label{sec:dd:sched:clook}
		902	%----------------------------------------------------------------------------
		903
		904	\begin{table}
		905	\begin{verbatim}
		906	typedef struct TAGclook_queue_t {
		907	struct phdskinfo *disk;
		908	__b_fastmutex_t mutex;
		909	int act;
		910	struct request_prologue *queue[2];
		911	int counter;
		912	} clook_queue_t;
		913
		914	struct request_specific {
		915	void *next;
		916	};
		917	\end{verbatim}
		918	\caption{struct clook\_queue\_t}
		919	\label{tab:structclook}
		920	\end{table}
		921
		922	This scheduling policy is implemented using the data structures in
		923	Figure \ref{tab:structclook}:
		924
		925	\begin{description}
		926	\item [mutex]for mutual exclusion.
		927	\item [dir]This is the current queue.
		928	\item [queue]Two ascending lists. queue[dir] is used until it is emptied. Then,
		929	dir is changed. A new request is inserted in the current queue if the current
		930	position is greater than the current one.
		931	\item [counter]the request counter.
		932	\end{description}
		933
		934	This implementation can introduce starvation because insertions at
		935	the current position are done on the same queue.
		936
		937	%----------------------------------------------------------------------------
		938	\subsection{EDF}
		939	\label{sec:dd:sched:edf}
		940	%----------------------------------------------------------------------------
		941
		942	\begin{table}
		943	\begin{verbatim}
		944	typedef struct TAGbd_edf_queue_t {
		945	struct phdskinfo *disk;
		946	__b_fastmutex_t mutex;
		947	struct request_prologue *queue;
		948	int inservice;
		949	int counter;
		950	} bd_edf_queue_t;
		951
		952	struct request_specific {
		953	void *next;
		954	long dl;
		955	};
		956	\end{verbatim}
		957	\caption{struct look\_queue\_t}
		958	\label{tab:structedf}
		959	\end{table}
		960
		961	This is the only real-time disk scheduling algorithm implemented
		962	in S.Ha.R.K. This scheduling policy is implemented using the data
		963	structures in Figure \ref{tab:structedf}:
		964
		965	\begin{description}
		966	\item [mutex]For mutual exclusion.
		967	\item [queue]An ordered list, increasing with the deadline.
		968	\item [inservice]This flag is set when a request is in service.
		969	\item [counter]The request counter.
		970	\end{description}
		971
		972	\begin{table}
		973	\begin{verbatim}
		974	typedef struct {
		975	RES_MODEL r;
		976	TIME dl;
		977	} BDEDF_RES_MODEL;
		978
		979	#define BDEDF_res_default_model(res)
		980	#define BDEDF_res_def_level(res,l)
		981	#define BDEDF_res_def_dl(res,dl)
		982	\end{verbatim}
		983	\caption{Resource Module for deadlines}
		984	\label{tab:edfresmodel}
		985	\end{table}
		986
		987	The request's deadline is got during task creation using the
		988	S.Ha.R.K. resource module shown in Figure \ref{tab:edfresmodel}.
		989
		990	If the resource module is not used, the deadline is set to
		991	infinity. That is, in that case the scheduling algorithm become a
		992	FCFS.
		993
		994	%----------------------------------------------------------------------------
		995	\section{Device driver: the IDE interface}
		996	\label{sec:dd:ide}
		997	%----------------------------------------------------------------------------
		998
		999	%----------------------------------------------------------------------------
		1000	\subsection{Description}
		1001	\label{sec:dd:ide:desc}
		1002	%----------------------------------------------------------------------------
		1003
		1004	A description of the IDE standard can be found in\cite{manual:ideintel} and
		1005	\cite{manual:idevia}. This interface allows the connection of up to 2 IDE
		1006	peripherals using the ATA or ATAPI protocol (as described into
		1007	\cite{manual:ata4}). This driver implements the ATA-4 protocol \footnote{The
		1008	original Massy's thesis have a small description of the standard, that I have
		1009	removed.}.
		1010
		1011	%----------------------------------------------------------------------------
		1012	\subsection{Implementation}
		1013	\label{sec:dd:ide:code}
		1014	%----------------------------------------------------------------------------
		1015
		1016	\begin{figure}
		1017	\begin{center}
		1018	\includegraphics[width=7cm]{images/ide.eps}
		1019	\end{center}
		1020	\caption{The IDE subsystem}
		1021	\label{fig:ide}
		1022	\end{figure}
		1023
		1024
		1025	The architecture of the IDE subsystem is shown in Figure
		1026	\ref{fig:ide}: there is a part of the interface responsible for
		1027	the system initialization (for example, for finding the IDE
		1028	interfaces in the system, for finding the devices and their
		1029	logical structure, for transforming the low level requests in
		1030	ATA-4 commands), for command handling and another part for
		1031	interfacing with the disk scheduling queue handling. Finally,
		1032	there is a small part OS-dependent that should be rewritten to
		1033	port it to other OSes.
		1034
		1035	If possible, the driver uses the DMA; in any case, polling can be
		1036	forced. DMA is only supported in bus-master mode
		1037	\cite{manual:ideintel}.
		1038
		1039	The system actually use only some commands of the classes ``PIO
		1040	data in'', ``PIO data out'', ``Non-data'' e ``DMA''; it uses
		1041	the ``read'', ``write'', ``seek'', ``dma read'', ``dma
		1042	write'', ``identify'', ``pidentify'' and ``set feature''
		1043	commands; packet commands are not currently supported, so ATAPI
		1044	devices (like cd-roms) are currently not supported.
		1045
		1046	%----------------------------------------------------------------------------
		1047	\subsubsection{Initialization}
		1048	%----------------------------------------------------------------------------
		1049
		1050	The system is initialized calling the ide\_init() function into
		1051	bdev\_init(). This is automatically done if the IDE support is
		1052	compiled. The user should only initialize the struct bdev\_parms,
		1053	as specified in Section \ref{sec:dd:interfaccia}.
		1054
		1055	\begin{table}
		1056	\begin{verbatim}
		1057	typedef struct ide_parms {
		1058	void *parm_initserver;
		1059	} IDE_PARMS;
		1060
		1061	struct ide_server_model {
		1062	TIME dl;
		1063	TIME wcet;
		1064	TIME mit;
		1065	};
		1066	\end{verbatim}
		1067	\caption{Data structures used to initialize the IDE driver.}
		1068	\label{tab:ideinit}
		1069	\end{table}
		1070
		1071	The IDE\_PARMS structure has only one parameter, and is included
		1072	into the ideparms field in the global init data structure:
		1073
		1074	\begin{description}
		1075	\item [parm\_initserver]This parameter is OS-specific and is
		1076	related to the glue code.
		1077	\end{description}
		1078
		1079	\begin{table}
		1080	\begin{verbatim}
		1081	void ide_glue_send_request (int ideif);
		1082	int ide_glue_activate_interface (int ideif);
		1083	void ide_glue_unactivate_interface (int ideif);
		1084	\end{verbatim}
		1085	\caption{The glue code for the IDE driver.}
		1086	\label{tab:ideglue}
		1087	\end{table}
		1088
		1089	The system dependent part of the IDE driver is contained into the
		1090	file ideglue.c (see Figure \ref{tab:ideglue}):
		1091
		1092	\begin{description}
		1093	\item [ide\_glue\_activate\_interface]It enables the interface passed as
		1094	parameter: It creates an aperiodic task, and assigns to it the interface
		1095	interrupt. The task is activated when the interrupt arrives.
		1096	\item [ide\_glue\_activate\_interface]It disables the interface. The aperiodic
		1097	task is killed.
		1098	\item [ide\_glue\_send\_request]It executes the first available request: it
		1099	explicitly activates the task interface; when the task is activated because of
		1100	an interrupt, the pending request is terminated, and another request (if
		1101	present) is activated.
		1102	\end{description}
		1103
		1104	the parm\_initserver field contains a pointer to the struct
		1105	ide\_server\_model, used to create the aperiodic tasks (see Figure
		1106	\ref{tab:ideinit}):
		1107
		1108	\begin{description}
		1109	\item [dl]task deadline in $\mu$sec.
		1110	\item [wcet]task worst case execution time in $\mu$sec.
		1111	\item [mit]the mean inter arrival time.
		1112	\end{description}
		1113
		1114	If these parameters are not specified, standard parameters are
		1115	used.
		1116
		1117	These defines are used to conditionally compile the driver (see
		1118	the source code):
		1119
		1120	\begin{description}
		1121	\item [FORCE\_SCANNING]this define force the search of the IDE peripherals also
		1122	if the bus reset fails. This is useful if some peripherals does not follow the
		1123	standard timings.
		1124	\item [IDE\_OLDATAPIDELAY]this define forces a little delay that can be useful
		1125	when using old ATAPI interfaces.
		1126	\item [IDE\_FORCERESET]if defined a soft reset operation cannot be aborted
		1127	because of a timeout. It must be specified if the peripheral does not follows
		1128	the soft reset timings or if it cannot go in stand-by mode.
		1129	\item [IDE\_SKIPSTATUSTEST]with this symbol we do not check if the command is
		1130	being executed correctly (that is done reading on a status register); all the
		1131	TX-based motherboard we found needs this symbol.
		1132	\item [IDE\_SKIPALLTEST]when set, all the command are written into the registers
		1133	without looking on the status register. This symbol implies the previous. This
		1134	mode was needed on a TX motherboard where the polled mode for commands was
		1135	unreliable.
		1136	\end{description}
		1137
		1138	%----------------------------------------------------------------------------
		1139	\subsubsection{Policy queues Interface}
		1140	%----------------------------------------------------------------------------
		1141
		1142	\begin{table}
		1143	\begin{verbatim}
		1144	typedef struct TAGidereq_t {
		1145	request_prologue_t info;
		1146	int next;
		1147	__uint8_t resetonerror;
		1148	__uint8_t cmd;
		1149	__b_sem_t wait;
		1150	int result;
		1151	__uint8_t features;
		1152	__uint8_t cyllow;
		1153	__uint8_t cylhig;
		1154	__uint8_t seccou;
		1155	__uint8_t secnum;
		1156	__uint8_t devhead;
		1157	__uint8_t *buffer;
		1158	} idereq_t;
		1159	\end{verbatim}
		1160	\caption{struct idereq\_t}
		1161	\label{tab:idereq}
		1162	\end{table}
		1163
		1164	Table \ref{tab:idereq} shows struct idereq\_t that is used by the
		1165	driver to store the pending requests. It can be noted that the
		1166	info field is of type request\_prologue\_t, as required by the
		1167	modules that handle the scheduling policies (see
		1168	\ref{sec:dd:schedpres}).
		1169
		1170	Note that (since the IDE interface allows only a command at a
		1171	time) there are two queues for each IDE interface (one for the
		1172	master device, one for the slave). These request queues are
		1173	prioritized. That is, the master queue always have priority on the
		1174	slave queue.
		1175
		1176	Here a small description of the fields showed in Table
		1177	\ref{tab:idereq}:
		1178
		1179	\begin{description}
		1180	\item [info]To handle the requests using a module for disk scheduling.
		1181	\item [next]used internally to handle a request pool.
		1182	\item [resetonerror]If set, an error during a command implies a soft reset.
		1183	\item [cmd]The command for the peripheral (see \cite{manual:ata4}).
		1184	\item [wait]Synchronization semaphore: when a task sends a command, it waits the
		1185	end of the execution of the command.
		1186	\item [result]This is the result: 0 in case of success, != 0 in case of error
		1187	(see idelow.c for the error codes).
		1188	\item [features,~cyllow,~cylhigh,~seccou,~secnum~e~devhead]Values to be inserted
		1189	into the I/O registers of the IDE interface. The behavior is different depending
		1190	on the peripheral mode (LBA or C/H/S) (see Section \ref{sec:dd:ide:desc}).
		1191	\item [buffer]read/write buffer (if needed).
		1192	\end{description}
		1193
		1194	%----------------------------------------------------------------------------
		1195	\subsubsection{Minor numbers}
		1196	%----------------------------------------------------------------------------
		1197
		1198	\begin{figure}
		1199	\begin{center}
		1200	\includegraphics[width=10cm]{images/ideminor.eps}
		1201	\end{center}
		1202	\caption{IDE minor numbers}
		1203	\label{fig:ideminor}
		1204	\end{figure}
		1205
		1206	This section describes how minor numbers are handled.
		1207
		1208	As described in Section \ref{sec:dd:intercaccia:ext}, the generic
		1209	low level part uses the major number to pass a request to a
		1210	device. The specific device is then selected using the minor
		1211	number.
		1212
		1213	Figure \ref{fig:ideminor} shows the minor number structure of this
		1214	device:
		1215
		1216	\begin{itemize}
		1217	\item The first 3 bits identifies the interface (8 interfaces maximum).
		1218	\item A bit for the peripheral (master/slave). This is used to enqueue request
		1219	in the right queue.
		1220	\item The last 4 bits says which partition should be used. This info is used to
		1221	convert relative sector numbers into absolute sector numbers.
		1222	\end{itemize}
		1223
		1224	Minor numbers are accessed only via macros. To modify the minor
		1225	number mapping you should only modify these macros.
		1226
		1227	%----------------------------------------------------------------------------
		1228	\chapter{File system}
		1229	\label{sec:fs}
		1230	%----------------------------------------------------------------------------
		1231
		1232	With the word ``file system'' we mean the internal organization
		1233	of a block device; the purpose of such an organization is to store
		1234	a set of files, usually divided in directories.
		1235
		1236	A block device is seen by the file system as an ordered sequence
		1237	of fixed sized blocks. The basic operations of a file system are
		1238	read/write operations on the n-th block.
		1239
		1240	%----------------------------------------------------------------------------
		1241	\section{Interface}
		1242	\label{sec:fs:int}
		1243	%----------------------------------------------------------------------------
		1244
		1245	This section describes the interface between the system and the
		1246	user tasks. It also describes the interface between a specific
		1247	instance of file system and the system (just to make easy the
		1248	expansion of the system with new real-time file systems).
		1249
		1250	%----------------------------------------------------------------------------
		1251	\subsection{Initialization}
		1252	\label{sec:fs:int:init}
		1253	%----------------------------------------------------------------------------
		1254
		1255	To initialize the higher level of the system we use an interface
		1256	similar to the device drivers interface (see Section
		1257	\ref{sec:dd:interfaccia:init}.
		1258
		1259	\begin{table}
		1260	\begin{verbatim}
		1261	typedef struct filesystem_parms {
		1262	__dev_t device;
		1263	__uint8_t fs_ind;
		1264	__uint32_t fs_showinfo:1;
		1265	void *fs_mutexattr;
		1266	struct mount_opts *fs_opts;
		1267	} FILESYSTEM_PARMS;
		1268
		1269	#define filesystem_def_rootdevice(par,rootdev)
		1270	#define filesystem_def_fs(par,fs)
		1271	#define filesystem_def_showinfo(par,show)
		1272	#define filesystem_def_options(par,opts)
		1273
		1274	int filesystem_init(FILESYSTEM_PARMS *);
		1275	\end{verbatim}
		1276	\caption{struct filesystem\_parms}
		1277	\label{tab:fsparms}
		1278	\end{table}
		1279
		1280	The function filesystem\_init() is called to initialize the higher
		1281	layer passing the parameters showed in Figure \ref{tab:fsparms}.
		1282	These parameters must be initialized using the following macros:
		1283
		1284	\begin{description}
		1285	\item [filesystem\_def\_rootdevice]Set the partition that must be used as
		1286	\emph{root device}, that is, the directory associated to ``/''. This field is
		1287	mandatory, and that implies that the higher layer must be initialized
		1288	\emph{after} the device drivers.
		1289	\item [filesystem\_def\_fs]Selects the filesystem that must be used with the
		1290	device. It must be one of the symbols defined into include/fs/fsind.h; if the
		1291	special symbol FS\_DEFAULT is used, the system tries to autodetect the
		1292	filesystem type.
		1293	\item [filesystem\_def\_showinfo]If this flag is set, then initialization
		1294	messages are printed on the screen.
		1295	\item [filesystem\_def\_options]There are the mount options.
		1296	\end{description}
		1297
		1298	The field fs\_mutexattr has the same meaning of the correspondent
		1299	field into bdev\_parms (see Section
		1300	\ref{sec:dd:interfaccia:init}).
		1301
		1302	You can use the macro BASE\_FILESYSTEM to initialize a struct
		1303	filesystem\_parms with some default values. NULL is not allowed as
		1304	a parameter of filesystem\_init.
		1305
		1306	\begin{table}
		1307	\begin{verbatim}
		1308	struct mount_opts {
		1309	__uint32_t flags;
		1310	union {
		1311	struct msdosfs_parms msdos;
		1312	} x;
		1313	};
		1314
		1315	/* Flags */
		1316	#define MOUNT_FLAG_RW 0x01
		1317	\end{verbatim}
		1318	\caption{struct mount\_opts}
		1319	\label{tab:mountopt}
		1320	\end{table}
		1321
		1322	\label{sec:mountopt}The mount options for every device must be
		1323	inserted into a struct mount\_opts showed in Table
		1324	\ref{tab:mountopt}. The structure is composed by two parts: a
		1325	general part and a filesystem-specific part. The general part
		1326	actually contains the following fields:
		1327
		1328	\begin{description}
		1329	\item [flags]This field contains the mount fields. The only flag
		1330	currently defined is MOUNT\_FLAG\_RW, that allows write operations
		1331	on the file system (default is read-only).
		1332	\end{description}
		1333
		1334	\begin{figure}
		1335	\begin{center}
		1336	\includegraphics[width=7cm]{images/fsinit.eps}
		1337	\end{center}
		1338	\caption{Filesystem init}
		1339	\label{fig:fsinit}
		1340	\end{figure}
		1341
		1342	Filesystem initialization proceeds as shown in Figure \ref{fig:fsinit}:
		1343
		1344	\begin{enumerate}
		1345	\item The user calls filesystem\_init();
		1346	\item filesystem\_init() initializes itself and its modules, then all the file
		1347	systems are initialized. filesystem\_init() have to be changed if a new
		1348	filesystem is introduced.
		1349	\item The file systems initialization functions have to register themselves
		1350	calling the function filesystem\_register() with a struct file\_system\_type.
		1351	\end{enumerate}
		1352
		1353	%----------------------------------------------------------------------------
		1354	\subsubsection{Struct file\_system\_type}
		1355	\label{sec:filesystemtype}
		1356	%----------------------------------------------------------------------------
		1357
		1358	\begin{table}
		1359	\begin{verbatim}
		1360	struct file_system_type {
		1361	const char *name;
		1362	int fs_flags;
		1363	__uint8_t fs_ind;
		1364	struct super_block (read_super) (
		1365	__dev_t device,
		1366	__uint8_t fs_ind,
		1367	struct mount_opts *options
		1368	);
		1369	};
		1370	\end{verbatim}
		1371	\caption{struct file\_system\_type}
		1372	\label{src:structfs}
		1373	\end{table}
		1374
		1375	This structure (see Table \ref{src:structfs})stores the
		1376	informations needed for the filesystem registration. Here a short
		1377	description of its fields:
		1378
		1379	\begin{description}
		1380	\item [name]Pointer to the filesystem name (e.g., ``FAT16'').
		1381	\item [fs\_flags]Filesystem flags (currently not used).
		1382	\item [fs\_ind]Filesystem ID (every filesystem has a different ID).
		1383	\item [read\_super]This the filesystem initialization function. It is called
		1384	when a partition is mounted to initialize the correspondent struct super\_block.
		1385	\end{description}
		1386
		1387	%----------------------------------------------------------------------------
		1388	\subsection{Internal interface}
		1389	%----------------------------------------------------------------------------
		1390
		1391	This section describes the interface between the high layer (that
		1392	handles the file systems), and the modules that implements a
		1393	specific filesystem.
		1394
		1395	When a user mount a device into a directory using a filesystem,
		1396	the system uses the file system's specific struct
		1397	file\_system\_type, calling the function read\_super() to read the
		1398	superblock of the specified device.
		1399
		1400	The struct super\_block contains a pointer to a structure that
		1401	contains a set of ``virtual operations'' (see Section
		1402	\ref{sec:fs:intstruct}), that implements the behavior of a
		1403	specific filesystem.
		1404
		1405	%----------------------------------------------------------------------------
		1406	\subsubsection{struct super\_operations}
		1407	\label{sec:superop}
		1408	%----------------------------------------------------------------------------
		1409
		1410	\begin{table}
		1411	\begin{verbatim}
		1412	struct super_operations {
		1413	int (init_inode) (struct inode );
		1414	int (read_inode) (struct inode );
		1415	int (write_inode) (struct inode );
		1416	int (put_super) (struct super_block );
		1417	int (delete_inode) (struct inode );
		1418	};
		1419	\end{verbatim}
		1420	\caption{struct super\_operations}
		1421	\label{src:structsuperop}
		1422	\end{table}
		1423
		1424	The structure in Table \ref{src:structsuperop} is used for super
		1425	block manipulation. All the functions should return 0 in case of
		1426	success, or a value != 0 otherwise.
		1427
		1428	\begin{description}
		1429	\item [init\_inode]This function initializes a struct inode passed as parameter.
		1430	That structure must have the fields i\_sp and i\_st.st\_ino already initialized
		1431	(see Section \ref{sec:inode}).
		1432	\item [read\_inode]This function reads the inode passed as parameter from the
		1433	filesystem. The inode structure must have the fields i\_sp and i\_st.st\_ino
		1434	already initialized.
		1435	\item [write\_inode]This function writes a (valid, full initialized) inode into
		1436	the filesystem.
		1437	\item [delete\_inode]This function removes the inode from the filesystem (the
		1438	inode must already exist on the filesystem).
		1439	\item [put\_super]This function writes the super block passed as parameter into
		1440	the filesystem. This function is called when a filesystem is unmounted. A dual
		1441	function (that is, a function that reads a super\_block from a filesystem)
		1442	(which is only used when mounting the filesystem) can be found into the struct
		1443	file\_system\_type (see Section
		1444	\ref{sec:filesystemtype}).
		1445	\end{description}
		1446
		1447	%----------------------------------------------------------------------------
		1448	\subsubsection{Struct dentry\_operations}
		1449	%----------------------------------------------------------------------------
		1450
		1451	\begin{table}
		1452	\begin{verbatim}
		1453	struct dentry_operations {
		1454	int (d_compare) (struct dentry , struct qstr , struct qstr );
		1455	};
		1456	\end{verbatim}
		1457	\caption{struct dentry\_operations}
		1458	\label{src:structdentryop}
		1459	\end{table}
		1460
		1461	\label{sec:dentryop} This structure contains all the functions
		1462	used to manipulate a struct dentry.
		1463
		1464	\begin{description}
		1465	\item [d\_compare]This function compares the two strings passed as
		1466	parameter. it returns 0 if the two strings are equal, or != 0
		1467	otherwise. The filesystem do not use directly the strcmp()
		1468	function, because filename comparison depends on the filesystem
		1469	(e.g., case sensitive, case insensitive).
		1470	\end{description}
		1471
		1472	%----------------------------------------------------------------------------
		1473	\subsubsection{struct inode\_operations}
		1474	\label{sec:inodeop}
		1475	%----------------------------------------------------------------------------
		1476
		1477	\begin{table}
		1478	\begin{verbatim}
		1479	struct inode_operations {
		1480	struct file_operations *default_file_ops;
		1481	struct inode* (create) (struct inode , struct dentry *);
		1482	struct inode* (lookup) (struct inode , struct dentry *);
		1483	int (unlink) (struct dentry );
		1484	void (destroy) (struct inode );
		1485	int (truncate) (struct inode , __off_t len);
		1486	};
		1487	\end{verbatim}
		1488	\caption{struct inode\_operations}
		1489	\label{src:structinodeop}
		1490	\end{table}
		1491
		1492	This struct contains the virtual operations on the struct inode.
		1493	These functions returns 0 in case of success, !=0 in case of
		1494	error, that will be interpreted as a negate errno() value (e.g.,
		1495	-EPERM instead of EPERM).
		1496
		1497	\begin{description}
		1498	\item [default\_file\_ops]these are the file\_operations described into the
		1499	paragraph \ref{sec:fileop}, that must be used with the inode. By default, the
		1500	field is initialized with the file\_operation of its filesystem. These pointers
		1501	can be modified to implement a kind of virtual filesystem that works on a
		1502	device. For example, we can define an inode ``/dev/ide/hda1'' in a way that all
		1503	read/write operations on it are not mapped in the operations of the filesystem
		1504	where it is stored.
		1505	\item [create]This function creates a new inode (a new file) with the name
		1506	passed as parameter into the struct inode passed as parameter (that must be an
		1507	inode of type ``directory''). It returns the new inode or NULL in case of
		1508	error.
		1509	\item [lookup]This function searches an inode passed as parameter into another
		1510	inode of type directory passed as parameter. It returns the inode just found or
		1511	NULL in case of error.
		1512	\item [unlink]This function unlinks the filename passed as parameter from the
		1513	parameter of the corresponding inode. The inode will be removed (by generic
		1514	filesystem routines) if it remains without links associated to it.
		1515	\item [destroy]This function destroys the file linked to the inode. It frees the
		1516	space occupied by the file. Often, it is sufficient to call the truncate
		1517	function with length 0.
		1518	\item [truncate]This function truncates the file length of the inode passed as
		1519	parameter at the length specified in the second parameter. If the given length
		1520	is greater than the actual file length, the function does nothing.
		1521	\end{description}
		1522
		1523	%----------------------------------------------------------------------------
		1524	\subsubsection{struct file\_operations}
		1525	\label{sec:fileop}
		1526	%----------------------------------------------------------------------------
		1527
		1528	\begin{table}
		1529	\begin{verbatim}
		1530	struct file_operations {
		1531	__off_t (lseek) (struct file , __off_t, int);
		1532	__ssize_t (read) (struct file , char *, __ssize_t);
		1533	__ssize_t (write) (struct file , char *, __ssize_t);
		1534	int (readdir) (struct file , void *);
		1535	int (open) (struct inode , struct file *);
		1536	int (close) (struct inode , struct file *);
		1537	};
		1538	\end{verbatim}
		1539	\caption{struct file\_operations}
		1540	\label{src:structfileop}
		1541	\end{table}
		1542
		1543	This structure contains all the function that must work with
		1544	files. If not otherwise stated, the functions can operate both on
		1545	files and on directory.
		1546
		1547	\begin{description}
		1548	\item [lseek]Moves the input/output position of the file specified as parameter.
		1549	The new position is specified as in the lseek primitive (described in
		1550	\cite{book:posix1003.1} or in all the Unix programming books). That is, the
		1551	position is specified with a delta with respect to a given position (start of
		1552	file, end of file, current position); As described in section \ref{sec:file}
		1553	during the description of the field f\_pos, the new position can be greater than
		1554	the end-of-file. This function must return a position inside the file or an
		1555	error code.
		1556	\item [read]Reads some data and puts it into the buffer. The function is called
		1557	only for regular files. It returns the number of bytes read, that can be less
		1558	than requested, or an error code.
		1559	\item [write]Writes some bytes from the specified buffer. It is called only for
		1560	regular files. It returns the number of bytes written, that can be less than
		1561	requested, or an error code.
		1562	\item [readdir]Reads the next filename of the directory parameter passed through
		1563	a struct file. It inserts it into the struct dirent passed as parameter. Returns
		1564
		1565	\item [open]Opens the file linked to the inode. Fills some internal fields in
		1566	the filesystem-specific struct file. Returns 0 in case of success, an error
		1567	otherwise.
		1568	\item [close]Closes the file passed as parameter. returns 0 in case of success,
		1569	an error in case of failure.
		1570	\end{description}
		1571
		1572	Most of the function described in this section works on buffers
		1573	passed by the user. the general routines checks that these buffer
		1574	are corrects. In case the operating system implement some memory
		1575	protection mechanisms, the functions copyfromuser() and
		1576	copytouser() should be used (their syntax is similar to memcpy()).
		1577	Note that Shark currently do not provide memory protection, so
		1578	memcpy() can be used.
		1579
		1580	%----------------------------------------------------------------------------
		1581	\subsection{External interface}
		1582	%----------------------------------------------------------------------------
		1583
		1584	%----------------------------------------------------------------------------
		1585	\subsubsection{Non standard functions}
		1586	\label{sec:fs:mount}
		1587	%----------------------------------------------------------------------------
		1588
		1589	\begin{table}
		1590	\begin{verbatim}
		1591	int mount (dev_t device, u_int8_t fsind, char *where, struct mount_opts
		1592	*options);
		1593	int umount (dev_t device);
		1594	dev_t fdevice(char *device_name);
		1595	\end{verbatim}
		1596	\caption{mount/umount functions}
		1597	\label{tab:mount}
		1598	\end{table}
		1599
		1600	The mount/umount functions are the non-standard functions provided
		1601	by the filesystem (their prototype can be found in
		1602	include/sys/mount.h). Here is a small description of these
		1603	functions (see Table \ref{tab:mount}):
		1604
		1605	\begin{description}
		1606	\item [fdevice]This function returns the device serial number of a
		1607	given filename. See Section \ref{sec:dd:interfaccia:nomi}.
		1608	\item [mount]Is used to mount a partition. These are the informations that must
		1609	be passed to it:
		1610
		1611	\begin{itemize}
		1612	\item The device serial number that must be mounted
		1613	\item The filesystem identifier (fsind) that should be used (see
		1614	include/fs/fsind.h).
		1615	\item The directory where the device will be mounted.
		1616	\item Other parameters (see Section \ref{sec:mountopt}).
		1617	\end{itemize}
		1618
		1619	The function returns 0 in case of success, != 0 in case of failure
		1620	(errno is modified).
		1621
		1622	\item [umount]This function unmount a device that was previously
		1623	mounted. This function cannot unmount the ``root'' device.
		1624	During shutdown, all the filesystem will be unmounted in the
		1625	proper order.
		1626	\end{description}
		1627
		1628	%----------------------------------------------------------------------------
		1629	\subsection{Initialization: an example}
		1630	%----------------------------------------------------------------------------
		1631
		1632	This is an example of the filesystem initialization in S.Ha.R.K.
		1633	To initialize the system, we must register the kernel modules
		1634	first:
		1635
		1636	\begin{verbatim}
		1637	TIME __kernel_register_levels__(void *arg)} {
		1638	struct multiboot_info mb=(struct multiboot_info)arg;
		1639
		1640	EDF_register_level(EDF_ENABLE_ALL);
		1641	RR_register_level(RRTICK, RR_MAIN_YES, mb);
		1642	CBS_register_level(CBS_ENABLE_ALL, 0);
		1643	dummy_register_level();
		1644
		1645	SEM_register_module();
		1646	CABS_register_module();
		1647	NOPM_register_module();
		1648	return TICK;
		1649	}
		1650	\end{verbatim}
		1651
		1652	The filesystem always require the registration of a module that
		1653	can accept soft aperiodic tasks (such as the CBS, for example).
		1654	Moreover, a mutex protocol must be present (NOPM, in this case).
		1655
		1656	\begin{verbatim}
		1657	TASK __init__(void *arg) {
		1658	extern int __bdev_sub_init(void);
		1659	extern int __fs_sub_init(void);
		1660	struct multiboot_info mb = (struct multiboot_info)arg;
		1661
		1662	/* block devices initialization */
		1663	__bdev_sub_init();
		1664
		1665	/* filesystem initialization */
		1666	__fs_sub_init();
		1667	__call_main__(mb);
		1668	return (void *)0;
		1669	}
		1670	\end{verbatim}
		1671
		1672	This is a typical S.Ha.R.K. \_\_init\_\_ function. In this
		1673	function, block devices and file systems should be initialized.
		1674
		1675	\begin{verbatim}
		1676	int __bdev_subinit(void) {
		1677	extern __dev_t root_device;
		1678	BDEV_PARMS bdev=BASE_BDEV;
		1679
		1680	/* low level initialization */
		1681	bdev_def_showinfo(bdev,TRUE);
		1682	bdev_init(&bdev);
		1683
		1684	/* root device specification */
		1685	root_device = bdev_scan_devices(choose_root_callback);
		1686	if (root_device < 0) {
		1687
		1688	/* in case of error... */
		1689	exit(1);
		1690	}
		1691	return 0;
		1692	}
		1693	\end{verbatim}
		1694
		1695
		1696	This function initializes the block device layer. This function
		1697	sets the parameters of the bdev variable, calls the low level
		1698	initialization functions and searches for the \emph{root device};
		1699	basically, the function must specify in some way which is the root
		1700	device. In the function, the low level layer lists all the
		1701	available devices, calling the function choose\_root\_callback()
		1702	for each of them. That function will choose the root device.
		1703
		1704	\begin{verbatim}
		1705	int choose_root_callback(dev_t dev,u_int8_t fs) {
		1706	if (fs==FS_MSDOS)
		1707	return dev;
		1708	return -1;
		1709	}
		1710	\end{verbatim}
		1711
		1712	In this case, the choice of the root device is quite simple: the
		1713	first FAT16 DOS partition becomes the root device. Usually,
		1714	DOS calls that partition ``C:''.
		1715
		1716	\begin{verbatim}
		1717	int __fs_sub_init(void) {
		1718	extern __dev_t root_device;
		1719	FILESYSTEM_PARMS fs = BASE_FILESYSTEM;
		1720	struct mount_opts opts;
		1721
		1722	/* mount parameters */
		1723	memset(&opts, 0, sizeof(struct mount_opts));
		1724	opts.flags = MOUNT_FLAG_RW;
		1725
		1726	/* filesystem initialization */
		1727	filesystem_def_rootdevice(fs,root_device);
		1728	filesystem_def_fs(fs,FS_MSDOS);
		1729	filesystem_def_showinfo(fs,TRUE);
		1730	filesystem_init(&fs);
		1731
		1732	/* C library initialization */
		1733	libc_initialize();
		1734
		1735	return 0;
		1736	}
		1737	\end{verbatim}
		1738
		1739	This functions initializes the filesystem giving R/W permission on
		1740	the root partition previously found. Then, the C library is
		1741	initialized.
		1742
		1743	%----------------------------------------------------------------------------
		1744	\section{Internal structure}
		1745	\label{sec:fs:intstruct}
		1746	%----------------------------------------------------------------------------
		1747
		1748	Many data structures are composed by two parts: a generic part,
		1749	that is independent from the particular filesystem, and a
		1750	filesystem specific part, that are usually stored into an union.
		1751
		1752	Almost all the data structures have a set of functions that
		1753	modifies them. Pointers to these functions are included in a
		1754	structure that has the name of the structure that is modified by
		1755	the functions with a suffix ``operations''. For example, the
		1756	structure super\_block (see Section \ref{sec:superblock}) has a
		1757	structure super\_operations (see Section \ref{sec:superop}) that
		1758	contains a function create\_inode.
		1759
		1760	Since the high layer functions influences the global system
		1761	throughput, some informations are duplicated in the data
		1762	structures to speed up the execution.
		1763
		1764	The system is basically structured with a Unix-like approach, with
		1765	4 fundamental structures:
		1766
		1767	\begin{description}
		1768	\item [super]contains informations on each device that is mounted (a table is
		1769	used);
		1770	\item [inode]contains informations on the files in a filesystem (double lists
		1771	with hash keys are used);
		1772	\item [dentry]contains the file names (a tree is used);
		1773	\item [file]contains informations on each open file (a descriptor table is
		1774	used).
		1775	\end{description}
		1776
		1777	%----------------------------------------------------------------------------
		1778	\subsection{The data structures}
		1779	%----------------------------------------------------------------------------
		1780
		1781	%----------------------------------------------------------------------------
		1782	\subsubsection{struct super\_block}
		1783	\label{sec:superblock}
		1784	%----------------------------------------------------------------------------
		1785
		1786	From a user point of view, the filesystem exports a unix-like view
		1787	of the system. That is, partitions can be mounted into the
		1788	directory tree. When the system starts, the root directory is
		1789	mounted as ``/''.
		1790
		1791	\begin{table}
		1792	\begin{verbatim}
		1793	struct super_block {
		1794	struct super_block *sb_parent;
		1795	struct super_block *sb_childs;
		1796	struct super_block *sb_next;
		1797	__dev_t sb_dev;
		1798	struct inode *sb_root;
		1799	struct dentry *sb_droot;
		1800	struct file_system_type *sb_fs;
		1801	struct super_operations *sb_op;
		1802	struct dentry_operations *sb_dop;
		1803	struct mount_opts sb_mopts;
		1804	int sb_busycount;
		1805	__uint32_t sb_used:1;
		1806	__uint32_t sb_blocked:1;
		1807	union {
		1808	struct msdos_super_block msdos_sb;
		1809	} u;
		1810	};
		1811	\end{verbatim}
		1812	\caption{struct super\_block}
		1813	\label{src:structsuper}
		1814	\end{table}
		1815
		1816	The struct super\_block contains the fundamental information of
		1817	the filesystem (a unix-like filesystem usually duplicates this
		1818	informations a few times into the Hard disk). Every time a
		1819	partition is mounted a new structure is allocated and initialized.
		1820	These structures are maintained in a tree, that is, if a partition
		1821	is mounted into a directory the new structure becomes a leaf of
		1822	the structure that contains the mount directory.
		1823
		1824	\begin{figure}
		1825	\begin{center}
		1826	\includegraphics[width=7cm]{images/supertree.eps}
		1827	\end{center}
		1828	\caption{Super block tree}
		1829	\label{fig:supertree}
		1830	\end{figure}
		1831
		1832	The root super block is stored into the global pointer
		1833	root\_superblock; Figure \ref{fig:supertree} shows a possible
		1834	super block tree.
		1835
		1836	Here a short description of the struct super\_block fields (see
		1837	Table \ref{src:structsuper}):
		1838
		1839	\begin{description}
		1840	\item [sb\_parent,~sb\_childs,~sb\_next]are used to store the tree structure:
		1841	sb\_parent points to the father of the structure, whereas sb\_childs points to
		1842	the first structure of the childs; sb\_next is used to create the brothers list
		1843	(that is the children list of the father).
		1844	\item [sb\_dev]Contains the device, that is the partition that contains the
		1845	filesystem (the low level part is responsible of the devices, that are typically
		1846	associated to an hard disk partition. It is possible for example to extend the
		1847	low level to associate to a device other entities, such a file on another
		1848	filesystem (this is usually possible on Unix system through loopback devices).
		1849	\item [sb\_root]This is the root directory inode of the filesystem.
		1850	\item [sb\_droot]This is the directory entry (the filename) of the root
		1851	directory.
		1852	\item [sb\_fs]This is the pointer to the filesystem present in the filesystem.
		1853	Every supported file system is registered at init time (see Section
		1854	\ref{sec:fs:int:init}).
		1855	\item [sb\_op]Pointer to a structure that contains the specific filesystem
		1856	functions used to use the struct super\_block.
		1857	\item [sb\_dop]These are the specific function used for the directory entries.
		1858	\item [sb\_mopts]Mount options used during initialization.
		1859	\item [sb\_busycount]This is a counter that counts the number of entities that
		1860	are using the filesystem (e.g., a partition can be unmounted if someone is
		1861	reading it).
		1862	\item [sb\_used]A flag that signals if the structure is used.
		1863	\item [sb\_blocked]Some operations on this structure must be executed in mutual
		1864	exclusion; if this flag is set there is some operation on the structure. This
		1865	flag can not be used/tested/modified directly by the user functions.
		1866	\end{description}
		1867
		1868	%----------------------------------------------------------------------------
		1869	\subsubsection{struct dentry}
		1870	%----------------------------------------------------------------------------
		1871
		1872	\begin{table}
		1873	\begin{verbatim}
		1874	struct dentry {
		1875	struct dentry *d_next;
		1876	struct dentry *d_prev;
		1877	struct dentry *d_parent;
		1878	struct dentry *d_child;
		1879	__time_t d_acc;
		1880	struct qstr d_name;
		1881	short d_lock;
		1882	struct dentry_operations *d_op;
		1883	struct inode *d_inode;
		1884	struct super_block *d_sb;
		1885	};
		1886	\end{verbatim}
		1887	\caption{struct dentry}
		1888	\label{src:structdentry}
		1889	\end{table}
		1890
		1891	\label{sec:dentry} The struct dentry contains the file names independently from
		1892	the file type (regular file, directory, ...) \footnote{``dentry'' means
		1893	\emph{D}irectory \emph{ENTRY}.}. Each file in the filesystem is always stored in
		1894	two structures:
		1895
		1896	\begin{itemize}
		1897	\item struct dentry contains the filename;
		1898	\item struct inode contains the other informations on the file.
		1899	\end{itemize}
		1900
		1901	This separation is made because in most file systems (not in the DOS FAT16)
		1902	more than one name can be linked to the same file (inode).
		1903
		1904	\begin{figure}
		1905	\begin{center}
		1906	\includegraphics[width=10cm]{images/dentrytree.eps}
		1907	\end{center}
		1908	\caption{Directory entry tree.}
		1909	\label{fig:dentrytree}
		1910	\end{figure}
		1911
		1912
		1913	These data structures are stored internally in a tree structure. Please note
		1914	that the inodes are not structured as a tree; every dentry has a pointer to the
		1915	correspondent inode.
		1916
		1917	Figure \ref{fig:dentrytree} shows a directory entry tree.
		1918	Directories are showed with a continuous line, whereas files are
		1919	showed with a dashed line (note that the internal structures for
		1920	files and directories are the same).
		1921
		1922	The system do not load all the file names in the filesystem. Only
		1923	the recent filenames are stored in memory in a partial tree. These
		1924	informations are continuously updated removing and adding dentry
		1925	struct.
		1926
		1927	\begin{table}
		1928	\begin{verbatim}
		1929	struct qstr {
		1930	char name[MAXFILENAMELEN+1];
		1931	char *nameptr;
		1932	};
		1933	\end{verbatim}
		1934	\caption{struct qstr}
		1935	\label{src:structqstr}
		1936	\end{table}
		1937
		1938	Directory names are stored into a struct qstr (see Table
		1939	\ref{src:structqstr}). A string is not used to avoid too much
		1940	sting copies. The structure is composed by two fields:
		1941
		1942	\begin{description}
		1943	\item [nameptr]If != NULL, it is the name to use.
		1944	\item [name]If nameptr==NULL, it is the name to use.
		1945	\end{description}
		1946
		1947	The QSTRNAME can be used on a structure QSTR to get a (char *).
		1948	When filling the structure, one of the two fields can be used. If
		1949	nameptr is used, the pointed string must be statically allocated.
		1950	If name is used, nameptr must be set to NULL.
		1951
		1952	The fields of the struct dentry contains the following
		1953	informations:
		1954
		1955	\begin{description}
		1956	\item [d\_next,~d\_prev,~d\_parent~e~d\_child]can be used to store a dentry tree
		1957	structure: d\_parent is a pointer to the father's dentry, d\_child is a pointer
		1958	to a list of children dentry, d\_next and d\_prev are used to navigate into the
		1959	children dentry list.
		1960	\item [d\_acc]It is the time of the last access to the dentry structure.
		1961	\item [d\_name]Is the name associated to the dentry.
		1962	\item [d\_lock]is a blocking counter: it is incremented every time a routine
		1963	needs to use the dentry, and it is decremented when it is no more needed. In
		1964	that way, it is possible to know which are the structures currently in use. The
		1965	filesystem routines should not directly modify this field.
		1966	\item [d\_op]Pointer to the virtual operations used to handle this structure.
		1967	\item [d\_inode]is the inode associated to the structure.
		1968	\item [d\_sb]It is the super\_block that contains the directory. This a
		1969	redundant information used to speed-up the code.
		1970	\end{description}
		1971
		1972	%----------------------------------------------------------------------------
		1973	\subsubsection{struct inode}
		1974	\label{sec:inode}
		1975	%----------------------------------------------------------------------------
		1976
		1977	\begin{figure}
		1978	\begin{center}
		1979	\includegraphics[width=10cm]{images/inodebuckets.eps}
		1980	\end{center}
		1981	\caption{Inode data structures}
		1982	\label{fig:inodebuckets}
		1983	\end{figure}
		1984
		1985
		1986	\begin{table}
		1987	\begin{verbatim}
		1988	struct inode {
		1989	struct stat i_st;
		1990	struct inode *i_next;
		1991	struct inode *i_prev;
		1992	int i_hash;
		1993	__rwlock_t i_lock;
		1994	__uint16_t i_dlink;
		1995	__uint32_t i_dirty:1;
		1996	struct inode_operations *i_op;
		1997	struct super_block *i_sb;
		1998	union {
		1999	struct msdos_inode_info msdos_i;
		2000	} u;
		2001	};
		2002	\end{verbatim}
		2003	\caption{struct inode}
		2004	\label{src:structinode}
		2005	\end{table}
		2006
		2007	An inode is a file into a filesystem. It contains all the file
		2008	informations except the name that is maintained into a struct
		2009	dentry (see \ref{sec:dentry}; more details can be found into
		2010	\cite{book:unix}).
		2011
		2012	All the inodes temporarily present in memory are stored into an
		2013	hash bucket structure as showed in Figure \ref{fig:inodebuckets}:
		2014	for each device a number is computed using an hash function, the
		2015	inode number (file serial number) and the device number. All the
		2016	inodes with the same hash entry are stored in a list. The first
		2017	inode in the list is stored in a table at the index given by the
		2018	hash key.
		2019
		2020	\begin{table}
		2021	\begin{verbatim}
		2022	struct stat {
		2023	__dev_t st_dev;
		2024	__ino_t st_ino;
		2025	__mode_t st_mode;
		2026	__nlink_t st_nlink;
		2027	__uid_t st_uid;
		2028	__gid_t st_gid;
		2029	__off_t st_size;
		2030	unsigned long st_blksize;
		2031	__time_t st_atime;
		2032	__time_t st_mtime;
		2033	__time_t st_ctime;
		2034	};
		2035	\end{verbatim}
		2036	\caption{struct stat}
		2037	\label{src:structstat}
		2038	\end{table}
		2039
		2040	An inode contains the following fields:
		2041
		2042	\begin{description}
		2043	\item [i\_st]Contains all the standard informations for a file (e.g., the
		2044	length, the type); a task can inspect these informations using the stat()
		2045	primitive. These informations are included into a stat structure that is shared
		2046	between the user libraries and the internal implementation, in a way that a
		2047	memcopy should be enough to pass these informations.
		2048	\item [i\_next,~i\_prev~e~i\_hash]These fields are used to implement the double
		2049	linked list and the hash key: i\_next and i\_prev implement a double list,
		2050	whereas i\_hash is the hash key (redundant information).
		2051	\item [i\_lock]It is a lock for the structure (see Section \ref{sec:rwlock}).
		2052	\item [i\_dlink]Number of directory entries that points to the inode (see
		2053	Section \ref{sec:dentry}).
		2054	\item [i\_dirty]A flag that says if this inode has been modified.
		2055	\item [i\_op]Virtual operations used for working with this inode. Usually they
		2056	are filesystem specific.
		2057	\item [i\_sb]Super block where the inode is stored (see Section
		2058	\ref{sec:superblock}).
		2059	\end{description}
		2060
		2061	The struct stat showed in Table \ref{src:structstat}, contained
		2062	into the struct inode and described in the paragraph 5.6 of
		2063	\cite{book:posix1003.1}, contains the following fields:
		2064
		2065	\begin{description}
		2066	\item [st\_mode]It is the file mode (informations about who can
		2067	read/modify/execute a file).
		2068	\item [st\_ino]An unique number that identifies a file (``file serial number''
		2069	in the Posix terminology).
		2070	\item [st\_dev]It is the device serial number; st\_dev and st\_ino are necessary
		2071	and sufficient to uniquely identify a file in the system.
		2072	\item [st\_nlink]Number of hard links to a file (Always 1 for FAT16 file
		2073	systems).
		2074	\item [st\_uid~e~st\_gid]File group and user identifiers. 0 is used for the
		2075	system administrator. The FAT16 filesystem always set these fields to 0.
		2076	\item [st\_size]File length (bytes).
		2077	\item [st\_atime,~st\_mtime~e~st\_ctime]File times: st\_atime is the time of the
		2078	last access, st\_mtime is the time of the last modification and st\_ctime is the
		2079	time of the last state change (e.g., when the file has been created or modified
		2080	using chmod).
		2081	\end{description}
		2082
		2083	%----------------------------------------------------------------------------
		2084	\subsubsection{struct file}
		2085	\label{sec:file}
		2086	%----------------------------------------------------------------------------
		2087
		2088	\begin{table}
		2089	\begin{verbatim}
		2090	struct file {
		2091	struct file *f_next;
		2092	struct dentry *f_dentry;
		2093	struct file_operations *f_op;
		2094	__loff_t f_pos;
		2095	unsigned int f_count;
		2096	__uint32_t f_flag_isdir:1;
		2097	union {
		2098	struct msdos_file_info msdos_f;
		2099	} u;
		2100	};
		2101	\end{verbatim}
		2102	\caption{struct file}
		2103	\label{src:structfile}
		2104	\end{table}
		2105
		2106	Every open file has a struct file associated to it (see Table
		2107	\ref{src:structfile}).
		2108
		2109	\begin{description}
		2110	\item [f\_next]Pointer used internally to the filesystem modules.
		2111	\item [f\_dentry]contains the filename (see Section \ref{sec:dentry}). From here
		2112	we can get the corresponding inode number.
		2113	\item [f\_op]Contains a pointer to the virtual operation that works on struct
		2114	file, described in Section \ref{sec:fileop}.
		2115	\item [f\_pos]Contains the current position into the file. read/write operations
		2116	are done starting from this point; moreover, Posix specification allow a seek to
		2117	a position beyond file termination: in that case read operation should fail,
		2118	whereas write operation should fill the gap with zeroes.
		2119	\item [f\_count]Contains the number of file descriptors that are using this
		2120	structure for input/output operations on the file.
		2121	\item [f\_flag\_isdir]Flag that says if this is a directory (redundant
		2122	information).
		2123	\item [msdos\_f]FAT16 specific informations.
		2124	\end{description}
		2125
		2126	The applications use a file descriptor to index the file to use. A
		2127	file descriptor is a number meaningful only for the processor that
		2128	obtained it (e.g. through an open operation) that is assigned to a
		2129	struct file through a descriptor table. Every process has a
		2130	different table, contained into the struct file\_descriptors
		2131	described in section \ref{sec:filedes}. The whole S.Ha.R.K. Kernel
		2132	can be considered a monoprocess multithread application, and for
		2133	that reason it has only one descriptor table.
		2134
		2135	%----------------------------------------------------------------------------
		2136	\subsubsection{struct file\_descriptors}
		2137	\label{sec:filedes}
		2138	%----------------------------------------------------------------------------
		2139
		2140	\begin{table}
		2141	\begin{verbatim}
		2142	struct file_descriptors {
		2143	struct file *fd_file[MAXOPENFILES];
		2144	__uint16_t fd_flags[MAXOPENFILES];
		2145	__uint32_t fd_fflags[MAXOPENFILES];
		2146	__mode_t fd_umask;
		2147	char fd_cwd[MAXPATHNAMELEN+1];
		2148	struct dentry *fd_cwden;
		2149	__mutex_t fd_mutex;
		2150	};
		2151	\end{verbatim}
		2152	\caption{struct file\_descriptors}
		2153	\label{src:structfiledes}
		2154	\end{table}
		2155
		2156	This struct is used to store all the general parameters assigned
		2157	to a process (e.g., the current directory). Again, S.Ha.R.K. has
		2158	only one structure because it can be considered like a monoprocess
		2159	multithread system. A mutex is used to guarantee mutual exclusion
		2160	when the struct is used by more than one thread. The structure
		2161	contains the following fields:
		2162
		2163	\begin{description}
		2164	\item [fd\_file]This is the descriptor table; every descriptor
		2165	(with value between 0 and MAXOPENFILES) is associated an element
		2166	that can have the following values:
		2167
		2168	\begin{itemize}
		2169	\item NULL if it is a free descriptor;
		2170	\item RESERVED if it is reserved for special uses (e.g., descriptors assigned to
		2171	the keyboard or to the console)
		2172	\item pointer to the file
		2173	\end{itemize}
		2174
		2175	\item [fd\_flags]Flags associated to the file (e.g., FD\_CLOSEXEC), described in
		2176	detail in \cite{book:posix1003.1}, when the fcntl() primitive is described.
		2177	\item [fd\_fflags]Flag used when the file was opened (e.g., O\_APPEND,
		2178	O\_RDONLY, etc...).
		2179	\item [fd\_umask]Mask used during file creation;can be modified with the
		2180	function umask().
		2181	\item [fd\_cwd]Current working directory used by all the primitives that
		2182	receives a relative pathname (that does not start with ``/'' (redundant
		2183	information).
		2184	\item [fd\_cwden]The current working directory; it can be modified using the
		2185	function chdir().
		2186	\item [fd\_mutex]Mutual exclusion semaphore used for concurrent multithread
		2187	access.
		2188	\end{description}
		2189
		2190	%----------------------------------------------------------------------------
		2191	\subsection{Disk Cache}
		2192	\label{sec:diskcache}
		2193	%----------------------------------------------------------------------------
		2194
		2195	%----------------------------------------------------------------------------
		2196	\subsubsection{How to use the cache}
		2197	%----------------------------------------------------------------------------
		2198
		2199	The filesystem modules use the disk cache to get all the
		2200	informations they need.
		2201
		2202	\begin{table}
		2203	\begin{verbatim}
		2204	dcache_t *dcache_lock (__dev_t dev, __blkcnt_t lsector);
		2205	void dcache_unlock (dcache_t *);
		2206	dcache_t *dcache_acquire (__dev_t dev, __blkcnt_t lsector);
		2207	void dcache_release (dcache_t *);
		2208	void dcache_dirty (dcache_t *d);
		2209	void dcache_skipwt(dcache_t *d);
		2210	\end{verbatim}
		2211	\caption{Disk cache interface}
		2212	\label{tab:cacheint}
		2213	\end{table}
		2214
		2215	The disk cache interface is showed in Table \ref{tab:cacheint}:
		2216
		2217	\begin{description}
		2218	\item [dcache\_lock]This function can be used to require a read only access to
		2219	the cache. More than one thread can lock a block. the parameters are the device
		2220	number and the sector number; it returns a pointer to the struct dcache\_t in
		2221	case of success, or NULL in case of error. This function do not return until the
		2222	access is granted or an error occurred. To avoid critical blocking times in case
		2223	more than one block is needed, you must use a proper access protocol: the
		2224	blocking must be done with increasing block numbers; moreover, if possible, only
		2225	one block should be blocked at a time.
		2226	\item [dcache\_unlock]This function can be used to unblock a cache entry
		2227	previously blocked.
		2228	\item [dcache\_acquire]This function behaves as dcache\_lock(), but it requires
		2229	a read/write access. Only one task at a time can acquire the lock. This access
		2230	do not imply that the cache entry is considered dirty (see the dcache\_dirty()
		2231	function).
		2232	\item [dcache\_release]This function is similar to dcache\_unlock(), and works
		2233	with the locks acquired by dcache\_acquire().
		2234	\item [dcache\_dirty]Signal that the cache entry has been modified. This
		2235	function must be called only if we own a read/write lock and if the cache has
		2236	been modified.
		2237	\item [dcache\_skipwt]This function signals to the system that the cache entry
		2238	contains system informations that are often modified. That means that the
		2239	write\_through flag should not be considered. See section \ref{sec:modicache}
		2240	for a detailed description.
		2241	\end{description}
		2242
		2243	%----------------------------------------------------------------------------
		2244	\subsubsection{struct dcache}
		2245	\label{sec:dcache}
		2246	%----------------------------------------------------------------------------
		2247
		2248	\begin{figure}
		2249	\begin{center}
		2250	\includegraphics[width=10cm]{images/dcachebuckets.eps}
		2251	\end{center}
		2252	\caption{Data structure for disk cache}
		2253	\label{fig:dcachebuckets}
		2254	\end{figure}
		2255
		2256	\begin{table}
		2257	\begin{verbatim}
		2258	typedef struct TAGdcache {
		2259	int prev;
		2260	int next;
		2261	int hash;
		2262	__dev_t device;
		2263	__blkcnt_t lsector;
		2264	__uint8_t buffer[MAXSECTORSIZE];
		2265	int used;
		2266	__time_t time;
		2267	int numblocked;
		2268	__fs_sem_t sync;
		2269	__fs_sem_t esync;
		2270	__rwlock_t rwlock;
		2271	__uint32_t dirty:1;
		2272	__uint32_t ready:1;
		2273	__uint32_t esyncreq:1;
		2274	__uint32_t skipwt:1;
		2275	} dcache_t;
		2276	\end{verbatim}
		2277	\caption{struct dcache}
		2278	\label{str:structdcache}
		2279	\end{table}
		2280
		2281	This data structure, used by the module that implements the data
		2282	cache, represents a cache entry. All the high level functions uses
		2283	the cache module as an interface to the lower layer.
		2284
		2285	These structures are maintained in the system wit a structure
		2286	similar to the inodes, that is a set of double linked lists that I
		2287	can access using an hash key, as showed in Figure
		2288	\ref{fig:dcachebuckets}; the hash key is computed using the device
		2289	(that is the physical peripheral) and the lsector (the logical
		2290	sector).
		2291
		2292	Every struct dcache is an elementary block that can be transferred
		2293	to a device. It is not possible to have 2 data structures that
		2294	refers to the same block. That is, the pair (device,lsector) must
		2295	uniquely identify the data sector on the physical device (e.g.,
		2296	the block 64 in ``ide/hda'' could be the block 0 of
		2297	``ide/hda1'', as explained in section
		2298	\ref{sec:dd:interfaccia:nomi}).
		2299
		2300	The field of the structure (see Table \ref{str:structdcache}) are:
		2301
		2302	\begin{description}
		2303	\item [prev,~next~e~hash]Are used to store the structures into the lists that
		2304	are accessed using the hash key: next and prev are used to implement a double
		2305	linked list, whereas hash is the hash key.
		2306	\item [device,~lsector~e~buffer]These fields contain the physical device, the
		2307	logical sector and the data contained in the corresponding physical block. These
		2308	are the unique fields that a high level routine should use (and that shall be
		2309	provided by a data cache).
		2310	\item [used]The number of tasks that are using the block; -1 means that no one
		2311	is using the block, and that the block is free.
		2312	\item [time]The last block access time.
		2313	\item [numblocked]It is the number of tasks that are waiting the availability of
		2314	the block, (that is, they are waiting the flag ready to be set). That is, if a
		2315	task asks for a block already asked by another task, but not yet available (not
		2316	made available by the lower layer), then the task blocks on a synchronization
		2317	semaphore, and it increments that field.
		2318	\item [sync]This is the synchronization semaphore used by the blocked tasks that
		2319	waits the setting of the ready flag.
		2320	\item [esync]It is a synchronization semaphore on the errors needed during the
		2321	cache ``purging'' phase (see later).
		2322	\item [rwlock]This is a locker used to require the R/W access to the block (see
		2323	Section \ref{sec:rwlock}).
		2324	\item [dirty]This flag is set if the block has been modified.
		2325	\item [ready]This flag is set if the block is available.
		2326	\item [esyncreq]This flag is set if the synchronization on errors is required
		2327	(that is, the field esync is used).
		2328	\item [skipwt]This flag says if the write through handling should be done.
		2329	\end{description}
		2330
		2331	The fields esync and esyncreq are used during the ``purging
		2332	phase'': a task requires a read or write operation on a block, the
		2333	block is not into the data cache, and the cache is full. That is,
		2334	another block must be discarded from the cache. To discard a block
		2335	it can happen that a disk write operation must be done. While this
		2336	write operation is in progress other tasks can require the same
		2337	(discarded, in writing) block. These tasks can block waiting for
		2338	synchronization on the block. In that case the cache behavior
		2339	could be the following: the requests on the block are aborted (the
		2340	failure is only needed if we are able to free the block; the
		2341	blocked tasks are awakened with an error); or, the requests on the
		2342	block are not aborted (the block can not be freed, and the tasks
		2343	blocked on it are woken up). The problem is that after the task
		2344	has wake up all the blocked tasks, it must wait that all the
		2345	awakened tasks check for an error. To do that, the task set the
		2346	esyncreq flag, and it blocks on the esync semaphore; the last task
		2347	that checked for errors, checks the esyncreq flag and signal the
		2348	esync semaphore waking up the waiting task.
		2349
		2350	The disk cache can work in three different modes: \label{sec:modicache}
		2351	``copy-back'', ``write-through'' and ``modified write-thought''. Usually most
		2352	time-sharing OSes handle a copy-back cache. that is, write operations are done
		2353	only in cache; a low priority task is used to synchronize cache and disk data.
		2354	In such a system, if a cache request arrives and the cache is full, a block is
		2355	freed eventually writing it on the disk. On a real-time system such a behavior
		2356	can give some problems, because the time spent freeing the cache can be
		2357	accounted to the running task. For example, a task that only reads can be
		2358	delayed by another task that fills the cache memory writing blocks on the disk.
		2359	To solve this problem there are 2 solutions: the cache is not used (that is, no
		2360	caching on write), or the cache behavior is set to write-through, that is the
		2361	write operations are immediately synchronized on the disk, in a way that who
		2362	dirties the cache is also responsible for its synchronization.
		2363
		2364	If the cache is used in write-through mode, another problem
		2365	arises: sometimes a few bytes are modified frequently (e.g. the
		2366	filesystem information for the allocation of a file on the disk),
		2367	that leads to continuous disk writes. For that reason, the disk
		2368	cache provides the flag skipwt, that forces the copy-back mode for
		2369	a particular block. By default, this flag is not set. When the
		2370	write-through policy must be disabled, the file system should call
		2371	the function dcache\_skipwt() on the cache-entry before releasing
		2372	it.
		2373
		2374	%----------------------------------------------------------------------------
		2375	\subsubsection{struct \_\_rwlock\_t}
		2376	\label{sec:rwlock}
		2377	%----------------------------------------------------------------------------
		2378
		2379	\begin{table}
		2380	\begin{verbatim}
		2381	typedef struct {
		2382	__mutex_t mutex;
		2383	int active_readers;
		2384	int active_writers;
		2385	int blocked_readers;
		2386	int blocked_writers;
		2387	__sem_t readers_semaph;
		2388	__sem_t writers_semaph;
		2389	} __rwlock_t;
		2390	\end{verbatim}
		2391	\caption{struct \_\_rwlock\_t}
		2392	\label{src:structrwlock}
		2393	\end{table}
		2394
		2395	\begin{table}
		2396	\begin{verbatim}
		2397	void __rwlock_init (__rwlock_t *ptr);
		2398	void __rwlock_rdlock (__rwlock_t *ptr);
		2399	void __rwlock_wrlock (__rwlock_t *ptr);
		2400	int __rwlock_tryrdlock (__rwlock_t *ptr);
		2401	int __rwlock_trywrlock (__rwlock_t *ptr);
		2402	void __rwlock_rdunlock (__rwlock_t *ptr);
		2403	void __rwlock_wrunlock (__rwlock_t *ptr);
		2404	\end{verbatim}
		2405	\caption{Lockers functions}
		2406	\label{src:funcrwlock}
		2407	\end{table}
		2408
		2409	The cache module and other modules need a locker protocol, that is
		2410	a protocol that solves the reader/writer problem: basically, there
		2411	is a shared resource where some processes must read or write.
		2412	These rules must be imposed by the protocol:
		2413
		2414	\begin{enumerate}
		2415	\item If someone is writing, no one is authorizes to read or write.
		2416	\item If someone is reading, other tasks are authorized to read, but not to
		2417	write.
		2418	\end{enumerate}
		2419
		2420	In literature, there exist different methods to solve the problem
		2421	with different behaviors. This filesystem implements the protocol
		2422	proposed into paragraph 4.2.3.1 of
		2423	\cite{book:programmazioneconcorrente}. To replace that policy, the
		2424	rwlock.c file should be modified with another algorithm,
		2425	maintaining the same interface.
		2426
		2427	Table \ref{src:structrwlock} contains the internal data structure
		2428	of the implemented protocol. To implement a ``locker'', all the
		2429	needed data must be inserted into a structure \_\_wrlock\_t. The
		2430	following functions have also to be provided (see Table
		2431	\ref{src:funcrwlock}):
		2432
		2433	\begin{description}
		2434	\item [\_\_rwlock\_init]Initializes the structure passed as parameter. That
		2435	structure is passed to all the functions and it is used to protect a shared
		2436	resource.
		2437	\item [\_\_rwlock\_rdlock]Read request (read lock); when this function returns,
		2438	the calling task is authorized to read the shared resource.
		2439	\item [\_\_rwlock\_wrlock]Write request (write lock).
		2440	\item [\_\_rwlock\_tryrdlock]Conditional read request (try read lock); The
		2441	function return 0 if the calling task can read, another number otherwise. At the
		2442	end of the read operation (if the lock has been granted), \_\_rwlock\_rdunlock
		2443	must be called.
		2444	\item [\_\_rwlock\_trywrlock]Conditional write request (try write lock); The
		2445	function return 0 if the calling task can write, another number otherwise. At
		2446	the end of the write operation (if the lock has been granted),
		2447	\_\_rwlock\_wrunlock must be called.
		2448	\item [\_\_rwlock\_rdunlock]Read unlock; the task has finished the use of the
		2449	shared resource.
		2450	\item [\_\_rwlock\_wrunlock]Write unlock; the task has finished the use of the
		2451	shared resource, that is again in a consistent state.
		2452	\end{description}
		2453
		2454	The functions \_\_rwlock\_rdlock and \_\_rwlock\_rdunlock must always be used
		2455	together. Nested calls are not allowed. The same rules apply to
		2456	\_\_rwlock\_wrlock and \_\_rwlock\_wrunlock.
		2457
		2458	Please note that \_\_rwlock\_rdlock and \_\_rwlock\_wrlock can
		2459	cause undefined task blocking. The other functions do not block
		2460	the calling task.
		2461
		2462	%----------------------------------------------------------------------------
		2463	\section{Filesystem: DOS (FAT16)}
		2464	%----------------------------------------------------------------------------
		2465
		2466	%----------------------------------------------------------------------------
		2467	\subsection{Description}
		2468	%----------------------------------------------------------------------------
		2469
		2470	A technical description of the FAT16 filesystem can be found in
		2471	\cite{manual:DOS330} or \cite{manual:dosfat}.
		2472
		2473	%----------------------------------------------------------------------------
		2474	\subsubsection{Internal structure}
		2475	%----------------------------------------------------------------------------
		2476
		2477	\begin{figure}
		2478	\begin{center}
		2479	\includegraphics[width=7cm]{images/fatarch.eps}
		2480	\end{center}
		2481	\caption{Internal structure of the DOS FAT16.}
		2482	\label{src:fatarch}
		2483	\end{figure}
		2484
		2485	As showed in Figure \ref{src:fatarch}, this filesystem is divided
		2486	in areas:
		2487
		2488	\begin{description}
		2489	\item [Boot~sector]Contains a small program that loads the DOS operating
		2490	system, as specified into \cite{manual:DOS330}. It also contains other
		2491	informations like the number of FATS, the kind of support and the dimension of
		2492	the Root Directory; It can be thought as a super-block. The informations
		2493	contained in this sector are showed in Table \ref{tab:fatboot}.
		2494	\item [FAT~tables]This data area contains some copies of the File Allocation
		2495	Table (FAT) These informations are used to know where a file can be found on the
		2496	hard disk.
		2497	\item [Root~directory]This block is structured as all the blocks that contains a
		2498	directory, with the exceptions that it is allocated in a contiguous way and that
		2499	it has a fixed size.
		2500	\item [Data~area]This is the area where the files and directories are stored.
		2501	The minimal allocation unit is a cluster, that is a set of contiguous sectors on
		2502	the hard disk. The boot sector stores the number of sectors that compose a
		2503	cluster.
		2504	\end{description}
		2505
		2506	%----------------------------------------------------------------------------
		2507	\subsubsection{The boot sector}
		2508	%----------------------------------------------------------------------------
		2509
		2510	\begin{table}
		2511	\begin{verbatim}
		2512	struct bootrecord {
		2513	__uint8_t reserved[3];
		2514	char oemname[8];
		2515	__uint16_t bytespersector;
		2516	__uint8_t sectorspercluster;
		2517	__uint16_t hiddensectors;
		2518	__uint8_t fatsnumber;
		2519	__uint16_t rootentry;
		2520	__uint16_t sectors;
		2521	__uint8_t media;
		2522	__uint16_t sectorsperfat;
		2523	__uint16_t sectorpertrak;
		2524	__uint16_t headsperdisk;
		2525	__uint32_t hiddensectors32;
		2526	__uint32_t sectors32;
		2527	__uint16_t physicaldrive;
		2528	__uint8_t signature;
		2529	__uint8_t serialnumber[4];
		2530	char volumelabel[11];
		2531	char fattype[8];
		2532	};
		2533	\end{verbatim}
		2534	\caption{The DOS boot sector.}
		2535	\label{tab:fatboot}
		2536	\end{table}
		2537
		2538	Table \ref{tab:fatboot} shows the boot sector structure. The boot
		2539	sector is always 512 bytes large. The struct is only the first
		2540	part of the sector, the second part is the boot loader. Here is a
		2541	small description of some fields:
		2542
		2543	\begin{description}
		2544	\item [sectorpercluster]stores the number of sectors included into a cluster.
		2545	\item [sectorsperfat]stores the number of sectors in the FAT
		2546	\item [fatsnumber]stores the number of FATs on the disk.
		2547	\end{description}
		2548
		2549	A note: when DOS formats a disk, creating a FAT16 filesystem,
		2550	the informations that comes from a pre-existing filesystem have
		2551	precedence over the informations given by the hardware. That is,
		2552	if an hard disk declares to have 8 heads, and the pre-existing
		2553	filesystem has headsperdisk=13, the new file system will have 13
		2554	heads.
		2555
		2556	%----------------------------------------------------------------------------
		2557	\subsubsection{The File Allocation Table}
		2558	%----------------------------------------------------------------------------
		2559
		2560	\begin{figure}
		2561	\begin{center}
		2562	\includegraphics[width=7cm]{images/fatfat.eps}
		2563	\end{center}
		2564	\caption{An example of File Allocation Table (FAT)}
		2565	\label{fig:fatfat}
		2566	\end{figure}
		2567
		2568	Every FAT is organized as a table of 16 bit elements. Every
		2569	element is a cluster; the element value gives informations about
		2570	the cluster:
		2571
		2572	\begin{itemize}
		2573	\item 0 means the cluster is available.
		2574	\item values between 0xfff0-0xfff7 are reserved; for example, 0xfff7 means bad
		2575	(damaged) cluster.
		2576	\item 0xfff8-0xffff says that the cluster is the last cluster in a file.
		2577	\item Every other value index which is the next cluster of the file.
		2578	\end{itemize}
		2579
		2580	The FAT is used to create a set of chains that make a file. For
		2581	example, Figure \ref{fig:fatfat} shows the information for a file
		2582	stored in 4 clusters (23, 27, 25, 31). The first cluster of the
		2583	chain is contained into the directories. Please note that the
		2584	first cluster of the data area is the number two, because cluster
		2585
		2586	or 16 bit), and the disk format.
		2587
		2588	Directories (except the root directory) are common files with a
		2589	particular structure. The system usually give access to
		2590	directories through other primitives (mkdir, creat, and so on).
		2591
		2592	%----------------------------------------------------------------------------
		2593	\subsubsection{The ``root directory''}
		2594	%----------------------------------------------------------------------------
		2595
		2596	This area has the same structure of a directory file, except that
		2597	it has fixed dimension, it is not allocated into the data area and
		2598	its allocation is contiguous. Common directories have the same
		2599	structure except that they are allocated into the data area, and
		2600	that they can be fragmented.
		2601
		2602	\begin{table}
		2603	\begin{verbatim}
		2604	struct directoryentry {
		2605	__uint8_t name[8];
		2606	__uint8_t ext[3];
		2607	__uint8_t attribute;
		2608	__uint8_t reserved[10];
		2609	__uint16_t time;
		2610	__uint16_t date;
		2611	__uint16_t cluster;
		2612	__uint32_t size;
		2613	};
		2614	\end{verbatim}
		2615	\caption{DOS Directory Entry.}
		2616	\label{tab:fatdentry}
		2617	\end{table}
		2618
		2619	The directory structure is showed in Figure \ref{tab:fatdentry}:
		2620
		2621	\begin{description}
		2622	\item [name~and~ext]Contains the filename in the classic format 8+3. Non used
		2623	chars are filled with spaces.
		2624	\item [attribute]File flags: they specify if it is regular, a directory, a
		2625	volume label, it has been archived or if it is read-only.
		2626	\item [time]Creation time (2 seconds resolution) (all the 3 times specified by
		2627	the POSIX standard are mapped on that date). The time format is DOS specific.
		2628	\item [date]Creation date (until 2099). The date format is DOS specific.
		2629	\item [cluster]The first cluster of the file. A value 0 means empty file (the
		2630	first available cluster is the cluster 2).
		2631	\item [size]File size (32 bit).
		2632	\end{description}
		2633
		2634	Note the difference with the typical Unix file systems, where file
		2635	names and file informations are stored in different areas.
		2636
		2637	Directory entries are not contiguous. If the first character name
		2638	is 0xe5, the file has been deleted, if it is 0x00, it has been
		2639	never used. Any other character means that the entry is valid.
		2640	0x00 is used for efficiency: if it is found, directory scan is
		2641	ended also if there are other sectors in the cluster that contains
		2642	a directory.
		2643
		2644	%----------------------------------------------------------------------------
		2645	\subsubsection{The data area}
		2646	%----------------------------------------------------------------------------
		2647
		2648	The data area is divided in clusters, that are a set of sectors.
		2649	It starts from cluster number 2, and it stores regular files and
		2650	directories.
		2651
		2652	linear addresses are used to identify a sector on a hard disk, and
		2653	its cluster. Hard disk sectors are numbered starting from sector
		2654	0. The filesystem specifies an algorithm that associates to every
		2655	linear address an address C/H/S (Cylinder/Head/Sector).
		2656
		2657	Some DOS implementation supposes a particular structure
		2658	depending on the media used: a 1.44 Mb floppy disk has sectors
		2659	composed by 1 sector, and a root directory of 14 sectors. The best
		2660	way to implement a filesystem is to rely only on the super block
		2661	informations.
		2662
		2663	%----------------------------------------------------------------------------
		2664	\subsection{Implementation notes}
		2665	%----------------------------------------------------------------------------
		2666
		2667	The filesystem maintains 2 structures to store file informations:
		2668	an inode structure and a dentry structure. In the DOS FAT16
		2669	filesystem unfortunately there is an unique correspondence between
		2670	file names and inodes, and that information is contained into the
		2671	filesystem directory entry.
		2672
		2673	\begin{figure}
		2674	\begin{center}
		2675	\includegraphics[width=7cm]{images/fsn.eps}
		2676	\end{center}
		2677	\caption{The FAT16 file serial number.}
		2678	\label{fig:fsn}
		2679	\end{figure}
		2680
		2681	When the system refers to a file, it use the inode number. That
		2682	is, every file has a number that is unique inside the file system.
		2683
		2684	The FAT16 serial number has been specified as shown in Figure
		2685	\ref{fig:fsn}:
		2686
		2687	\begin{itemize}
		2688	\item The most significant part is the cluster number (the cluster where the
		2689	directory is stored).
		2690	\item The less significant part is the number into the directory entry.
		2691	\end{itemize}
		2692
		2693	For example, the inode number 0x01230008, refers to the 8th
		2694	directory entry into the cluster 0x123.
		2695
		2696	The cluster 1 is used to index the directory entries contained
		2697	into the root directory, whereas the cluster 0 is used to index
		2698	the root directory.
		2699
		2700	\bibliographystyle{plain}
		2701	\bibliography{../common/retis}
		2702
		2703	\end{document}

Subversion Repositories shark

(root)/sharkdoc/trunk/vol6/vol6.tex - Rev 1677