Subversion Repositories shark

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
1676 tullio 1
\documentclass[english]{report}
2
\usepackage[T1]{fontenc}
3
\usepackage[latin1]{inputenc}
4
\usepackage{a4wide}
5
\setcounter{secnumdepth}{3} \setcounter{tocdepth}{3}
6
\usepackage{graphicx}
7
 
8
\makeatletter
9
 
10
\usepackage{babel}
11
\makeatother
12
 
13
\begin{document}
14
 
15
\thispagestyle{empty}
16
 
17
\begin{center}{\LARGE S.Ha.R.K. User Manual}\end{center}{\LARGE \par} \vfill{}
18
 
19
\begin{center}{\large Volume VI}\end{center}
20
 
21
\begin{center}S.Ha.R.K. File System\end{center} \vfill{}
22
 
23
\begin{center}Written by\end{center}
24
\begin{center}Paolo Gai (pj at sssup.it)\end{center}
25
\begin{center}Tullio Facchinetti (tullio.facchinetti at unipv.it)\end{center}
26
\begin{center}Original italian version by\end{center}
27
\begin{center}Massimiliano Giorgi (massy at gandalf.sssup.it)\end{center}
28
\vfill{}
29
 
30
\begin{center}\includegraphics[width=2cm]{../common/sssup.ps}\end{center}
31
 
32
\begin{center}Scuola Superiore di Studi e Perfezionamento S. Anna\end{center}
33
\begin{center}RETIS Lab\end{center}
34
\begin{center}Via Carducci, 40 - 56100 Pisa\end{center}
35
\begin{center}\pagebreak\end{center}
36
 
37
\tableofcontents{}
38
 
39
\listoffigures
40
 
41
\listoftables
42
 
43
\begin{abstract}
44
This is the documentation about S.Ha.R.K. File system. The document gives an
45
insight on many data structures used internally by the file system. This is the
46
first step toward a \emph{true} file system user manual. If you carefully read
47
it, you will understand how it works, how can be initialized and used.
48
 
49
In any case, please send any comments to pj@sssup.it.
50
 
51
Enjoy,
52
 
53
Paolo
54
\end{abstract}
55
 
56
%----------------------------------------------------------------------------
57
\chapter{File system architecture}
58
\label{sec:arch}
59
%----------------------------------------------------------------------------
60
 
61
%----------------------------------------------------------------------------
62
\section{Features}
63
%----------------------------------------------------------------------------
64
 
65
\label{sec:arch:gen}The library is composed by two main layers:
66
 
67
\begin{description}
68
\item [File-system] The file system contains all the data structures used to
69
handle files.
70
\item [Blocks~devices]This part handles directly the hardware. They provide a
71
common interface that unifies the usage of all the block devices (i.e., the hard
72
disks).
73
\end{description}
74
 
75
The architecture of the library is modular, and it is described in
76
Figure \ref{fig:Filesystem-architecture}.
77
 
78
\begin{figure}
79
\begin{center}
80
\includegraphics[width=6cm]{images/arch2.eps}
81
\end{center}
82
\caption{File system architecture}
83
\label{fig:Filesystem-architecture}
84
\end{figure}
85
 
86
The library is connected to the Kernel via a \emph{glue} layer, to
87
the application via a \emph{wrapper} layer; the block devices and
88
the file system operations are separated by a cache, that may be
89
excluded.
90
 
91
\label{sec:arch:req}
92
 
93
The library needs a set of primitives that must be supported by
94
the underlying kernel, as memory handling, string handling, and
95
functions with a variable number of parameters. Moreover, the
96
kernel should support functions that implement:
97
 
98
\begin{itemize}
99
\item Mutual exclusion.
100
\item Allocation of low level resources (as I/O spaces and interrupts).
101
\item Memory protection (not needed in S.Ha.R.K.).
102
\end{itemize}
103
 
104
%----------------------------------------------------------------------------
105
\section{Low Level: Device Drivers}
106
%----------------------------------------------------------------------------
107
 
108
\begin{figure}
109
\begin{center}
110
\includegraphics[width=6cm]{images/archlow.eps}
111
\end{center}
112
\caption{Device Drivers}
113
\label{fig:archlow}
114
\end{figure}
115
 
116
The device drivers are divided in two main parts:
117
 
118
\begin{itemize}
119
\item A general part that exports the basic low level services. It
120
also gives a set of internal routines to the internal modules.
121
\item A set of internal modules that handle one or more
122
peripherals. The application can choose which module must be
123
linked, initialized and used at compile time.
124
\end{itemize}
125
 
126
\begin{figure}
127
\begin{center}
128
\includegraphics[width=6cm]{images/archlow2.eps}
129
\end{center}
130
\caption{Device Drivers - details}
131
\label{fig:archlow2}
132
\end{figure}
133
 
134
The general part is composed by four modules:
135
 
136
\begin{description}
137
\item [Block~device]The main part; it exports the interface to the upper layers,
138
inits the device driver subsystem and register all the drivers present in the
139
system.
140
\item [Logical~disk]This module provides functions that allow to inspect the
141
logical structure of an hard disk, like start, end, size and type of each
142
partition on the Hard Disk.
143
\item [Physical~disk]This module provide a small data base that contains
144
informations about all the hard disks in the system; these informations can be
145
useful for the module that must handle the low level pending request queues.
146
\item [Queue~policy]This module handles the low level policies for the pending
147
requests (SCAN, LOOK, EDF, ...).
148
\end{description}
149
 
150
%----------------------------------------------------------------------------
151
\section{High Level: File systems}
152
%----------------------------------------------------------------------------
153
 
154
\begin{figure}
155
\begin{center}
156
\includegraphics[width=10cm]{images/archhigh.eps}
157
\end{center}
158
\caption{High Level - File system details}
159
\label{fig:archhigh}
160
\end{figure}
161
 
162
The high level is divided in three parts:
163
 
164
\begin{itemize}
165
\item The first part handle the file systems, its initialization and its
166
interface to the internal modules.
167
\item A second part composed by a set of modules that implement a particular
168
file system (i.e., FAT16).
169
\item A cache module (whose policy can be easily changed).
170
\end{itemize}
171
 
172
The first part is composed by sub-modules (internal implementation
173
of a sub-module can be changed without affecting the rest of the
174
library):
175
 
176
\begin{description}
177
\item [Filesystems]It is responsible for the initialization and termination of
178
all the high level layer. It registers all the file systems, and it contains the
179
routines for mounting/unmounting the partitions.
180
\item [File]It handles the descriptor tables and the files; it defines the
181
\texttt{file} structure.
182
\item [Super]It handles the super blocks (that contains the global informations
183
about partitions on a disk).
184
\item [Inodes]It handles the inodes (informations about a file in the file
185
system).
186
\item [Dentry]It handles file names in a tree structure.
187
\item [Open,~read,~umask,~...]They implement the correspondent system
188
primitives.
189
\end{description}
190
 
191
%----------------------------------------------------------------------------
192
\chapter{Device drivers}
193
\label{sec:dd}
194
%----------------------------------------------------------------------------
195
 
196
%----------------------------------------------------------------------------
197
\section{Interface}
198
\label{sec:dd:interfaccia}
199
%----------------------------------------------------------------------------
200
 
201
%----------------------------------------------------------------------------
202
\subsection{Initialization}
203
\label{sec:dd:interfaccia:init}
204
%----------------------------------------------------------------------------
205
 
206
\begin{table}
207
\begin{verbatim}
208
typedef struct bdev_parms {
209
    __uint32_t showinfo;
210
    char *config;
211
    void *bmutexattr;
212
    __uint16_t dummy;
213
#ifdef IDE_BLOCK_DEVICE
214
    IDE_PARMS ideparms;
215
#endif
216
} BDEV_PARMS;
217
 
218
void bdev_def_showinfo (BDEV_PARMS s, int v);
219
void bdev_def_configstring(BDEV_PARMS s, char *v);
220
void bdev_def_mutexattrptr(BDEV_PARMS s,void *v);
221
\end{verbatim}
222
\caption{The BDEV\_PARMS structure}
223
\label{tab:BDEV_PARMS}
224
\end{table}
225
 
226
The bdev\_init() function (see Table \ref{tab:BDEV_PARMS}) must be
227
called to initialize a device driver, passing a pointer to a
228
BDEV\_PARMS structure. Use the provided macros to init that
229
structure.
230
 
231
The structure is composed by two parts: a generic part and a
232
specific part. The generic part is composed by the following
233
fields:
234
 
235
\begin{description}
236
\item [showinfo]If this flag is set, the initialization of the device will be
237
``verbose''. To set this flag, use the macro bdev\_def\_showinfo.
238
\item [config]It is a string that can have a particular meaning for the device
239
driver. The string is parsed by the device driver, that usually searches for
240
some initialization parameter (something like ``foo:[number]''). To set this
241
string, use the macro bdev\_def\_configstring.
242
\item [bmutexattr]This parameter can be used to tell the system which is the
243
kind of mutex to use to implement the mutual exclusion. To set this parameter,
244
use the macro \texttt{bdev\_def\_mutexattrptr}.
245
\item [dummy]Used to align the data structure (see the macro BASE\_DEV).
246
\end{description}
247
 
248
The specific part is directly specified by the device driver; the
249
bdev\_init function can be called can be called passing a pointer
250
to a BDEV\_PARMS structure initialized with a BASE\_BDEV value. If
251
NULL is specified, the default values are used.
252
 
253
%----------------------------------------------------------------------------
254
\subsubsection{How to create a new device driver}
255
%----------------------------------------------------------------------------
256
 
257
\begin{enumerate}
258
\item Add a new \emph{major} number in t the file include/fs/major.h (see
259
\ref{sec:dd:interfaccia:nomi});
260
\item Add a ``\#define'' into the file include/fs/bdevconf.h (to enable
261
selective compilation);
262
\item If needed, add fields to bdev\_parms, and modify the BASE\_BDEV macro with
263
the new default parameters;
264
\item Modify the function bdev\_init to call the init function of the new device
265
driver (into drivers/block/bdev.c).
266
\end{enumerate}
267
 
268
\begin{figure}
269
\begin{center}
270
\includegraphics[width=7cm]{images/ddinit.eps}
271
\end{center}
272
\caption{Device Drivers Initialization.}
273
\label{fig:ddinit}
274
\end{figure}
275
 
276
The device driver initialization function, after the
277
initialization of its internal part, does the following actions
278
(see Figure \ref{fig:ddinit}):
279
 
280
\begin{enumerate}
281
\item It registers itself calling bdev\_register(); That function must be called
282
for each device handled by the driver.
283
\item It registers every block device that it find into its interfaces to the
284
Physical Disk Module.
285
\item It registers every logical device found calling bdev\_register\_minor().
286
\end{enumerate}
287
 
288
Data structures and registration functions used in this phased are
289
described in Section \ref{sec:dd:interfaccia:det}.
290
 
291
%----------------------------------------------------------------------------
292
\subsection{Device serial number}
293
\label{sec:dd:interfaccia:nomi}
294
%----------------------------------------------------------------------------
295
 
296
\begin{figure}
297
\begin{center}
298
\includegraphics[width=7cm]{images/devtype.eps}
299
\end{center}
300
\caption{Device Serial Number}
301
\label{tab:devtype}
302
\end{figure}
303
 
304
Every logical device (i.e., a logical partition of a hard disk)
305
has a device serial number of type \_\_dev\_t, that must have a
306
unique value (see the POSIX standard).
307
 
308
The number has an internal structure (see Figure
309
\ref{tab:devtype}) composed by a \emph{major} \emph{number}, and
310
another number (whose number depends on the device implementation)
311
called \emph{minor} \emph{number}.
312
 
313
Three macro exists (see include/fs/sysmacro.h) to handle the
314
device numbers:
315
 
316
\begin{description}
317
\item [makedev]This macro creates a \_\_dev\_t number from the major and the
318
minor number.
319
\item [major]Returns the major number of a device.
320
\item [minor]Returns the minor number of a device.
321
\end{description}
322
 
323
The major number is used to decide which device driver will handle
324
a request (see Section \ref{sec:dd:interfaccia:det}). A device
325
driver can register more than on major number. Minor numbers are
326
device driver dependent, and the validity of a minor number is
327
checked only by the driver.
328
 
329
To simplify the search of a device, every device has a symbolic
330
name (see Section \ref{sec:dd:intercaccia:ext}). Symbolic names
331
are composed by two parts: a unique name for each major, and a
332
specific name assigned to each minor. For example, the major
333
MAJOR\_B\_IDE has a symbolic name ``ide''; a minor name for that
334
major can be ``hda2'', so the full name is ``ide/hda2'', that
335
identifies the second partition on the master hard disk on the
336
first IDE controller interface (see Section \ref{sec:dd:ide}).
337
 
338
%----------------------------------------------------------------------------
339
\subsection{Implementation details}
340
\label{sec:dd:interfaccia:det}
341
%----------------------------------------------------------------------------
342
 
343
\begin{table}
344
\begin{verbatim}
345
int bdev_register(__dev_t major, char *name, struct block_device *);
346
int bdev_register_minor(__dev_t device, char *name, __uint8_t fsind);
347
struct phdskinfo *phdsk_register( struct phdskinfo *disk);
348
\end{verbatim}
349
\caption{Block device registration functions}
350
\label{tab:Block_register}
351
\end{table}
352
 
353
These are the registration functions for the device drivers (see
354
Table \ref{tab:Block_register}):
355
 
356
\begin{description}
357
\item [bdev\_register]This function registers a device type (see Section
358
\ref{sec:dd:interfaccia:nomi}); the function receives as parameter the major
359
number, its symbolic name, and a pointer to a structure block\_device. The
360
function returns 0 in case of success, != 0 otherwise.
361
\item [bdev\_register\_minor]It registers a minor number, its symbolic name and
362
the file system type (see include/fs/fsind.h). This information is used by
363
bdev\_finddevice\_byname (see Section \ref{sec:dd:intercaccia:ext}). The
364
function returns 0 in case of success, != 0 otherwise.
365
\item [phdsk\_register]This function is used to register every physical device
366
found with parameters that can be used by the Disk Scheduling Algorithms (see
367
Section \ref{sec:dd:sched}). This function receives a pointer to a structure
368
phdiskinfo that contains the device parameters, and returns NULL in case of
369
error, or a pointer to the registered structure (that can be != from the
370
parameter) in case of success.
371
\end{description}
372
 
373
\begin{table}
374
\begin{verbatim}
375
struct dskgeometry {
376
    __uint16_t cyls;
377
    __uint16_t heads;
378
    __uint16_t sectors;
379
};
380
 
381
struct phdskinfo \{
382
    char pd_name[MAXPHDSKNAME];
383
    __dev_t pd_device;
384
    __blkcnt_t pd_size;
385
    int pd_sectsize;
386
    struct dskgeometry pd_phgeom;
387
    struct dskgeometry pd_logeom;
388
};
389
\end{verbatim}
390
\caption{The phdsk structure.}
391
\label{tab:structphdsk}
392
\end{table}
393
 
394
The phdskinfo structure contains the physical parameters of a
395
device. all the field must be filled before calling
396
phdsk\_register. The fields are:
397
 
398
\begin{description}
399
\item [pd\_name]device name, used only for debug purposes;
400
\item [pd\_device]device identificator;
401
\item [pd\_size]number of logical sectors of the device;
402
\item [pd\_sectsize]Dimension (bytes) of a logical sector;
403
\item [pd\_phgeom~e~pd\_logeom]geometry of the device (logical and phisical);
404
geometries are described using the struct dskgeometry (see Table
405
\ref{tab:structphdsk}):
406
 
407
\begin{description}
408
\item [cyls]number of cylinders (traces) of the device;
409
\item [heads]number of heads;
410
\item [sectors]number of sectors for each cylinder.
411
\end{description}
412
\end{description}
413
 
414
\begin{table}
415
\begin{verbatim}
416
struct block_device {
417
    char *bd_name;
418
    int bd_sectorsize;
419
    __uint32_t bd_flag_used:1;
420
    struct block_device_operations *bd_op;
421
};
422
\end{verbatim}
423
\caption{block\_device Structure}
424
\label{tab:structbdev}
425
\end{table}
426
 
427
The struct block\_device described in Table \ref{tab:structbdev}
428
is used to register the block devices:
429
 
430
\begin{description}
431
\item [bd\_name]Physical device name; for example, for the IDE device driver,
432
``ide''.
433
\item [bd\_sectorsize]logical sector dimension (bytes; redundant info but
434
useful).
435
\item [bd\_flag\_used]It should not modified by the device drivers. If set, the
436
corresponding device driver is present (a table of all the possible devices is
437
used to access the system in a fast way).
438
\item [bd\_op]Virtual device operations.
439
\end{description}
440
 
441
%----------------------------------------------------------------------------
442
\subsubsection{block\_device\_operation structure.}
443
\label{sec:dd:interfaccia:det:bdevop}
444
%----------------------------------------------------------------------------
445
 
446
\begin{table}
447
\begin{verbatim}
448
struct block_device_operations {
449
    int (*_trylock) (__dev_t device);
450
    int (*_tryunlock)(__dev_t device);
451
    int (*read) (__dev_t device, __blkcnt_t blocknum, __uint8_t *buffer);
452
    int (*seek) (__dev_t device, __blkcnt_t blocknum);
453
    int (*write) (__dev_t device, __blkcnt_t blocknum, __uint8_t *buffer);
454
};
455
\end{verbatim}
456
\caption{block\_device\_operations structure.}
457
\label{tab:structbdevop}
458
\end{table}
459
This structure contains the virtual operation that have to be
460
supplied by every device driver:
461
 
462
\begin{description}
463
\item [read]This function should read a logical pblock passed as parameter,
464
writing the result on the buffer. It returns 0 in case of success, != 0
465
otherwise.
466
\item [write]Similar to read, but it writes.
467
\item [seek]moves the head on the required block.
468
\item [\_trylock~e~\_tryunlock]Used to block/unblock a device. They return 0 in
469
case of success, != 0 otherwise; after initialization, and usually, every device
470
is in the unblocked state.
471
\end{description}
472
 
473
Why blocking/unblocking a device? The problem, described in
474
Section \ref{sec:dcache}, is that the data cache have to refer to
475
a logical block in an unique way. Since there are logical devices
476
that shares blocks (think at /dev/hda and /dev/hda2 in Linux), the
477
mount operation must be done in mutual exclusion, blocking the
478
device.
479
 
480
%----------------------------------------------------------------------------
481
\subsubsection{The ``lodsk'' module}
482
%----------------------------------------------------------------------------
483
 
484
\begin{table}
485
\begin{verbatim}
486
struct lodskinfo {
487
    __uint8_t fs_ind;
488
    __blkcnt_t start;
489
    __blkcnt_t size;
490
};
491
typedef int (*lodsk_callback_func)(int, struct lodskinfo*, void *);
492
int lodsk_scan(__dev_t device, lodsk_callback_func func, void *data, int
493
showinfo, char *lname);
494
\end{verbatim}
495
\caption{The lodsk module}
496
\label{tab:lodsk}
497
\end{table}
498
 
499
This module (see Table \ref{tab:lodsk}) offer a service to the
500
device drivers: It recognizes the logical structure (partitions)
501
of an HD as described into \cite{manual:DOS330} (part of the code
502
is derived from Linux).
503
 
504
before calling lodsk\_scan, the device driver must be already
505
registered; the function has five parameters:
506
 
507
\begin{enumerate}
508
\item The device ID where the search have to be done.
509
\item A callback function called for each partition found.
510
\item A generic pointer passed to the callback function.
511
\item A verbose flag.
512
\item The name of the logical device (i.e., ``hdb'', used if the verbose option
513
has been set.
514
\end{enumerate}
515
 
516
Each callback function is called passing as arguments:
517
 
518
\begin{enumerate}
519
\item The logical number of the partition (the Linux numbering scheme is used).
520
\item A structure with informations on the partition found. The structure has
521
the following fields:
522
 
523
\begin{description}
524
\item [fs\_ind]The file system number (see include/fs/fsind.h).
525
\item [start]The partition starting block.
526
\item [size]The partition size (number of blocks).
527
\end{description}
528
\item A the generic pointer passed to lodsk\_scan.
529
\end{enumerate}
530
 
531
The low\_level routines always check these values to avoid
532
read/write operations outside the partition.
533
 
534
%----------------------------------------------------------------------------
535
\subsubsection{Semaphores}
536
%----------------------------------------------------------------------------
537
 
538
The files include/fs/mutex.h and include/fs/semaph.h contain the
539
mutex/semaphore interface used by the file system.
540
 
541
\begin{table}
542
\begin{verbatim}
543
typedef internal_sem_t __sem_t;
544
 
545
#define __sem_init (ptr,val)
546
#define __sem_wait (ptr)
547
#define __sem_trywait (ptr)
548
#define __sem_signal (ptr)
549
\end{verbatim}
550
\caption{The semaphore interface}
551
\label{tab:semaph}
552
\end{table}
553
 
554
In particular, the semaphore interface (see Table
555
\ref{tab:semaph}) is simply a wrapper to the S.Ha.R.K. Internal
556
Semaphores.
557
 
558
\begin{table}
559
\begin{verbatim}
560
typedef internal_sem_t __mutex_t;
561
 
562
#define __mutex_init (ptr,attr)
563
#define __mutex_lock (ptr)
564
#define __mutex_trylock (ptr)
565
#define __mutex_unlock (ptr)
566
 
567
typedef SYS_FLAGS __fastmutex_t;
568
 
569
#define __fastmutex_init (ptr)
570
#define __fastmutex_lock (ptr)
571
#define __fastmutex_unlock (ptr)
572
\end{verbatim}
573
\caption{The mutex interface}
574
\label{tab:mutex}
575
\end{table}
576
 
577
The mutex interface requires two types of mutexes:
578
 
579
\begin{enumerate}
580
\item Fast mutexes: when only a few lines have to be protected (remapped to
581
kern\_fsave/kern\_frestore).
582
\item Normal mutexes: without any restriction (remapped again to the internal
583
semaphores).
584
\end{enumerate}
585
 
586
Please note that these functions are not directly used in the
587
code; instead, functions like \_\_b\_mutex\_lock and
588
\_\_fs\_mutex\_lock are used. This is done to make possible the
589
disabling of the mutual exclusion requirement when not needed
590
(i.e., if the file system runs as a process with its own address
591
space).
592
 
593
%----------------------------------------------------------------------------
594
\subsection{Exported interface}
595
\label{sec:dd:intercaccia:ext}
596
%----------------------------------------------------------------------------
597
 
598
\begin{table}
599
\begin{verbatim}
600
int bdev_read (__dev_t dev, __blkcnt_t blocknum, __uint8_t *buffer);
601
int bdev_write (__dev_t dev, __blkcnt_t blocknum, __uint8_t *buffer);
602
int bdev_seek (__dev_t dev, __blkcnt_t blocknum);
603
int bdev_trylock(__dev_t device); int bdev_tryunlock(__dev_t device);
604
__uint8_t bdev_getdeffs(__dev_t device);
605
__dev_t bdev_find_byname(char *name);
606
__dev_t bdev_find_byfs(__uint8_t fsind);
607
int bdev_findname(__dev_t device, char **majorname, char **minorname);
608
int bdev_scan_devices(int(*callback) (__dev_t,__uint8_t));
609
\end{verbatim}
610
\caption{Low level functions}
611
\label{tab:bdev}
612
\end{table}
613
 
614
The low level exports two kinds of functions: one to use the
615
devices, one to search/get the logical devices present on a
616
system.
617
 
618
The three main functions are:
619
 
620
\begin{description}
621
\item [bdev\_find\_byname]This function accepts a symbolic device name, as
622
described in Section \ref{sec:dd:interfaccia:nomi}, and returns 0 if the device
623
is not registered, or its number otherwise.
624
\item [bdev\_find\_byfs]This function accepts a file system ID (see
625
include/fs/fsind.h), and returns the first device that contains that file
626
system, or 0 if no one provides it.
627
\item [bdev\_scan\_device]This function can be used to get a list of all the
628
devices in the system; the user have to pass a callback that is called for each
629
device in the system. The function returns 0 in case of success, -1 if an error
630
occurred, or a value != 0 if the callback returns a value != 0. The callback is
631
called with a device serial number and a file system type that should be present
632
in the device.
633
\end{description}
634
 
635
These function are useful when at system startup the root file
636
system have to be mounted (see Section \ref{sec:fs:mount}).
637
 
638
The functions bdev\_getdeffs and bdev\textbackslash{}\_findname
639
are used for these purposes:
640
 
641
\begin{description}
642
\item [bdev\_getdeffs]to know a probable file system present on a logical
643
device.
644
\item [bdev\_findname]to know the logical name of a device. The function returns
645
!= 0 in case of error, and 0 in case of success. In the latter case, the two
646
buffers passed as parameter are filled with the major and minor names.
647
\end{description}
648
 
649
These functions simply check the parameter values and then they
650
calls the device drivers functions (see Section
651
\ref{sec:dd:interfaccia:det:bdevop}):
652
 
653
\begin{description}
654
\item [bdev\_read]Reads a block on the device.
655
\item [bdev\_write]Write a block on the device.
656
\item [bdev\_seek]Move the head in the position passed as parameter.
657
\item [bdev\_trylock~e~bdev\_tryunlock]Tries to block/unblock a device.
658
\end{description}
659
 
660
These functions can block the calling thread for an unspecified
661
amount of time. All the device drivers were thought to be used in
662
a multithreaded systems.
663
 
664
If a device driver is not multithread, proper mutual exclusion
665
must be ensured, because the file system high level functions will
666
call a low level service before the previous request has been
667
served. Note that if such a mutual exclusion is used, the access
668
to the device driver will be forced by the mutex policy, and so
669
all the disk scheduling algorithms will not work properly.
670
 
671
%----------------------------------------------------------------------------
672
\section{I/O requests scheduling}
673
\label{sec:dd:sched}
674
%----------------------------------------------------------------------------
675
 
676
Massy's thesis here inserts a dissertation on
677
the various disk scheduling algorithms that has been removed.
678
 
679
%----------------------------------------------------------------------------
680
\section{Disk scheduling algorithms implemented in S.Ha.R.K.}
681
\label{sec:dd:schedpres}
682
%----------------------------------------------------------------------------
683
 
684
The system currently support 5 different scheduling algorithms
685
that can be changed/increased simply modifying system
686
configuration.
687
 
688
\begin{table}
689
\begin{verbatim}
690
int bqueue_init (bqueue_t *);
691
int bqueue_numelements (bqueue_t *);
692
int bqueue_removerequest (bqueue_t *);
693
int bqueue_insertrequest (bqueue_t *, struct request_prologue *);
694
struct request_prologue *bqueue_getrequest(bqueue_t *);
695
\end{verbatim}
696
\caption{Queue handling functions}
697
\label{tab:bqueuefunc}
698
\end{table}
699
 
700
Only one algorithm can be active at a time. All the device drivers
701
should use the queuing interface listed in the bqueue.h file (see
702
Table \ref{tab:bqueuefunc}). The structure bqueue\_t is specific
703
for each scheduling algorithm. The semantic of each function in
704
the interface is the following:
705
 
706
\begin{description}
707
\item [bqueue\_init]Initialize the pending request queue; it returns 0 in case
708
of success, != 0 otherwise.
709
\item [bqueue\_numelements]Returns the number of elements in the queue.
710
\item [bqueue\_insertrequest]This function inserts a request in the queue,
711
following the specific algorithm behavior. The function returns 0 in case of
712
success, != 0 otherwise. Please note that requests are passed through a pointer
713
to the structrequest\_prologue\_t (see tab:structreqprolog). That is, every
714
algorithm should use a structure with a struct request\_prologue as first field,
715
passing a casted pointer to that structure.
716
\item [bqueue\_getrequest]Returns the first request in queue (without removing
717
it), and returns a pointer to that request, or NULL if there are not any pending
718
requests.
719
\item [bqueue\_removerequest]Removes the first request in queue, that is the
720
request that would have been returned by bqueue\_getrequest; it returns 0 in
721
case of success, != 0 in case of error.
722
\end{description}
723
 
724
\begin{table}
725
\begin{verbatim}
726
#define REQ_DUMMY 0
727
#define REQ_SEEK 1
728
#define REQ_READ 2
729
#define REQ_WRITE 3
730
 
731
struct request_prologue {
732
    unsigned sector;
733
    unsigned head;
734
    unsigned cylinder;
735
    unsigned nsectors;
736
    unsigned operation;
737
    struct request_specific x;
738
};
739
 
740
#define request_prologue_t struct request_prologue
741
\end{verbatim}
742
\caption{struct request\_prologue\_t}
743
\label{tab:structreqprolog}
744
\end{table}
745
 
746
before calling bdev\_insertrequest, all the fields of the struct
747
request\_prologue\_t must be filled (see Table
748
\ref{tab:structreqprolog}):
749
 
750
\begin{description}
751
\item [sector,~head,~cylinder]This is the starting position in the hard disk.
752
\item [nsectors]This is the number of sectors to transfer.
753
\item [operation]The operation that must be performed:
754
 
755
\begin{description}
756
\item [REQ\_DUMMY]request not specified.
757
\item [REQ\_SEEK]head seek to the specified position.
758
\item [REQ\_READ]read request.
759
\item [REQ\_WRITE]write request.
760
\end{description}
761
\end{description}
762
 
763
To add algorithm specific fields, you can use the method used for
764
the deadlines into Section \ref{sec:dd:sched:edf}.
765
 
766
struct request\_specific (see Table \ref{tab:structreqprolog})
767
must be defined by each algorithm, and contains algorithm-specific
768
fields.
769
 
770
struct bqueue\_t contains a scheduling queue. Its first field must
771
be a pointer to a struct phdiskinfo (see Section
772
\ref{sec:dd:interfaccia:det}, and Table \ref{tab:structphdsk}).
773
 
774
Here is a description about the implementation of various disk
775
scheduling algorithms.
776
 
777
%----------------------------------------------------------------------------
778
\subsection{FCFS}
779
\label{sec:dd:sched:fcfs}
780
%----------------------------------------------------------------------------
781
 
782
\begin{table}
783
\begin{verbatim}
784
typedef struct TAGfcfs_queue_t {
785
    struct phdskinfo *disk;
786
    __b_fastmutex_t mutex;
787
    struct request_prologue *head;
788
    struct request_prologue *tail;
789
    int counter;
790
} fcfs_queue_t;
791
 
792
struct request_specific {
793
    void *next;
794
};
795
\end{verbatim}
796
\caption{struct fcfs\_queue\_t}
797
\label{tab:structfcfs}
798
\end{table}
799
 
800
The FCFS policy is implemented using the structure in table
801
\ref{tab:structfcfs}:
802
 
803
\begin{description}
804
\item [mutex]A mutual exclusion semaphore.
805
\item [head~e~tail]a pointer to head and tail of a request queue.
806
\item [counter]number of pending requests.
807
\end{description}
808
 
809
Then, the struct request\_specific contains a pointer that is used
810
to link the list.
811
 
812
%----------------------------------------------------------------------------
813
\subsection{SSTF}
814
\label{sec:dd:sched:sstf}
815
%----------------------------------------------------------------------------
816
 
817
\begin{table}
818
\begin{verbatim}
819
typedef struct TAGsstf_queue_t {
820
    struct phdskinfo *disk;
821
    __b_fastmutex_t mutex;
822
    struct request_prologue *actual;
823
    struct request_prologue *lqueue;
824
    struct request_prologue *hqueue;
825
    int counter;
826
} sstf_queue_t;
827
 
828
struct request_specific {
829
    void *next;
830
};
831
\end{verbatim}
832
\caption{struct sstf\_queue\_t}
833
\label{tab:structsstf}
834
\end{table}
835
 
836
This scheduling policy is implemented using the data structures in
837
Figure \ref{tab:structsstf}:
838
 
839
\begin{description}
840
\item [mutex]for mutual exclusion.
841
\item [actual]the current request (that is returned by bqueue\_getrequest()).
842
\item [lqueue~e~hqueue]two lists ordered in decreasing and increasing order:
843
when a new request arrives, the request is inserted on one of the two queues
844
depending on the value of the current request cylinder. The current request will
845
become the nearest to the current cylinder. Note: This algorithm can cause
846
starvation.
847
\item [counter]The request counter.
848
\end{description}
849
 
850
Then, the struct request\_specific contains a pointer that is used
851
to link the list.
852
 
853
%----------------------------------------------------------------------------
854
\subsection{LOOK}
855
\label{sec:dd:sched:look}
856
%----------------------------------------------------------------------------
857
 
858
\begin{table}
859
\begin{verbatim}
860
typedef struct TAGlook_queue_t {
861
    struct phdskinfo *disk;
862
    __b_fastmutex_t mutex;
863
    int dir;
864
    struct request_prologue *queue[2];
865
    int counter;
866
} look_queue_t;
867
 
868
struct request_specific {
869
    void *next;
870
};
871
\end{verbatim}
872
\caption{struct look\_queue\_t}
873
\label{tab:structlook}
874
\end{table}
875
 
876
This scheduling policy is implemented using the data structures in
877
Figure \ref{tab:structlook}:
878
 
879
\begin{description}
880
\item [mutex]for mutual exclusion.
881
\item [dir]Is the looking direction: 0 ascending, 1 descending.
882
\item [queue]two ordered lists (one ascending, one descending). queue[dir] is
883
the queue used. When the list becomes empty, dir=!dir. a new request is inserted
884
in the current queue if its position is after/before the actual position
885
(otherwise, it is inserted in the other list).
886
\item [counter]the request counter.
887
\end{description}
888
 
889
This algorithm can be implemented in two variants:
890
 
891
\begin{enumerate}
892
\item Requests with position = current position are inserted in the current
893
queue (starvation can happen).
894
\item Requests with position = current position are inserted in the queue not
895
currently used. This is the default choice; to modify it, you have to modify the
896
look.c file.
897
\end{enumerate}
898
 
899
%----------------------------------------------------------------------------
900
\subsection{CLOOK}
901
\label{sec:dd:sched:clook}
902
%----------------------------------------------------------------------------
903
 
904
\begin{table}
905
\begin{verbatim}
906
typedef struct TAGclook_queue_t {
907
    struct phdskinfo *disk;
908
    __b_fastmutex_t mutex;
909
    int act;
910
    struct request_prologue *queue[2];
911
    int counter;
912
} clook_queue_t;
913
 
914
struct request_specific {
915
    void *next;
916
};
917
\end{verbatim}
918
\caption{struct clook\_queue\_t}
919
\label{tab:structclook}
920
\end{table}
921
 
922
This scheduling policy is implemented using the data structures in
923
Figure \ref{tab:structclook}:
924
 
925
\begin{description}
926
\item [mutex]for mutual exclusion.
927
\item [dir]This is the current queue.
928
\item [queue]Two ascending lists. queue[dir] is used until it is emptied. Then,
929
dir is changed. A new request is inserted in the current queue if the current
930
position is greater than the current one.
931
\item [counter]the request counter.
932
\end{description}
933
 
934
This implementation can introduce starvation because insertions at
935
the current position are done on the same queue.
936
 
937
%----------------------------------------------------------------------------
938
\subsection{EDF}
939
\label{sec:dd:sched:edf}
940
%----------------------------------------------------------------------------
941
 
942
\begin{table}
943
\begin{verbatim}
944
typedef struct TAGbd_edf_queue_t {
945
    struct phdskinfo *disk;
946
    __b_fastmutex_t mutex;
947
    struct request_prologue *queue;
948
    int inservice;
949
    int counter;
950
} bd_edf_queue_t;
951
 
952
struct request_specific {
953
    void *next;
954
    long dl;
955
};
956
\end{verbatim}
957
\caption{struct look\_queue\_t}
958
\label{tab:structedf}
959
\end{table}
960
 
961
This is the only real-time disk scheduling algorithm implemented
962
in S.Ha.R.K. This scheduling policy is implemented using the data
963
structures in Figure \ref{tab:structedf}:
964
 
965
\begin{description}
966
\item [mutex]For mutual exclusion.
967
\item [queue]An ordered list, increasing with the deadline.
968
\item [inservice]This flag is set when a request is in service.
969
\item [counter]The request counter.
970
\end{description}
971
 
972
\begin{table}
973
\begin{verbatim}
974
typedef struct {
975
    RES_MODEL r;
976
    TIME dl;
977
} BDEDF_RES_MODEL;
978
 
979
#define BDEDF_res_default_model(res)
980
#define BDEDF_res_def_level(res,l)
981
#define BDEDF_res_def_dl(res,dl)
982
\end{verbatim}
983
\caption{Resource Module for deadlines}
984
\label{tab:edfresmodel}
985
\end{table}
986
 
987
The request's deadline is got during task creation using the
988
S.Ha.R.K. resource module shown in Figure \ref{tab:edfresmodel}.
989
 
990
If the resource module is not used, the deadline is set to
991
infinity. That is, in that case the scheduling algorithm become a
992
FCFS.
993
 
994
%----------------------------------------------------------------------------
995
\section{Device driver: the IDE interface}
996
\label{sec:dd:ide}
997
%----------------------------------------------------------------------------
998
 
999
%----------------------------------------------------------------------------
1000
\subsection{Description}
1001
\label{sec:dd:ide:desc}
1002
%----------------------------------------------------------------------------
1003
 
1004
A description of the IDE standard can be found in\cite{manual:ideintel} and
1005
\cite{manual:idevia}. This interface allows the connection of up to 2 IDE
1006
peripherals using the ATA or ATAPI protocol (as described into
1007
\cite{manual:ata4}). This driver implements the ATA-4 protocol \footnote{The
1008
original Massy's thesis have a small description of the standard, that I have
1009
removed.}.
1010
 
1011
%----------------------------------------------------------------------------
1012
\subsection{Implementation}
1013
\label{sec:dd:ide:code}
1014
%----------------------------------------------------------------------------
1015
 
1016
\begin{figure}
1017
\begin{center}
1018
\includegraphics[width=7cm]{images/ide.eps}
1019
\end{center}
1020
\caption{The IDE subsystem}
1021
\label{fig:ide}
1022
\end{figure}
1023
 
1024
 
1025
The architecture of the IDE subsystem is shown in Figure
1026
\ref{fig:ide}: there is a part of the interface responsible for
1027
the system initialization (for example, for finding the IDE
1028
interfaces in the system, for finding the devices and their
1029
logical structure, for transforming the low level requests in
1030
ATA-4 commands), for command handling and another part for
1031
interfacing with the disk scheduling queue handling. Finally,
1032
there is a small part OS-dependent that should be rewritten to
1033
port it to other OSes.
1034
 
1035
If possible, the driver uses the DMA; in any case, polling can be
1036
forced. DMA is only supported in bus-master mode
1037
\cite{manual:ideintel}.
1038
 
1039
The system actually use only some commands of the classes ``PIO
1040
data in'', ``PIO data out'', ``Non-data'' e ``DMA''; it uses
1041
the ``read'', ``write'', ``seek'', ``dma read'', ``dma
1042
write'', ``identify'', ``pidentify'' and ``set feature''
1043
commands; packet commands are not currently supported, so ATAPI
1044
devices (like cd-roms) are currently not supported.
1045
 
1046
%----------------------------------------------------------------------------
1047
\subsubsection{Initialization}
1048
%----------------------------------------------------------------------------
1049
 
1050
The system is initialized calling the ide\_init() function into
1051
bdev\_init(). This is automatically done if the IDE support is
1052
compiled. The user should only initialize the struct bdev\_parms,
1053
as specified in Section \ref{sec:dd:interfaccia}.
1054
 
1055
\begin{table}
1056
\begin{verbatim}
1057
typedef struct ide_parms {
1058
    void *parm_initserver;
1059
} IDE_PARMS;
1060
 
1061
struct ide_server_model {
1062
    TIME dl;
1063
    TIME wcet;
1064
    TIME mit;
1065
};
1066
\end{verbatim}
1067
\caption{Data structures used to initialize the IDE driver.}
1068
\label{tab:ideinit}
1069
\end{table}
1070
 
1071
The IDE\_PARMS structure has only one parameter, and is included
1072
into the ideparms field in the global init data structure:
1073
 
1074
\begin{description}
1075
\item [parm\_initserver]This parameter is OS-specific and is
1076
related to the glue code.
1077
\end{description}
1078
 
1079
\begin{table}
1080
\begin{verbatim}
1081
void ide_glue_send_request (int ideif);
1082
int ide_glue_activate_interface (int ideif);
1083
void ide_glue_unactivate_interface (int ideif);
1084
\end{verbatim}
1085
\caption{The glue code for the IDE driver.}
1086
\label{tab:ideglue}
1087
\end{table}
1088
 
1089
The system dependent part of the IDE driver is contained into the
1090
file ideglue.c (see Figure \ref{tab:ideglue}):
1091
 
1092
\begin{description}
1093
\item [ide\_glue\_activate\_interface]It enables the interface passed as
1094
parameter: It creates an aperiodic task, and assigns to it the interface
1095
interrupt. The task is activated when the interrupt arrives.
1096
\item [ide\_glue\_activate\_interface]It disables the interface. The aperiodic
1097
task is killed.
1098
\item [ide\_glue\_send\_request]It executes the first available request: it
1099
explicitly activates the task interface; when the task is activated because of
1100
an interrupt, the pending request is terminated, and another request (if
1101
present) is activated.
1102
\end{description}
1103
 
1104
the parm\_initserver field contains a pointer to the struct
1105
ide\_server\_model, used to create the aperiodic tasks (see Figure
1106
\ref{tab:ideinit}):
1107
 
1108
\begin{description}
1109
\item [dl]task deadline in $\mu$sec.
1110
\item [wcet]task worst case execution time in $\mu$sec.
1111
\item [mit]the mean inter arrival time.
1112
\end{description}
1113
 
1114
If these parameters are not specified, standard parameters are
1115
used.
1116
 
1117
These defines are used to conditionally compile the driver (see
1118
the source code):
1119
 
1120
\begin{description}
1121
\item [FORCE\_SCANNING]this define force the search of the IDE peripherals also
1122
if the bus reset fails. This is useful if some peripherals does not follow the
1123
standard timings.
1124
\item [IDE\_OLDATAPIDELAY]this define forces a little delay that can be useful
1125
when using old ATAPI interfaces.
1126
\item [IDE\_FORCERESET]if defined a soft reset operation cannot be aborted
1127
because of a timeout. It must be specified if the peripheral does not follows
1128
the soft reset timings or if it cannot go in stand-by mode.
1129
\item [IDE\_SKIPSTATUSTEST]with this symbol we do not check if the command is
1130
being executed correctly (that is done reading on a status register); all the
1131
TX-based motherboard we found needs this symbol.
1132
\item [IDE\_SKIPALLTEST]when set, all the command are written into the registers
1133
without looking on the status register. This symbol implies the previous. This
1134
mode was needed on a TX motherboard where the polled mode for commands was
1135
unreliable.
1136
\end{description}
1137
 
1138
%----------------------------------------------------------------------------
1139
\subsubsection{Policy queues Interface}
1140
%----------------------------------------------------------------------------
1141
 
1142
\begin{table}
1143
\begin{verbatim}
1144
typedef struct TAGidereq_t {
1145
    request_prologue_t info;
1146
    int next;
1147
    __uint8_t resetonerror;
1148
    __uint8_t cmd;
1149
    __b_sem_t wait;
1150
    int result;
1151
    __uint8_t features;
1152
    __uint8_t cyllow;
1153
    __uint8_t cylhig;
1154
    __uint8_t seccou;
1155
    __uint8_t secnum;
1156
    __uint8_t devhead;
1157
    __uint8_t *buffer;
1158
} idereq_t;
1159
\end{verbatim}
1160
\caption{struct idereq\_t}
1161
\label{tab:idereq}
1162
\end{table}
1163
 
1164
Table \ref{tab:idereq} shows struct idereq\_t that is used by the
1165
driver to store the pending requests. It can be noted that the
1166
info field is of type request\_prologue\_t, as required by the
1167
modules that handle the scheduling policies (see
1168
\ref{sec:dd:schedpres}).
1169
 
1170
Note that (since the IDE interface allows only a command at a
1171
time) there are two queues for each IDE interface (one for the
1172
master device, one for the slave). These request queues are
1173
prioritized. That is, the master queue always have priority on the
1174
slave queue.
1175
 
1176
Here a small description of the fields showed in Table
1177
\ref{tab:idereq}:
1178
 
1179
\begin{description}
1180
\item [info]To handle the requests using a module for disk scheduling.
1181
\item [next]used internally to handle a request pool.
1182
\item [resetonerror]If set, an error during a command implies a soft reset.
1183
\item [cmd]The command for the peripheral (see \cite{manual:ata4}).
1184
\item [wait]Synchronization semaphore: when a task sends a command, it waits the
1185
end of the execution of the command.
1186
\item [result]This is the result: 0 in case of success, != 0 in case of error
1187
(see idelow.c for the error codes).
1188
\item [features,~cyllow,~cylhigh,~seccou,~secnum~e~devhead]Values to be inserted
1189
into the I/O registers of the IDE interface. The behavior is different depending
1190
on the peripheral mode (LBA or C/H/S) (see Section \ref{sec:dd:ide:desc}).
1191
\item [buffer]read/write buffer (if needed).
1192
\end{description}
1193
 
1194
%----------------------------------------------------------------------------
1195
\subsubsection{Minor numbers}
1196
%----------------------------------------------------------------------------
1197
 
1198
\begin{figure}
1199
\begin{center}
1200
\includegraphics[width=10cm]{images/ideminor.eps}
1201
\end{center}
1202
\caption{IDE minor numbers}
1203
\label{fig:ideminor}
1204
\end{figure}
1205
 
1206
This section describes how minor numbers are handled.
1207
 
1208
As described in Section \ref{sec:dd:intercaccia:ext}, the generic
1209
low level part uses the major number to pass a request to a
1210
device. The specific device is then selected using the minor
1211
number.
1212
 
1213
Figure \ref{fig:ideminor} shows the minor number structure of this
1214
device:
1215
 
1216
\begin{itemize}
1217
\item The first 3 bits identifies the interface (8 interfaces maximum).
1218
\item A bit for the peripheral (master/slave). This is used to enqueue request
1219
in the right queue.
1220
\item The last 4 bits says which partition should be used. This info is used to
1221
convert relative sector numbers into absolute sector numbers.
1222
\end{itemize}
1223
 
1224
Minor numbers are accessed only via macros. To modify the minor
1225
number mapping you should only modify these macros.
1226
 
1227
%----------------------------------------------------------------------------
1228
\chapter{File system}
1229
\label{sec:fs}
1230
%----------------------------------------------------------------------------
1231
 
1232
With the word ``file system'' we mean the internal organization
1233
of a block device; the purpose of such an organization is to store
1234
a set of files, usually divided in directories.
1235
 
1236
A block device is seen by the file system as an ordered sequence
1237
of fixed sized blocks. The basic operations of a file system are
1238
read/write operations on the n-th block.
1239
 
1240
%----------------------------------------------------------------------------
1241
\section{Interface}
1242
\label{sec:fs:int}
1243
%----------------------------------------------------------------------------
1244
 
1245
This section describes the interface between the system and the
1246
user tasks. It also describes the interface between a specific
1247
instance of file system and the system (just to make easy the
1248
expansion of the system with new real-time file systems).
1249
 
1250
%----------------------------------------------------------------------------
1251
\subsection{Initialization}
1252
\label{sec:fs:int:init}
1253
%----------------------------------------------------------------------------
1254
 
1255
To initialize the higher level of the system we use an interface
1256
similar to the device drivers interface (see Section
1257
\ref{sec:dd:interfaccia:init}.
1258
 
1259
\begin{table}
1260
\begin{verbatim}
1261
typedef struct filesystem_parms {
1262
    __dev_t device;
1263
    __uint8_t fs_ind;
1264
    __uint32_t fs_showinfo:1;
1265
    void *fs_mutexattr;
1266
    struct mount_opts *fs_opts;
1267
} FILESYSTEM_PARMS;
1268
 
1269
#define filesystem_def_rootdevice(par,rootdev)
1270
#define filesystem_def_fs(par,fs)
1271
#define filesystem_def_showinfo(par,show)
1272
#define filesystem_def_options(par,opts)
1273
 
1274
int filesystem_init(FILESYSTEM_PARMS *);
1275
\end{verbatim}
1276
\caption{struct filesystem\_parms}
1277
\label{tab:fsparms}
1278
\end{table}
1279
 
1280
The function filesystem\_init() is called to initialize the higher
1281
layer passing the parameters showed in Figure \ref{tab:fsparms}.
1282
These parameters must be initialized using the following macros:
1283
 
1284
\begin{description}
1285
\item [filesystem\_def\_rootdevice]Set the partition that must be used as
1286
\emph{root device}, that is, the directory associated to ``/''. This field is
1287
mandatory, and that implies that the higher layer must be initialized
1288
\emph{after} the device drivers.
1289
\item [filesystem\_def\_fs]Selects the filesystem that must be used with the
1290
device. It must be one of the symbols defined into include/fs/fsind.h; if the
1291
special symbol FS\_DEFAULT is used, the system tries to autodetect the
1292
filesystem type.
1293
\item [filesystem\_def\_showinfo]If this flag is set, then initialization
1294
messages are printed on the screen.
1295
\item [filesystem\_def\_options]There are the mount options.
1296
\end{description}
1297
 
1298
The field fs\_mutexattr has the same meaning of the correspondent
1299
field into bdev\_parms (see Section
1300
\ref{sec:dd:interfaccia:init}).
1301
 
1302
You can use the macro BASE\_FILESYSTEM to initialize a struct
1303
filesystem\_parms with some default values. NULL is not allowed as
1304
a parameter of filesystem\_init.
1305
 
1306
\begin{table}
1307
\begin{verbatim}
1308
struct mount_opts {
1309
    __uint32_t flags;
1310
    union {
1311
        struct msdosfs_parms msdos;
1312
    } x;
1313
};
1314
 
1315
/* Flags */
1316
#define MOUNT_FLAG_RW 0x01
1317
\end{verbatim}
1318
\caption{struct mount\_opts}
1319
\label{tab:mountopt}
1320
\end{table}
1321
 
1322
\label{sec:mountopt}The mount options for every device must be
1323
inserted into a struct mount\_opts showed in Table
1324
\ref{tab:mountopt}. The structure is composed by two parts: a
1325
general part and a filesystem-specific part. The general part
1326
actually contains the following fields:
1327
 
1328
\begin{description}
1329
\item [flags]This field contains the mount fields. The only flag
1330
currently defined is MOUNT\_FLAG\_RW, that allows write operations
1331
on the file system (default is read-only).
1332
\end{description}
1333
 
1334
\begin{figure}
1335
\begin{center}
1336
\includegraphics[width=7cm]{images/fsinit.eps}
1337
\end{center}
1338
\caption{Filesystem init}
1339
\label{fig:fsinit}
1340
\end{figure}
1341
 
1342
Filesystem initialization proceeds as shown in Figure \ref{fig:fsinit}:
1343
 
1344
\begin{enumerate}
1345
\item The user calls filesystem\_init();
1346
\item filesystem\_init() initializes itself and its modules, then all the file
1347
systems are initialized. filesystem\_init() have to be changed if a new
1348
filesystem is introduced.
1349
\item The file systems initialization functions have to register themselves
1350
calling the function filesystem\_register() with a struct file\_system\_type.
1351
\end{enumerate}
1352
 
1353
%----------------------------------------------------------------------------
1354
\subsubsection{Struct file\_system\_type}
1355
\label{sec:filesystemtype}
1356
%----------------------------------------------------------------------------
1357
 
1358
\begin{table}
1359
\begin{verbatim}
1360
struct file_system_type {
1361
    const char *name;
1362
    int fs_flags;
1363
    __uint8_t fs_ind;
1364
    struct super_block *(*read_super) (
1365
        __dev_t device,
1366
        __uint8_t fs_ind,
1367
        struct mount_opts *options
1368
    );
1369
};
1370
\end{verbatim}
1371
\caption{struct file\_system\_type}
1372
\label{src:structfs}
1373
\end{table}
1374
 
1375
This structure (see Table \ref{src:structfs})stores the
1376
informations needed for the filesystem registration. Here a short
1377
description of its fields:
1378
 
1379
\begin{description}
1380
\item [name]Pointer to the filesystem name (e.g., ``FAT16'').
1381
\item [fs\_flags]Filesystem flags (currently not used).
1382
\item [fs\_ind]Filesystem ID (every filesystem has a different ID).
1383
\item [read\_super]This the filesystem initialization function. It is called
1384
when a partition is mounted to initialize the correspondent struct super\_block.
1385
\end{description}
1386
 
1387
%----------------------------------------------------------------------------
1388
\subsection{Internal interface}
1389
%----------------------------------------------------------------------------
1390
 
1391
This section describes the interface between the high layer (that
1392
handles the file systems), and the modules that implements a
1393
specific filesystem.
1394
 
1395
When a user mount a device into a directory using a filesystem,
1396
the system uses the file system's specific struct
1397
file\_system\_type, calling the function read\_super() to read the
1398
superblock of the specified device.
1399
 
1400
The struct super\_block contains a pointer to a structure that
1401
contains a set of ``virtual operations'' (see Section
1402
\ref{sec:fs:intstruct}), that implements the behavior of a
1403
specific filesystem.
1404
 
1405
%----------------------------------------------------------------------------
1406
\subsubsection{struct super\_operations}
1407
\label{sec:superop}
1408
%----------------------------------------------------------------------------
1409
 
1410
\begin{table}
1411
\begin{verbatim}
1412
struct super_operations {
1413
    int (*init_inode) (struct inode *);
1414
    int (*read_inode) (struct inode *);
1415
    int (*write_inode) (struct inode *);
1416
    int (*put_super) (struct super_block *);
1417
    int (*delete_inode) (struct inode *);
1418
};
1419
\end{verbatim}
1420
\caption{struct super\_operations}
1421
\label{src:structsuperop}
1422
\end{table}
1423
 
1424
The structure in Table \ref{src:structsuperop} is used for super
1425
block manipulation. All the functions should return 0 in case of
1426
success, or a value != 0 otherwise.
1427
 
1428
\begin{description}
1429
\item [init\_inode]This function initializes a struct inode passed as parameter.
1430
That structure must have the fields i\_sp and i\_st.st\_ino already initialized
1431
(see Section \ref{sec:inode}).
1432
\item [read\_inode]This function reads the inode passed as parameter from the
1433
filesystem. The inode structure must have the fields i\_sp and i\_st.st\_ino
1434
already initialized.
1435
\item [write\_inode]This function writes a (valid, full initialized) inode into
1436
the filesystem.
1437
\item [delete\_inode]This function removes the inode from the filesystem (the
1438
inode must already exist on the filesystem).
1439
\item [put\_super]This function writes the super block passed as parameter into
1440
the filesystem. This function is called when a filesystem is unmounted. A dual
1441
function (that is, a function that reads a super\_block from a filesystem)
1442
(which is only used when mounting the filesystem) can be found into the struct
1443
file\_system\_type (see Section
1444
\ref{sec:filesystemtype}).
1445
\end{description}
1446
 
1447
%----------------------------------------------------------------------------
1448
\subsubsection{Struct dentry\_operations}
1449
%----------------------------------------------------------------------------
1450
 
1451
\begin{table}
1452
\begin{verbatim}
1453
struct dentry_operations {
1454
    int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
1455
};
1456
\end{verbatim}
1457
\caption{struct dentry\_operations}
1458
\label{src:structdentryop}
1459
\end{table}
1460
 
1461
\label{sec:dentryop} This structure contains all the functions
1462
used to manipulate a struct dentry.
1463
 
1464
\begin{description}
1465
\item [d\_compare]This function compares the two strings passed as
1466
parameter. it returns 0 if the two strings are equal, or != 0
1467
otherwise. The filesystem do not use directly the strcmp()
1468
function, because filename comparison depends on the filesystem
1469
(e.g., case sensitive, case insensitive).
1470
\end{description}
1471
 
1472
%----------------------------------------------------------------------------
1473
\subsubsection{struct inode\_operations}
1474
\label{sec:inodeop}
1475
%----------------------------------------------------------------------------
1476
 
1477
\begin{table}
1478
\begin{verbatim}
1479
struct inode_operations {
1480
    struct file_operations *default_file_ops;
1481
    struct inode* (*create) (struct inode *, struct dentry *);
1482
    struct inode* (*lookup) (struct inode *, struct dentry *);
1483
    int (*unlink) (struct dentry *);
1484
    void (*destroy) (struct inode *);
1485
    int (*truncate) (struct inode *, __off_t len);
1486
};
1487
\end{verbatim}
1488
\caption{struct inode\_operations}
1489
\label{src:structinodeop}
1490
\end{table}
1491
 
1492
This struct contains the virtual operations on the struct inode.
1493
These functions returns 0 in case of success, !=0 in case of
1494
error, that will be interpreted as a negate errno() value (e.g.,
1495
-EPERM instead of EPERM).
1496
 
1497
\begin{description}
1498
\item [default\_file\_ops]these are the file\_operations described into the
1499
paragraph \ref{sec:fileop}, that must be used with the inode. By default, the
1500
field is initialized with the file\_operation of its filesystem. These pointers
1501
can be modified to implement a kind of virtual filesystem that works on a
1502
device. For example, we can define an inode ``/dev/ide/hda1'' in a way that all
1503
read/write operations on it are not mapped in the operations of the filesystem
1504
where it is stored.
1505
\item [create]This function creates a new inode (a new file) with the name
1506
passed as parameter into the struct inode passed as parameter (that must be an
1507
inode of type ``directory''). It returns the new inode or NULL in case of
1508
error.
1509
\item [lookup]This function searches an inode passed as parameter into another
1510
inode of type directory passed as parameter. It returns the inode just found or
1511
NULL in case of error.
1512
\item [unlink]This function unlinks the filename passed as parameter from the
1513
parameter of the corresponding inode. The inode will be removed (by generic
1514
filesystem routines) if it remains without links associated to it.
1515
\item [destroy]This function destroys the file linked to the inode. It frees the
1516
space occupied by the file. Often, it is sufficient to call the truncate
1517
function with length 0.
1518
\item [truncate]This function truncates the file length of the inode passed as
1519
parameter at the length specified in the second parameter. If the given length
1520
is greater than the actual file length, the function does nothing.
1521
\end{description}
1522
 
1523
%----------------------------------------------------------------------------
1524
\subsubsection{struct file\_operations}
1525
\label{sec:fileop}
1526
%----------------------------------------------------------------------------
1527
 
1528
\begin{table}
1529
\begin{verbatim}
1530
struct file_operations {
1531
    __off_t (*lseek) (struct file *, __off_t, int);
1532
    __ssize_t (*read) (struct file *, char *, __ssize_t);
1533
    __ssize_t (*write) (struct file *, char *, __ssize_t);
1534
    int (*readdir) (struct file *, void *);
1535
    int (*open) (struct inode *, struct file *);
1536
    int (*close) (struct inode *, struct file *);
1537
};
1538
\end{verbatim}
1539
\caption{struct file\_operations}
1540
\label{src:structfileop}
1541
\end{table}
1542
 
1543
This structure contains all the function that must work with
1544
files. If not otherwise stated, the functions can operate both on
1545
files and on directory.
1546
 
1547
\begin{description}
1548
\item [lseek]Moves the input/output position of the file specified as parameter.
1549
The new position is specified as in the lseek primitive (described in
1550
\cite{book:posix1003.1} or in all the Unix programming books). That is, the
1551
position is specified with a delta with respect to a given position (start of
1552
file, end of file, current position); As described in section \ref{sec:file}
1553
during the description of the field f\_pos, the new position can be greater than
1554
the end-of-file. This function must return a position inside the file or an
1555
error code.
1556
\item [read]Reads some data and puts it into the buffer. The function is called
1557
only for regular files. It returns the number of bytes read, that can be less
1558
than requested, or an error code.
1559
\item [write]Writes some bytes from the specified buffer. It is called only for
1560
regular files. It returns the number of bytes written, that can be less than
1561
requested, or an error code.
1562
\item [readdir]Reads the next filename of the directory parameter passed through
1563
a struct file. It inserts it into the struct dirent passed as parameter. Returns
1564
 
1565
\item [open]Opens the file linked to the inode. Fills some internal fields in
1566
the filesystem-specific struct file. Returns 0 in case of success, an error
1567
otherwise.
1568
\item [close]Closes the file passed as parameter. returns 0 in case of success,
1569
an error in case of failure.
1570
\end{description}
1571
 
1572
Most of the function described in this section works on buffers
1573
passed by the user. the general routines checks that these buffer
1574
are corrects. In case the operating system implement some memory
1575
protection mechanisms, the functions copyfromuser() and
1576
copytouser() should be used (their syntax is similar to memcpy()).
1577
Note that Shark currently do not provide memory protection, so
1578
memcpy() can be used.
1579
 
1580
%----------------------------------------------------------------------------
1581
\subsection{External interface}
1582
%----------------------------------------------------------------------------
1583
 
1584
%----------------------------------------------------------------------------
1585
\subsubsection{Non standard functions}
1586
\label{sec:fs:mount}
1587
%----------------------------------------------------------------------------
1588
 
1589
\begin{table}
1590
\begin{verbatim}
1591
int mount (dev_t device, u_int8_t fsind, char *where, struct mount_opts
1592
*options);
1593
int umount (dev_t device);
1594
dev_t fdevice(char *device_name);
1595
\end{verbatim}
1596
\caption{mount/umount functions}
1597
\label{tab:mount}
1598
\end{table}
1599
 
1600
The mount/umount functions are the non-standard functions provided
1601
by the filesystem (their prototype can be found in
1602
include/sys/mount.h). Here is a small description of these
1603
functions (see Table \ref{tab:mount}):
1604
 
1605
\begin{description}
1606
\item [fdevice]This function returns the device serial number of a
1607
given filename. See Section \ref{sec:dd:interfaccia:nomi}.
1608
\item [mount]Is used to mount a partition. These are the informations that must
1609
be passed to it:
1610
 
1611
\begin{itemize}
1612
\item The device serial number that must be mounted
1613
\item The filesystem identifier (fsind) that should be used (see
1614
include/fs/fsind.h).
1615
\item The directory where the device will be mounted.
1616
\item Other parameters (see Section \ref{sec:mountopt}).
1617
\end{itemize}
1618
 
1619
The function returns 0 in case of success, != 0 in case of failure
1620
(errno is modified).
1621
 
1622
\item [umount]This function unmount a device that was previously
1623
mounted. This function cannot unmount the ``root'' device.
1624
During shutdown, all the filesystem will be unmounted in the
1625
proper order.
1626
\end{description}
1627
 
1628
%----------------------------------------------------------------------------
1629
\subsection{Initialization: an example}
1630
%----------------------------------------------------------------------------
1631
 
1632
This is an example of the filesystem initialization in S.Ha.R.K.
1633
To initialize the system, we must register the kernel modules
1634
first:
1635
 
1636
\begin{verbatim}
1637
TIME __kernel_register_levels__(void *arg)} {
1638
    struct multiboot_info *mb=(struct multiboot_info*)arg;
1639
 
1640
    EDF_register_level(EDF_ENABLE_ALL);
1641
    RR_register_level(RRTICK, RR_MAIN_YES, mb);
1642
    CBS_register_level(CBS_ENABLE_ALL, 0);
1643
    dummy_register_level();
1644
 
1645
    SEM_register_module();
1646
    CABS_register_module();
1647
    NOPM_register_module();
1648
    return TICK;
1649
}
1650
\end{verbatim}
1651
 
1652
The filesystem always require the registration of a module that
1653
can accept soft aperiodic tasks (such as the CBS, for example).
1654
Moreover, a mutex protocol must be present (NOPM, in this case).
1655
 
1656
\begin{verbatim}
1657
TASK __init__(void *arg) {
1658
    extern int __bdev_sub_init(void);
1659
    extern int __fs_sub_init(void);
1660
    struct multiboot_info *mb = (struct multiboot_info*)arg;
1661
 
1662
    /* block devices initialization */
1663
    __bdev_sub_init();
1664
 
1665
    /* filesystem initialization */
1666
    __fs_sub_init();
1667
    __call_main__(mb);
1668
    return (void *)0;
1669
}
1670
\end{verbatim}
1671
 
1672
This is a typical S.Ha.R.K. \_\_init\_\_ function. In this
1673
function, block devices and file systems should be initialized.
1674
 
1675
\begin{verbatim}
1676
int __bdev_subinit(void) {
1677
    extern __dev_t root_device;
1678
    BDEV_PARMS bdev=BASE_BDEV;
1679
 
1680
    /* low level initialization */
1681
    bdev_def_showinfo(bdev,TRUE);
1682
    bdev_init(&bdev);
1683
 
1684
    /* root device specification */
1685
    root_device = bdev_scan_devices(choose_root_callback);
1686
    if (root_device < 0) {
1687
 
1688
        /* in case of error... */
1689
        exit(1);
1690
    }
1691
    return 0;
1692
}
1693
\end{verbatim}
1694
 
1695
 
1696
This function initializes the block device layer. This function
1697
sets the parameters of the bdev variable, calls the low level
1698
initialization functions and searches for the \emph{root device};
1699
basically, the function must specify in some way which is the root
1700
device. In the function, the low level layer lists all the
1701
available devices, calling the function choose\_root\_callback()
1702
for each of them. That function will choose the root device.
1703
 
1704
\begin{verbatim}
1705
int choose_root_callback(dev_t dev,u_int8_t fs) {
1706
    if (fs==FS_MSDOS)
1707
        return dev;
1708
    return -1;
1709
}
1710
\end{verbatim}
1711
 
1712
In this case, the choice of the root device is quite simple: the
1713
first FAT16 DOS partition becomes the root device. Usually,
1714
DOS calls that partition ``C:''.
1715
 
1716
\begin{verbatim}
1717
int __fs_sub_init(void) {
1718
    extern __dev_t root_device;
1719
    FILESYSTEM_PARMS fs = BASE_FILESYSTEM;
1720
    struct mount_opts opts;
1721
 
1722
    /* mount parameters */
1723
    memset(&opts, 0, sizeof(struct mount_opts));
1724
    opts.flags = MOUNT_FLAG_RW;
1725
 
1726
    /* filesystem initialization */
1727
    filesystem_def_rootdevice(fs,root_device);
1728
    filesystem_def_fs(fs,FS_MSDOS);
1729
    filesystem_def_showinfo(fs,TRUE);
1730
    filesystem_init(&fs);
1731
 
1732
    /* C library initialization */
1733
    libc_initialize();
1734
 
1735
    return 0;
1736
}
1737
\end{verbatim}
1738
 
1739
This functions initializes the filesystem giving R/W permission on
1740
the root partition previously found. Then, the C library is
1741
initialized.
1742
 
1743
%----------------------------------------------------------------------------
1744
\section{Internal structure}
1745
\label{sec:fs:intstruct}
1746
%----------------------------------------------------------------------------
1747
 
1748
Many data structures are composed by two parts: a generic part,
1749
that is independent from the particular filesystem, and a
1750
filesystem specific part, that are usually stored into an union.
1751
 
1752
Almost all the data structures have a set of functions that
1753
modifies them. Pointers to these functions are included in a
1754
structure that has the name of the structure that is modified by
1755
the functions with a suffix ``operations''. For example, the
1756
structure super\_block (see Section \ref{sec:superblock}) has a
1757
structure super\_operations (see Section \ref{sec:superop}) that
1758
contains a function create\_inode.
1759
 
1760
Since the high layer functions influences the global system
1761
throughput, some informations are duplicated in the data
1762
structures to speed up the execution.
1763
 
1764
The system is basically structured with a Unix-like approach, with
1765
4 fundamental structures:
1766
 
1767
\begin{description}
1768
\item [super]contains informations on each device that is mounted (a table is
1769
used);
1770
\item [inode]contains informations on the files in a filesystem (double lists
1771
with hash keys are used);
1772
\item [dentry]contains the file names (a tree is used);
1773
\item [file]contains informations on each open file (a descriptor table is
1774
used).
1775
\end{description}
1776
 
1777
%----------------------------------------------------------------------------
1778
\subsection{The data structures}
1779
%----------------------------------------------------------------------------
1780
 
1781
%----------------------------------------------------------------------------
1782
\subsubsection{struct super\_block}
1783
\label{sec:superblock}
1784
%----------------------------------------------------------------------------
1785
 
1786
From a user point of view, the filesystem exports a unix-like view
1787
of the system. That is, partitions can be mounted into the
1788
directory tree. When the system starts, the root directory is
1789
mounted as ``/''.
1790
 
1791
\begin{table}
1792
\begin{verbatim}
1793
struct super_block {
1794
    struct super_block *sb_parent;
1795
    struct super_block *sb_childs;
1796
    struct super_block *sb_next;
1797
    __dev_t sb_dev;
1798
    struct inode *sb_root;
1799
    struct dentry *sb_droot;
1800
    struct file_system_type *sb_fs;
1801
    struct super_operations *sb_op;
1802
    struct dentry_operations *sb_dop;
1803
    struct mount_opts sb_mopts;
1804
    int sb_busycount;
1805
    __uint32_t sb_used:1;
1806
    __uint32_t sb_blocked:1;
1807
    union {
1808
        struct msdos_super_block msdos_sb;
1809
    } u;
1810
};
1811
\end{verbatim}
1812
\caption{struct super\_block}
1813
\label{src:structsuper}
1814
\end{table}
1815
 
1816
The struct super\_block contains the fundamental information of
1817
the filesystem (a unix-like filesystem usually duplicates this
1818
informations a few times into the Hard disk). Every time a
1819
partition is mounted a new structure is allocated and initialized.
1820
These structures are maintained in a tree, that is, if a partition
1821
is mounted into a directory the new structure becomes a leaf of
1822
the structure that contains the mount directory.
1823
 
1824
\begin{figure}
1825
\begin{center}
1826
\includegraphics[width=7cm]{images/supertree.eps}
1827
\end{center}
1828
\caption{Super block tree}
1829
\label{fig:supertree}
1830
\end{figure}
1831
 
1832
The root super block is stored into the global pointer
1833
root\_superblock; Figure \ref{fig:supertree} shows a possible
1834
super block tree.
1835
 
1836
Here a short description of the struct super\_block fields (see
1837
Table \ref{src:structsuper}):
1838
 
1839
\begin{description}
1840
\item [sb\_parent,~sb\_childs,~sb\_next]are used to store the tree structure:
1841
sb\_parent points to the father of the structure, whereas sb\_childs points to
1842
the first structure of the childs; sb\_next is used to create the brothers list
1843
(that is the children list of the father).
1844
\item [sb\_dev]Contains the device, that is the partition that contains the
1845
filesystem (the low level part is responsible of the devices, that are typically
1846
associated to an hard disk partition. It is possible for example to extend the
1847
low level to associate to a device other entities, such a file on another
1848
filesystem (this is usually possible on Unix system through loopback devices).
1849
\item [sb\_root]This is the root directory inode of the filesystem.
1850
\item [sb\_droot]This is the directory entry (the filename) of the root
1851
directory.
1852
\item [sb\_fs]This is the pointer to the filesystem present in the filesystem.
1853
Every supported file system is registered at init time (see Section
1854
\ref{sec:fs:int:init}).
1855
\item [sb\_op]Pointer to a structure that contains the specific filesystem
1856
functions used to use the struct super\_block.
1857
\item [sb\_dop]These are the specific function used for the directory entries.
1858
\item [sb\_mopts]Mount options used during initialization.
1859
\item [sb\_busycount]This is a counter that counts the number of entities that
1860
are using the filesystem (e.g., a partition can be unmounted if someone is
1861
reading it).
1862
\item [sb\_used]A flag that signals if the structure is used.
1863
\item [sb\_blocked]Some operations on this structure must be executed in mutual
1864
exclusion; if this flag is set there is some operation on the structure. This
1865
flag can not be used/tested/modified directly by the user functions.
1866
\end{description}
1867
 
1868
%----------------------------------------------------------------------------
1869
\subsubsection{struct dentry}
1870
%----------------------------------------------------------------------------
1871
 
1872
\begin{table}
1873
\begin{verbatim}
1874
struct dentry {
1875
    struct dentry *d_next;
1876
    struct dentry *d_prev;
1877
    struct dentry *d_parent;
1878
    struct dentry *d_child;
1879
    __time_t d_acc;
1880
    struct qstr d_name;
1881
    short d_lock;
1882
    struct dentry_operations *d_op;
1883
    struct inode *d_inode;
1884
    struct super_block *d_sb;
1885
};
1886
\end{verbatim}
1887
\caption{struct dentry}
1888
\label{src:structdentry}
1889
\end{table}
1890
 
1891
\label{sec:dentry} The struct dentry contains the file names independently from
1892
the file type (regular file, directory, ...) \footnote{``dentry'' means
1893
\emph{D}irectory \emph{ENTRY}.}. Each file in the filesystem is always stored in
1894
two structures:
1895
 
1896
\begin{itemize}
1897
\item struct dentry contains the filename;
1898
\item struct inode contains the other informations on the file.
1899
\end{itemize}
1900
 
1901
This separation is made because in most file systems (not in the DOS FAT16)
1902
more than one name can be linked to the same file (inode).
1903
 
1904
\begin{figure}
1905
\begin{center}
1906
\includegraphics[width=10cm]{images/dentrytree.eps}
1907
\end{center}
1908
\caption{Directory entry tree.}
1909
\label{fig:dentrytree}
1910
\end{figure}
1911
 
1912
 
1913
These data structures are stored internally in a tree structure. Please note
1914
that the inodes are not structured as a tree; every dentry has a pointer to the
1915
correspondent inode.
1916
 
1917
Figure \ref{fig:dentrytree} shows a directory entry tree.
1918
Directories are showed with a continuous line, whereas files are
1919
showed with a dashed line (note that the internal structures for
1920
files and directories are the same).
1921
 
1922
The system do not load all the file names in the filesystem. Only
1923
the recent filenames are stored in memory in a partial tree. These
1924
informations are continuously updated removing and adding dentry
1925
struct.
1926
 
1927
\begin{table}
1928
\begin{verbatim}
1929
struct qstr {
1930
    char name[MAXFILENAMELEN+1];
1931
    char *nameptr;
1932
};
1933
\end{verbatim}
1934
\caption{struct qstr}
1935
\label{src:structqstr}
1936
\end{table}
1937
 
1938
Directory names are stored into a struct qstr (see Table
1939
\ref{src:structqstr}). A string is not used to avoid too much
1940
sting copies. The structure is composed by two fields:
1941
 
1942
\begin{description}
1943
\item [nameptr]If != NULL, it is the name to use.
1944
\item [name]If nameptr==NULL, it is the name to use.
1945
\end{description}
1946
 
1947
The QSTRNAME can be used on a structure QSTR to get a (char *).
1948
When filling the structure, one of the two fields can be used. If
1949
nameptr is used, the pointed string must be statically allocated.
1950
If name is used, nameptr must be set to NULL.
1951
 
1952
The fields of the struct dentry contains the following
1953
informations:
1954
 
1955
\begin{description}
1956
\item [d\_next,~d\_prev,~d\_parent~e~d\_child]can be used to store a dentry tree
1957
structure: d\_parent is a pointer to the father's dentry, d\_child is a pointer
1958
to a list of children dentry, d\_next and d\_prev are used to navigate into the
1959
children dentry list.
1960
\item [d\_acc]It is the time of the last access to the dentry structure.
1961
\item [d\_name]Is the name associated to the dentry.
1962
\item [d\_lock]is a blocking counter: it is incremented every time a routine
1963
needs to use the dentry, and it is decremented when it is no more needed. In
1964
that way, it is possible to know which are the structures currently in use. The
1965
filesystem routines should not directly modify this field.
1966
\item [d\_op]Pointer to the virtual operations used to handle this structure.
1967
\item [d\_inode]is the inode associated to the structure.
1968
\item [d\_sb]It is the super\_block that contains the directory. This a
1969
redundant information used to speed-up the code.
1970
\end{description}
1971
 
1972
%----------------------------------------------------------------------------
1973
\subsubsection{struct inode}
1974
\label{sec:inode}
1975
%----------------------------------------------------------------------------
1976
 
1977
\begin{figure}
1978
\begin{center}
1979
\includegraphics[width=10cm]{images/inodebuckets.eps}
1980
\end{center}
1981
\caption{Inode data structures}
1982
\label{fig:inodebuckets}
1983
\end{figure}
1984
 
1985
 
1986
\begin{table}
1987
\begin{verbatim}
1988
struct inode {
1989
    struct stat i_st;
1990
    struct inode *i_next;
1991
    struct inode *i_prev;
1992
    int i_hash;
1993
    __rwlock_t i_lock;
1994
    __uint16_t i_dlink;
1995
    __uint32_t i_dirty:1;
1996
    struct inode_operations *i_op;
1997
    struct super_block *i_sb;
1998
    union {
1999
        struct msdos_inode_info msdos_i;
2000
    } u;
2001
};
2002
\end{verbatim}
2003
\caption{struct inode}
2004
\label{src:structinode}
2005
\end{table}
2006
 
2007
An inode is a file into a filesystem. It contains all the file
2008
informations except the name that is maintained into a struct
2009
dentry (see \ref{sec:dentry}; more details can be found into
2010
\cite{book:unix}).
2011
 
2012
All the inodes temporarily present in memory are stored into an
2013
hash bucket structure as showed in Figure \ref{fig:inodebuckets}:
2014
for each device a number is computed using an hash function, the
2015
inode number (file serial number) and the device number. All the
2016
inodes with the same hash entry are stored in a list. The first
2017
inode in the list is stored in a table at the index given by the
2018
hash key.
2019
 
2020
\begin{table}
2021
\begin{verbatim}
2022
struct stat {
2023
    __dev_t st_dev;
2024
    __ino_t st_ino;
2025
    __mode_t st_mode;
2026
    __nlink_t st_nlink;
2027
    __uid_t st_uid;
2028
    __gid_t st_gid;
2029
    __off_t st_size;
2030
    unsigned long st_blksize;
2031
    __time_t st_atime;
2032
    __time_t st_mtime;
2033
    __time_t st_ctime;
2034
};
2035
\end{verbatim}
2036
\caption{struct stat}
2037
\label{src:structstat}
2038
\end{table}
2039
 
2040
An inode contains the following fields:
2041
 
2042
\begin{description}
2043
\item [i\_st]Contains all the standard informations for a file (e.g., the
2044
length, the type); a task can inspect these informations using the stat()
2045
primitive. These informations are included into a stat structure that is shared
2046
between the user libraries and the internal implementation, in a way that a
2047
memcopy should be enough to pass these informations.
2048
\item [i\_next,~i\_prev~e~i\_hash]These fields are used to implement the double
2049
linked list and the hash key: i\_next and i\_prev implement a double list,
2050
whereas i\_hash is the hash key (redundant information).
2051
\item [i\_lock]It is a lock for the structure (see Section \ref{sec:rwlock}).
2052
\item [i\_dlink]Number of directory entries that points to the inode (see
2053
Section \ref{sec:dentry}).
2054
\item [i\_dirty]A flag that says if this inode has been modified.
2055
\item [i\_op]Virtual operations used for working with this inode. Usually they
2056
are filesystem specific.
2057
\item [i\_sb]Super block where the inode is stored (see Section
2058
\ref{sec:superblock}).
2059
\end{description}
2060
 
2061
The struct stat showed in Table \ref{src:structstat}, contained
2062
into the struct inode and described in the paragraph 5.6 of
2063
\cite{book:posix1003.1}, contains the following fields:
2064
 
2065
\begin{description}
2066
\item [st\_mode]It is the file mode (informations about who can
2067
read/modify/execute a file).
2068
\item [st\_ino]An unique number that identifies a file (``file serial number''
2069
in the Posix terminology).
2070
\item [st\_dev]It is the device serial number; st\_dev and st\_ino are necessary
2071
and sufficient to uniquely identify a file in the system.
2072
\item [st\_nlink]Number of hard links to a file (Always 1 for FAT16 file
2073
systems).
2074
\item [st\_uid~e~st\_gid]File group and user identifiers. 0 is used for the
2075
system administrator. The FAT16 filesystem always set these fields to 0.
2076
\item [st\_size]File length (bytes).
2077
\item [st\_atime,~st\_mtime~e~st\_ctime]File times: st\_atime is the time of the
2078
last access, st\_mtime is the time of the last modification and st\_ctime is the
2079
time of the last state change (e.g., when the file has been created or modified
2080
using chmod).
2081
\end{description}
2082
 
2083
%----------------------------------------------------------------------------
2084
\subsubsection{struct file}
2085
\label{sec:file}
2086
%----------------------------------------------------------------------------
2087
 
2088
\begin{table}
2089
\begin{verbatim}
2090
struct file {
2091
    struct file *f_next;
2092
    struct dentry *f_dentry;
2093
    struct file_operations *f_op;
2094
    __loff_t f_pos;
2095
    unsigned int f_count;
2096
    __uint32_t f_flag_isdir:1;
2097
    union {
2098
        struct msdos_file_info msdos_f;
2099
    } u;
2100
};
2101
\end{verbatim}
2102
\caption{struct file}
2103
\label{src:structfile}
2104
\end{table}
2105
 
2106
Every open file has a struct file associated to it (see Table
2107
\ref{src:structfile}).
2108
 
2109
\begin{description}
2110
\item [f\_next]Pointer used internally to the filesystem modules.
2111
\item [f\_dentry]contains the filename (see Section \ref{sec:dentry}). From here
2112
we can get the corresponding inode number.
2113
\item [f\_op]Contains a pointer to the virtual operation that works on struct
2114
file, described in Section \ref{sec:fileop}.
2115
\item [f\_pos]Contains the current position into the file. read/write operations
2116
are done starting from this point; moreover, Posix specification allow a seek to
2117
a position beyond file termination: in that case read operation should fail,
2118
whereas write operation should fill the gap with zeroes.
2119
\item [f\_count]Contains the number of file descriptors that are using this
2120
structure for input/output operations on the file.
2121
\item [f\_flag\_isdir]Flag that says if this is a directory (redundant
2122
information).
2123
\item [msdos\_f]FAT16 specific informations.
2124
\end{description}
2125
 
2126
The applications use a file descriptor to index the file to use. A
2127
file descriptor is a number meaningful only for the processor that
2128
obtained it (e.g. through an open operation) that is assigned to a
2129
struct file through a descriptor table. Every process has a
2130
different table, contained into the struct file\_descriptors
2131
described in section \ref{sec:filedes}. The whole S.Ha.R.K. Kernel
2132
can be considered a monoprocess multithread application, and for
2133
that reason it has only one descriptor table.
2134
 
2135
%----------------------------------------------------------------------------
2136
\subsubsection{struct file\_descriptors}
2137
\label{sec:filedes}
2138
%----------------------------------------------------------------------------
2139
 
2140
\begin{table}
2141
\begin{verbatim}
2142
struct file_descriptors {
2143
    struct file *fd_file[MAXOPENFILES];
2144
    __uint16_t fd_flags[MAXOPENFILES];
2145
    __uint32_t fd_fflags[MAXOPENFILES];
2146
    __mode_t fd_umask;
2147
    char fd_cwd[MAXPATHNAMELEN+1];
2148
    struct dentry *fd_cwden;
2149
    __mutex_t fd_mutex;
2150
};
2151
\end{verbatim}
2152
\caption{struct file\_descriptors}
2153
\label{src:structfiledes}
2154
\end{table}
2155
 
2156
This struct is used to store all the general parameters assigned
2157
to a process (e.g., the current directory). Again, S.Ha.R.K. has
2158
only one structure because it can be considered like a monoprocess
2159
multithread system. A mutex is used to guarantee mutual exclusion
2160
when the struct is used by more than one thread. The structure
2161
contains the following fields:
2162
 
2163
\begin{description}
2164
\item [fd\_file]This is the descriptor table; every descriptor
2165
(with value between 0 and MAXOPENFILES) is associated an element
2166
that can have the following values:
2167
 
2168
\begin{itemize}
2169
\item NULL if it is a free descriptor;
2170
\item RESERVED if it is reserved for special uses (e.g., descriptors assigned to
2171
the keyboard or to the console)
2172
\item pointer to the file
2173
\end{itemize}
2174
 
2175
\item [fd\_flags]Flags associated to the file (e.g., FD\_CLOSEXEC), described in
2176
detail in \cite{book:posix1003.1}, when the fcntl() primitive is described.
2177
\item [fd\_fflags]Flag used when the file was opened (e.g., O\_APPEND,
2178
O\_RDONLY, etc...).
2179
\item [fd\_umask]Mask used during file creation;can be modified with the
2180
function umask().
2181
\item [fd\_cwd]Current working directory used by all the primitives that
2182
receives a relative pathname (that does not start with ``/'' (redundant
2183
information).
2184
\item [fd\_cwden]The current working directory; it can be modified using the
2185
function chdir().
2186
\item [fd\_mutex]Mutual exclusion semaphore used for concurrent multithread
2187
access.
2188
\end{description}
2189
 
2190
%----------------------------------------------------------------------------
2191
\subsection{Disk Cache}
2192
\label{sec:diskcache}
2193
%----------------------------------------------------------------------------
2194
 
2195
%----------------------------------------------------------------------------
2196
\subsubsection{How to use the cache}
2197
%----------------------------------------------------------------------------
2198
 
2199
The filesystem modules use the disk cache to get all the
2200
informations they need.
2201
 
2202
\begin{table}
2203
\begin{verbatim}
2204
 dcache_t *dcache_lock (__dev_t dev, __blkcnt_t lsector);
2205
void dcache_unlock (dcache_t *);
2206
dcache_t *dcache_acquire (__dev_t dev, __blkcnt_t lsector);
2207
void dcache_release (dcache_t *);
2208
void dcache_dirty (dcache_t *d);
2209
void dcache_skipwt(dcache_t *d);
2210
\end{verbatim}
2211
\caption{Disk cache interface}
2212
\label{tab:cacheint}
2213
\end{table}
2214
 
2215
The disk cache interface is showed in Table \ref{tab:cacheint}:
2216
 
2217
\begin{description}
2218
\item [dcache\_lock]This function can be used to require a read only access to
2219
the cache. More than one thread can lock a block. the parameters are the device
2220
number and the sector number; it returns a pointer to the struct dcache\_t in
2221
case of success, or NULL in case of error. This function do not return until the
2222
access is granted or an error occurred. To avoid critical blocking times in case
2223
more than one block is needed, you must use a proper access protocol: the
2224
blocking must be done with increasing block numbers; moreover, if possible, only
2225
one block should be blocked at a time.
2226
\item [dcache\_unlock]This function can be used to unblock a cache entry
2227
previously blocked.
2228
\item [dcache\_acquire]This function behaves as dcache\_lock(), but it requires
2229
a read/write access. Only one task at a time can acquire the lock. This access
2230
do not imply that the cache entry is considered dirty (see the dcache\_dirty()
2231
function).
2232
\item [dcache\_release]This function is similar to dcache\_unlock(), and works
2233
with the locks acquired by dcache\_acquire().
2234
\item [dcache\_dirty]Signal that the cache entry has been modified. This
2235
function must be called only if we own a read/write lock and if the cache has
2236
been modified.
2237
\item [dcache\_skipwt]This function signals to the system that the cache entry
2238
contains system informations that are often modified. That means that the
2239
write\_through flag should not be considered. See section \ref{sec:modicache}
2240
for a detailed description.
2241
\end{description}
2242
 
2243
%----------------------------------------------------------------------------
2244
\subsubsection{struct dcache}
2245
\label{sec:dcache}
2246
%----------------------------------------------------------------------------
2247
 
2248
\begin{figure}
2249
\begin{center}
2250
\includegraphics[width=10cm]{images/dcachebuckets.eps}
2251
\end{center}
2252
\caption{Data structure for disk cache}
2253
\label{fig:dcachebuckets}
2254
\end{figure}
2255
 
2256
\begin{table}
2257
\begin{verbatim}
2258
typedef struct TAGdcache {
2259
    int prev;
2260
    int next;
2261
    int hash;
2262
    __dev_t device;
2263
    __blkcnt_t lsector;
2264
    __uint8_t buffer[MAXSECTORSIZE];
2265
    int used;
2266
    __time_t time;
2267
    int numblocked;
2268
    __fs_sem_t sync;
2269
    __fs_sem_t esync;
2270
    __rwlock_t rwlock;
2271
    __uint32_t dirty:1;
2272
    __uint32_t ready:1;
2273
    __uint32_t esyncreq:1;
2274
    __uint32_t skipwt:1;
2275
} dcache_t;
2276
\end{verbatim}
2277
\caption{struct dcache}
2278
\label{str:structdcache}
2279
\end{table}
2280
 
2281
This data structure, used by the module that implements the data
2282
cache, represents a cache entry. All the high level functions uses
2283
the cache module as an interface to the lower layer.
2284
 
2285
These structures are maintained in the system wit a structure
2286
similar to the inodes, that is a set of double linked lists that I
2287
can access using an hash key, as showed in Figure
2288
\ref{fig:dcachebuckets}; the hash key is computed using the device
2289
(that is the physical peripheral) and the lsector (the logical
2290
sector).
2291
 
2292
Every struct dcache is an elementary block that can be transferred
2293
to a device. It is not possible to have 2 data structures that
2294
refers to the same block. That is, the pair (device,lsector) must
2295
uniquely identify the data sector on the physical device (e.g.,
2296
the block 64 in ``ide/hda'' could be the block 0 of
2297
``ide/hda1'', as explained in section
2298
\ref{sec:dd:interfaccia:nomi}).
2299
 
2300
The field of the structure (see Table \ref{str:structdcache}) are:
2301
 
2302
\begin{description}
2303
\item [prev,~next~e~hash]Are used to store the structures into the lists that
2304
are accessed using the hash key: next and prev are used to implement a double
2305
linked list, whereas hash is the hash key.
2306
\item [device,~lsector~e~buffer]These fields contain the physical device, the
2307
logical sector and the data contained in the corresponding physical block. These
2308
are the unique fields that a high level routine should use (and that shall be
2309
provided by a data cache).
2310
\item [used]The number of tasks that are using the block; -1 means that no one
2311
is using the block, and that the block is free.
2312
\item [time]The last block access time.
2313
\item [numblocked]It is the number of tasks that are waiting the availability of
2314
the block, (that is, they are waiting the flag ready to be set). That is, if a
2315
task asks for a block already asked by another task, but not yet available (not
2316
made available by the lower layer), then the task blocks on a synchronization
2317
semaphore, and it increments that field.
2318
\item [sync]This is the synchronization semaphore used by the blocked tasks that
2319
waits the setting of the ready flag.
2320
\item [esync]It is a synchronization semaphore on the errors needed during the
2321
cache ``purging'' phase (see later).
2322
\item [rwlock]This is a locker used to require the R/W access to the block (see
2323
Section \ref{sec:rwlock}).
2324
\item [dirty]This flag is set if the block has been modified.
2325
\item [ready]This flag is set if the block is available.
2326
\item [esyncreq]This flag is set if the synchronization on errors is required
2327
(that is, the field esync is used).
2328
\item [skipwt]This flag says if the write through handling should be done.
2329
\end{description}
2330
 
2331
The fields esync and esyncreq are used during the ``purging
2332
phase'': a task requires a read or write operation on a block, the
2333
block is not into the data cache, and the cache is full. That is,
2334
another block must be discarded from the cache. To discard a block
2335
it can happen that a disk write operation must be done. While this
2336
write operation is in progress other tasks can require the same
2337
(discarded, in writing) block. These tasks can block waiting for
2338
synchronization on the block. In that case the cache behavior
2339
could be the following: the requests on the block are aborted (the
2340
failure is only needed if we are able to free the block; the
2341
blocked tasks are awakened with an error); or, the requests on the
2342
block are not aborted (the block can not be freed, and the tasks
2343
blocked on it are woken up). The problem is that after the task
2344
has wake up all the blocked tasks, it must wait that all the
2345
awakened tasks check for an error. To do that, the task set the
2346
esyncreq flag, and it blocks on the esync semaphore; the last task
2347
that checked for errors, checks the esyncreq flag and signal the
2348
esync semaphore waking up the waiting task.
2349
 
2350
The disk cache can work in three different modes: \label{sec:modicache}
2351
``copy-back'', ``write-through'' and ``modified write-thought''. Usually most
2352
time-sharing OSes handle a copy-back cache. that is, write operations are done
2353
only in cache; a low priority task is used to synchronize cache and disk data.
2354
In such a system, if a cache request arrives and the cache is full, a block is
2355
freed eventually writing it on the disk. On a real-time system such a behavior
2356
can give some problems, because the time spent freeing the cache can be
2357
accounted to the running task. For example, a task that only reads can be
2358
delayed by another task that fills the cache memory writing blocks on the disk.
2359
To solve this problem there are 2 solutions: the cache is not used (that is, no
2360
caching on write), or the cache behavior is set to write-through, that is the
2361
write operations are immediately synchronized on the disk, in a way that who
2362
dirties the cache is also responsible for its synchronization.
2363
 
2364
If the cache is used in write-through mode, another problem
2365
arises: sometimes a few bytes are modified frequently (e.g. the
2366
filesystem information for the allocation of a file on the disk),
2367
that leads to continuous disk writes. For that reason, the disk
2368
cache provides the flag skipwt, that forces the copy-back mode for
2369
a particular block. By default, this flag is not set. When the
2370
write-through policy must be disabled, the file system should call
2371
the function dcache\_skipwt() on the cache-entry before releasing
2372
it.
2373
 
2374
%----------------------------------------------------------------------------
2375
\subsubsection{struct \_\_rwlock\_t}
2376
\label{sec:rwlock}
2377
%----------------------------------------------------------------------------
2378
 
2379
\begin{table}
2380
\begin{verbatim}
2381
typedef struct {
2382
    __mutex_t mutex;
2383
    int active_readers;
2384
    int active_writers;
2385
    int blocked_readers;
2386
    int blocked_writers;
2387
    __sem_t readers_semaph;
2388
    __sem_t writers_semaph;
2389
} __rwlock_t;
2390
\end{verbatim}
2391
\caption{struct \_\_rwlock\_t}
2392
\label{src:structrwlock}
2393
\end{table}
2394
 
2395
\begin{table}
2396
\begin{verbatim}
2397
void __rwlock_init (__rwlock_t *ptr);
2398
void __rwlock_rdlock (__rwlock_t *ptr);
2399
void __rwlock_wrlock (__rwlock_t *ptr);
2400
int __rwlock_tryrdlock (__rwlock_t *ptr);
2401
int __rwlock_trywrlock (__rwlock_t *ptr);
2402
void __rwlock_rdunlock (__rwlock_t *ptr);
2403
void __rwlock_wrunlock (__rwlock_t *ptr);
2404
\end{verbatim}
2405
\caption{Lockers functions}
2406
\label{src:funcrwlock}
2407
\end{table}
2408
 
2409
The cache module and other modules need a locker protocol, that is
2410
a protocol that solves the reader/writer problem: basically, there
2411
is a shared resource where some processes must read or write.
2412
These rules must be imposed by the protocol:
2413
 
2414
\begin{enumerate}
2415
\item If someone is writing, no one is authorizes to read or write.
2416
\item If someone is reading, other tasks are authorized to read, but not to
2417
write.
2418
\end{enumerate}
2419
 
2420
In literature, there exist different methods to solve the problem
2421
with different behaviors. This filesystem implements the protocol
2422
proposed into paragraph 4.2.3.1 of
2423
\cite{book:programmazioneconcorrente}. To replace that policy, the
2424
rwlock.c file should be modified with another algorithm,
2425
maintaining the same interface.
2426
 
2427
Table \ref{src:structrwlock} contains the internal data structure
2428
of the implemented protocol. To implement a ``locker'', all the
2429
needed data must be inserted into a structure \_\_wrlock\_t. The
2430
following functions have also to be provided (see Table
2431
\ref{src:funcrwlock}):
2432
 
2433
\begin{description}
2434
\item [\_\_rwlock\_init]Initializes the structure passed as parameter. That
2435
structure is passed to all the functions and it is used to protect a shared
2436
resource.
2437
\item [\_\_rwlock\_rdlock]Read request (read lock); when this function returns,
2438
the calling task is authorized to read the shared resource.
2439
\item [\_\_rwlock\_wrlock]Write request (write lock).
2440
\item [\_\_rwlock\_tryrdlock]Conditional read request (try read lock); The
2441
function return 0 if the calling task can read, another number otherwise. At the
2442
end of the read operation (if the lock has been granted), \_\_rwlock\_rdunlock
2443
must be called.
2444
\item [\_\_rwlock\_trywrlock]Conditional write request (try write lock); The
2445
function return 0 if the calling task can write, another number otherwise. At
2446
the end of the write operation (if the lock has been granted),
2447
\_\_rwlock\_wrunlock must be called.
2448
\item [\_\_rwlock\_rdunlock]Read unlock; the task has finished the use of the
2449
shared resource.
2450
\item [\_\_rwlock\_wrunlock]Write unlock; the task has finished the use of the
2451
shared resource, that is again in a consistent state.
2452
\end{description}
2453
 
2454
The functions \_\_rwlock\_rdlock and \_\_rwlock\_rdunlock must always be used
2455
together. Nested calls are not allowed. The same rules apply to
2456
\_\_rwlock\_wrlock and \_\_rwlock\_wrunlock.
2457
 
2458
Please note that \_\_rwlock\_rdlock and \_\_rwlock\_wrlock can
2459
cause undefined task blocking. The other functions do not block
2460
the calling task.
2461
 
2462
%----------------------------------------------------------------------------
2463
\section{Filesystem: DOS (FAT16)}
2464
%----------------------------------------------------------------------------
2465
 
2466
%----------------------------------------------------------------------------
2467
\subsection{Description}
2468
%----------------------------------------------------------------------------
2469
 
2470
A technical description of the FAT16 filesystem can be found in
2471
\cite{manual:DOS330} or \cite{manual:dosfat}.
2472
 
2473
%----------------------------------------------------------------------------
2474
\subsubsection{Internal structure}
2475
%----------------------------------------------------------------------------
2476
 
2477
\begin{figure}
2478
\begin{center}
2479
\includegraphics[width=7cm]{images/fatarch.eps}
2480
\end{center}
2481
\caption{Internal structure of the DOS FAT16.}
2482
\label{src:fatarch}
2483
\end{figure}
2484
 
2485
As showed in Figure \ref{src:fatarch}, this filesystem is divided
2486
in areas:
2487
 
2488
\begin{description}
2489
\item [Boot~sector]Contains a small program that loads the DOS operating
2490
system, as specified into \cite{manual:DOS330}. It also contains other
2491
informations like the number of FATS, the kind of support and the dimension of
2492
the Root Directory; It can be thought as a super-block. The informations
2493
contained in this sector are showed in Table \ref{tab:fatboot}.
2494
\item [FAT~tables]This data area contains some copies of the File Allocation
2495
Table (FAT) These informations are used to know where a file can be found on the
2496
hard disk.
2497
\item [Root~directory]This block is structured as all the blocks that contains a
2498
directory, with the exceptions that it is allocated in a contiguous way and that
2499
it has a fixed size.
2500
\item [Data~area]This is the area where the files and directories are stored.
2501
The minimal allocation unit is a cluster, that is a set of contiguous sectors on
2502
the hard disk. The boot sector stores the number of sectors that compose a
2503
cluster.
2504
\end{description}
2505
 
2506
%----------------------------------------------------------------------------
2507
\subsubsection{The boot sector}
2508
%----------------------------------------------------------------------------
2509
 
2510
\begin{table}
2511
\begin{verbatim}
2512
struct bootrecord {
2513
    __uint8_t reserved[3];
2514
    char oemname[8];
2515
    __uint16_t bytespersector;
2516
    __uint8_t sectorspercluster;
2517
    __uint16_t hiddensectors;
2518
    __uint8_t fatsnumber;
2519
    __uint16_t rootentry;
2520
    __uint16_t sectors;
2521
    __uint8_t media;
2522
    __uint16_t sectorsperfat;
2523
    __uint16_t sectorpertrak;
2524
    __uint16_t headsperdisk;
2525
    __uint32_t hiddensectors32;
2526
    __uint32_t sectors32;
2527
    __uint16_t physicaldrive;
2528
    __uint8_t signature;
2529
    __uint8_t serialnumber[4];
2530
    char volumelabel[11];
2531
    char fattype[8];
2532
};
2533
\end{verbatim}
2534
\caption{The DOS boot sector.}
2535
\label{tab:fatboot}
2536
\end{table}
2537
 
2538
Table \ref{tab:fatboot} shows the boot sector structure. The boot
2539
sector is always 512 bytes large. The struct is only the first
2540
part of the sector, the second part is the boot loader. Here is a
2541
small description of some fields:
2542
 
2543
\begin{description}
2544
\item [sectorpercluster]stores the number of sectors included into a cluster.
2545
\item [sectorsperfat]stores the number of sectors in the FAT
2546
\item [fatsnumber]stores the number of FATs on the disk.
2547
\end{description}
2548
 
2549
A note: when DOS formats a disk, creating a FAT16 filesystem,
2550
the informations that comes from a pre-existing filesystem have
2551
precedence over the informations given by the hardware. That is,
2552
if an hard disk declares to have 8 heads, and the pre-existing
2553
filesystem has headsperdisk=13, the new file system will have 13
2554
heads.
2555
 
2556
%----------------------------------------------------------------------------
2557
\subsubsection{The File Allocation Table}
2558
%----------------------------------------------------------------------------
2559
 
2560
\begin{figure}
2561
\begin{center}
2562
\includegraphics[width=7cm]{images/fatfat.eps}
2563
\end{center}
2564
\caption{An example of File Allocation Table (FAT)}
2565
\label{fig:fatfat}
2566
\end{figure}
2567
 
2568
Every FAT is organized as a table of 16 bit elements. Every
2569
element is a cluster; the element value gives informations about
2570
the cluster:
2571
 
2572
\begin{itemize}
2573
\item 0 means the cluster is available.
2574
\item values between 0xfff0-0xfff7 are reserved; for example, 0xfff7 means bad
2575
(damaged) cluster.
2576
\item 0xfff8-0xffff says that the cluster is the last cluster in a file.
2577
\item Every other value index which is the next cluster of the file.
2578
\end{itemize}
2579
 
2580
The FAT is used to create a set of chains that make a file. For
2581
example, Figure \ref{fig:fatfat} shows the information for a file
2582
stored in 4 clusters (23, 27, 25, 31). The first cluster of the
2583
chain is contained into the directories. Please note that the
2584
first cluster of the data area is the number two, because cluster
2585
 
2586
or 16 bit), and the disk format.
2587
 
2588
Directories (except the root directory) are common files with a
2589
particular structure. The system usually give access to
2590
directories through other primitives (mkdir, creat, and so on).
2591
 
2592
%----------------------------------------------------------------------------
2593
\subsubsection{The ``root directory''}
2594
%----------------------------------------------------------------------------
2595
 
2596
This area has the same structure of a directory file, except that
2597
it has fixed dimension, it is not allocated into the data area and
2598
its allocation is contiguous. Common directories have the same
2599
structure except that they are allocated into the data area, and
2600
that they can be fragmented.
2601
 
2602
\begin{table}
2603
\begin{verbatim}
2604
struct directoryentry {
2605
    __uint8_t name[8];
2606
    __uint8_t ext[3];
2607
    __uint8_t attribute;
2608
    __uint8_t reserved[10];
2609
    __uint16_t time;
2610
    __uint16_t date;
2611
    __uint16_t cluster;
2612
    __uint32_t size;
2613
};
2614
\end{verbatim}
2615
\caption{DOS Directory Entry.}
2616
\label{tab:fatdentry}
2617
\end{table}
2618
 
2619
The directory structure is showed in Figure \ref{tab:fatdentry}:
2620
 
2621
\begin{description}
2622
\item [name~and~ext]Contains the filename in the classic format 8+3. Non used
2623
chars are filled with spaces.
2624
\item [attribute]File flags: they specify if it is regular, a directory, a
2625
volume label, it has been archived or if it is read-only.
2626
\item [time]Creation time (2 seconds resolution) (all the 3 times specified by
2627
the POSIX standard are mapped on that date). The time format is DOS specific.
2628
\item [date]Creation date (until 2099). The date format is DOS specific.
2629
\item [cluster]The first cluster of the file. A value 0 means empty file (the
2630
first available cluster is the cluster 2).
2631
\item [size]File size (32 bit).
2632
\end{description}
2633
 
2634
Note the difference with the typical Unix file systems, where file
2635
names and file informations are stored in different areas.
2636
 
2637
Directory entries are not contiguous. If the first character name
2638
is 0xe5, the file has been deleted, if it is 0x00, it has been
2639
never used. Any other character means that the entry is valid.
2640
0x00 is used for efficiency: if it is found, directory scan is
2641
ended also if there are other sectors in the cluster that contains
2642
a directory.
2643
 
2644
%----------------------------------------------------------------------------
2645
\subsubsection{The data area}
2646
%----------------------------------------------------------------------------
2647
 
2648
The data area is divided in clusters, that are a set of sectors.
2649
It starts from cluster number 2, and it stores regular files and
2650
directories.
2651
 
2652
linear addresses are used to identify a sector on a hard disk, and
2653
its cluster. Hard disk sectors are numbered starting from sector
2654
0. The filesystem specifies an algorithm that associates to every
2655
linear address an address C/H/S (Cylinder/Head/Sector).
2656
 
2657
Some DOS implementation supposes a particular structure
2658
depending on the media used: a 1.44 Mb floppy disk has sectors
2659
composed by 1 sector, and a root directory of 14 sectors. The best
2660
way to implement a filesystem is to rely only on the super block
2661
informations.
2662
 
2663
%----------------------------------------------------------------------------
2664
\subsection{Implementation notes}
2665
%----------------------------------------------------------------------------
2666
 
2667
The filesystem maintains 2 structures to store file informations:
2668
an inode structure and a dentry structure. In the DOS FAT16
2669
filesystem unfortunately there is an unique correspondence between
2670
file names and inodes, and that information is contained into the
2671
filesystem directory entry.
2672
 
2673
\begin{figure}
2674
\begin{center}
2675
\includegraphics[width=7cm]{images/fsn.eps}
2676
\end{center}
2677
\caption{The FAT16 file serial number.}
2678
\label{fig:fsn}
2679
\end{figure}
2680
 
2681
When the system refers to a file, it use the inode number. That
2682
is, every file has a number that is unique inside the file system.
2683
 
2684
The FAT16 serial number has been specified as shown in Figure
2685
\ref{fig:fsn}:
2686
 
2687
\begin{itemize}
2688
\item The most significant part is the cluster number (the cluster where the
2689
directory is stored).
2690
\item The less significant part is the number into the directory entry.
2691
\end{itemize}
2692
 
2693
For example, the inode number 0x01230008, refers to the 8th
2694
directory entry into the cluster 0x123.
2695
 
2696
The cluster 1 is used to index the directory entries contained
2697
into the root directory, whereas the cluster 0 is used to index
2698
the root directory.
2699
 
2700
\bibliographystyle{plain}
2701
\bibliography{../common/retis}
2702
 
2703
\end{document}