Rev 1676 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
1676 | tullio | 1 | \documentclass[english]{report} |
2 | \usepackage[T1]{fontenc} |
||
3 | \usepackage[latin1]{inputenc} |
||
4 | \usepackage{a4wide} |
||
5 | \setcounter{secnumdepth}{3} \setcounter{tocdepth}{3} |
||
6 | \usepackage{graphicx} |
||
7 | |||
8 | \makeatletter |
||
9 | |||
10 | \usepackage{babel} |
||
11 | \makeatother |
||
12 | |||
13 | \begin{document} |
||
14 | |||
15 | \thispagestyle{empty} |
||
16 | |||
17 | \begin{center}{\LARGE S.Ha.R.K. User Manual}\end{center}{\LARGE \par} \vfill{} |
||
18 | |||
19 | \begin{center}{\large Volume VI}\end{center} |
||
20 | |||
21 | \begin{center}S.Ha.R.K. File System\end{center} \vfill{} |
||
22 | |||
23 | \begin{center}Written by\end{center} |
||
24 | \begin{center}Paolo Gai (pj at sssup.it)\end{center} |
||
25 | \begin{center}Tullio Facchinetti (tullio.facchinetti at unipv.it)\end{center} |
||
26 | \begin{center}Original italian version by\end{center} |
||
27 | \begin{center}Massimiliano Giorgi (massy at gandalf.sssup.it)\end{center} |
||
28 | \vfill{} |
||
29 | |||
30 | \begin{center}\includegraphics[width=2cm]{../common/sssup.ps}\end{center} |
||
31 | |||
32 | \begin{center}Scuola Superiore di Studi e Perfezionamento S. Anna\end{center} |
||
33 | \begin{center}RETIS Lab\end{center} |
||
34 | \begin{center}Via Carducci, 40 - 56100 Pisa\end{center} |
||
35 | \begin{center}\pagebreak\end{center} |
||
36 | |||
37 | \tableofcontents{} |
||
38 | |||
39 | \listoffigures |
||
40 | |||
41 | \listoftables |
||
42 | |||
43 | \begin{abstract} |
||
44 | This is the documentation about S.Ha.R.K. File system. The document gives an |
||
45 | insight on many data structures used internally by the file system. This is the |
||
46 | first step toward a \emph{true} file system user manual. If you carefully read |
||
47 | it, you will understand how it works, how can be initialized and used. |
||
48 | |||
49 | In any case, please send any comments to pj@sssup.it. |
||
50 | |||
51 | Enjoy, |
||
52 | |||
53 | Paolo |
||
54 | \end{abstract} |
||
55 | |||
56 | %---------------------------------------------------------------------------- |
||
57 | \chapter{File system architecture} |
||
58 | \label{sec:arch} |
||
59 | %---------------------------------------------------------------------------- |
||
60 | |||
61 | %---------------------------------------------------------------------------- |
||
62 | \section{Features} |
||
63 | %---------------------------------------------------------------------------- |
||
64 | |||
65 | \label{sec:arch:gen}The library is composed by two main layers: |
||
66 | |||
67 | \begin{description} |
||
68 | \item [File-system] The file system contains all the data structures used to |
||
69 | handle files. |
||
70 | \item [Blocks~devices]This part handles directly the hardware. They provide a |
||
71 | common interface that unifies the usage of all the block devices (i.e., the hard |
||
72 | disks). |
||
73 | \end{description} |
||
74 | |||
75 | The architecture of the library is modular, and it is described in |
||
76 | Figure \ref{fig:Filesystem-architecture}. |
||
77 | |||
78 | \begin{figure} |
||
79 | \begin{center} |
||
80 | \includegraphics[width=6cm]{images/arch2.eps} |
||
81 | \end{center} |
||
82 | \caption{File system architecture} |
||
83 | \label{fig:Filesystem-architecture} |
||
84 | \end{figure} |
||
85 | |||
86 | The library is connected to the Kernel via a \emph{glue} layer, to |
||
87 | the application via a \emph{wrapper} layer; the block devices and |
||
88 | the file system operations are separated by a cache, that may be |
||
89 | excluded. |
||
90 | |||
91 | \label{sec:arch:req} |
||
92 | |||
93 | The library needs a set of primitives that must be supported by |
||
94 | the underlying kernel, as memory handling, string handling, and |
||
95 | functions with a variable number of parameters. Moreover, the |
||
96 | kernel should support functions that implement: |
||
97 | |||
98 | \begin{itemize} |
||
99 | \item Mutual exclusion. |
||
100 | \item Allocation of low level resources (as I/O spaces and interrupts). |
||
101 | \item Memory protection (not needed in S.Ha.R.K.). |
||
102 | \end{itemize} |
||
103 | |||
104 | %---------------------------------------------------------------------------- |
||
105 | \section{Low Level: Device Drivers} |
||
106 | %---------------------------------------------------------------------------- |
||
107 | |||
108 | \begin{figure} |
||
109 | \begin{center} |
||
110 | \includegraphics[width=6cm]{images/archlow.eps} |
||
111 | \end{center} |
||
112 | \caption{Device Drivers} |
||
113 | \label{fig:archlow} |
||
114 | \end{figure} |
||
115 | |||
116 | The device drivers are divided in two main parts: |
||
117 | |||
118 | \begin{itemize} |
||
119 | \item A general part that exports the basic low level services. It |
||
120 | also gives a set of internal routines to the internal modules. |
||
121 | \item A set of internal modules that handle one or more |
||
122 | peripherals. The application can choose which module must be |
||
123 | linked, initialized and used at compile time. |
||
124 | \end{itemize} |
||
125 | |||
126 | \begin{figure} |
||
127 | \begin{center} |
||
128 | \includegraphics[width=6cm]{images/archlow2.eps} |
||
129 | \end{center} |
||
130 | \caption{Device Drivers - details} |
||
131 | \label{fig:archlow2} |
||
132 | \end{figure} |
||
133 | |||
134 | The general part is composed by four modules: |
||
135 | |||
136 | \begin{description} |
||
137 | \item [Block~device]The main part; it exports the interface to the upper layers, |
||
138 | inits the device driver subsystem and register all the drivers present in the |
||
139 | system. |
||
140 | \item [Logical~disk]This module provides functions that allow to inspect the |
||
141 | logical structure of an hard disk, like start, end, size and type of each |
||
142 | partition on the Hard Disk. |
||
143 | \item [Physical~disk]This module provide a small data base that contains |
||
144 | informations about all the hard disks in the system; these informations can be |
||
145 | useful for the module that must handle the low level pending request queues. |
||
146 | \item [Queue~policy]This module handles the low level policies for the pending |
||
147 | requests (SCAN, LOOK, EDF, ...). |
||
148 | \end{description} |
||
149 | |||
150 | %---------------------------------------------------------------------------- |
||
151 | \section{High Level: File systems} |
||
152 | %---------------------------------------------------------------------------- |
||
153 | |||
154 | \begin{figure} |
||
155 | \begin{center} |
||
156 | \includegraphics[width=10cm]{images/archhigh.eps} |
||
157 | \end{center} |
||
158 | \caption{High Level - File system details} |
||
159 | \label{fig:archhigh} |
||
160 | \end{figure} |
||
161 | |||
162 | The high level is divided in three parts: |
||
163 | |||
164 | \begin{itemize} |
||
165 | \item The first part handle the file systems, its initialization and its |
||
166 | interface to the internal modules. |
||
167 | \item A second part composed by a set of modules that implement a particular |
||
168 | file system (i.e., FAT16). |
||
169 | \item A cache module (whose policy can be easily changed). |
||
170 | \end{itemize} |
||
171 | |||
172 | The first part is composed by sub-modules (internal implementation |
||
173 | of a sub-module can be changed without affecting the rest of the |
||
174 | library): |
||
175 | |||
176 | \begin{description} |
||
177 | \item [Filesystems]It is responsible for the initialization and termination of |
||
178 | all the high level layer. It registers all the file systems, and it contains the |
||
179 | routines for mounting/unmounting the partitions. |
||
180 | \item [File]It handles the descriptor tables and the files; it defines the |
||
181 | \texttt{file} structure. |
||
182 | \item [Super]It handles the super blocks (that contains the global informations |
||
183 | about partitions on a disk). |
||
184 | \item [Inodes]It handles the inodes (informations about a file in the file |
||
185 | system). |
||
186 | \item [Dentry]It handles file names in a tree structure. |
||
187 | \item [Open,~read,~umask,~...]They implement the correspondent system |
||
188 | primitives. |
||
189 | \end{description} |
||
190 | |||
191 | %---------------------------------------------------------------------------- |
||
192 | \chapter{Device drivers} |
||
193 | \label{sec:dd} |
||
194 | %---------------------------------------------------------------------------- |
||
195 | |||
196 | %---------------------------------------------------------------------------- |
||
197 | \section{Interface} |
||
198 | \label{sec:dd:interfaccia} |
||
199 | %---------------------------------------------------------------------------- |
||
200 | |||
201 | %---------------------------------------------------------------------------- |
||
202 | \subsection{Initialization} |
||
203 | \label{sec:dd:interfaccia:init} |
||
204 | %---------------------------------------------------------------------------- |
||
205 | |||
206 | \begin{table} |
||
207 | \begin{verbatim} |
||
208 | typedef struct bdev_parms { |
||
209 | __uint32_t showinfo; |
||
210 | char *config; |
||
211 | void *bmutexattr; |
||
212 | __uint16_t dummy; |
||
213 | #ifdef IDE_BLOCK_DEVICE |
||
214 | IDE_PARMS ideparms; |
||
215 | #endif |
||
216 | } BDEV_PARMS; |
||
217 | |||
218 | void bdev_def_showinfo (BDEV_PARMS s, int v); |
||
219 | void bdev_def_configstring(BDEV_PARMS s, char *v); |
||
220 | void bdev_def_mutexattrptr(BDEV_PARMS s,void *v); |
||
221 | \end{verbatim} |
||
222 | \caption{The BDEV\_PARMS structure} |
||
223 | \label{tab:BDEV_PARMS} |
||
224 | \end{table} |
||
225 | |||
226 | The bdev\_init() function (see Table \ref{tab:BDEV_PARMS}) must be |
||
227 | called to initialize a device driver, passing a pointer to a |
||
228 | BDEV\_PARMS structure. Use the provided macros to init that |
||
229 | structure. |
||
230 | |||
231 | The structure is composed by two parts: a generic part and a |
||
232 | specific part. The generic part is composed by the following |
||
233 | fields: |
||
234 | |||
235 | \begin{description} |
||
236 | \item [showinfo]If this flag is set, the initialization of the device will be |
||
237 | ``verbose''. To set this flag, use the macro bdev\_def\_showinfo. |
||
238 | \item [config]It is a string that can have a particular meaning for the device |
||
239 | driver. The string is parsed by the device driver, that usually searches for |
||
240 | some initialization parameter (something like ``foo:[number]''). To set this |
||
241 | string, use the macro bdev\_def\_configstring. |
||
242 | \item [bmutexattr]This parameter can be used to tell the system which is the |
||
243 | kind of mutex to use to implement the mutual exclusion. To set this parameter, |
||
244 | use the macro \texttt{bdev\_def\_mutexattrptr}. |
||
245 | \item [dummy]Used to align the data structure (see the macro BASE\_DEV). |
||
246 | \end{description} |
||
247 | |||
248 | The specific part is directly specified by the device driver; the |
||
249 | bdev\_init function can be called can be called passing a pointer |
||
250 | to a BDEV\_PARMS structure initialized with a BASE\_BDEV value. If |
||
251 | NULL is specified, the default values are used. |
||
252 | |||
253 | %---------------------------------------------------------------------------- |
||
254 | \subsubsection{How to create a new device driver} |
||
255 | %---------------------------------------------------------------------------- |
||
256 | |||
257 | \begin{enumerate} |
||
258 | \item Add a new \emph{major} number in t the file include/fs/major.h (see |
||
259 | \ref{sec:dd:interfaccia:nomi}); |
||
260 | \item Add a ``\#define'' into the file include/fs/bdevconf.h (to enable |
||
261 | selective compilation); |
||
262 | \item If needed, add fields to bdev\_parms, and modify the BASE\_BDEV macro with |
||
263 | the new default parameters; |
||
264 | \item Modify the function bdev\_init to call the init function of the new device |
||
265 | driver (into drivers/block/bdev.c). |
||
266 | \end{enumerate} |
||
267 | |||
268 | \begin{figure} |
||
269 | \begin{center} |
||
270 | \includegraphics[width=7cm]{images/ddinit.eps} |
||
271 | \end{center} |
||
272 | \caption{Device Drivers Initialization.} |
||
273 | \label{fig:ddinit} |
||
274 | \end{figure} |
||
275 | |||
276 | The device driver initialization function, after the |
||
277 | initialization of its internal part, does the following actions |
||
278 | (see Figure \ref{fig:ddinit}): |
||
279 | |||
280 | \begin{enumerate} |
||
281 | \item It registers itself calling bdev\_register(); That function must be called |
||
282 | for each device handled by the driver. |
||
283 | \item It registers every block device that it find into its interfaces to the |
||
284 | Physical Disk Module. |
||
285 | \item It registers every logical device found calling bdev\_register\_minor(). |
||
286 | \end{enumerate} |
||
287 | |||
288 | Data structures and registration functions used in this phased are |
||
289 | described in Section \ref{sec:dd:interfaccia:det}. |
||
290 | |||
291 | %---------------------------------------------------------------------------- |
||
292 | \subsection{Device serial number} |
||
293 | \label{sec:dd:interfaccia:nomi} |
||
294 | %---------------------------------------------------------------------------- |
||
295 | |||
296 | \begin{figure} |
||
297 | \begin{center} |
||
298 | \includegraphics[width=7cm]{images/devtype.eps} |
||
299 | \end{center} |
||
300 | \caption{Device Serial Number} |
||
301 | \label{tab:devtype} |
||
302 | \end{figure} |
||
303 | |||
304 | Every logical device (i.e., a logical partition of a hard disk) |
||
305 | has a device serial number of type \_\_dev\_t, that must have a |
||
306 | unique value (see the POSIX standard). |
||
307 | |||
308 | The number has an internal structure (see Figure |
||
309 | \ref{tab:devtype}) composed by a \emph{major} \emph{number}, and |
||
310 | another number (whose number depends on the device implementation) |
||
311 | called \emph{minor} \emph{number}. |
||
312 | |||
313 | Three macro exists (see include/fs/sysmacro.h) to handle the |
||
314 | device numbers: |
||
315 | |||
316 | \begin{description} |
||
317 | \item [makedev]This macro creates a \_\_dev\_t number from the major and the |
||
318 | minor number. |
||
319 | \item [major]Returns the major number of a device. |
||
320 | \item [minor]Returns the minor number of a device. |
||
321 | \end{description} |
||
322 | |||
323 | The major number is used to decide which device driver will handle |
||
324 | a request (see Section \ref{sec:dd:interfaccia:det}). A device |
||
325 | driver can register more than on major number. Minor numbers are |
||
326 | device driver dependent, and the validity of a minor number is |
||
327 | checked only by the driver. |
||
328 | |||
329 | To simplify the search of a device, every device has a symbolic |
||
330 | name (see Section \ref{sec:dd:intercaccia:ext}). Symbolic names |
||
331 | are composed by two parts: a unique name for each major, and a |
||
332 | specific name assigned to each minor. For example, the major |
||
333 | MAJOR\_B\_IDE has a symbolic name ``ide''; a minor name for that |
||
334 | major can be ``hda2'', so the full name is ``ide/hda2'', that |
||
335 | identifies the second partition on the master hard disk on the |
||
336 | first IDE controller interface (see Section \ref{sec:dd:ide}). |
||
337 | |||
338 | %---------------------------------------------------------------------------- |
||
339 | \subsection{Implementation details} |
||
340 | \label{sec:dd:interfaccia:det} |
||
341 | %---------------------------------------------------------------------------- |
||
342 | |||
343 | \begin{table} |
||
344 | \begin{verbatim} |
||
345 | int bdev_register(__dev_t major, char *name, struct block_device *); |
||
346 | int bdev_register_minor(__dev_t device, char *name, __uint8_t fsind); |
||
347 | struct phdskinfo *phdsk_register( struct phdskinfo *disk); |
||
348 | \end{verbatim} |
||
349 | \caption{Block device registration functions} |
||
350 | \label{tab:Block_register} |
||
351 | \end{table} |
||
352 | |||
353 | These are the registration functions for the device drivers (see |
||
354 | Table \ref{tab:Block_register}): |
||
355 | |||
356 | \begin{description} |
||
357 | \item [bdev\_register]This function registers a device type (see Section |
||
358 | \ref{sec:dd:interfaccia:nomi}); the function receives as parameter the major |
||
359 | number, its symbolic name, and a pointer to a structure block\_device. The |
||
360 | function returns 0 in case of success, != 0 otherwise. |
||
361 | \item [bdev\_register\_minor]It registers a minor number, its symbolic name and |
||
362 | the file system type (see include/fs/fsind.h). This information is used by |
||
363 | bdev\_finddevice\_byname (see Section \ref{sec:dd:intercaccia:ext}). The |
||
364 | function returns 0 in case of success, != 0 otherwise. |
||
365 | \item [phdsk\_register]This function is used to register every physical device |
||
366 | found with parameters that can be used by the Disk Scheduling Algorithms (see |
||
367 | Section \ref{sec:dd:sched}). This function receives a pointer to a structure |
||
368 | phdiskinfo that contains the device parameters, and returns NULL in case of |
||
369 | error, or a pointer to the registered structure (that can be != from the |
||
370 | parameter) in case of success. |
||
371 | \end{description} |
||
372 | |||
373 | \begin{table} |
||
374 | \begin{verbatim} |
||
375 | struct dskgeometry { |
||
376 | __uint16_t cyls; |
||
377 | __uint16_t heads; |
||
378 | __uint16_t sectors; |
||
379 | }; |
||
380 | |||
381 | struct phdskinfo \{ |
||
382 | char pd_name[MAXPHDSKNAME]; |
||
383 | __dev_t pd_device; |
||
384 | __blkcnt_t pd_size; |
||
385 | int pd_sectsize; |
||
386 | struct dskgeometry pd_phgeom; |
||
387 | struct dskgeometry pd_logeom; |
||
388 | }; |
||
389 | \end{verbatim} |
||
390 | \caption{The phdsk structure.} |
||
391 | \label{tab:structphdsk} |
||
392 | \end{table} |
||
393 | |||
394 | The phdskinfo structure contains the physical parameters of a |
||
395 | device. all the field must be filled before calling |
||
396 | phdsk\_register. The fields are: |
||
397 | |||
398 | \begin{description} |
||
399 | \item [pd\_name]device name, used only for debug purposes; |
||
400 | \item [pd\_device]device identificator; |
||
401 | \item [pd\_size]number of logical sectors of the device; |
||
402 | \item [pd\_sectsize]Dimension (bytes) of a logical sector; |
||
403 | \item [pd\_phgeom~e~pd\_logeom]geometry of the device (logical and phisical); |
||
404 | geometries are described using the struct dskgeometry (see Table |
||
405 | \ref{tab:structphdsk}): |
||
406 | |||
407 | \begin{description} |
||
408 | \item [cyls]number of cylinders (traces) of the device; |
||
409 | \item [heads]number of heads; |
||
410 | \item [sectors]number of sectors for each cylinder. |
||
411 | \end{description} |
||
412 | \end{description} |
||
413 | |||
414 | \begin{table} |
||
415 | \begin{verbatim} |
||
416 | struct block_device { |
||
417 | char *bd_name; |
||
418 | int bd_sectorsize; |
||
419 | __uint32_t bd_flag_used:1; |
||
420 | struct block_device_operations *bd_op; |
||
421 | }; |
||
422 | \end{verbatim} |
||
423 | \caption{block\_device Structure} |
||
424 | \label{tab:structbdev} |
||
425 | \end{table} |
||
426 | |||
427 | The struct block\_device described in Table \ref{tab:structbdev} |
||
428 | is used to register the block devices: |
||
429 | |||
430 | \begin{description} |
||
431 | \item [bd\_name]Physical device name; for example, for the IDE device driver, |
||
432 | ``ide''. |
||
433 | \item [bd\_sectorsize]logical sector dimension (bytes; redundant info but |
||
434 | useful). |
||
435 | \item [bd\_flag\_used]It should not modified by the device drivers. If set, the |
||
436 | corresponding device driver is present (a table of all the possible devices is |
||
437 | used to access the system in a fast way). |
||
438 | \item [bd\_op]Virtual device operations. |
||
439 | \end{description} |
||
440 | |||
441 | %---------------------------------------------------------------------------- |
||
442 | \subsubsection{block\_device\_operation structure.} |
||
443 | \label{sec:dd:interfaccia:det:bdevop} |
||
444 | %---------------------------------------------------------------------------- |
||
445 | |||
446 | \begin{table} |
||
447 | \begin{verbatim} |
||
448 | struct block_device_operations { |
||
449 | int (*_trylock) (__dev_t device); |
||
450 | int (*_tryunlock)(__dev_t device); |
||
451 | int (*read) (__dev_t device, __blkcnt_t blocknum, __uint8_t *buffer); |
||
452 | int (*seek) (__dev_t device, __blkcnt_t blocknum); |
||
453 | int (*write) (__dev_t device, __blkcnt_t blocknum, __uint8_t *buffer); |
||
454 | }; |
||
455 | \end{verbatim} |
||
456 | \caption{block\_device\_operations structure.} |
||
457 | \label{tab:structbdevop} |
||
458 | \end{table} |
||
459 | This structure contains the virtual operation that have to be |
||
460 | supplied by every device driver: |
||
461 | |||
462 | \begin{description} |
||
463 | \item [read]This function should read a logical pblock passed as parameter, |
||
464 | writing the result on the buffer. It returns 0 in case of success, != 0 |
||
465 | otherwise. |
||
466 | \item [write]Similar to read, but it writes. |
||
467 | \item [seek]moves the head on the required block. |
||
468 | \item [\_trylock~e~\_tryunlock]Used to block/unblock a device. They return 0 in |
||
469 | case of success, != 0 otherwise; after initialization, and usually, every device |
||
470 | is in the unblocked state. |
||
471 | \end{description} |
||
472 | |||
473 | Why blocking/unblocking a device? The problem, described in |
||
474 | Section \ref{sec:dcache}, is that the data cache have to refer to |
||
475 | a logical block in an unique way. Since there are logical devices |
||
476 | that shares blocks (think at /dev/hda and /dev/hda2 in Linux), the |
||
477 | mount operation must be done in mutual exclusion, blocking the |
||
478 | device. |
||
479 | |||
480 | %---------------------------------------------------------------------------- |
||
481 | \subsubsection{The ``lodsk'' module} |
||
482 | %---------------------------------------------------------------------------- |
||
483 | |||
484 | \begin{table} |
||
485 | \begin{verbatim} |
||
486 | struct lodskinfo { |
||
487 | __uint8_t fs_ind; |
||
488 | __blkcnt_t start; |
||
489 | __blkcnt_t size; |
||
490 | }; |
||
491 | typedef int (*lodsk_callback_func)(int, struct lodskinfo*, void *); |
||
492 | int lodsk_scan(__dev_t device, lodsk_callback_func func, void *data, int |
||
493 | showinfo, char *lname); |
||
494 | \end{verbatim} |
||
495 | \caption{The lodsk module} |
||
496 | \label{tab:lodsk} |
||
497 | \end{table} |
||
498 | |||
499 | This module (see Table \ref{tab:lodsk}) offer a service to the |
||
500 | device drivers: It recognizes the logical structure (partitions) |
||
501 | of an HD as described into \cite{manual:DOS330} (part of the code |
||
502 | is derived from Linux). |
||
503 | |||
504 | before calling lodsk\_scan, the device driver must be already |
||
505 | registered; the function has five parameters: |
||
506 | |||
507 | \begin{enumerate} |
||
508 | \item The device ID where the search have to be done. |
||
509 | \item A callback function called for each partition found. |
||
510 | \item A generic pointer passed to the callback function. |
||
511 | \item A verbose flag. |
||
512 | \item The name of the logical device (i.e., ``hdb'', used if the verbose option |
||
513 | has been set. |
||
514 | \end{enumerate} |
||
515 | |||
516 | Each callback function is called passing as arguments: |
||
517 | |||
518 | \begin{enumerate} |
||
519 | \item The logical number of the partition (the Linux numbering scheme is used). |
||
520 | \item A structure with informations on the partition found. The structure has |
||
521 | the following fields: |
||
522 | |||
523 | \begin{description} |
||
524 | \item [fs\_ind]The file system number (see include/fs/fsind.h). |
||
525 | \item [start]The partition starting block. |
||
526 | \item [size]The partition size (number of blocks). |
||
527 | \end{description} |
||
528 | \item A the generic pointer passed to lodsk\_scan. |
||
529 | \end{enumerate} |
||
530 | |||
531 | The low\_level routines always check these values to avoid |
||
532 | read/write operations outside the partition. |
||
533 | |||
534 | %---------------------------------------------------------------------------- |
||
535 | \subsubsection{Semaphores} |
||
536 | %---------------------------------------------------------------------------- |
||
537 | |||
538 | The files include/fs/mutex.h and include/fs/semaph.h contain the |
||
539 | mutex/semaphore interface used by the file system. |
||
540 | |||
541 | \begin{table} |
||
542 | \begin{verbatim} |
||
543 | typedef internal_sem_t __sem_t; |
||
544 | |||
545 | #define __sem_init (ptr,val) |
||
546 | #define __sem_wait (ptr) |
||
547 | #define __sem_trywait (ptr) |
||
548 | #define __sem_signal (ptr) |
||
549 | \end{verbatim} |
||
550 | \caption{The semaphore interface} |
||
551 | \label{tab:semaph} |
||
552 | \end{table} |
||
553 | |||
554 | In particular, the semaphore interface (see Table |
||
555 | \ref{tab:semaph}) is simply a wrapper to the S.Ha.R.K. Internal |
||
556 | Semaphores. |
||
557 | |||
558 | \begin{table} |
||
559 | \begin{verbatim} |
||
560 | typedef internal_sem_t __mutex_t; |
||
561 | |||
562 | #define __mutex_init (ptr,attr) |
||
563 | #define __mutex_lock (ptr) |
||
564 | #define __mutex_trylock (ptr) |
||
565 | #define __mutex_unlock (ptr) |
||
566 | |||
567 | typedef SYS_FLAGS __fastmutex_t; |
||
568 | |||
569 | #define __fastmutex_init (ptr) |
||
570 | #define __fastmutex_lock (ptr) |
||
571 | #define __fastmutex_unlock (ptr) |
||
572 | \end{verbatim} |
||
573 | \caption{The mutex interface} |
||
574 | \label{tab:mutex} |
||
575 | \end{table} |
||
576 | |||
577 | The mutex interface requires two types of mutexes: |
||
578 | |||
579 | \begin{enumerate} |
||
580 | \item Fast mutexes: when only a few lines have to be protected (remapped to |
||
581 | kern\_fsave/kern\_frestore). |
||
582 | \item Normal mutexes: without any restriction (remapped again to the internal |
||
583 | semaphores). |
||
584 | \end{enumerate} |
||
585 | |||
586 | Please note that these functions are not directly used in the |
||
587 | code; instead, functions like \_\_b\_mutex\_lock and |
||
588 | \_\_fs\_mutex\_lock are used. This is done to make possible the |
||
589 | disabling of the mutual exclusion requirement when not needed |
||
590 | (i.e., if the file system runs as a process with its own address |
||
591 | space). |
||
592 | |||
593 | %---------------------------------------------------------------------------- |
||
594 | \subsection{Exported interface} |
||
595 | \label{sec:dd:intercaccia:ext} |
||
596 | %---------------------------------------------------------------------------- |
||
597 | |||
598 | \begin{table} |
||
599 | \begin{verbatim} |
||
600 | int bdev_read (__dev_t dev, __blkcnt_t blocknum, __uint8_t *buffer); |
||
601 | int bdev_write (__dev_t dev, __blkcnt_t blocknum, __uint8_t *buffer); |
||
602 | int bdev_seek (__dev_t dev, __blkcnt_t blocknum); |
||
603 | int bdev_trylock(__dev_t device); int bdev_tryunlock(__dev_t device); |
||
604 | __uint8_t bdev_getdeffs(__dev_t device); |
||
605 | __dev_t bdev_find_byname(char *name); |
||
606 | __dev_t bdev_find_byfs(__uint8_t fsind); |
||
607 | int bdev_findname(__dev_t device, char **majorname, char **minorname); |
||
608 | int bdev_scan_devices(int(*callback) (__dev_t,__uint8_t)); |
||
609 | \end{verbatim} |
||
610 | \caption{Low level functions} |
||
611 | \label{tab:bdev} |
||
612 | \end{table} |
||
613 | |||
614 | The low level exports two kinds of functions: one to use the |
||
615 | devices, one to search/get the logical devices present on a |
||
616 | system. |
||
617 | |||
618 | The three main functions are: |
||
619 | |||
620 | \begin{description} |
||
621 | \item [bdev\_find\_byname]This function accepts a symbolic device name, as |
||
622 | described in Section \ref{sec:dd:interfaccia:nomi}, and returns 0 if the device |
||
623 | is not registered, or its number otherwise. |
||
624 | \item [bdev\_find\_byfs]This function accepts a file system ID (see |
||
625 | include/fs/fsind.h), and returns the first device that contains that file |
||
626 | system, or 0 if no one provides it. |
||
627 | \item [bdev\_scan\_device]This function can be used to get a list of all the |
||
628 | devices in the system; the user have to pass a callback that is called for each |
||
629 | device in the system. The function returns 0 in case of success, -1 if an error |
||
630 | occurred, or a value != 0 if the callback returns a value != 0. The callback is |
||
631 | called with a device serial number and a file system type that should be present |
||
632 | in the device. |
||
633 | \end{description} |
||
634 | |||
635 | These function are useful when at system startup the root file |
||
636 | system have to be mounted (see Section \ref{sec:fs:mount}). |
||
637 | |||
638 | The functions bdev\_getdeffs and bdev\textbackslash{}\_findname |
||
639 | are used for these purposes: |
||
640 | |||
641 | \begin{description} |
||
642 | \item [bdev\_getdeffs]to know a probable file system present on a logical |
||
643 | device. |
||
644 | \item [bdev\_findname]to know the logical name of a device. The function returns |
||
645 | != 0 in case of error, and 0 in case of success. In the latter case, the two |
||
646 | buffers passed as parameter are filled with the major and minor names. |
||
647 | \end{description} |
||
648 | |||
649 | These functions simply check the parameter values and then they |
||
650 | calls the device drivers functions (see Section |
||
651 | \ref{sec:dd:interfaccia:det:bdevop}): |
||
652 | |||
653 | \begin{description} |
||
654 | \item [bdev\_read]Reads a block on the device. |
||
655 | \item [bdev\_write]Write a block on the device. |
||
656 | \item [bdev\_seek]Move the head in the position passed as parameter. |
||
657 | \item [bdev\_trylock~e~bdev\_tryunlock]Tries to block/unblock a device. |
||
658 | \end{description} |
||
659 | |||
660 | These functions can block the calling thread for an unspecified |
||
661 | amount of time. All the device drivers were thought to be used in |
||
662 | a multithreaded systems. |
||
663 | |||
664 | If a device driver is not multithread, proper mutual exclusion |
||
665 | must be ensured, because the file system high level functions will |
||
666 | call a low level service before the previous request has been |
||
667 | served. Note that if such a mutual exclusion is used, the access |
||
668 | to the device driver will be forced by the mutex policy, and so |
||
669 | all the disk scheduling algorithms will not work properly. |
||
670 | |||
671 | %---------------------------------------------------------------------------- |
||
672 | \section{I/O requests scheduling} |
||
673 | \label{sec:dd:sched} |
||
674 | %---------------------------------------------------------------------------- |
||
675 | |||
676 | Massy's thesis here inserts a dissertation on |
||
677 | the various disk scheduling algorithms that has been removed. |
||
678 | |||
679 | %---------------------------------------------------------------------------- |
||
680 | \section{Disk scheduling algorithms implemented in S.Ha.R.K.} |
||
681 | \label{sec:dd:schedpres} |
||
682 | %---------------------------------------------------------------------------- |
||
683 | |||
684 | The system currently support 5 different scheduling algorithms |
||
685 | that can be changed/increased simply modifying system |
||
686 | configuration. |
||
687 | |||
688 | \begin{table} |
||
689 | \begin{verbatim} |
||
690 | int bqueue_init (bqueue_t *); |
||
691 | int bqueue_numelements (bqueue_t *); |
||
692 | int bqueue_removerequest (bqueue_t *); |
||
693 | int bqueue_insertrequest (bqueue_t *, struct request_prologue *); |
||
694 | struct request_prologue *bqueue_getrequest(bqueue_t *); |
||
695 | \end{verbatim} |
||
696 | \caption{Queue handling functions} |
||
697 | \label{tab:bqueuefunc} |
||
698 | \end{table} |
||
699 | |||
700 | Only one algorithm can be active at a time. All the device drivers |
||
701 | should use the queuing interface listed in the bqueue.h file (see |
||
702 | Table \ref{tab:bqueuefunc}). The structure bqueue\_t is specific |
||
703 | for each scheduling algorithm. The semantic of each function in |
||
704 | the interface is the following: |
||
705 | |||
706 | \begin{description} |
||
707 | \item [bqueue\_init]Initialize the pending request queue; it returns 0 in case |
||
708 | of success, != 0 otherwise. |
||
709 | \item [bqueue\_numelements]Returns the number of elements in the queue. |
||
710 | \item [bqueue\_insertrequest]This function inserts a request in the queue, |
||
711 | following the specific algorithm behavior. The function returns 0 in case of |
||
712 | success, != 0 otherwise. Please note that requests are passed through a pointer |
||
713 | to the structrequest\_prologue\_t (see tab:structreqprolog). That is, every |
||
714 | algorithm should use a structure with a struct request\_prologue as first field, |
||
715 | passing a casted pointer to that structure. |
||
716 | \item [bqueue\_getrequest]Returns the first request in queue (without removing |
||
717 | it), and returns a pointer to that request, or NULL if there are not any pending |
||
718 | requests. |
||
719 | \item [bqueue\_removerequest]Removes the first request in queue, that is the |
||
720 | request that would have been returned by bqueue\_getrequest; it returns 0 in |
||
721 | case of success, != 0 in case of error. |
||
722 | \end{description} |
||
723 | |||
724 | \begin{table} |
||
725 | \begin{verbatim} |
||
726 | #define REQ_DUMMY 0 |
||
727 | #define REQ_SEEK 1 |
||
728 | #define REQ_READ 2 |
||
729 | #define REQ_WRITE 3 |
||
730 | |||
731 | struct request_prologue { |
||
732 | unsigned sector; |
||
733 | unsigned head; |
||
734 | unsigned cylinder; |
||
735 | unsigned nsectors; |
||
736 | unsigned operation; |
||
737 | struct request_specific x; |
||
738 | }; |
||
739 | |||
740 | #define request_prologue_t struct request_prologue |
||
741 | \end{verbatim} |
||
742 | \caption{struct request\_prologue\_t} |
||
743 | \label{tab:structreqprolog} |
||
744 | \end{table} |
||
745 | |||
746 | before calling bdev\_insertrequest, all the fields of the struct |
||
747 | request\_prologue\_t must be filled (see Table |
||
748 | \ref{tab:structreqprolog}): |
||
749 | |||
750 | \begin{description} |
||
751 | \item [sector,~head,~cylinder]This is the starting position in the hard disk. |
||
752 | \item [nsectors]This is the number of sectors to transfer. |
||
753 | \item [operation]The operation that must be performed: |
||
754 | |||
755 | \begin{description} |
||
756 | \item [REQ\_DUMMY]request not specified. |
||
757 | \item [REQ\_SEEK]head seek to the specified position. |
||
758 | \item [REQ\_READ]read request. |
||
759 | \item [REQ\_WRITE]write request. |
||
760 | \end{description} |
||
761 | \end{description} |
||
762 | |||
763 | To add algorithm specific fields, you can use the method used for |
||
764 | the deadlines into Section \ref{sec:dd:sched:edf}. |
||
765 | |||
766 | struct request\_specific (see Table \ref{tab:structreqprolog}) |
||
767 | must be defined by each algorithm, and contains algorithm-specific |
||
768 | fields. |
||
769 | |||
770 | struct bqueue\_t contains a scheduling queue. Its first field must |
||
771 | be a pointer to a struct phdiskinfo (see Section |
||
772 | \ref{sec:dd:interfaccia:det}, and Table \ref{tab:structphdsk}). |
||
773 | |||
774 | Here is a description about the implementation of various disk |
||
775 | scheduling algorithms. |
||
776 | |||
777 | %---------------------------------------------------------------------------- |
||
778 | \subsection{FCFS} |
||
779 | \label{sec:dd:sched:fcfs} |
||
780 | %---------------------------------------------------------------------------- |
||
781 | |||
782 | \begin{table} |
||
783 | \begin{verbatim} |
||
784 | typedef struct TAGfcfs_queue_t { |
||
785 | struct phdskinfo *disk; |
||
786 | __b_fastmutex_t mutex; |
||
787 | struct request_prologue *head; |
||
788 | struct request_prologue *tail; |
||
789 | int counter; |
||
790 | } fcfs_queue_t; |
||
791 | |||
792 | struct request_specific { |
||
793 | void *next; |
||
794 | }; |
||
795 | \end{verbatim} |
||
796 | \caption{struct fcfs\_queue\_t} |
||
797 | \label{tab:structfcfs} |
||
798 | \end{table} |
||
799 | |||
800 | The FCFS policy is implemented using the structure in table |
||
801 | \ref{tab:structfcfs}: |
||
802 | |||
803 | \begin{description} |
||
804 | \item [mutex]A mutual exclusion semaphore. |
||
805 | \item [head~e~tail]a pointer to head and tail of a request queue. |
||
806 | \item [counter]number of pending requests. |
||
807 | \end{description} |
||
808 | |||
809 | Then, the struct request\_specific contains a pointer that is used |
||
810 | to link the list. |
||
811 | |||
812 | %---------------------------------------------------------------------------- |
||
813 | \subsection{SSTF} |
||
814 | \label{sec:dd:sched:sstf} |
||
815 | %---------------------------------------------------------------------------- |
||
816 | |||
817 | \begin{table} |
||
818 | \begin{verbatim} |
||
819 | typedef struct TAGsstf_queue_t { |
||
820 | struct phdskinfo *disk; |
||
821 | __b_fastmutex_t mutex; |
||
822 | struct request_prologue *actual; |
||
823 | struct request_prologue *lqueue; |
||
824 | struct request_prologue *hqueue; |
||
825 | int counter; |
||
826 | } sstf_queue_t; |
||
827 | |||
828 | struct request_specific { |
||
829 | void *next; |
||
830 | }; |
||
831 | \end{verbatim} |
||
832 | \caption{struct sstf\_queue\_t} |
||
833 | \label{tab:structsstf} |
||
834 | \end{table} |
||
835 | |||
836 | This scheduling policy is implemented using the data structures in |
||
837 | Figure \ref{tab:structsstf}: |
||
838 | |||
839 | \begin{description} |
||
840 | \item [mutex]for mutual exclusion. |
||
841 | \item [actual]the current request (that is returned by bqueue\_getrequest()). |
||
842 | \item [lqueue~e~hqueue]two lists ordered in decreasing and increasing order: |
||
843 | when a new request arrives, the request is inserted on one of the two queues |
||
844 | depending on the value of the current request cylinder. The current request will |
||
845 | become the nearest to the current cylinder. Note: This algorithm can cause |
||
846 | starvation. |
||
847 | \item [counter]The request counter. |
||
848 | \end{description} |
||
849 | |||
850 | Then, the struct request\_specific contains a pointer that is used |
||
851 | to link the list. |
||
852 | |||
853 | %---------------------------------------------------------------------------- |
||
854 | \subsection{LOOK} |
||
855 | \label{sec:dd:sched:look} |
||
856 | %---------------------------------------------------------------------------- |
||
857 | |||
858 | \begin{table} |
||
859 | \begin{verbatim} |
||
860 | typedef struct TAGlook_queue_t { |
||
861 | struct phdskinfo *disk; |
||
862 | __b_fastmutex_t mutex; |
||
863 | int dir; |
||
864 | struct request_prologue *queue[2]; |
||
865 | int counter; |
||
866 | } look_queue_t; |
||
867 | |||
868 | struct request_specific { |
||
869 | void *next; |
||
870 | }; |
||
871 | \end{verbatim} |
||
872 | \caption{struct look\_queue\_t} |
||
873 | \label{tab:structlook} |
||
874 | \end{table} |
||
875 | |||
876 | This scheduling policy is implemented using the data structures in |
||
877 | Figure \ref{tab:structlook}: |
||
878 | |||
879 | \begin{description} |
||
880 | \item [mutex]for mutual exclusion. |
||
881 | \item [dir]Is the looking direction: 0 ascending, 1 descending. |
||
882 | \item [queue]two ordered lists (one ascending, one descending). queue[dir] is |
||
883 | the queue used. When the list becomes empty, dir=!dir. a new request is inserted |
||
884 | in the current queue if its position is after/before the actual position |
||
885 | (otherwise, it is inserted in the other list). |
||
886 | \item [counter]the request counter. |
||
887 | \end{description} |
||
888 | |||
889 | This algorithm can be implemented in two variants: |
||
890 | |||
891 | \begin{enumerate} |
||
892 | \item Requests with position = current position are inserted in the current |
||
893 | queue (starvation can happen). |
||
894 | \item Requests with position = current position are inserted in the queue not |
||
895 | currently used. This is the default choice; to modify it, you have to modify the |
||
896 | look.c file. |
||
897 | \end{enumerate} |
||
898 | |||
899 | %---------------------------------------------------------------------------- |
||
900 | \subsection{CLOOK} |
||
901 | \label{sec:dd:sched:clook} |
||
902 | %---------------------------------------------------------------------------- |
||
903 | |||
904 | \begin{table} |
||
905 | \begin{verbatim} |
||
906 | typedef struct TAGclook_queue_t { |
||
907 | struct phdskinfo *disk; |
||
908 | __b_fastmutex_t mutex; |
||
909 | int act; |
||
910 | struct request_prologue *queue[2]; |
||
911 | int counter; |
||
912 | } clook_queue_t; |
||
913 | |||
914 | struct request_specific { |
||
915 | void *next; |
||
916 | }; |
||
917 | \end{verbatim} |
||
918 | \caption{struct clook\_queue\_t} |
||
919 | \label{tab:structclook} |
||
920 | \end{table} |
||
921 | |||
922 | This scheduling policy is implemented using the data structures in |
||
923 | Figure \ref{tab:structclook}: |
||
924 | |||
925 | \begin{description} |
||
926 | \item [mutex]for mutual exclusion. |
||
927 | \item [dir]This is the current queue. |
||
928 | \item [queue]Two ascending lists. queue[dir] is used until it is emptied. Then, |
||
929 | dir is changed. A new request is inserted in the current queue if the current |
||
930 | position is greater than the current one. |
||
931 | \item [counter]the request counter. |
||
932 | \end{description} |
||
933 | |||
934 | This implementation can introduce starvation because insertions at |
||
935 | the current position are done on the same queue. |
||
936 | |||
937 | %---------------------------------------------------------------------------- |
||
938 | \subsection{EDF} |
||
939 | \label{sec:dd:sched:edf} |
||
940 | %---------------------------------------------------------------------------- |
||
941 | |||
942 | \begin{table} |
||
943 | \begin{verbatim} |
||
944 | typedef struct TAGbd_edf_queue_t { |
||
945 | struct phdskinfo *disk; |
||
946 | __b_fastmutex_t mutex; |
||
947 | struct request_prologue *queue; |
||
948 | int inservice; |
||
949 | int counter; |
||
950 | } bd_edf_queue_t; |
||
951 | |||
952 | struct request_specific { |
||
953 | void *next; |
||
954 | long dl; |
||
955 | }; |
||
956 | \end{verbatim} |
||
957 | \caption{struct look\_queue\_t} |
||
958 | \label{tab:structedf} |
||
959 | \end{table} |
||
960 | |||
961 | This is the only real-time disk scheduling algorithm implemented |
||
962 | in S.Ha.R.K. This scheduling policy is implemented using the data |
||
963 | structures in Figure \ref{tab:structedf}: |
||
964 | |||
965 | \begin{description} |
||
966 | \item [mutex]For mutual exclusion. |
||
967 | \item [queue]An ordered list, increasing with the deadline. |
||
968 | \item [inservice]This flag is set when a request is in service. |
||
969 | \item [counter]The request counter. |
||
970 | \end{description} |
||
971 | |||
972 | \begin{table} |
||
973 | \begin{verbatim} |
||
974 | typedef struct { |
||
975 | RES_MODEL r; |
||
976 | TIME dl; |
||
977 | } BDEDF_RES_MODEL; |
||
978 | |||
979 | #define BDEDF_res_default_model(res) |
||
980 | #define BDEDF_res_def_level(res,l) |
||
981 | #define BDEDF_res_def_dl(res,dl) |
||
982 | \end{verbatim} |
||
983 | \caption{Resource Module for deadlines} |
||
984 | \label{tab:edfresmodel} |
||
985 | \end{table} |
||
986 | |||
987 | The request's deadline is got during task creation using the |
||
988 | S.Ha.R.K. resource module shown in Figure \ref{tab:edfresmodel}. |
||
989 | |||
990 | If the resource module is not used, the deadline is set to |
||
991 | infinity. That is, in that case the scheduling algorithm become a |
||
992 | FCFS. |
||
993 | |||
994 | %---------------------------------------------------------------------------- |
||
995 | \section{Device driver: the IDE interface} |
||
996 | \label{sec:dd:ide} |
||
997 | %---------------------------------------------------------------------------- |
||
998 | |||
999 | %---------------------------------------------------------------------------- |
||
1000 | \subsection{Description} |
||
1001 | \label{sec:dd:ide:desc} |
||
1002 | %---------------------------------------------------------------------------- |
||
1003 | |||
1004 | A description of the IDE standard can be found in\cite{manual:ideintel} and |
||
1005 | \cite{manual:idevia}. This interface allows the connection of up to 2 IDE |
||
1006 | peripherals using the ATA or ATAPI protocol (as described into |
||
1007 | \cite{manual:ata4}). This driver implements the ATA-4 protocol \footnote{The |
||
1008 | original Massy's thesis have a small description of the standard, that I have |
||
1009 | removed.}. |
||
1010 | |||
1011 | %---------------------------------------------------------------------------- |
||
1012 | \subsection{Implementation} |
||
1013 | \label{sec:dd:ide:code} |
||
1014 | %---------------------------------------------------------------------------- |
||
1015 | |||
1016 | \begin{figure} |
||
1017 | \begin{center} |
||
1018 | \includegraphics[width=7cm]{images/ide.eps} |
||
1019 | \end{center} |
||
1020 | \caption{The IDE subsystem} |
||
1021 | \label{fig:ide} |
||
1022 | \end{figure} |
||
1023 | |||
1024 | |||
1025 | The architecture of the IDE subsystem is shown in Figure |
||
1026 | \ref{fig:ide}: there is a part of the interface responsible for |
||
1027 | the system initialization (for example, for finding the IDE |
||
1028 | interfaces in the system, for finding the devices and their |
||
1029 | logical structure, for transforming the low level requests in |
||
1030 | ATA-4 commands), for command handling and another part for |
||
1031 | interfacing with the disk scheduling queue handling. Finally, |
||
1032 | there is a small part OS-dependent that should be rewritten to |
||
1033 | port it to other OSes. |
||
1034 | |||
1035 | If possible, the driver uses the DMA; in any case, polling can be |
||
1036 | forced. DMA is only supported in bus-master mode |
||
1037 | \cite{manual:ideintel}. |
||
1038 | |||
1039 | The system actually use only some commands of the classes ``PIO |
||
1040 | data in'', ``PIO data out'', ``Non-data'' e ``DMA''; it uses |
||
1041 | the ``read'', ``write'', ``seek'', ``dma read'', ``dma |
||
1042 | write'', ``identify'', ``pidentify'' and ``set feature'' |
||
1043 | commands; packet commands are not currently supported, so ATAPI |
||
1044 | devices (like cd-roms) are currently not supported. |
||
1045 | |||
1046 | %---------------------------------------------------------------------------- |
||
1047 | \subsubsection{Initialization} |
||
1048 | %---------------------------------------------------------------------------- |
||
1049 | |||
1050 | The system is initialized calling the ide\_init() function into |
||
1051 | bdev\_init(). This is automatically done if the IDE support is |
||
1052 | compiled. The user should only initialize the struct bdev\_parms, |
||
1053 | as specified in Section \ref{sec:dd:interfaccia}. |
||
1054 | |||
1055 | \begin{table} |
||
1056 | \begin{verbatim} |
||
1057 | typedef struct ide_parms { |
||
1058 | void *parm_initserver; |
||
1059 | } IDE_PARMS; |
||
1060 | |||
1061 | struct ide_server_model { |
||
1062 | TIME dl; |
||
1063 | TIME wcet; |
||
1064 | TIME mit; |
||
1065 | }; |
||
1066 | \end{verbatim} |
||
1067 | \caption{Data structures used to initialize the IDE driver.} |
||
1068 | \label{tab:ideinit} |
||
1069 | \end{table} |
||
1070 | |||
1071 | The IDE\_PARMS structure has only one parameter, and is included |
||
1072 | into the ideparms field in the global init data structure: |
||
1073 | |||
1074 | \begin{description} |
||
1075 | \item [parm\_initserver]This parameter is OS-specific and is |
||
1076 | related to the glue code. |
||
1077 | \end{description} |
||
1078 | |||
1079 | \begin{table} |
||
1080 | \begin{verbatim} |
||
1081 | void ide_glue_send_request (int ideif); |
||
1082 | int ide_glue_activate_interface (int ideif); |
||
1083 | void ide_glue_unactivate_interface (int ideif); |
||
1084 | \end{verbatim} |
||
1085 | \caption{The glue code for the IDE driver.} |
||
1086 | \label{tab:ideglue} |
||
1087 | \end{table} |
||
1088 | |||
1089 | The system dependent part of the IDE driver is contained into the |
||
1090 | file ideglue.c (see Figure \ref{tab:ideglue}): |
||
1091 | |||
1092 | \begin{description} |
||
1093 | \item [ide\_glue\_activate\_interface]It enables the interface passed as |
||
1094 | parameter: It creates an aperiodic task, and assigns to it the interface |
||
1095 | interrupt. The task is activated when the interrupt arrives. |
||
1096 | \item [ide\_glue\_activate\_interface]It disables the interface. The aperiodic |
||
1097 | task is killed. |
||
1098 | \item [ide\_glue\_send\_request]It executes the first available request: it |
||
1099 | explicitly activates the task interface; when the task is activated because of |
||
1100 | an interrupt, the pending request is terminated, and another request (if |
||
1101 | present) is activated. |
||
1102 | \end{description} |
||
1103 | |||
1104 | the parm\_initserver field contains a pointer to the struct |
||
1105 | ide\_server\_model, used to create the aperiodic tasks (see Figure |
||
1106 | \ref{tab:ideinit}): |
||
1107 | |||
1108 | \begin{description} |
||
1109 | \item [dl]task deadline in $\mu$sec. |
||
1110 | \item [wcet]task worst case execution time in $\mu$sec. |
||
1111 | \item [mit]the mean inter arrival time. |
||
1112 | \end{description} |
||
1113 | |||
1114 | If these parameters are not specified, standard parameters are |
||
1115 | used. |
||
1116 | |||
1117 | These defines are used to conditionally compile the driver (see |
||
1118 | the source code): |
||
1119 | |||
1120 | \begin{description} |
||
1121 | \item [FORCE\_SCANNING]this define force the search of the IDE peripherals also |
||
1122 | if the bus reset fails. This is useful if some peripherals does not follow the |
||
1123 | standard timings. |
||
1124 | \item [IDE\_OLDATAPIDELAY]this define forces a little delay that can be useful |
||
1125 | when using old ATAPI interfaces. |
||
1126 | \item [IDE\_FORCERESET]if defined a soft reset operation cannot be aborted |
||
1127 | because of a timeout. It must be specified if the peripheral does not follows |
||
1128 | the soft reset timings or if it cannot go in stand-by mode. |
||
1129 | \item [IDE\_SKIPSTATUSTEST]with this symbol we do not check if the command is |
||
1130 | being executed correctly (that is done reading on a status register); all the |
||
1131 | TX-based motherboard we found needs this symbol. |
||
1132 | \item [IDE\_SKIPALLTEST]when set, all the command are written into the registers |
||
1133 | without looking on the status register. This symbol implies the previous. This |
||
1134 | mode was needed on a TX motherboard where the polled mode for commands was |
||
1135 | unreliable. |
||
1136 | \end{description} |
||
1137 | |||
1138 | %---------------------------------------------------------------------------- |
||
1139 | \subsubsection{Policy queues Interface} |
||
1140 | %---------------------------------------------------------------------------- |
||
1141 | |||
1142 | \begin{table} |
||
1143 | \begin{verbatim} |
||
1144 | typedef struct TAGidereq_t { |
||
1145 | request_prologue_t info; |
||
1146 | int next; |
||
1147 | __uint8_t resetonerror; |
||
1148 | __uint8_t cmd; |
||
1149 | __b_sem_t wait; |
||
1150 | int result; |
||
1151 | __uint8_t features; |
||
1152 | __uint8_t cyllow; |
||
1153 | __uint8_t cylhig; |
||
1154 | __uint8_t seccou; |
||
1155 | __uint8_t secnum; |
||
1156 | __uint8_t devhead; |
||
1157 | __uint8_t *buffer; |
||
1158 | } idereq_t; |
||
1159 | \end{verbatim} |
||
1160 | \caption{struct idereq\_t} |
||
1161 | \label{tab:idereq} |
||
1162 | \end{table} |
||
1163 | |||
1164 | Table \ref{tab:idereq} shows struct idereq\_t that is used by the |
||
1165 | driver to store the pending requests. It can be noted that the |
||
1166 | info field is of type request\_prologue\_t, as required by the |
||
1167 | modules that handle the scheduling policies (see |
||
1168 | \ref{sec:dd:schedpres}). |
||
1169 | |||
1170 | Note that (since the IDE interface allows only a command at a |
||
1171 | time) there are two queues for each IDE interface (one for the |
||
1172 | master device, one for the slave). These request queues are |
||
1173 | prioritized. That is, the master queue always have priority on the |
||
1174 | slave queue. |
||
1175 | |||
1176 | Here a small description of the fields showed in Table |
||
1177 | \ref{tab:idereq}: |
||
1178 | |||
1179 | \begin{description} |
||
1180 | \item [info]To handle the requests using a module for disk scheduling. |
||
1181 | \item [next]used internally to handle a request pool. |
||
1182 | \item [resetonerror]If set, an error during a command implies a soft reset. |
||
1183 | \item [cmd]The command for the peripheral (see \cite{manual:ata4}). |
||
1184 | \item [wait]Synchronization semaphore: when a task sends a command, it waits the |
||
1185 | end of the execution of the command. |
||
1186 | \item [result]This is the result: 0 in case of success, != 0 in case of error |
||
1187 | (see idelow.c for the error codes). |
||
1188 | \item [features,~cyllow,~cylhigh,~seccou,~secnum~e~devhead]Values to be inserted |
||
1189 | into the I/O registers of the IDE interface. The behavior is different depending |
||
1190 | on the peripheral mode (LBA or C/H/S) (see Section \ref{sec:dd:ide:desc}). |
||
1191 | \item [buffer]read/write buffer (if needed). |
||
1192 | \end{description} |
||
1193 | |||
1194 | %---------------------------------------------------------------------------- |
||
1195 | \subsubsection{Minor numbers} |
||
1196 | %---------------------------------------------------------------------------- |
||
1197 | |||
1198 | \begin{figure} |
||
1199 | \begin{center} |
||
1200 | \includegraphics[width=10cm]{images/ideminor.eps} |
||
1201 | \end{center} |
||
1202 | \caption{IDE minor numbers} |
||
1203 | \label{fig:ideminor} |
||
1204 | \end{figure} |
||
1205 | |||
1206 | This section describes how minor numbers are handled. |
||
1207 | |||
1208 | As described in Section \ref{sec:dd:intercaccia:ext}, the generic |
||
1209 | low level part uses the major number to pass a request to a |
||
1210 | device. The specific device is then selected using the minor |
||
1211 | number. |
||
1212 | |||
1213 | Figure \ref{fig:ideminor} shows the minor number structure of this |
||
1214 | device: |
||
1215 | |||
1216 | \begin{itemize} |
||
1217 | \item The first 3 bits identifies the interface (8 interfaces maximum). |
||
1218 | \item A bit for the peripheral (master/slave). This is used to enqueue request |
||
1219 | in the right queue. |
||
1220 | \item The last 4 bits says which partition should be used. This info is used to |
||
1221 | convert relative sector numbers into absolute sector numbers. |
||
1222 | \end{itemize} |
||
1223 | |||
1224 | Minor numbers are accessed only via macros. To modify the minor |
||
1225 | number mapping you should only modify these macros. |
||
1226 | |||
1227 | %---------------------------------------------------------------------------- |
||
1228 | \chapter{File system} |
||
1229 | \label{sec:fs} |
||
1230 | %---------------------------------------------------------------------------- |
||
1231 | |||
1232 | With the word ``file system'' we mean the internal organization |
||
1233 | of a block device; the purpose of such an organization is to store |
||
1234 | a set of files, usually divided in directories. |
||
1235 | |||
1236 | A block device is seen by the file system as an ordered sequence |
||
1237 | of fixed sized blocks. The basic operations of a file system are |
||
1238 | read/write operations on the n-th block. |
||
1239 | |||
1240 | %---------------------------------------------------------------------------- |
||
1241 | \section{Interface} |
||
1242 | \label{sec:fs:int} |
||
1243 | %---------------------------------------------------------------------------- |
||
1244 | |||
1245 | This section describes the interface between the system and the |
||
1246 | user tasks. It also describes the interface between a specific |
||
1247 | instance of file system and the system (just to make easy the |
||
1248 | expansion of the system with new real-time file systems). |
||
1249 | |||
1250 | %---------------------------------------------------------------------------- |
||
1251 | \subsection{Initialization} |
||
1252 | \label{sec:fs:int:init} |
||
1253 | %---------------------------------------------------------------------------- |
||
1254 | |||
1255 | To initialize the higher level of the system we use an interface |
||
1256 | similar to the device drivers interface (see Section |
||
1257 | \ref{sec:dd:interfaccia:init}. |
||
1258 | |||
1259 | \begin{table} |
||
1260 | \begin{verbatim} |
||
1261 | typedef struct filesystem_parms { |
||
1262 | __dev_t device; |
||
1263 | __uint8_t fs_ind; |
||
1264 | __uint32_t fs_showinfo:1; |
||
1265 | void *fs_mutexattr; |
||
1266 | struct mount_opts *fs_opts; |
||
1267 | } FILESYSTEM_PARMS; |
||
1268 | |||
1269 | #define filesystem_def_rootdevice(par,rootdev) |
||
1270 | #define filesystem_def_fs(par,fs) |
||
1271 | #define filesystem_def_showinfo(par,show) |
||
1272 | #define filesystem_def_options(par,opts) |
||
1273 | |||
1274 | int filesystem_init(FILESYSTEM_PARMS *); |
||
1275 | \end{verbatim} |
||
1276 | \caption{struct filesystem\_parms} |
||
1277 | \label{tab:fsparms} |
||
1278 | \end{table} |
||
1279 | |||
1280 | The function filesystem\_init() is called to initialize the higher |
||
1281 | layer passing the parameters showed in Figure \ref{tab:fsparms}. |
||
1282 | These parameters must be initialized using the following macros: |
||
1283 | |||
1284 | \begin{description} |
||
1285 | \item [filesystem\_def\_rootdevice]Set the partition that must be used as |
||
1286 | \emph{root device}, that is, the directory associated to ``/''. This field is |
||
1287 | mandatory, and that implies that the higher layer must be initialized |
||
1288 | \emph{after} the device drivers. |
||
1289 | \item [filesystem\_def\_fs]Selects the filesystem that must be used with the |
||
1290 | device. It must be one of the symbols defined into include/fs/fsind.h; if the |
||
1291 | special symbol FS\_DEFAULT is used, the system tries to autodetect the |
||
1292 | filesystem type. |
||
1293 | \item [filesystem\_def\_showinfo]If this flag is set, then initialization |
||
1294 | messages are printed on the screen. |
||
1295 | \item [filesystem\_def\_options]There are the mount options. |
||
1296 | \end{description} |
||
1297 | |||
1298 | The field fs\_mutexattr has the same meaning of the correspondent |
||
1299 | field into bdev\_parms (see Section |
||
1300 | \ref{sec:dd:interfaccia:init}). |
||
1301 | |||
1302 | You can use the macro BASE\_FILESYSTEM to initialize a struct |
||
1303 | filesystem\_parms with some default values. NULL is not allowed as |
||
1304 | a parameter of filesystem\_init. |
||
1305 | |||
1306 | \begin{table} |
||
1307 | \begin{verbatim} |
||
1308 | struct mount_opts { |
||
1309 | __uint32_t flags; |
||
1310 | union { |
||
1311 | struct msdosfs_parms msdos; |
||
1312 | } x; |
||
1313 | }; |
||
1314 | |||
1315 | /* Flags */ |
||
1316 | #define MOUNT_FLAG_RW 0x01 |
||
1317 | \end{verbatim} |
||
1318 | \caption{struct mount\_opts} |
||
1319 | \label{tab:mountopt} |
||
1320 | \end{table} |
||
1321 | |||
1322 | \label{sec:mountopt}The mount options for every device must be |
||
1323 | inserted into a struct mount\_opts showed in Table |
||
1324 | \ref{tab:mountopt}. The structure is composed by two parts: a |
||
1325 | general part and a filesystem-specific part. The general part |
||
1326 | actually contains the following fields: |
||
1327 | |||
1328 | \begin{description} |
||
1329 | \item [flags]This field contains the mount fields. The only flag |
||
1330 | currently defined is MOUNT\_FLAG\_RW, that allows write operations |
||
1331 | on the file system (default is read-only). |
||
1332 | \end{description} |
||
1333 | |||
1334 | \begin{figure} |
||
1335 | \begin{center} |
||
1336 | \includegraphics[width=7cm]{images/fsinit.eps} |
||
1337 | \end{center} |
||
1338 | \caption{Filesystem init} |
||
1339 | \label{fig:fsinit} |
||
1340 | \end{figure} |
||
1341 | |||
1342 | Filesystem initialization proceeds as shown in Figure \ref{fig:fsinit}: |
||
1343 | |||
1344 | \begin{enumerate} |
||
1345 | \item The user calls filesystem\_init(); |
||
1346 | \item filesystem\_init() initializes itself and its modules, then all the file |
||
1347 | systems are initialized. filesystem\_init() have to be changed if a new |
||
1348 | filesystem is introduced. |
||
1349 | \item The file systems initialization functions have to register themselves |
||
1350 | calling the function filesystem\_register() with a struct file\_system\_type. |
||
1351 | \end{enumerate} |
||
1352 | |||
1353 | %---------------------------------------------------------------------------- |
||
1354 | \subsubsection{Struct file\_system\_type} |
||
1355 | \label{sec:filesystemtype} |
||
1356 | %---------------------------------------------------------------------------- |
||
1357 | |||
1358 | \begin{table} |
||
1359 | \begin{verbatim} |
||
1360 | struct file_system_type { |
||
1361 | const char *name; |
||
1362 | int fs_flags; |
||
1363 | __uint8_t fs_ind; |
||
1364 | struct super_block *(*read_super) ( |
||
1365 | __dev_t device, |
||
1366 | __uint8_t fs_ind, |
||
1367 | struct mount_opts *options |
||
1368 | ); |
||
1369 | }; |
||
1370 | \end{verbatim} |
||
1371 | \caption{struct file\_system\_type} |
||
1372 | \label{src:structfs} |
||
1373 | \end{table} |
||
1374 | |||
1375 | This structure (see Table \ref{src:structfs})stores the |
||
1376 | informations needed for the filesystem registration. Here a short |
||
1377 | description of its fields: |
||
1378 | |||
1379 | \begin{description} |
||
1380 | \item [name]Pointer to the filesystem name (e.g., ``FAT16''). |
||
1381 | \item [fs\_flags]Filesystem flags (currently not used). |
||
1382 | \item [fs\_ind]Filesystem ID (every filesystem has a different ID). |
||
1383 | \item [read\_super]This the filesystem initialization function. It is called |
||
1384 | when a partition is mounted to initialize the correspondent struct super\_block. |
||
1385 | \end{description} |
||
1386 | |||
1387 | %---------------------------------------------------------------------------- |
||
1388 | \subsection{Internal interface} |
||
1389 | %---------------------------------------------------------------------------- |
||
1390 | |||
1391 | This section describes the interface between the high layer (that |
||
1392 | handles the file systems), and the modules that implements a |
||
1393 | specific filesystem. |
||
1394 | |||
1395 | When a user mount a device into a directory using a filesystem, |
||
1396 | the system uses the file system's specific struct |
||
1397 | file\_system\_type, calling the function read\_super() to read the |
||
1398 | superblock of the specified device. |
||
1399 | |||
1400 | The struct super\_block contains a pointer to a structure that |
||
1401 | contains a set of ``virtual operations'' (see Section |
||
1402 | \ref{sec:fs:intstruct}), that implements the behavior of a |
||
1403 | specific filesystem. |
||
1404 | |||
1405 | %---------------------------------------------------------------------------- |
||
1406 | \subsubsection{struct super\_operations} |
||
1407 | \label{sec:superop} |
||
1408 | %---------------------------------------------------------------------------- |
||
1409 | |||
1410 | \begin{table} |
||
1411 | \begin{verbatim} |
||
1412 | struct super_operations { |
||
1413 | int (*init_inode) (struct inode *); |
||
1414 | int (*read_inode) (struct inode *); |
||
1415 | int (*write_inode) (struct inode *); |
||
1416 | int (*put_super) (struct super_block *); |
||
1417 | int (*delete_inode) (struct inode *); |
||
1418 | }; |
||
1419 | \end{verbatim} |
||
1420 | \caption{struct super\_operations} |
||
1421 | \label{src:structsuperop} |
||
1422 | \end{table} |
||
1423 | |||
1424 | The structure in Table \ref{src:structsuperop} is used for super |
||
1425 | block manipulation. All the functions should return 0 in case of |
||
1426 | success, or a value != 0 otherwise. |
||
1427 | |||
1428 | \begin{description} |
||
1429 | \item [init\_inode]This function initializes a struct inode passed as parameter. |
||
1430 | That structure must have the fields i\_sp and i\_st.st\_ino already initialized |
||
1431 | (see Section \ref{sec:inode}). |
||
1432 | \item [read\_inode]This function reads the inode passed as parameter from the |
||
1433 | filesystem. The inode structure must have the fields i\_sp and i\_st.st\_ino |
||
1434 | already initialized. |
||
1435 | \item [write\_inode]This function writes a (valid, full initialized) inode into |
||
1436 | the filesystem. |
||
1437 | \item [delete\_inode]This function removes the inode from the filesystem (the |
||
1438 | inode must already exist on the filesystem). |
||
1439 | \item [put\_super]This function writes the super block passed as parameter into |
||
1440 | the filesystem. This function is called when a filesystem is unmounted. A dual |
||
1441 | function (that is, a function that reads a super\_block from a filesystem) |
||
1442 | (which is only used when mounting the filesystem) can be found into the struct |
||
1443 | file\_system\_type (see Section |
||
1444 | \ref{sec:filesystemtype}). |
||
1445 | \end{description} |
||
1446 | |||
1447 | %---------------------------------------------------------------------------- |
||
1448 | \subsubsection{Struct dentry\_operations} |
||
1449 | %---------------------------------------------------------------------------- |
||
1450 | |||
1451 | \begin{table} |
||
1452 | \begin{verbatim} |
||
1453 | struct dentry_operations { |
||
1454 | int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); |
||
1455 | }; |
||
1456 | \end{verbatim} |
||
1457 | \caption{struct dentry\_operations} |
||
1458 | \label{src:structdentryop} |
||
1459 | \end{table} |
||
1460 | |||
1461 | \label{sec:dentryop} This structure contains all the functions |
||
1462 | used to manipulate a struct dentry. |
||
1463 | |||
1464 | \begin{description} |
||
1465 | \item [d\_compare]This function compares the two strings passed as |
||
1466 | parameter. it returns 0 if the two strings are equal, or != 0 |
||
1467 | otherwise. The filesystem do not use directly the strcmp() |
||
1468 | function, because filename comparison depends on the filesystem |
||
1469 | (e.g., case sensitive, case insensitive). |
||
1470 | \end{description} |
||
1471 | |||
1472 | %---------------------------------------------------------------------------- |
||
1473 | \subsubsection{struct inode\_operations} |
||
1474 | \label{sec:inodeop} |
||
1475 | %---------------------------------------------------------------------------- |
||
1476 | |||
1477 | \begin{table} |
||
1478 | \begin{verbatim} |
||
1479 | struct inode_operations { |
||
1480 | struct file_operations *default_file_ops; |
||
1481 | struct inode* (*create) (struct inode *, struct dentry *); |
||
1482 | struct inode* (*lookup) (struct inode *, struct dentry *); |
||
1483 | int (*unlink) (struct dentry *); |
||
1484 | void (*destroy) (struct inode *); |
||
1485 | int (*truncate) (struct inode *, __off_t len); |
||
1486 | }; |
||
1487 | \end{verbatim} |
||
1488 | \caption{struct inode\_operations} |
||
1489 | \label{src:structinodeop} |
||
1490 | \end{table} |
||
1491 | |||
1492 | This struct contains the virtual operations on the struct inode. |
||
1493 | These functions returns 0 in case of success, !=0 in case of |
||
1494 | error, that will be interpreted as a negate errno() value (e.g., |
||
1495 | -EPERM instead of EPERM). |
||
1496 | |||
1497 | \begin{description} |
||
1498 | \item [default\_file\_ops]these are the file\_operations described into the |
||
1499 | paragraph \ref{sec:fileop}, that must be used with the inode. By default, the |
||
1500 | field is initialized with the file\_operation of its filesystem. These pointers |
||
1501 | can be modified to implement a kind of virtual filesystem that works on a |
||
1502 | device. For example, we can define an inode ``/dev/ide/hda1'' in a way that all |
||
1503 | read/write operations on it are not mapped in the operations of the filesystem |
||
1504 | where it is stored. |
||
1505 | \item [create]This function creates a new inode (a new file) with the name |
||
1506 | passed as parameter into the struct inode passed as parameter (that must be an |
||
1507 | inode of type ``directory''). It returns the new inode or NULL in case of |
||
1508 | error. |
||
1509 | \item [lookup]This function searches an inode passed as parameter into another |
||
1510 | inode of type directory passed as parameter. It returns the inode just found or |
||
1511 | NULL in case of error. |
||
1512 | \item [unlink]This function unlinks the filename passed as parameter from the |
||
1513 | parameter of the corresponding inode. The inode will be removed (by generic |
||
1514 | filesystem routines) if it remains without links associated to it. |
||
1515 | \item [destroy]This function destroys the file linked to the inode. It frees the |
||
1516 | space occupied by the file. Often, it is sufficient to call the truncate |
||
1517 | function with length 0. |
||
1518 | \item [truncate]This function truncates the file length of the inode passed as |
||
1519 | parameter at the length specified in the second parameter. If the given length |
||
1520 | is greater than the actual file length, the function does nothing. |
||
1521 | \end{description} |
||
1522 | |||
1523 | %---------------------------------------------------------------------------- |
||
1524 | \subsubsection{struct file\_operations} |
||
1525 | \label{sec:fileop} |
||
1526 | %---------------------------------------------------------------------------- |
||
1527 | |||
1528 | \begin{table} |
||
1529 | \begin{verbatim} |
||
1530 | struct file_operations { |
||
1531 | __off_t (*lseek) (struct file *, __off_t, int); |
||
1532 | __ssize_t (*read) (struct file *, char *, __ssize_t); |
||
1533 | __ssize_t (*write) (struct file *, char *, __ssize_t); |
||
1534 | int (*readdir) (struct file *, void *); |
||
1535 | int (*open) (struct inode *, struct file *); |
||
1536 | int (*close) (struct inode *, struct file *); |
||
1537 | }; |
||
1538 | \end{verbatim} |
||
1539 | \caption{struct file\_operations} |
||
1540 | \label{src:structfileop} |
||
1541 | \end{table} |
||
1542 | |||
1543 | This structure contains all the function that must work with |
||
1544 | files. If not otherwise stated, the functions can operate both on |
||
1545 | files and on directory. |
||
1546 | |||
1547 | \begin{description} |
||
1548 | \item [lseek]Moves the input/output position of the file specified as parameter. |
||
1549 | The new position is specified as in the lseek primitive (described in |
||
1550 | \cite{book:posix1003.1} or in all the Unix programming books). That is, the |
||
1551 | position is specified with a delta with respect to a given position (start of |
||
1552 | file, end of file, current position); As described in section \ref{sec:file} |
||
1553 | during the description of the field f\_pos, the new position can be greater than |
||
1554 | the end-of-file. This function must return a position inside the file or an |
||
1555 | error code. |
||
1556 | \item [read]Reads some data and puts it into the buffer. The function is called |
||
1557 | only for regular files. It returns the number of bytes read, that can be less |
||
1558 | than requested, or an error code. |
||
1559 | \item [write]Writes some bytes from the specified buffer. It is called only for |
||
1560 | regular files. It returns the number of bytes written, that can be less than |
||
1561 | requested, or an error code. |
||
1562 | \item [readdir]Reads the next filename of the directory parameter passed through |
||
1563 | a struct file. It inserts it into the struct dirent passed as parameter. Returns |
||
1564 | |||
1565 | \item [open]Opens the file linked to the inode. Fills some internal fields in |
||
1566 | the filesystem-specific struct file. Returns 0 in case of success, an error |
||
1567 | otherwise. |
||
1568 | \item [close]Closes the file passed as parameter. returns 0 in case of success, |
||
1569 | an error in case of failure. |
||
1570 | \end{description} |
||
1571 | |||
1572 | Most of the function described in this section works on buffers |
||
1573 | passed by the user. the general routines checks that these buffer |
||
1574 | are corrects. In case the operating system implement some memory |
||
1575 | protection mechanisms, the functions copyfromuser() and |
||
1576 | copytouser() should be used (their syntax is similar to memcpy()). |
||
1577 | Note that Shark currently do not provide memory protection, so |
||
1578 | memcpy() can be used. |
||
1579 | |||
1580 | %---------------------------------------------------------------------------- |
||
1581 | \subsection{External interface} |
||
1582 | %---------------------------------------------------------------------------- |
||
1583 | |||
1584 | %---------------------------------------------------------------------------- |
||
1585 | \subsubsection{Non standard functions} |
||
1586 | \label{sec:fs:mount} |
||
1587 | %---------------------------------------------------------------------------- |
||
1588 | |||
1589 | \begin{table} |
||
1590 | \begin{verbatim} |
||
1591 | int mount (dev_t device, u_int8_t fsind, char *where, struct mount_opts |
||
1592 | *options); |
||
1593 | int umount (dev_t device); |
||
1594 | dev_t fdevice(char *device_name); |
||
1595 | \end{verbatim} |
||
1596 | \caption{mount/umount functions} |
||
1597 | \label{tab:mount} |
||
1598 | \end{table} |
||
1599 | |||
1600 | The mount/umount functions are the non-standard functions provided |
||
1601 | by the filesystem (their prototype can be found in |
||
1602 | include/sys/mount.h). Here is a small description of these |
||
1603 | functions (see Table \ref{tab:mount}): |
||
1604 | |||
1605 | \begin{description} |
||
1606 | \item [fdevice]This function returns the device serial number of a |
||
1607 | given filename. See Section \ref{sec:dd:interfaccia:nomi}. |
||
1608 | \item [mount]Is used to mount a partition. These are the informations that must |
||
1609 | be passed to it: |
||
1610 | |||
1611 | \begin{itemize} |
||
1612 | \item The device serial number that must be mounted |
||
1613 | \item The filesystem identifier (fsind) that should be used (see |
||
1614 | include/fs/fsind.h). |
||
1615 | \item The directory where the device will be mounted. |
||
1616 | \item Other parameters (see Section \ref{sec:mountopt}). |
||
1617 | \end{itemize} |
||
1618 | |||
1619 | The function returns 0 in case of success, != 0 in case of failure |
||
1620 | (errno is modified). |
||
1621 | |||
1622 | \item [umount]This function unmount a device that was previously |
||
1623 | mounted. This function cannot unmount the ``root'' device. |
||
1624 | During shutdown, all the filesystem will be unmounted in the |
||
1625 | proper order. |
||
1626 | \end{description} |
||
1627 | |||
1628 | %---------------------------------------------------------------------------- |
||
1629 | \subsection{Initialization: an example} |
||
1630 | %---------------------------------------------------------------------------- |
||
1631 | |||
1632 | This is an example of the filesystem initialization in S.Ha.R.K. |
||
1633 | To initialize the system, we must register the kernel modules |
||
1634 | first: |
||
1635 | |||
1636 | \begin{verbatim} |
||
1637 | TIME __kernel_register_levels__(void *arg)} { |
||
1638 | struct multiboot_info *mb=(struct multiboot_info*)arg; |
||
1639 | |||
1640 | EDF_register_level(EDF_ENABLE_ALL); |
||
1641 | RR_register_level(RRTICK, RR_MAIN_YES, mb); |
||
1642 | CBS_register_level(CBS_ENABLE_ALL, 0); |
||
1643 | dummy_register_level(); |
||
1644 | |||
1645 | SEM_register_module(); |
||
1646 | CABS_register_module(); |
||
1647 | NOPM_register_module(); |
||
1648 | return TICK; |
||
1649 | } |
||
1650 | \end{verbatim} |
||
1651 | |||
1652 | The filesystem always require the registration of a module that |
||
1653 | can accept soft aperiodic tasks (such as the CBS, for example). |
||
1654 | Moreover, a mutex protocol must be present (NOPM, in this case). |
||
1655 | |||
1656 | \begin{verbatim} |
||
1657 | TASK __init__(void *arg) { |
||
1658 | extern int __bdev_sub_init(void); |
||
1659 | extern int __fs_sub_init(void); |
||
1660 | struct multiboot_info *mb = (struct multiboot_info*)arg; |
||
1661 | |||
1662 | /* block devices initialization */ |
||
1663 | __bdev_sub_init(); |
||
1664 | |||
1665 | /* filesystem initialization */ |
||
1666 | __fs_sub_init(); |
||
1667 | __call_main__(mb); |
||
1668 | return (void *)0; |
||
1669 | } |
||
1670 | \end{verbatim} |
||
1671 | |||
1672 | This is a typical S.Ha.R.K. \_\_init\_\_ function. In this |
||
1673 | function, block devices and file systems should be initialized. |
||
1674 | |||
1675 | \begin{verbatim} |
||
1676 | int __bdev_subinit(void) { |
||
1677 | extern __dev_t root_device; |
||
1678 | BDEV_PARMS bdev=BASE_BDEV; |
||
1679 | |||
1680 | /* low level initialization */ |
||
1681 | bdev_def_showinfo(bdev,TRUE); |
||
1682 | bdev_init(&bdev); |
||
1683 | |||
1684 | /* root device specification */ |
||
1685 | root_device = bdev_scan_devices(choose_root_callback); |
||
1686 | if (root_device < 0) { |
||
1687 | |||
1688 | /* in case of error... */ |
||
1689 | exit(1); |
||
1690 | } |
||
1691 | return 0; |
||
1692 | } |
||
1693 | \end{verbatim} |
||
1694 | |||
1695 | |||
1696 | This function initializes the block device layer. This function |
||
1697 | sets the parameters of the bdev variable, calls the low level |
||
1698 | initialization functions and searches for the \emph{root device}; |
||
1699 | basically, the function must specify in some way which is the root |
||
1700 | device. In the function, the low level layer lists all the |
||
1701 | available devices, calling the function choose\_root\_callback() |
||
1702 | for each of them. That function will choose the root device. |
||
1703 | |||
1704 | \begin{verbatim} |
||
1705 | int choose_root_callback(dev_t dev,u_int8_t fs) { |
||
1706 | if (fs==FS_MSDOS) |
||
1707 | return dev; |
||
1708 | return -1; |
||
1709 | } |
||
1710 | \end{verbatim} |
||
1711 | |||
1712 | In this case, the choice of the root device is quite simple: the |
||
1713 | first FAT16 DOS partition becomes the root device. Usually, |
||
1714 | DOS calls that partition ``C:''. |
||
1715 | |||
1716 | \begin{verbatim} |
||
1717 | int __fs_sub_init(void) { |
||
1718 | extern __dev_t root_device; |
||
1719 | FILESYSTEM_PARMS fs = BASE_FILESYSTEM; |
||
1720 | struct mount_opts opts; |
||
1721 | |||
1722 | /* mount parameters */ |
||
1723 | memset(&opts, 0, sizeof(struct mount_opts)); |
||
1724 | opts.flags = MOUNT_FLAG_RW; |
||
1725 | |||
1726 | /* filesystem initialization */ |
||
1727 | filesystem_def_rootdevice(fs,root_device); |
||
1728 | filesystem_def_fs(fs,FS_MSDOS); |
||
1729 | filesystem_def_showinfo(fs,TRUE); |
||
1730 | filesystem_init(&fs); |
||
1731 | |||
1732 | /* C library initialization */ |
||
1733 | libc_initialize(); |
||
1734 | |||
1735 | return 0; |
||
1736 | } |
||
1737 | \end{verbatim} |
||
1738 | |||
1739 | This functions initializes the filesystem giving R/W permission on |
||
1740 | the root partition previously found. Then, the C library is |
||
1741 | initialized. |
||
1742 | |||
1743 | %---------------------------------------------------------------------------- |
||
1744 | \section{Internal structure} |
||
1745 | \label{sec:fs:intstruct} |
||
1746 | %---------------------------------------------------------------------------- |
||
1747 | |||
1748 | Many data structures are composed by two parts: a generic part, |
||
1749 | that is independent from the particular filesystem, and a |
||
1750 | filesystem specific part, that are usually stored into an union. |
||
1751 | |||
1752 | Almost all the data structures have a set of functions that |
||
1753 | modifies them. Pointers to these functions are included in a |
||
1754 | structure that has the name of the structure that is modified by |
||
1755 | the functions with a suffix ``operations''. For example, the |
||
1756 | structure super\_block (see Section \ref{sec:superblock}) has a |
||
1757 | structure super\_operations (see Section \ref{sec:superop}) that |
||
1758 | contains a function create\_inode. |
||
1759 | |||
1760 | Since the high layer functions influences the global system |
||
1761 | throughput, some informations are duplicated in the data |
||
1762 | structures to speed up the execution. |
||
1763 | |||
1764 | The system is basically structured with a Unix-like approach, with |
||
1765 | 4 fundamental structures: |
||
1766 | |||
1767 | \begin{description} |
||
1768 | \item [super]contains informations on each device that is mounted (a table is |
||
1769 | used); |
||
1770 | \item [inode]contains informations on the files in a filesystem (double lists |
||
1771 | with hash keys are used); |
||
1772 | \item [dentry]contains the file names (a tree is used); |
||
1773 | \item [file]contains informations on each open file (a descriptor table is |
||
1774 | used). |
||
1775 | \end{description} |
||
1776 | |||
1777 | %---------------------------------------------------------------------------- |
||
1778 | \subsection{The data structures} |
||
1779 | %---------------------------------------------------------------------------- |
||
1780 | |||
1781 | %---------------------------------------------------------------------------- |
||
1782 | \subsubsection{struct super\_block} |
||
1783 | \label{sec:superblock} |
||
1784 | %---------------------------------------------------------------------------- |
||
1785 | |||
1786 | From a user point of view, the filesystem exports a unix-like view |
||
1787 | of the system. That is, partitions can be mounted into the |
||
1788 | directory tree. When the system starts, the root directory is |
||
1789 | mounted as ``/''. |
||
1790 | |||
1791 | \begin{table} |
||
1792 | \begin{verbatim} |
||
1793 | struct super_block { |
||
1794 | struct super_block *sb_parent; |
||
1795 | struct super_block *sb_childs; |
||
1796 | struct super_block *sb_next; |
||
1797 | __dev_t sb_dev; |
||
1798 | struct inode *sb_root; |
||
1799 | struct dentry *sb_droot; |
||
1800 | struct file_system_type *sb_fs; |
||
1801 | struct super_operations *sb_op; |
||
1802 | struct dentry_operations *sb_dop; |
||
1803 | struct mount_opts sb_mopts; |
||
1804 | int sb_busycount; |
||
1805 | __uint32_t sb_used:1; |
||
1806 | __uint32_t sb_blocked:1; |
||
1807 | union { |
||
1808 | struct msdos_super_block msdos_sb; |
||
1809 | } u; |
||
1810 | }; |
||
1811 | \end{verbatim} |
||
1812 | \caption{struct super\_block} |
||
1813 | \label{src:structsuper} |
||
1814 | \end{table} |
||
1815 | |||
1816 | The struct super\_block contains the fundamental information of |
||
1817 | the filesystem (a unix-like filesystem usually duplicates this |
||
1818 | informations a few times into the Hard disk). Every time a |
||
1819 | partition is mounted a new structure is allocated and initialized. |
||
1820 | These structures are maintained in a tree, that is, if a partition |
||
1821 | is mounted into a directory the new structure becomes a leaf of |
||
1822 | the structure that contains the mount directory. |
||
1823 | |||
1824 | \begin{figure} |
||
1825 | \begin{center} |
||
1826 | \includegraphics[width=7cm]{images/supertree.eps} |
||
1827 | \end{center} |
||
1828 | \caption{Super block tree} |
||
1829 | \label{fig:supertree} |
||
1830 | \end{figure} |
||
1831 | |||
1832 | The root super block is stored into the global pointer |
||
1833 | root\_superblock; Figure \ref{fig:supertree} shows a possible |
||
1834 | super block tree. |
||
1835 | |||
1836 | Here a short description of the struct super\_block fields (see |
||
1837 | Table \ref{src:structsuper}): |
||
1838 | |||
1839 | \begin{description} |
||
1840 | \item [sb\_parent,~sb\_childs,~sb\_next]are used to store the tree structure: |
||
1841 | sb\_parent points to the father of the structure, whereas sb\_childs points to |
||
1842 | the first structure of the childs; sb\_next is used to create the brothers list |
||
1843 | (that is the children list of the father). |
||
1844 | \item [sb\_dev]Contains the device, that is the partition that contains the |
||
1845 | filesystem (the low level part is responsible of the devices, that are typically |
||
1846 | associated to an hard disk partition. It is possible for example to extend the |
||
1847 | low level to associate to a device other entities, such a file on another |
||
1848 | filesystem (this is usually possible on Unix system through loopback devices). |
||
1849 | \item [sb\_root]This is the root directory inode of the filesystem. |
||
1850 | \item [sb\_droot]This is the directory entry (the filename) of the root |
||
1851 | directory. |
||
1852 | \item [sb\_fs]This is the pointer to the filesystem present in the filesystem. |
||
1853 | Every supported file system is registered at init time (see Section |
||
1854 | \ref{sec:fs:int:init}). |
||
1855 | \item [sb\_op]Pointer to a structure that contains the specific filesystem |
||
1856 | functions used to use the struct super\_block. |
||
1857 | \item [sb\_dop]These are the specific function used for the directory entries. |
||
1858 | \item [sb\_mopts]Mount options used during initialization. |
||
1859 | \item [sb\_busycount]This is a counter that counts the number of entities that |
||
1860 | are using the filesystem (e.g., a partition can be unmounted if someone is |
||
1861 | reading it). |
||
1862 | \item [sb\_used]A flag that signals if the structure is used. |
||
1863 | \item [sb\_blocked]Some operations on this structure must be executed in mutual |
||
1864 | exclusion; if this flag is set there is some operation on the structure. This |
||
1865 | flag can not be used/tested/modified directly by the user functions. |
||
1866 | \end{description} |
||
1867 | |||
1868 | %---------------------------------------------------------------------------- |
||
1869 | \subsubsection{struct dentry} |
||
1870 | %---------------------------------------------------------------------------- |
||
1871 | |||
1872 | \begin{table} |
||
1873 | \begin{verbatim} |
||
1874 | struct dentry { |
||
1875 | struct dentry *d_next; |
||
1876 | struct dentry *d_prev; |
||
1877 | struct dentry *d_parent; |
||
1878 | struct dentry *d_child; |
||
1879 | __time_t d_acc; |
||
1880 | struct qstr d_name; |
||
1881 | short d_lock; |
||
1882 | struct dentry_operations *d_op; |
||
1883 | struct inode *d_inode; |
||
1884 | struct super_block *d_sb; |
||
1885 | }; |
||
1886 | \end{verbatim} |
||
1887 | \caption{struct dentry} |
||
1888 | \label{src:structdentry} |
||
1889 | \end{table} |
||
1890 | |||
1891 | \label{sec:dentry} The struct dentry contains the file names independently from |
||
1892 | the file type (regular file, directory, ...) \footnote{``dentry'' means |
||
1893 | \emph{D}irectory \emph{ENTRY}.}. Each file in the filesystem is always stored in |
||
1894 | two structures: |
||
1895 | |||
1896 | \begin{itemize} |
||
1897 | \item struct dentry contains the filename; |
||
1898 | \item struct inode contains the other informations on the file. |
||
1899 | \end{itemize} |
||
1900 | |||
1901 | This separation is made because in most file systems (not in the DOS FAT16) |
||
1902 | more than one name can be linked to the same file (inode). |
||
1903 | |||
1904 | \begin{figure} |
||
1905 | \begin{center} |
||
1906 | \includegraphics[width=10cm]{images/dentrytree.eps} |
||
1907 | \end{center} |
||
1908 | \caption{Directory entry tree.} |
||
1909 | \label{fig:dentrytree} |
||
1910 | \end{figure} |
||
1911 | |||
1912 | |||
1913 | These data structures are stored internally in a tree structure. Please note |
||
1914 | that the inodes are not structured as a tree; every dentry has a pointer to the |
||
1915 | correspondent inode. |
||
1916 | |||
1917 | Figure \ref{fig:dentrytree} shows a directory entry tree. |
||
1918 | Directories are showed with a continuous line, whereas files are |
||
1919 | showed with a dashed line (note that the internal structures for |
||
1920 | files and directories are the same). |
||
1921 | |||
1922 | The system do not load all the file names in the filesystem. Only |
||
1923 | the recent filenames are stored in memory in a partial tree. These |
||
1924 | informations are continuously updated removing and adding dentry |
||
1925 | struct. |
||
1926 | |||
1927 | \begin{table} |
||
1928 | \begin{verbatim} |
||
1929 | struct qstr { |
||
1930 | char name[MAXFILENAMELEN+1]; |
||
1931 | char *nameptr; |
||
1932 | }; |
||
1933 | \end{verbatim} |
||
1934 | \caption{struct qstr} |
||
1935 | \label{src:structqstr} |
||
1936 | \end{table} |
||
1937 | |||
1938 | Directory names are stored into a struct qstr (see Table |
||
1939 | \ref{src:structqstr}). A string is not used to avoid too much |
||
1940 | sting copies. The structure is composed by two fields: |
||
1941 | |||
1942 | \begin{description} |
||
1943 | \item [nameptr]If != NULL, it is the name to use. |
||
1944 | \item [name]If nameptr==NULL, it is the name to use. |
||
1945 | \end{description} |
||
1946 | |||
1947 | The QSTRNAME can be used on a structure QSTR to get a (char *). |
||
1948 | When filling the structure, one of the two fields can be used. If |
||
1949 | nameptr is used, the pointed string must be statically allocated. |
||
1950 | If name is used, nameptr must be set to NULL. |
||
1951 | |||
1952 | The fields of the struct dentry contains the following |
||
1953 | informations: |
||
1954 | |||
1955 | \begin{description} |
||
1956 | \item [d\_next,~d\_prev,~d\_parent~e~d\_child]can be used to store a dentry tree |
||
1957 | structure: d\_parent is a pointer to the father's dentry, d\_child is a pointer |
||
1958 | to a list of children dentry, d\_next and d\_prev are used to navigate into the |
||
1959 | children dentry list. |
||
1960 | \item [d\_acc]It is the time of the last access to the dentry structure. |
||
1961 | \item [d\_name]Is the name associated to the dentry. |
||
1962 | \item [d\_lock]is a blocking counter: it is incremented every time a routine |
||
1963 | needs to use the dentry, and it is decremented when it is no more needed. In |
||
1964 | that way, it is possible to know which are the structures currently in use. The |
||
1965 | filesystem routines should not directly modify this field. |
||
1966 | \item [d\_op]Pointer to the virtual operations used to handle this structure. |
||
1967 | \item [d\_inode]is the inode associated to the structure. |
||
1968 | \item [d\_sb]It is the super\_block that contains the directory. This a |
||
1969 | redundant information used to speed-up the code. |
||
1970 | \end{description} |
||
1971 | |||
1972 | %---------------------------------------------------------------------------- |
||
1973 | \subsubsection{struct inode} |
||
1974 | \label{sec:inode} |
||
1975 | %---------------------------------------------------------------------------- |
||
1976 | |||
1977 | \begin{figure} |
||
1978 | \begin{center} |
||
1979 | \includegraphics[width=10cm]{images/inodebuckets.eps} |
||
1980 | \end{center} |
||
1981 | \caption{Inode data structures} |
||
1982 | \label{fig:inodebuckets} |
||
1983 | \end{figure} |
||
1984 | |||
1985 | |||
1986 | \begin{table} |
||
1987 | \begin{verbatim} |
||
1988 | struct inode { |
||
1989 | struct stat i_st; |
||
1990 | struct inode *i_next; |
||
1991 | struct inode *i_prev; |
||
1992 | int i_hash; |
||
1993 | __rwlock_t i_lock; |
||
1994 | __uint16_t i_dlink; |
||
1995 | __uint32_t i_dirty:1; |
||
1996 | struct inode_operations *i_op; |
||
1997 | struct super_block *i_sb; |
||
1998 | union { |
||
1999 | struct msdos_inode_info msdos_i; |
||
2000 | } u; |
||
2001 | }; |
||
2002 | \end{verbatim} |
||
2003 | \caption{struct inode} |
||
2004 | \label{src:structinode} |
||
2005 | \end{table} |
||
2006 | |||
2007 | An inode is a file into a filesystem. It contains all the file |
||
2008 | informations except the name that is maintained into a struct |
||
2009 | dentry (see \ref{sec:dentry}; more details can be found into |
||
2010 | \cite{book:unix}). |
||
2011 | |||
2012 | All the inodes temporarily present in memory are stored into an |
||
2013 | hash bucket structure as showed in Figure \ref{fig:inodebuckets}: |
||
2014 | for each device a number is computed using an hash function, the |
||
2015 | inode number (file serial number) and the device number. All the |
||
2016 | inodes with the same hash entry are stored in a list. The first |
||
2017 | inode in the list is stored in a table at the index given by the |
||
2018 | hash key. |
||
2019 | |||
2020 | \begin{table} |
||
2021 | \begin{verbatim} |
||
2022 | struct stat { |
||
2023 | __dev_t st_dev; |
||
2024 | __ino_t st_ino; |
||
2025 | __mode_t st_mode; |
||
2026 | __nlink_t st_nlink; |
||
2027 | __uid_t st_uid; |
||
2028 | __gid_t st_gid; |
||
2029 | __off_t st_size; |
||
2030 | unsigned long st_blksize; |
||
2031 | __time_t st_atime; |
||
2032 | __time_t st_mtime; |
||
2033 | __time_t st_ctime; |
||
2034 | }; |
||
2035 | \end{verbatim} |
||
2036 | \caption{struct stat} |
||
2037 | \label{src:structstat} |
||
2038 | \end{table} |
||
2039 | |||
2040 | An inode contains the following fields: |
||
2041 | |||
2042 | \begin{description} |
||
2043 | \item [i\_st]Contains all the standard informations for a file (e.g., the |
||
2044 | length, the type); a task can inspect these informations using the stat() |
||
2045 | primitive. These informations are included into a stat structure that is shared |
||
2046 | between the user libraries and the internal implementation, in a way that a |
||
2047 | memcopy should be enough to pass these informations. |
||
2048 | \item [i\_next,~i\_prev~e~i\_hash]These fields are used to implement the double |
||
2049 | linked list and the hash key: i\_next and i\_prev implement a double list, |
||
2050 | whereas i\_hash is the hash key (redundant information). |
||
2051 | \item [i\_lock]It is a lock for the structure (see Section \ref{sec:rwlock}). |
||
2052 | \item [i\_dlink]Number of directory entries that points to the inode (see |
||
2053 | Section \ref{sec:dentry}). |
||
2054 | \item [i\_dirty]A flag that says if this inode has been modified. |
||
2055 | \item [i\_op]Virtual operations used for working with this inode. Usually they |
||
2056 | are filesystem specific. |
||
2057 | \item [i\_sb]Super block where the inode is stored (see Section |
||
2058 | \ref{sec:superblock}). |
||
2059 | \end{description} |
||
2060 | |||
2061 | The struct stat showed in Table \ref{src:structstat}, contained |
||
2062 | into the struct inode and described in the paragraph 5.6 of |
||
2063 | \cite{book:posix1003.1}, contains the following fields: |
||
2064 | |||
2065 | \begin{description} |
||
2066 | \item [st\_mode]It is the file mode (informations about who can |
||
2067 | read/modify/execute a file). |
||
2068 | \item [st\_ino]An unique number that identifies a file (``file serial number'' |
||
2069 | in the Posix terminology). |
||
2070 | \item [st\_dev]It is the device serial number; st\_dev and st\_ino are necessary |
||
2071 | and sufficient to uniquely identify a file in the system. |
||
2072 | \item [st\_nlink]Number of hard links to a file (Always 1 for FAT16 file |
||
2073 | systems). |
||
2074 | \item [st\_uid~e~st\_gid]File group and user identifiers. 0 is used for the |
||
2075 | system administrator. The FAT16 filesystem always set these fields to 0. |
||
2076 | \item [st\_size]File length (bytes). |
||
2077 | \item [st\_atime,~st\_mtime~e~st\_ctime]File times: st\_atime is the time of the |
||
2078 | last access, st\_mtime is the time of the last modification and st\_ctime is the |
||
2079 | time of the last state change (e.g., when the file has been created or modified |
||
2080 | using chmod). |
||
2081 | \end{description} |
||
2082 | |||
2083 | %---------------------------------------------------------------------------- |
||
2084 | \subsubsection{struct file} |
||
2085 | \label{sec:file} |
||
2086 | %---------------------------------------------------------------------------- |
||
2087 | |||
2088 | \begin{table} |
||
2089 | \begin{verbatim} |
||
2090 | struct file { |
||
2091 | struct file *f_next; |
||
2092 | struct dentry *f_dentry; |
||
2093 | struct file_operations *f_op; |
||
2094 | __loff_t f_pos; |
||
2095 | unsigned int f_count; |
||
2096 | __uint32_t f_flag_isdir:1; |
||
2097 | union { |
||
2098 | struct msdos_file_info msdos_f; |
||
2099 | } u; |
||
2100 | }; |
||
2101 | \end{verbatim} |
||
2102 | \caption{struct file} |
||
2103 | \label{src:structfile} |
||
2104 | \end{table} |
||
2105 | |||
2106 | Every open file has a struct file associated to it (see Table |
||
2107 | \ref{src:structfile}). |
||
2108 | |||
2109 | \begin{description} |
||
2110 | \item [f\_next]Pointer used internally to the filesystem modules. |
||
2111 | \item [f\_dentry]contains the filename (see Section \ref{sec:dentry}). From here |
||
2112 | we can get the corresponding inode number. |
||
2113 | \item [f\_op]Contains a pointer to the virtual operation that works on struct |
||
2114 | file, described in Section \ref{sec:fileop}. |
||
2115 | \item [f\_pos]Contains the current position into the file. read/write operations |
||
2116 | are done starting from this point; moreover, Posix specification allow a seek to |
||
2117 | a position beyond file termination: in that case read operation should fail, |
||
2118 | whereas write operation should fill the gap with zeroes. |
||
2119 | \item [f\_count]Contains the number of file descriptors that are using this |
||
2120 | structure for input/output operations on the file. |
||
2121 | \item [f\_flag\_isdir]Flag that says if this is a directory (redundant |
||
2122 | information). |
||
2123 | \item [msdos\_f]FAT16 specific informations. |
||
2124 | \end{description} |
||
2125 | |||
2126 | The applications use a file descriptor to index the file to use. A |
||
2127 | file descriptor is a number meaningful only for the processor that |
||
2128 | obtained it (e.g. through an open operation) that is assigned to a |
||
2129 | struct file through a descriptor table. Every process has a |
||
2130 | different table, contained into the struct file\_descriptors |
||
2131 | described in section \ref{sec:filedes}. The whole S.Ha.R.K. Kernel |
||
2132 | can be considered a monoprocess multithread application, and for |
||
2133 | that reason it has only one descriptor table. |
||
2134 | |||
2135 | %---------------------------------------------------------------------------- |
||
2136 | \subsubsection{struct file\_descriptors} |
||
2137 | \label{sec:filedes} |
||
2138 | %---------------------------------------------------------------------------- |
||
2139 | |||
2140 | \begin{table} |
||
2141 | \begin{verbatim} |
||
2142 | struct file_descriptors { |
||
2143 | struct file *fd_file[MAXOPENFILES]; |
||
2144 | __uint16_t fd_flags[MAXOPENFILES]; |
||
2145 | __uint32_t fd_fflags[MAXOPENFILES]; |
||
2146 | __mode_t fd_umask; |
||
2147 | char fd_cwd[MAXPATHNAMELEN+1]; |
||
2148 | struct dentry *fd_cwden; |
||
2149 | __mutex_t fd_mutex; |
||
2150 | }; |
||
2151 | \end{verbatim} |
||
2152 | \caption{struct file\_descriptors} |
||
2153 | \label{src:structfiledes} |
||
2154 | \end{table} |
||
2155 | |||
2156 | This struct is used to store all the general parameters assigned |
||
2157 | to a process (e.g., the current directory). Again, S.Ha.R.K. has |
||
2158 | only one structure because it can be considered like a monoprocess |
||
2159 | multithread system. A mutex is used to guarantee mutual exclusion |
||
2160 | when the struct is used by more than one thread. The structure |
||
2161 | contains the following fields: |
||
2162 | |||
2163 | \begin{description} |
||
2164 | \item [fd\_file]This is the descriptor table; every descriptor |
||
2165 | (with value between 0 and MAXOPENFILES) is associated an element |
||
2166 | that can have the following values: |
||
2167 | |||
2168 | \begin{itemize} |
||
2169 | \item NULL if it is a free descriptor; |
||
2170 | \item RESERVED if it is reserved for special uses (e.g., descriptors assigned to |
||
2171 | the keyboard or to the console) |
||
2172 | \item pointer to the file |
||
2173 | \end{itemize} |
||
2174 | |||
2175 | \item [fd\_flags]Flags associated to the file (e.g., FD\_CLOSEXEC), described in |
||
2176 | detail in \cite{book:posix1003.1}, when the fcntl() primitive is described. |
||
2177 | \item [fd\_fflags]Flag used when the file was opened (e.g., O\_APPEND, |
||
2178 | O\_RDONLY, etc...). |
||
2179 | \item [fd\_umask]Mask used during file creation;can be modified with the |
||
2180 | function umask(). |
||
2181 | \item [fd\_cwd]Current working directory used by all the primitives that |
||
2182 | receives a relative pathname (that does not start with ``/'' (redundant |
||
2183 | information). |
||
2184 | \item [fd\_cwden]The current working directory; it can be modified using the |
||
2185 | function chdir(). |
||
2186 | \item [fd\_mutex]Mutual exclusion semaphore used for concurrent multithread |
||
2187 | access. |
||
2188 | \end{description} |
||
2189 | |||
2190 | %---------------------------------------------------------------------------- |
||
2191 | \subsection{Disk Cache} |
||
2192 | \label{sec:diskcache} |
||
2193 | %---------------------------------------------------------------------------- |
||
2194 | |||
2195 | %---------------------------------------------------------------------------- |
||
2196 | \subsubsection{How to use the cache} |
||
2197 | %---------------------------------------------------------------------------- |
||
2198 | |||
2199 | The filesystem modules use the disk cache to get all the |
||
2200 | informations they need. |
||
2201 | |||
2202 | \begin{table} |
||
2203 | \begin{verbatim} |
||
2204 | dcache_t *dcache_lock (__dev_t dev, __blkcnt_t lsector); |
||
2205 | void dcache_unlock (dcache_t *); |
||
2206 | dcache_t *dcache_acquire (__dev_t dev, __blkcnt_t lsector); |
||
2207 | void dcache_release (dcache_t *); |
||
2208 | void dcache_dirty (dcache_t *d); |
||
2209 | void dcache_skipwt(dcache_t *d); |
||
2210 | \end{verbatim} |
||
2211 | \caption{Disk cache interface} |
||
2212 | \label{tab:cacheint} |
||
2213 | \end{table} |
||
2214 | |||
2215 | The disk cache interface is showed in Table \ref{tab:cacheint}: |
||
2216 | |||
2217 | \begin{description} |
||
2218 | \item [dcache\_lock]This function can be used to require a read only access to |
||
2219 | the cache. More than one thread can lock a block. the parameters are the device |
||
2220 | number and the sector number; it returns a pointer to the struct dcache\_t in |
||
2221 | case of success, or NULL in case of error. This function do not return until the |
||
2222 | access is granted or an error occurred. To avoid critical blocking times in case |
||
2223 | more than one block is needed, you must use a proper access protocol: the |
||
2224 | blocking must be done with increasing block numbers; moreover, if possible, only |
||
2225 | one block should be blocked at a time. |
||
2226 | \item [dcache\_unlock]This function can be used to unblock a cache entry |
||
2227 | previously blocked. |
||
2228 | \item [dcache\_acquire]This function behaves as dcache\_lock(), but it requires |
||
2229 | a read/write access. Only one task at a time can acquire the lock. This access |
||
2230 | do not imply that the cache entry is considered dirty (see the dcache\_dirty() |
||
2231 | function). |
||
2232 | \item [dcache\_release]This function is similar to dcache\_unlock(), and works |
||
2233 | with the locks acquired by dcache\_acquire(). |
||
2234 | \item [dcache\_dirty]Signal that the cache entry has been modified. This |
||
2235 | function must be called only if we own a read/write lock and if the cache has |
||
2236 | been modified. |
||
2237 | \item [dcache\_skipwt]This function signals to the system that the cache entry |
||
2238 | contains system informations that are often modified. That means that the |
||
2239 | write\_through flag should not be considered. See section \ref{sec:modicache} |
||
2240 | for a detailed description. |
||
2241 | \end{description} |
||
2242 | |||
2243 | %---------------------------------------------------------------------------- |
||
2244 | \subsubsection{struct dcache} |
||
2245 | \label{sec:dcache} |
||
2246 | %---------------------------------------------------------------------------- |
||
2247 | |||
2248 | \begin{figure} |
||
2249 | \begin{center} |
||
2250 | \includegraphics[width=10cm]{images/dcachebuckets.eps} |
||
2251 | \end{center} |
||
2252 | \caption{Data structure for disk cache} |
||
2253 | \label{fig:dcachebuckets} |
||
2254 | \end{figure} |
||
2255 | |||
2256 | \begin{table} |
||
2257 | \begin{verbatim} |
||
2258 | typedef struct TAGdcache { |
||
2259 | int prev; |
||
2260 | int next; |
||
2261 | int hash; |
||
2262 | __dev_t device; |
||
2263 | __blkcnt_t lsector; |
||
2264 | __uint8_t buffer[MAXSECTORSIZE]; |
||
2265 | int used; |
||
2266 | __time_t time; |
||
2267 | int numblocked; |
||
2268 | __fs_sem_t sync; |
||
2269 | __fs_sem_t esync; |
||
2270 | __rwlock_t rwlock; |
||
2271 | __uint32_t dirty:1; |
||
2272 | __uint32_t ready:1; |
||
2273 | __uint32_t esyncreq:1; |
||
2274 | __uint32_t skipwt:1; |
||
2275 | } dcache_t; |
||
2276 | \end{verbatim} |
||
2277 | \caption{struct dcache} |
||
2278 | \label{str:structdcache} |
||
2279 | \end{table} |
||
2280 | |||
2281 | This data structure, used by the module that implements the data |
||
2282 | cache, represents a cache entry. All the high level functions uses |
||
2283 | the cache module as an interface to the lower layer. |
||
2284 | |||
2285 | These structures are maintained in the system wit a structure |
||
2286 | similar to the inodes, that is a set of double linked lists that I |
||
2287 | can access using an hash key, as showed in Figure |
||
2288 | \ref{fig:dcachebuckets}; the hash key is computed using the device |
||
2289 | (that is the physical peripheral) and the lsector (the logical |
||
2290 | sector). |
||
2291 | |||
2292 | Every struct dcache is an elementary block that can be transferred |
||
2293 | to a device. It is not possible to have 2 data structures that |
||
2294 | refers to the same block. That is, the pair (device,lsector) must |
||
2295 | uniquely identify the data sector on the physical device (e.g., |
||
2296 | the block 64 in ``ide/hda'' could be the block 0 of |
||
2297 | ``ide/hda1'', as explained in section |
||
2298 | \ref{sec:dd:interfaccia:nomi}). |
||
2299 | |||
2300 | The field of the structure (see Table \ref{str:structdcache}) are: |
||
2301 | |||
2302 | \begin{description} |
||
2303 | \item [prev,~next~e~hash]Are used to store the structures into the lists that |
||
2304 | are accessed using the hash key: next and prev are used to implement a double |
||
2305 | linked list, whereas hash is the hash key. |
||
2306 | \item [device,~lsector~e~buffer]These fields contain the physical device, the |
||
2307 | logical sector and the data contained in the corresponding physical block. These |
||
2308 | are the unique fields that a high level routine should use (and that shall be |
||
2309 | provided by a data cache). |
||
2310 | \item [used]The number of tasks that are using the block; -1 means that no one |
||
2311 | is using the block, and that the block is free. |
||
2312 | \item [time]The last block access time. |
||
2313 | \item [numblocked]It is the number of tasks that are waiting the availability of |
||
2314 | the block, (that is, they are waiting the flag ready to be set). That is, if a |
||
2315 | task asks for a block already asked by another task, but not yet available (not |
||
2316 | made available by the lower layer), then the task blocks on a synchronization |
||
2317 | semaphore, and it increments that field. |
||
2318 | \item [sync]This is the synchronization semaphore used by the blocked tasks that |
||
2319 | waits the setting of the ready flag. |
||
2320 | \item [esync]It is a synchronization semaphore on the errors needed during the |
||
2321 | cache ``purging'' phase (see later). |
||
2322 | \item [rwlock]This is a locker used to require the R/W access to the block (see |
||
2323 | Section \ref{sec:rwlock}). |
||
2324 | \item [dirty]This flag is set if the block has been modified. |
||
2325 | \item [ready]This flag is set if the block is available. |
||
2326 | \item [esyncreq]This flag is set if the synchronization on errors is required |
||
2327 | (that is, the field esync is used). |
||
2328 | \item [skipwt]This flag says if the write through handling should be done. |
||
2329 | \end{description} |
||
2330 | |||
2331 | The fields esync and esyncreq are used during the ``purging |
||
2332 | phase'': a task requires a read or write operation on a block, the |
||
2333 | block is not into the data cache, and the cache is full. That is, |
||
2334 | another block must be discarded from the cache. To discard a block |
||
2335 | it can happen that a disk write operation must be done. While this |
||
2336 | write operation is in progress other tasks can require the same |
||
2337 | (discarded, in writing) block. These tasks can block waiting for |
||
2338 | synchronization on the block. In that case the cache behavior |
||
2339 | could be the following: the requests on the block are aborted (the |
||
2340 | failure is only needed if we are able to free the block; the |
||
2341 | blocked tasks are awakened with an error); or, the requests on the |
||
2342 | block are not aborted (the block can not be freed, and the tasks |
||
2343 | blocked on it are woken up). The problem is that after the task |
||
2344 | has wake up all the blocked tasks, it must wait that all the |
||
2345 | awakened tasks check for an error. To do that, the task set the |
||
2346 | esyncreq flag, and it blocks on the esync semaphore; the last task |
||
2347 | that checked for errors, checks the esyncreq flag and signal the |
||
2348 | esync semaphore waking up the waiting task. |
||
2349 | |||
2350 | The disk cache can work in three different modes: \label{sec:modicache} |
||
2351 | ``copy-back'', ``write-through'' and ``modified write-thought''. Usually most |
||
2352 | time-sharing OSes handle a copy-back cache. that is, write operations are done |
||
2353 | only in cache; a low priority task is used to synchronize cache and disk data. |
||
2354 | In such a system, if a cache request arrives and the cache is full, a block is |
||
2355 | freed eventually writing it on the disk. On a real-time system such a behavior |
||
2356 | can give some problems, because the time spent freeing the cache can be |
||
2357 | accounted to the running task. For example, a task that only reads can be |
||
2358 | delayed by another task that fills the cache memory writing blocks on the disk. |
||
2359 | To solve this problem there are 2 solutions: the cache is not used (that is, no |
||
2360 | caching on write), or the cache behavior is set to write-through, that is the |
||
2361 | write operations are immediately synchronized on the disk, in a way that who |
||
2362 | dirties the cache is also responsible for its synchronization. |
||
2363 | |||
2364 | If the cache is used in write-through mode, another problem |
||
2365 | arises: sometimes a few bytes are modified frequently (e.g. the |
||
2366 | filesystem information for the allocation of a file on the disk), |
||
2367 | that leads to continuous disk writes. For that reason, the disk |
||
2368 | cache provides the flag skipwt, that forces the copy-back mode for |
||
2369 | a particular block. By default, this flag is not set. When the |
||
2370 | write-through policy must be disabled, the file system should call |
||
2371 | the function dcache\_skipwt() on the cache-entry before releasing |
||
2372 | it. |
||
2373 | |||
2374 | %---------------------------------------------------------------------------- |
||
2375 | \subsubsection{struct \_\_rwlock\_t} |
||
2376 | \label{sec:rwlock} |
||
2377 | %---------------------------------------------------------------------------- |
||
2378 | |||
2379 | \begin{table} |
||
2380 | \begin{verbatim} |
||
2381 | typedef struct { |
||
2382 | __mutex_t mutex; |
||
2383 | int active_readers; |
||
2384 | int active_writers; |
||
2385 | int blocked_readers; |
||
2386 | int blocked_writers; |
||
2387 | __sem_t readers_semaph; |
||
2388 | __sem_t writers_semaph; |
||
2389 | } __rwlock_t; |
||
2390 | \end{verbatim} |
||
2391 | \caption{struct \_\_rwlock\_t} |
||
2392 | \label{src:structrwlock} |
||
2393 | \end{table} |
||
2394 | |||
2395 | \begin{table} |
||
2396 | \begin{verbatim} |
||
2397 | void __rwlock_init (__rwlock_t *ptr); |
||
2398 | void __rwlock_rdlock (__rwlock_t *ptr); |
||
2399 | void __rwlock_wrlock (__rwlock_t *ptr); |
||
2400 | int __rwlock_tryrdlock (__rwlock_t *ptr); |
||
2401 | int __rwlock_trywrlock (__rwlock_t *ptr); |
||
2402 | void __rwlock_rdunlock (__rwlock_t *ptr); |
||
2403 | void __rwlock_wrunlock (__rwlock_t *ptr); |
||
2404 | \end{verbatim} |
||
2405 | \caption{Lockers functions} |
||
2406 | \label{src:funcrwlock} |
||
2407 | \end{table} |
||
2408 | |||
2409 | The cache module and other modules need a locker protocol, that is |
||
2410 | a protocol that solves the reader/writer problem: basically, there |
||
2411 | is a shared resource where some processes must read or write. |
||
2412 | These rules must be imposed by the protocol: |
||
2413 | |||
2414 | \begin{enumerate} |
||
2415 | \item If someone is writing, no one is authorizes to read or write. |
||
2416 | \item If someone is reading, other tasks are authorized to read, but not to |
||
2417 | write. |
||
2418 | \end{enumerate} |
||
2419 | |||
2420 | In literature, there exist different methods to solve the problem |
||
2421 | with different behaviors. This filesystem implements the protocol |
||
2422 | proposed into paragraph 4.2.3.1 of |
||
2423 | \cite{book:programmazioneconcorrente}. To replace that policy, the |
||
2424 | rwlock.c file should be modified with another algorithm, |
||
2425 | maintaining the same interface. |
||
2426 | |||
2427 | Table \ref{src:structrwlock} contains the internal data structure |
||
2428 | of the implemented protocol. To implement a ``locker'', all the |
||
2429 | needed data must be inserted into a structure \_\_wrlock\_t. The |
||
2430 | following functions have also to be provided (see Table |
||
2431 | \ref{src:funcrwlock}): |
||
2432 | |||
2433 | \begin{description} |
||
2434 | \item [\_\_rwlock\_init]Initializes the structure passed as parameter. That |
||
2435 | structure is passed to all the functions and it is used to protect a shared |
||
2436 | resource. |
||
2437 | \item [\_\_rwlock\_rdlock]Read request (read lock); when this function returns, |
||
2438 | the calling task is authorized to read the shared resource. |
||
2439 | \item [\_\_rwlock\_wrlock]Write request (write lock). |
||
2440 | \item [\_\_rwlock\_tryrdlock]Conditional read request (try read lock); The |
||
2441 | function return 0 if the calling task can read, another number otherwise. At the |
||
2442 | end of the read operation (if the lock has been granted), \_\_rwlock\_rdunlock |
||
2443 | must be called. |
||
2444 | \item [\_\_rwlock\_trywrlock]Conditional write request (try write lock); The |
||
2445 | function return 0 if the calling task can write, another number otherwise. At |
||
2446 | the end of the write operation (if the lock has been granted), |
||
2447 | \_\_rwlock\_wrunlock must be called. |
||
2448 | \item [\_\_rwlock\_rdunlock]Read unlock; the task has finished the use of the |
||
2449 | shared resource. |
||
2450 | \item [\_\_rwlock\_wrunlock]Write unlock; the task has finished the use of the |
||
2451 | shared resource, that is again in a consistent state. |
||
2452 | \end{description} |
||
2453 | |||
2454 | The functions \_\_rwlock\_rdlock and \_\_rwlock\_rdunlock must always be used |
||
2455 | together. Nested calls are not allowed. The same rules apply to |
||
2456 | \_\_rwlock\_wrlock and \_\_rwlock\_wrunlock. |
||
2457 | |||
2458 | Please note that \_\_rwlock\_rdlock and \_\_rwlock\_wrlock can |
||
2459 | cause undefined task blocking. The other functions do not block |
||
2460 | the calling task. |
||
2461 | |||
2462 | %---------------------------------------------------------------------------- |
||
2463 | \section{Filesystem: DOS (FAT16)} |
||
2464 | %---------------------------------------------------------------------------- |
||
2465 | |||
2466 | %---------------------------------------------------------------------------- |
||
2467 | \subsection{Description} |
||
2468 | %---------------------------------------------------------------------------- |
||
2469 | |||
2470 | A technical description of the FAT16 filesystem can be found in |
||
2471 | \cite{manual:DOS330} or \cite{manual:dosfat}. |
||
2472 | |||
2473 | %---------------------------------------------------------------------------- |
||
2474 | \subsubsection{Internal structure} |
||
2475 | %---------------------------------------------------------------------------- |
||
2476 | |||
2477 | \begin{figure} |
||
2478 | \begin{center} |
||
2479 | \includegraphics[width=7cm]{images/fatarch.eps} |
||
2480 | \end{center} |
||
2481 | \caption{Internal structure of the DOS FAT16.} |
||
2482 | \label{src:fatarch} |
||
2483 | \end{figure} |
||
2484 | |||
2485 | As showed in Figure \ref{src:fatarch}, this filesystem is divided |
||
2486 | in areas: |
||
2487 | |||
2488 | \begin{description} |
||
2489 | \item [Boot~sector]Contains a small program that loads the DOS operating |
||
2490 | system, as specified into \cite{manual:DOS330}. It also contains other |
||
2491 | informations like the number of FATS, the kind of support and the dimension of |
||
2492 | the Root Directory; It can be thought as a super-block. The informations |
||
2493 | contained in this sector are showed in Table \ref{tab:fatboot}. |
||
2494 | \item [FAT~tables]This data area contains some copies of the File Allocation |
||
2495 | Table (FAT) These informations are used to know where a file can be found on the |
||
2496 | hard disk. |
||
2497 | \item [Root~directory]This block is structured as all the blocks that contains a |
||
2498 | directory, with the exceptions that it is allocated in a contiguous way and that |
||
2499 | it has a fixed size. |
||
2500 | \item [Data~area]This is the area where the files and directories are stored. |
||
2501 | The minimal allocation unit is a cluster, that is a set of contiguous sectors on |
||
2502 | the hard disk. The boot sector stores the number of sectors that compose a |
||
2503 | cluster. |
||
2504 | \end{description} |
||
2505 | |||
2506 | %---------------------------------------------------------------------------- |
||
2507 | \subsubsection{The boot sector} |
||
2508 | %---------------------------------------------------------------------------- |
||
2509 | |||
2510 | \begin{table} |
||
2511 | \begin{verbatim} |
||
2512 | struct bootrecord { |
||
2513 | __uint8_t reserved[3]; |
||
2514 | char oemname[8]; |
||
2515 | __uint16_t bytespersector; |
||
2516 | __uint8_t sectorspercluster; |
||
2517 | __uint16_t hiddensectors; |
||
2518 | __uint8_t fatsnumber; |
||
2519 | __uint16_t rootentry; |
||
2520 | __uint16_t sectors; |
||
2521 | __uint8_t media; |
||
2522 | __uint16_t sectorsperfat; |
||
2523 | __uint16_t sectorpertrak; |
||
2524 | __uint16_t headsperdisk; |
||
2525 | __uint32_t hiddensectors32; |
||
2526 | __uint32_t sectors32; |
||
2527 | __uint16_t physicaldrive; |
||
2528 | __uint8_t signature; |
||
2529 | __uint8_t serialnumber[4]; |
||
2530 | char volumelabel[11]; |
||
2531 | char fattype[8]; |
||
2532 | }; |
||
2533 | \end{verbatim} |
||
2534 | \caption{The DOS boot sector.} |
||
2535 | \label{tab:fatboot} |
||
2536 | \end{table} |
||
2537 | |||
2538 | Table \ref{tab:fatboot} shows the boot sector structure. The boot |
||
2539 | sector is always 512 bytes large. The struct is only the first |
||
2540 | part of the sector, the second part is the boot loader. Here is a |
||
2541 | small description of some fields: |
||
2542 | |||
2543 | \begin{description} |
||
2544 | \item [sectorpercluster]stores the number of sectors included into a cluster. |
||
2545 | \item [sectorsperfat]stores the number of sectors in the FAT |
||
2546 | \item [fatsnumber]stores the number of FATs on the disk. |
||
2547 | \end{description} |
||
2548 | |||
2549 | A note: when DOS formats a disk, creating a FAT16 filesystem, |
||
2550 | the informations that comes from a pre-existing filesystem have |
||
2551 | precedence over the informations given by the hardware. That is, |
||
2552 | if an hard disk declares to have 8 heads, and the pre-existing |
||
2553 | filesystem has headsperdisk=13, the new file system will have 13 |
||
2554 | heads. |
||
2555 | |||
2556 | %---------------------------------------------------------------------------- |
||
2557 | \subsubsection{The File Allocation Table} |
||
2558 | %---------------------------------------------------------------------------- |
||
2559 | |||
2560 | \begin{figure} |
||
2561 | \begin{center} |
||
2562 | \includegraphics[width=7cm]{images/fatfat.eps} |
||
2563 | \end{center} |
||
2564 | \caption{An example of File Allocation Table (FAT)} |
||
2565 | \label{fig:fatfat} |
||
2566 | \end{figure} |
||
2567 | |||
2568 | Every FAT is organized as a table of 16 bit elements. Every |
||
2569 | element is a cluster; the element value gives informations about |
||
2570 | the cluster: |
||
2571 | |||
2572 | \begin{itemize} |
||
2573 | \item 0 means the cluster is available. |
||
2574 | \item values between 0xfff0-0xfff7 are reserved; for example, 0xfff7 means bad |
||
2575 | (damaged) cluster. |
||
2576 | \item 0xfff8-0xffff says that the cluster is the last cluster in a file. |
||
2577 | \item Every other value index which is the next cluster of the file. |
||
2578 | \end{itemize} |
||
2579 | |||
2580 | The FAT is used to create a set of chains that make a file. For |
||
2581 | example, Figure \ref{fig:fatfat} shows the information for a file |
||
2582 | stored in 4 clusters (23, 27, 25, 31). The first cluster of the |
||
2583 | chain is contained into the directories. Please note that the |
||
2584 | first cluster of the data area is the number two, because cluster |
||
2585 | |||
2586 | or 16 bit), and the disk format. |
||
2587 | |||
2588 | Directories (except the root directory) are common files with a |
||
2589 | particular structure. The system usually give access to |
||
2590 | directories through other primitives (mkdir, creat, and so on). |
||
2591 | |||
2592 | %---------------------------------------------------------------------------- |
||
2593 | \subsubsection{The ``root directory''} |
||
2594 | %---------------------------------------------------------------------------- |
||
2595 | |||
2596 | This area has the same structure of a directory file, except that |
||
2597 | it has fixed dimension, it is not allocated into the data area and |
||
2598 | its allocation is contiguous. Common directories have the same |
||
2599 | structure except that they are allocated into the data area, and |
||
2600 | that they can be fragmented. |
||
2601 | |||
2602 | \begin{table} |
||
2603 | \begin{verbatim} |
||
2604 | struct directoryentry { |
||
2605 | __uint8_t name[8]; |
||
2606 | __uint8_t ext[3]; |
||
2607 | __uint8_t attribute; |
||
2608 | __uint8_t reserved[10]; |
||
2609 | __uint16_t time; |
||
2610 | __uint16_t date; |
||
2611 | __uint16_t cluster; |
||
2612 | __uint32_t size; |
||
2613 | }; |
||
2614 | \end{verbatim} |
||
2615 | \caption{DOS Directory Entry.} |
||
2616 | \label{tab:fatdentry} |
||
2617 | \end{table} |
||
2618 | |||
2619 | The directory structure is showed in Figure \ref{tab:fatdentry}: |
||
2620 | |||
2621 | \begin{description} |
||
2622 | \item [name~and~ext]Contains the filename in the classic format 8+3. Non used |
||
2623 | chars are filled with spaces. |
||
2624 | \item [attribute]File flags: they specify if it is regular, a directory, a |
||
2625 | volume label, it has been archived or if it is read-only. |
||
2626 | \item [time]Creation time (2 seconds resolution) (all the 3 times specified by |
||
2627 | the POSIX standard are mapped on that date). The time format is DOS specific. |
||
2628 | \item [date]Creation date (until 2099). The date format is DOS specific. |
||
2629 | \item [cluster]The first cluster of the file. A value 0 means empty file (the |
||
2630 | first available cluster is the cluster 2). |
||
2631 | \item [size]File size (32 bit). |
||
2632 | \end{description} |
||
2633 | |||
2634 | Note the difference with the typical Unix file systems, where file |
||
2635 | names and file informations are stored in different areas. |
||
2636 | |||
2637 | Directory entries are not contiguous. If the first character name |
||
2638 | is 0xe5, the file has been deleted, if it is 0x00, it has been |
||
2639 | never used. Any other character means that the entry is valid. |
||
2640 | 0x00 is used for efficiency: if it is found, directory scan is |
||
2641 | ended also if there are other sectors in the cluster that contains |
||
2642 | a directory. |
||
2643 | |||
2644 | %---------------------------------------------------------------------------- |
||
2645 | \subsubsection{The data area} |
||
2646 | %---------------------------------------------------------------------------- |
||
2647 | |||
2648 | The data area is divided in clusters, that are a set of sectors. |
||
2649 | It starts from cluster number 2, and it stores regular files and |
||
2650 | directories. |
||
2651 | |||
2652 | linear addresses are used to identify a sector on a hard disk, and |
||
2653 | its cluster. Hard disk sectors are numbered starting from sector |
||
2654 | 0. The filesystem specifies an algorithm that associates to every |
||
2655 | linear address an address C/H/S (Cylinder/Head/Sector). |
||
2656 | |||
2657 | Some DOS implementation supposes a particular structure |
||
2658 | depending on the media used: a 1.44 Mb floppy disk has sectors |
||
2659 | composed by 1 sector, and a root directory of 14 sectors. The best |
||
2660 | way to implement a filesystem is to rely only on the super block |
||
2661 | informations. |
||
2662 | |||
2663 | %---------------------------------------------------------------------------- |
||
2664 | \subsection{Implementation notes} |
||
2665 | %---------------------------------------------------------------------------- |
||
2666 | |||
2667 | The filesystem maintains 2 structures to store file informations: |
||
2668 | an inode structure and a dentry structure. In the DOS FAT16 |
||
2669 | filesystem unfortunately there is an unique correspondence between |
||
2670 | file names and inodes, and that information is contained into the |
||
2671 | filesystem directory entry. |
||
2672 | |||
2673 | \begin{figure} |
||
2674 | \begin{center} |
||
2675 | \includegraphics[width=7cm]{images/fsn.eps} |
||
2676 | \end{center} |
||
2677 | \caption{The FAT16 file serial number.} |
||
2678 | \label{fig:fsn} |
||
2679 | \end{figure} |
||
2680 | |||
2681 | When the system refers to a file, it use the inode number. That |
||
2682 | is, every file has a number that is unique inside the file system. |
||
2683 | |||
2684 | The FAT16 serial number has been specified as shown in Figure |
||
2685 | \ref{fig:fsn}: |
||
2686 | |||
2687 | \begin{itemize} |
||
2688 | \item The most significant part is the cluster number (the cluster where the |
||
2689 | directory is stored). |
||
2690 | \item The less significant part is the number into the directory entry. |
||
2691 | \end{itemize} |
||
2692 | |||
2693 | For example, the inode number 0x01230008, refers to the 8th |
||
2694 | directory entry into the cluster 0x123. |
||
2695 | |||
2696 | The cluster 1 is used to index the directory entries contained |
||
2697 | into the root directory, whereas the cluster 0 is used to index |
||
2698 | the root directory. |
||
2699 | |||
2700 | \bibliographystyle{plain} |
||
2701 | \bibliography{../common/retis} |
||
2702 | |||
2703 | \end{document} |