|
19 | 19 | "cell_type": "markdown", |
20 | 20 | "metadata": {}, |
21 | 21 | "source": [ |
22 | | - "You have learned about lists as well as how to write your own functions with loops and conditional statements. As such, you can already write programs performing a variety of tasks. \n", |
| 22 | + "You have learned about lists as well, and how to write your own functions with loops and conditional statements. This allows you to write programs performing a variety of tasks. \n", |
23 | 23 | "\n", |
24 | | - "However, something that you are currently missing is a mechanism to access the data that you want to analyze. A very common way to access these data is through (local or remotely-stored) [files](https://en.wikipedia.org/wiki/Computer_file)." |
| 24 | + "However, a convenient mechanism to access the data that you want to analyze is currently missing. In this notebook, we will explore the use of [files](https://en.wikipedia.org/wiki/Computer_file) since they are a common way to access stored data." |
25 | 25 | ] |
26 | 26 | }, |
27 | 27 | { |
|
30 | 30 | "source": [ |
31 | 31 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
32 | 32 | "\n", |
33 | | - "A **file** provides a mechanism for **permanently storing information** so that they can be retrieved when your program and/or your machine are restarted." |
| 33 | + "A **file** provides a mechanism for **permanently storing information**. Thus, the file content is not lost in the event of a [crash](https://en.wikipedia.org/wiki/Crash_(computing)) or [reboot](https://en.wikipedia.org/wiki/Reboot)." |
34 | 34 | ] |
35 | 35 | }, |
36 | 36 | { |
|
55 | 55 | "source": [ |
56 | 56 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
57 | 57 | "\n", |
58 | | - "A **binary file** is any other type of file that does not fit the previous definition of text file." |
| 58 | + "A **binary file** is any other type of file that does not fit the previous definition of a text file." |
59 | 59 | ] |
60 | 60 | }, |
61 | 61 | { |
62 | 62 | "cell_type": "markdown", |
63 | 63 | "metadata": {}, |
64 | 64 | "source": [ |
65 | | - "You can often recognize a text files by looking at the [file extension](https://en.wikipedia.org/wiki/Filename_extension). Extensions commonly in use for text files are: `.txt`, `.asc`, `.xyz`." |
| 65 | + "You can often recognize a text file by looking at the [file extension](https://en.wikipedia.org/wiki/Filename_extension). Extensions commonly in use for text files are: `.txt`, `.asc`, `.xyz`." |
66 | 66 | ] |
67 | 67 | }, |
68 | 68 | { |
|
71 | 71 | "source": [ |
72 | 72 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
73 | 73 | "\n", |
74 | | - "A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can understand the visualized content of an opened file, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
| 74 | + "A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can recognize the visualized content of an opened file as text, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
75 | 75 | ] |
76 | 76 | }, |
77 | 77 | { |
78 | 78 | "cell_type": "markdown", |
79 | 79 | "metadata": {}, |
80 | 80 | "source": [ |
81 | | - "We will first introduce some file managing capability of the `os` [Python module](https://docs.python.org/3.6/tutorial/modules.html#modules), then we will describe the use of the functions that Python provides for [reading and writing the content of a text file](https://docs.python.org/3.6/tutorial/inputoutput.html)." |
| 81 | + "We will first introduce some file managing capabilities of the `os.path` [Python module](https://docs.python.org/3.6/tutorial/modules.html#modules), then we will use the functions that Python provides for [reading and writing the content of a text file](https://docs.python.org/3.6/tutorial/inputoutput.html)." |
82 | 82 | ] |
83 | 83 | }, |
84 | 84 | { |
|
87 | 87 | "source": [ |
88 | 88 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
89 | 89 | "\n", |
90 | | - "In Python, a **module** is a file containing definitions and statements. The module name is given by the file name without the suffix `.py`." |
| 90 | + "In Python, a **module** is a file containing definitions and statements. " |
| 91 | + ] |
| 92 | + }, |
| 93 | + { |
| 94 | + "cell_type": "markdown", |
| 95 | + "metadata": {}, |
| 96 | + "source": [ |
| 97 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
| 98 | + "\n", |
| 99 | + "The module name is given by the file name without the [file extension](https://en.wikipedia.org/wiki/Filename_extension). For example, a file `example.py` may identify the module `example`." |
91 | 100 | ] |
92 | 101 | }, |
93 | 102 | { |
|
103 | 112 | "cell_type": "markdown", |
104 | 113 | "metadata": {}, |
105 | 114 | "source": [ |
106 | | - "## The `os` module" |
| 115 | + "## The `os.path` module" |
107 | 116 | ] |
108 | 117 | }, |
109 | 118 | { |
110 | 119 | "cell_type": "markdown", |
111 | 120 | "metadata": {}, |
112 | 121 | "source": [ |
113 | | - "The `os` module provides a **portable** way of using several functionalities [across different operating systems](https://en.wikipedia.org/wiki/Cross-platform_software) (i.e., the same code can run on [Linux Ubuntu](https://en.wikipedia.org/wiki/Ubuntu) and [Microsoft Windows 10](https://en.wikipedia.org/wiki/Windows_10))." |
| 122 | + "We will explore the `os.path` module to retrieve some data files that are stored on the server's hard disk." |
114 | 123 | ] |
115 | 124 | }, |
116 | 125 | { |
117 | 126 | "cell_type": "markdown", |
118 | 127 | "metadata": {}, |
119 | 128 | "source": [ |
120 | | - "In particular, we will explore the `os.path` sub-module to retrieve some data files that are stored on the server's hard disk.\n", |
| 129 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
121 | 130 | "\n", |
122 | | - "The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the folder where this notebook is located:\n", |
| 131 | + "Several functionalities in the `os.path` module are **portable**. This means that can be used [across different operating systems](https://en.wikipedia.org/wiki/Cross-platform_software). For example, you can use its functionalities in code that runs on [Linux Ubuntu](https://en.wikipedia.org/wiki/Ubuntu) and [Microsoft Windows 10](https://en.wikipedia.org/wiki/Windows_10)." |
| 132 | + ] |
| 133 | + }, |
| 134 | + { |
| 135 | + "cell_type": "markdown", |
| 136 | + "metadata": {}, |
| 137 | + "source": [ |
| 138 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
| 139 | + "\n", |
| 140 | + "For using the `os.path` module, the first required operation is to **import** it. Once imported, the module can be used." |
| 141 | + ] |
| 142 | + }, |
| 143 | + { |
| 144 | + "cell_type": "markdown", |
| 145 | + "metadata": {}, |
| 146 | + "source": [ |
| 147 | + "In the example below, we write a `get_current_folder()` function that returns the path of the folder where this notebook is located." |
| 148 | + ] |
| 149 | + }, |
| 150 | + { |
| 151 | + "cell_type": "markdown", |
| 152 | + "metadata": {}, |
| 153 | + "source": [ |
| 154 | + "To achieve this task, we will use two of the `os.path` functionalities and variables:\n", |
| 155 | + "\n", |
| 156 | + "- `curdir`: The string used by the [operating system](https://en.wikipedia.org/wiki/Operating_system) to refer to the current directory.\n", |
| 157 | + "- `abspath()`: A function that returns the [absolute path](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths)." |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "attachments": {}, |
| 162 | + "cell_type": "markdown", |
| 163 | + "metadata": {}, |
| 164 | + "source": [ |
| 165 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
123 | 166 | "\n", |
124 | | - "- `curdir`: The constant string used by the operating system to refer to the current directory. E.g., `.` for Windows and Linux.\n", |
125 | | - "- `abspath()`: A function that returns the full, absolute version of a path." |
| 167 | + "An [**absolute path**](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths) points to the same location in a file system, regardless of the current working directory. In contrast, a [**relative path**](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths) starts from a given working directory." |
126 | 168 | ] |
127 | 169 | }, |
128 | 170 | { |
|
131 | 173 | "metadata": {}, |
132 | 174 | "outputs": [], |
133 | 175 | "source": [ |
134 | | - "import os\n", |
| 176 | + "import os.path\n", |
135 | 177 | "\n", |
136 | 178 | "def get_current_folder():\n", |
137 | 179 | " cur_folder = os.path.abspath(os.path.curdir)\n", |
|
153 | 195 | "cell_type": "markdown", |
154 | 196 | "metadata": {}, |
155 | 197 | "source": [ |
156 | | - "As such, we extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
| 198 | + "To be able to access the `data` sub-folder, we extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
157 | 199 | "\n", |
158 | | - "- Create the full path to the `data` sub-folder.\n", |
| 200 | + "- Create the absolute path to the `data` sub-folder.\n", |
159 | 201 | "- Check whether the resulting path actually exists." |
160 | 202 | ] |
161 | 203 | }, |
|
189 | 231 | "source": [ |
190 | 232 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
191 | 233 | "\n", |
192 | | - "We did not import the `os` module since it was already imported in the previous cell. Re-importing a module does not break your code, but makes it more verbose. " |
| 234 | + "We did not import the `os.path` module since it was already imported in the previous code cell. Re-importing a module does not break your code, but makes it more verbose. " |
| 235 | + ] |
| 236 | + }, |
| 237 | + { |
| 238 | + "cell_type": "markdown", |
| 239 | + "metadata": {}, |
| 240 | + "source": [ |
| 241 | + "However, if you decide to [clear the results of this notebook](000_Welcome_on_Board.ipynb#How-to-Clear-the-Results-of-a-Notebook?), you will need to re-execute the code cell with the `import` statement." |
193 | 242 | ] |
194 | 243 | }, |
195 | 244 | { |
|
298 | 347 | "cell_type": "markdown", |
299 | 348 | "metadata": {}, |
300 | 349 | "source": [ |
301 | | - "As discussed above, a text file is a sequence of characters stored on a permanent medium (e.g., a flash memory)." |
| 350 | + "As discussed above, a text file is a sequence of characters stored on a permanent medium (e.g., a [USB flash drive](https://en.wikipedia.org/wiki/USB_flash_drive))." |
302 | 351 | ] |
303 | 352 | }, |
304 | 353 | { |
|
388 | 437 | "cell_type": "markdown", |
389 | 438 | "metadata": {}, |
390 | 439 | "source": [ |
391 | | - "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
| 440 | + "Why the characters are 100? There are 20 rows in the file. Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible [newline character](https://en.wikipedia.org/wiki/Newline) (i.e., `\\n`) that text editors interpret as a new line. Thus, `(4+1) * 20 = 100` characters." |
| 441 | + ] |
| 442 | + }, |
| 443 | + { |
| 444 | + "cell_type": "markdown", |
| 445 | + "metadata": {}, |
| 446 | + "source": [ |
| 447 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
392 | 448 | "\n", |
393 | | - "Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character (i.e., `\\n`) that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters." |
| 449 | + "The **newline character** is used to control the end of a line of text and the start of a new one." |
| 450 | + ] |
| 451 | + }, |
| 452 | + { |
| 453 | + "cell_type": "markdown", |
| 454 | + "metadata": {}, |
| 455 | + "source": [ |
| 456 | + "In the code above, the `sal_content` variable holds the content of the file as a single sequence of characters." |
394 | 457 | ] |
395 | 458 | }, |
396 | 459 | { |
397 | 460 | "cell_type": "markdown", |
398 | 461 | "metadata": {}, |
399 | 462 | "source": [ |
400 | | - "We will now write a function that not only reads the sequence of characters, but also splits them by line (using the `str` method named `splitlines()`) and converts the result in the corresponding `float` value." |
| 463 | + "We will now write a function that not only reads the sequence of characters, but also splits them in multiple lines based on the **newline character** (using the `str` method named `splitlines()`). Finally, we convert the result in the corresponding `float` value and append this value to `sal_list`." |
401 | 464 | ] |
402 | 465 | }, |
403 | 466 | { |
|
503 | 566 | "cell_type": "markdown", |
504 | 567 | "metadata": {}, |
505 | 568 | "source": [ |
506 | | - "If you want to write a text file, the first decision to take is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:" |
| 569 | + "If you want to write a text file, you need to decide where to store it. For this collection of notebooks, we will use the `output` sub-folder that can be retrieved running the following code:" |
507 | 570 | ] |
508 | 571 | }, |
509 | 572 | { |
|
513 | 576 | "outputs": [], |
514 | 577 | "source": [ |
515 | 578 | "def get_output_folder():\n", |
516 | | - " cur_folder = os.path.abspath(os.path.curdir)\n", |
517 | | - " output_folder = os.path.join(cur_folder, \"output\")\n", |
| 579 | + " cur_folder = os.path.abspath(os.path.curdir) # The absolute path to the current directory\n", |
| 580 | + " output_folder = os.path.join(cur_folder, \"output\") # The absolute path to the output folder (may or may not exist)\n", |
518 | 581 | " if os.path.exists(output_folder):\n", |
519 | 582 | " return output_folder\n", |
520 | 583 | " else: # in case that the output folder does not exists, we raise a meaningful error\n", |
|
545 | 608 | "cell_type": "markdown", |
546 | 609 | "metadata": {}, |
547 | 610 | "source": [ |
548 | | - "To write a file, you have to use the `open()` function and pass the `w` mode (`w` is for *write*) as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file." |
| 611 | + "In the code below, the `write_list_to_disk` function takes:\n", |
| 612 | + "\n", |
| 613 | + "* An `output_path` where the output file is to be written. \n", |
| 614 | + "* A `input_list` containing the data to be written in the output file." |
| 615 | + ] |
| 616 | + }, |
| 617 | + { |
| 618 | + "cell_type": "markdown", |
| 619 | + "metadata": {}, |
| 620 | + "source": [ |
| 621 | + "The below function uses the `open()` function and passes the `w` mode (`w` is for *write*) as second parameter. " |
| 622 | + ] |
| 623 | + }, |
| 624 | + { |
| 625 | + "cell_type": "markdown", |
| 626 | + "metadata": {}, |
| 627 | + "source": [ |
| 628 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
| 629 | + "\n", |
| 630 | + "You may learn about other modes for opening a file from the official [Python documentation](https://docs.python.org/3.6/library/functions.html?#open)." |
549 | 631 | ] |
550 | 632 | }, |
551 | 633 | { |
|
556 | 638 | "source": [ |
557 | 639 | "def write_list_to_disk(output_path, input_list):\n", |
558 | 640 | " \n", |
559 | | - " output_file = open(output_path, mode=\"w\")\n", |
| 641 | + " output_file = open(output_path, mode=\"w\") # mode=\"w\" to open the file in writing mode\n", |
560 | 642 | " \n", |
561 | 643 | " for value in input_list:\n", |
562 | | - " line_content = str(value) + \"\\n\" # the \"\\n\" is the 'escaped' character for the new line\n", |
| 644 | + " line_content = str(value) + \"\\n\" # the \"\\n\" is the newline character\n", |
563 | 645 | " output_file.write(line_content)\n", |
564 | 646 | " \n", |
565 | 647 | " output_file.close()\n", |
|
669 | 751 | "* [Computer file](https://en.wikipedia.org/wiki/Computer_file)\n", |
670 | 752 | " * [Text file](https://en.wikipedia.org/wiki/Text_file)\n", |
671 | 753 | " * [Binary file](https://en.wikipedia.org/wiki/Binary_file)\n", |
672 | | - " * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)" |
| 754 | + " * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)\n", |
| 755 | + "* [Absolute and relative paths](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths)" |
673 | 756 | ] |
674 | 757 | }, |
675 | 758 | { |
|
0 commit comments