|
19 | 19 | "cell_type": "markdown", |
20 | 20 | "metadata": {}, |
21 | 21 | "source": [ |
22 | | - "Now that you understand about `list` and `dict` as well as how to write your own functions with loops and conditional statements, you can already write simply programs that perform quite useful operations. \n", |
| 22 | + "You have learned about lists as well as how to write your own functions with loops and conditional statements. As such, you can already write programs performing a variety of tasks. \n", |
23 | 23 | "\n", |
24 | | - "However, as a research assistant, you will likely need to access data that are (locally or remotely) stored in files. A file provides a mechanism for **permanently store** information so that they can be retrieved when your program and/or your machine are restarted.\n", |
| 24 | + "However, something that is currently missing is a mechanism to access the data that you want to analyze. A very common way to access these data is through (local or remotely-stored) [files](https://en.wikipedia.org/wiki/Computer_file)." |
| 25 | + ] |
| 26 | + }, |
| 27 | + { |
| 28 | + "cell_type": "markdown", |
| 29 | + "metadata": {}, |
| 30 | + "source": [ |
| 31 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
25 | 32 | "\n", |
26 | | - "There are two main types of files:" |
| 33 | + "A **file** provides a mechanism for **permanently storing information** so that they can be retrieved when your program and/or your machine are restarted." |
| 34 | + ] |
| 35 | + }, |
| 36 | + { |
| 37 | + "cell_type": "markdown", |
| 38 | + "metadata": {}, |
| 39 | + "source": [ |
| 40 | + "There are two main types of files: text files and binary files." |
27 | 41 | ] |
28 | 42 | }, |
29 | 43 | { |
|
57 | 71 | "source": [ |
58 | 72 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
59 | 73 | "\n", |
60 | | - "A very simple test to evaluate whether a given file is or not a text file is to try opening it using a text editor. If you can understand the visualized content, then that is file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
| 74 | + "A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can understand the visualized content of an opened file, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
61 | 75 | ] |
62 | 76 | }, |
63 | 77 | { |
|
105 | 119 | "source": [ |
106 | 120 | "In particular, we will explore the `os.path` sub-module to retrieve some data files that are stored on the server's hard disk.\n", |
107 | 121 | "\n", |
108 | | - "The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the server's folder where this notebook is located:\n", |
| 122 | + "The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the folder where this notebook is located:\n", |
109 | 123 | "\n", |
110 | 124 | "- `curdir`: The constant string used by the operating system to refer to the current directory. E.g., `.` for Windows and Linux.\n", |
111 | 125 | "- `abspath()`: A function that returns the full, absolute version of a path." |
112 | 126 | ] |
113 | 127 | }, |
114 | 128 | { |
115 | 129 | "cell_type": "code", |
116 | | - "execution_count": null, |
117 | | - "metadata": {}, |
118 | | - "outputs": [], |
| 130 | + "execution_count": 1, |
| 131 | + "metadata": {}, |
| 132 | + "outputs": [ |
| 133 | + { |
| 134 | + "name": "stdout", |
| 135 | + "output_type": "stream", |
| 136 | + "text": [ |
| 137 | + "The current folder is: C:\\code\\hyo2\\epom\\python_basics\n" |
| 138 | + ] |
| 139 | + } |
| 140 | + ], |
119 | 141 | "source": [ |
120 | 142 | "import os\n", |
121 | 143 | "\n", |
|
130 | 152 | "cell_type": "markdown", |
131 | 153 | "metadata": {}, |
132 | 154 | "source": [ |
133 | | - "The data are inside a `data` sub-folder. We now extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
| 155 | + "As show in the figure below, the data are inside a `data` sub-folder: \n", |
| 156 | + "\n", |
| 157 | + "" |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "markdown", |
| 162 | + "metadata": {}, |
| 163 | + "source": [ |
| 164 | + "As such, we extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
134 | 165 | "\n", |
135 | 166 | "- Create the full path to the `data` sub-folder.\n", |
136 | 167 | "- Check whether the resulting path actually exists." |
|
140 | 171 | "cell_type": "markdown", |
141 | 172 | "metadata": {}, |
142 | 173 | "source": [ |
143 | | - "In case that the `data` sub-folder does not exist, we raise an error using the `raise` keyword." |
| 174 | + "In case that the `data` sub-folder does not exist, we raise an error using the [`raise`](https://docs.python.org/3.6/tutorial/errors.html#raising-exceptions) keyword." |
144 | 175 | ] |
145 | 176 | }, |
146 | 177 | { |
147 | 178 | "cell_type": "code", |
148 | | - "execution_count": null, |
149 | | - "metadata": {}, |
150 | | - "outputs": [], |
| 179 | + "execution_count": 2, |
| 180 | + "metadata": {}, |
| 181 | + "outputs": [ |
| 182 | + { |
| 183 | + "name": "stdout", |
| 184 | + "output_type": "stream", |
| 185 | + "text": [ |
| 186 | + "The data folder is: C:\\code\\hyo2\\epom\\python_basics\\data\n" |
| 187 | + ] |
| 188 | + } |
| 189 | + ], |
151 | 190 | "source": [ |
152 | 191 | "def get_data_folder():\n", |
153 | 192 | " cur_folder = os.path.abspath(os.path.curdir)\n", |
|
182 | 221 | "cell_type": "markdown", |
183 | 222 | "metadata": {}, |
184 | 223 | "source": [ |
185 | | - "We will now retrieve all the paths to the files in the `data` sub-folder. Specifically, we will create a function `get_data_paths()` that will returns a list containing all the files in that folder, using the `os.listdir()` function." |
| 224 | + "We will now retrieve all the paths to the files in the `data` sub-folder. Specifically, we will create a function `get_data_paths()` that returns a list containing all the files in that folder, using the `os.listdir()` function." |
186 | 225 | ] |
187 | 226 | }, |
188 | 227 | { |
189 | 228 | "cell_type": "code", |
190 | | - "execution_count": null, |
191 | | - "metadata": {}, |
192 | | - "outputs": [], |
| 229 | + "execution_count": 3, |
| 230 | + "metadata": {}, |
| 231 | + "outputs": [ |
| 232 | + { |
| 233 | + "name": "stdout", |
| 234 | + "output_type": "stream", |
| 235 | + "text": [ |
| 236 | + "The data paths are: ['C:\\\\code\\\\hyo2\\\\epom\\\\python_basics\\\\data\\\\ctd.txt', 'C:\\\\code\\\\hyo2\\\\epom\\\\python_basics\\\\data\\\\sal.txt', 'C:\\\\code\\\\hyo2\\\\epom\\\\python_basics\\\\data\\\\temp.txt']\n" |
| 237 | + ] |
| 238 | + } |
| 239 | + ], |
193 | 240 | "source": [ |
194 | 241 | "def get_data_paths():\n", |
195 | 242 | " data_paths = list() # create a empty list that will be populate and returned\n", |
|
215 | 262 | "source": [ |
216 | 263 | "In the above code, we wrote a function in which:\n", |
217 | 264 | "\n", |
218 | | - "- We created and populated a list: `data_paths`\n", |
219 | | - "- We reused a function that we previously created: `get_data_folder()`.\n", |
220 | | - "- We used several Python functions from the `os` module: e.g., `listdir()`, `join()`.\n", |
221 | | - "- We executed a `for` loop to populate the `data_paths` list.\n", |
222 | | - "- We returned the populated list." |
| 265 | + "- We create and populate a list: `data_paths`\n", |
| 266 | + "- We reuse a function that was previously created: `get_data_folder()`.\n", |
| 267 | + "- We use several Python functions from the `os` module: e.g., `listdir()`, `join()`.\n", |
| 268 | + "- We execute a `for` loop to populate the `data_paths` list.\n", |
| 269 | + "- We return the populated list." |
223 | 270 | ] |
224 | 271 | }, |
225 | 272 | { |
|
228 | 275 | "source": [ |
229 | 276 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
230 | 277 | "\n", |
231 | | - "You don't need to remember all the names of the available Python functions. But you need to learn how to search for them. The [official Python documentation](https://docs.python.org/3.6/index.html) is usually a good place to start with." |
| 278 | + "You don't need to remember all the names of the available Python functions. But you need to learn how to search for them. The [official Python documentation](https://docs.python.org/3.6/index.html) is a good place to start with." |
232 | 279 | ] |
233 | 280 | }, |
234 | 281 | { |
235 | 282 | "cell_type": "markdown", |
236 | 283 | "metadata": {}, |
237 | 284 | "source": [ |
238 | | - "From the [Lists of Variables notebook](002_Lists_of_Variables.ipynb), you should remember how to access a value in a list by its index. \n", |
| 285 | + "From the [Lists of Variables notebook](002_Lists_of_Variables.ipynb), you know how to access a value in a list by its index. \n", |
239 | 286 | "\n", |
240 | 287 | "Thus, to access the file named `sal.txt`, we can use `1` as index since it is the **second** element in the list." |
241 | 288 | ] |
242 | 289 | }, |
243 | 290 | { |
244 | 291 | "cell_type": "code", |
245 | | - "execution_count": null, |
246 | | - "metadata": {}, |
247 | | - "outputs": [], |
| 292 | + "execution_count": 4, |
| 293 | + "metadata": {}, |
| 294 | + "outputs": [ |
| 295 | + { |
| 296 | + "name": "stdout", |
| 297 | + "output_type": "stream", |
| 298 | + "text": [ |
| 299 | + "The file path with index 1 is: C:\\code\\hyo2\\epom\\python_basics\\data\\sal.txt\n" |
| 300 | + ] |
| 301 | + } |
| 302 | + ], |
248 | 303 | "source": [ |
249 | 304 | "sal_path = retrieved_paths[1]\n", |
250 | 305 | "print(\"The file path with index 1 is: \" + sal_path)" |
|
254 | 309 | "cell_type": "markdown", |
255 | 310 | "metadata": {}, |
256 | 311 | "source": [ |
257 | | - "In the next section, you will learn how to open and read the content of these text files." |
| 312 | + "In the next section, you will learn how to open and read the content of `sal_path`." |
258 | 313 | ] |
259 | 314 | }, |
260 | 315 | { |
|
282 | 337 | "cell_type": "markdown", |
283 | 338 | "metadata": {}, |
284 | 339 | "source": [ |
285 | | - "The Python `open()` function takes the name of the file (as a parameter) and returns a file object. \n", |
| 340 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
286 | 341 | "\n", |
287 | | - "This object can be used to read the sequence of characters in a few ways:" |
| 342 | + "The Python `open()` function takes the name of the file (as a parameter) and returns a [file object](https://docs.python.org/3.6/glossary.html#term-file-object). " |
| 343 | + ] |
| 344 | + }, |
| 345 | + { |
| 346 | + "cell_type": "markdown", |
| 347 | + "metadata": {}, |
| 348 | + "source": [ |
| 349 | + "This file object can be used to read the sequence of characters in a few ways:" |
288 | 350 | ] |
289 | 351 | }, |
290 | 352 | { |
|
317 | 379 | }, |
318 | 380 | { |
319 | 381 | "cell_type": "code", |
320 | | - "execution_count": null, |
321 | | - "metadata": {}, |
322 | | - "outputs": [], |
| 382 | + "execution_count": 5, |
| 383 | + "metadata": {}, |
| 384 | + "outputs": [ |
| 385 | + { |
| 386 | + "name": "stdout", |
| 387 | + "output_type": "stream", |
| 388 | + "text": [ |
| 389 | + "31.4\n", |
| 390 | + "31.6\n", |
| 391 | + "30.5\n", |
| 392 | + "30.8\n", |
| 393 | + "30.4\n", |
| 394 | + "31.4\n", |
| 395 | + "31.6\n", |
| 396 | + "30.5\n", |
| 397 | + "30.3\n", |
| 398 | + "30.2\n", |
| 399 | + "31.4\n", |
| 400 | + "31.6\n", |
| 401 | + "32.5\n", |
| 402 | + "30.8\n", |
| 403 | + "31.4\n", |
| 404 | + "31.7\n", |
| 405 | + "31.6\n", |
| 406 | + "31.5\n", |
| 407 | + "30.2\n", |
| 408 | + "30.4\n", |
| 409 | + "\n" |
| 410 | + ] |
| 411 | + } |
| 412 | + ], |
323 | 413 | "source": [ |
324 | 414 | "sal_file = open(sal_path)\n", |
325 | 415 | "\n", |
|
333 | 423 | "cell_type": "markdown", |
334 | 424 | "metadata": {}, |
335 | 425 | "source": [ |
336 | | - "The execution of the above code will print the 20 salinity values in the text file. Although they look like numbers, they are actually just a single `str` of 100 characters!" |
| 426 | + "The execution of the above code will print the 20 salinity values in the text file. Although they look like numbers, they are **actually** a single `str` of 100 characters!" |
337 | 427 | ] |
338 | 428 | }, |
339 | 429 | { |
|
360 | 450 | "source": [ |
361 | 451 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
362 | 452 | "\n", |
363 | | - "Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters." |
| 453 | + "Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character (i.e., `\\n`) that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters." |
364 | 454 | ] |
365 | 455 | }, |
366 | 456 | { |
367 | 457 | "cell_type": "markdown", |
368 | 458 | "metadata": {}, |
369 | 459 | "source": [ |
370 | | - "We will now write a function that reads the sequence of characters, but also split them by line (using the `str` method named `splitlines()`) and convert the result in the corresponding `float` value." |
| 460 | + "We will now write a function that not only reads the sequence of characters, but also splits them by line (using the `str` method named `splitlines()`) and converts the result in the corresponding `float` value." |
371 | 461 | ] |
372 | 462 | }, |
373 | 463 | { |
|
400 | 490 | "source": [ |
401 | 491 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
402 | 492 | "\n", |
403 | | - "There are more efficient ways to read a text file. We adopted an approach that is simple to understand for a first learner." |
| 493 | + "There are more efficient ways to read a text file. We adopted an approach that is simple to understand for a first-time learner." |
404 | 494 | ] |
405 | 495 | }, |
406 | 496 | { |
|
473 | 563 | "cell_type": "markdown", |
474 | 564 | "metadata": {}, |
475 | 565 | "source": [ |
476 | | - "The first required decision is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:" |
| 566 | + "If you want to write a text file, the first decision to take is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:" |
477 | 567 | ] |
478 | 568 | }, |
479 | 569 | { |
480 | 570 | "cell_type": "code", |
481 | | - "execution_count": null, |
482 | | - "metadata": {}, |
483 | | - "outputs": [], |
| 571 | + "execution_count": 6, |
| 572 | + "metadata": {}, |
| 573 | + "outputs": [ |
| 574 | + { |
| 575 | + "name": "stdout", |
| 576 | + "output_type": "stream", |
| 577 | + "text": [ |
| 578 | + "The output folder is: C:\\code\\hyo2\\epom\\python_basics\\output\n" |
| 579 | + ] |
| 580 | + } |
| 581 | + ], |
484 | 582 | "source": [ |
485 | 583 | "def get_output_folder():\n", |
486 | 584 | " cur_folder = os.path.abspath(os.path.curdir)\n", |
|
498 | 596 | "cell_type": "markdown", |
499 | 597 | "metadata": {}, |
500 | 598 | "source": [ |
501 | | - "We then use `join()` function to store the output file: e.g., `depths.txt`." |
| 599 | + "We then use the `join()` function to set the output file: e.g., `depths.txt`." |
502 | 600 | ] |
503 | 601 | }, |
504 | 602 | { |
|
515 | 613 | "cell_type": "markdown", |
516 | 614 | "metadata": {}, |
517 | 615 | "source": [ |
518 | | - "To write a file, you have to use the `open()` passing the mode `w` as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file." |
| 616 | + "To write a file, you have to use the `open()` function and pass the `w` mode (`w` is for *write*) as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file." |
519 | 617 | ] |
520 | 618 | }, |
521 | 619 | { |
522 | 620 | "cell_type": "code", |
523 | | - "execution_count": null, |
| 621 | + "execution_count": 7, |
524 | 622 | "metadata": {}, |
525 | 623 | "outputs": [], |
526 | 624 | "source": [ |
|
636 | 734 | " * [The os module](https://docs.python.org/3.6/library/os.html)\n", |
637 | 735 | " * [Input and Output](https://docs.python.org/3.6/tutorial/inputoutput.html)\n", |
638 | 736 | "* [Cross-platform software](https://en.wikipedia.org/wiki/Cross-platform_software)\n", |
639 | | - "* [Text file](https://en.wikipedia.org/wiki/Text_file)\n", |
640 | | - "* [Binary file](https://en.wikipedia.org/wiki/Binary_file)\n", |
641 | | - "* [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)" |
| 737 | + "* [Computer file](https://en.wikipedia.org/wiki/Computer_file)\n", |
| 738 | + " * [Text file](https://en.wikipedia.org/wiki/Text_file)\n", |
| 739 | + " * [Binary file](https://en.wikipedia.org/wiki/Binary_file)\n", |
| 740 | + " * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)" |
642 | 741 | ] |
643 | 742 | }, |
644 | 743 | { |
|
655 | 754 | "metadata": {}, |
656 | 755 | "source": [ |
657 | 756 | "<!--NAVIGATION-->\n", |
658 | | - "[< Dictionaries and Metadata](006_Dictionaries_and_Metadata.ipynb) | [Contents](index.ipynb) | [A Class as a Data Container>](008_A_Class_as_a_Data_Container.ipynb)" |
| 757 | + "[< Write Your Own Functions](005_Write_Your_Own_Functions.ipynb) | [Contents](index.ipynb) | [Dictionaries and Metadata >](007_Dictionaries_and_Metadata.ipynb)" |
659 | 758 | ] |
660 | 759 | } |
661 | 760 | ], |
|
0 commit comments