Skip to content

Latest commit

 

History

History

Text-Based-Browser

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Text Based Browser

browser

About this project

Sometimes you need to read online documentation or find something on the Internet from the command line or terminal. So, let's use Python to create a text-based browser! Of course, making a real, full-blown browser is a very difficult task. In this project, you'll create a very simple browser that will ignore JavaScript and CSS, won't have cookies, and will only process a limited set of tags.

Run

Requirements:

python browser.py

Project

1. Address line

Description

Every browser accepts a string from the user and then shows a web page. A string from the user is a URL (Uniform Resource Locator) and looks somewhat like this: https://www.google.com. After that, the browser has a lot of work. In a nutshell, this work can be described as finding a web page. The web page is located somewhere on the Internet and the browser has to retrieve it. Since the https://www. part is always the same, it is often omitted and the correct shortened link looks like this: google.com.

In our first stage, we'll try to imitate this behavior.

Objectives

  1. You should write a program that takes a string from the user (URL) and outputs a "hard-coded" website with news (just a header and some text below).
    The websites are presented as two variables in source code, you can see them in the template. These are mock bloomberg.com and nytimes.com sites. You just need to output them as a response to the corresponding input URL.
  2. Also, you should add the possibility to quit the browser by typing exit, because real browsers don’t finish their work when they output a single web page: they are ready to accept a new URL at any moment. You should realize this behavior, too. An endless loop can help you with that part.

Example

The greater-than symbol followed by space (> ) represents the user input. Notice that it's not the part of the input.

> bloomberg.com
The Space Race: From Apollo 11 to Elon Musk

It's 50 years since the world was gripped by historic images
of Apollo 11, and Neil Armstrong -- the first man to walk
on the moon. It was the height of the Cold War, and the charts
were filled with David Bowie's Space Oddity, and Creedence's
Bad Moon Rising. The world is a very different place than
it was 5 decades ago. But how has the space race changed since
the summer of '69? (Source: Bloomberg)


Twitter CEO Jack Dorsey Gives Talk at Apple Headquarters

Twitter and Square Chief Executive Officer Jack Dorsey
addressed Apple Inc. employees at the iPhone maker’s headquarters
Tuesday, a signal of the strong ties between the Silicon Valley giants.
> exit

2. Tabs

Description

Let's make our browser store web pages in a file and show them if the user types a shortened request (for example, wikipedia instead of wikipedia.org). You can store each page as a separate file or find another way to do this. But your program should accept one command line argument which is a directory to store the files, and your web pages should be saved inside this directory.

Objectives

At this stage, your program should:

  1. Check if the user has entered a valid URL. It must contain at least one dot, for example, bloomberg.com. If the URL is incorrect, the browser should output an error message (it should contain the word error) and wait for another URL.
  2. Accept a command-line argument which is a directory for saved tabs. For example, if the argument is dir, then you need to create a folder with the name dir and save all web pages that the user downloads in this folder.
  3. Save this web page in a file. After that, the user needs to have a simple way to see the saved web page by typing "bloomberg". The rule is simple: you just need to remove the last dot and everything that comes after it. bloomberg.com becomes bloomberg, en.wikipedia.org becomes en.wikipedia.

Check out a tutorial to learn how to work with files and create folders in Python.

Example

The greater-than symbol followed by space (> ) represents the user input. Notice that it's not the part of the input.

> python browser.py dir-for-files
> bloomberg.com
The Space Race: From Apollo 11 to Elon Musk

It's 50 years since the world was gripped by historic images
of Apollo 11, and Neil Armstrong -- the first man to walk
on the moon. It was the height of the Cold War, and the charts
were filled with David Bowie's Space Oddity, and Creedence's
Bad Moon Rising. The world is a very different place than
it was 5 decades ago. But how has the space race changed since
the summer of '69? (Source: Bloomberg)


Twitter CEO Jack Dorsey Gives Talk at Apple Headquarters

Twitter and Square Chief Executive Officer Jack Dorsey
addressed Apple Inc. employees at the iPhone maker’s headquarters
Tuesday, a signal of the strong ties between the Silicon Valley giants.

> bloomberg
The Space Race: From Apollo 11 to Elon Musk

It's 50 years since the world was gripped by historic images
of Apollo 11, and Neil Armstrong -- the first man to walk
on the moon. It was the height of the Cold War, and the charts
were filled with David Bowie's Space Oddity, and Creedence's
Bad Moon Rising. The world is a very different place than
it was 5 decades ago. But how has the space race changed since
the summer of '69? (Source: Bloomberg)


Twitter CEO Jack Dorsey Gives Talk at Apple Headquarters

Twitter and Square Chief Executive Officer Jack Dorsey
addressed Apple Inc. employees at the iPhone maker’s headquarters
Tuesday, a signal of the strong ties between the Silicon Valley giants.

> nytimes
Error: Incorrect URL

> exit

3. Hotkeys

Description

Every browser has a “back” button. If the user presses this button, the browser shows the previous web page. This feature can be realized using a stack. You save the pages visited by the user: google, wikipedia, bloomberg, ..., but when the user types back, you will see the pages in the reverse order: ..., bloomberg, wikipedia, google.

Objectives

The result of this task is the same as in the previous task, but now the program has a new feature:

  1. The program should show the previous page if the user types back. You can implement a stack to do this.
  2. If there are no more pages in the browser history, just don’t output anything.

Example

The greater-than symbol followed by space (> ) represents the user input. Notice that it's not the part of the input.

> python browser.py dir-for-files
> bloomberg.com
The Space Race: From Apollo 11 to Elon Musk

It's 50 years since the world was gripped by historic images
of Apollo 11, and Neil Armstrong -- the first man to walk
on the moon. It was the height of the Cold War, and the charts
were filled with David Bowie's Space Oddity, and Creedence's
Bad Moon Rising. The world is a very different place than
it was 5 decades ago. But how has the space race changed since
the summer of '69? (Source: Bloomberg)


Twitter CEO Jack Dorsey Gives Talk at Apple Headquarters

Twitter and Square Chief Executive Officer Jack Dorsey
addressed Apple Inc. employees at the iPhone maker’s headquarters
Tuesday, a signal of the strong ties between the Silicon Valley giants.

> nytimes.com
This New Liquid Is Magnetic, and Mesmerizing

Scientists have created “soft” magnets that can flow
and change shape, and that could be a boon to medicine
and robotics. (Source: New York Times)


Most Wikipedia Profiles Are of Men. This Scientist Is Changing That.

Jessica Wade has added nearly 700 Wikipedia biographies for
important female and minority scientists in less than two
years.

> back
The Space Race: From Apollo 11 to Elon Musk

It's 50 years since the world was gripped by historic images
of Apollo 11, and Neil Armstrong -- the first man to walk
on the moon. It was the height of the Cold War, and the charts
were filled with David Bowie's Space Oddity, and Creedence's
Bad Moon Rising. The world is a very different place than
it was 5 decades ago. But how has the space race changed since
the summer of '69? (Source: Bloomberg)


Twitter CEO Jack Dorsey Gives Talk at Apple Headquarters

Twitter and Square Chief Executive Officer Jack Dorsey
addressed Apple Inc. employees at the iPhone maker’s headquarters
Tuesday, a signal of the strong ties between the Silicon Valley giants.

> exit

4. Requesting

Description

Now we should get closer to the browser with the address bar. At this stage, you need to forget about your hard-coded variables with sites and show your user some real pages. Make the browser request the real input URL and show the result.

One of the simplest ways to do this is the Request library. It is already installed in your project, so you can use it. This library allows to get any web page via URL by one string. You can find this string in Request documentation, though it’s better to read the whole quick manual to understand more.

Sometimes, it’s going to be a challenge. You might find that you suddenly don't have permission to visit certain websites. That’s because of the User-agent. It’s just a string that all browsers use to mark the request, and they all have different user-agents. Frankly, browsers add a lot of additional information to the requests. All this info can be set using the request library. For this task, it's optional, but feel free to experiment.

Objectives

Add new features to the browser:

  1. So, your program should read the URL from input as before, but now show the real web page using the Request library.
  2. Since the user can input the URL without https:// in the beginning, you need to append this string if it is not there.

Example

The greater-than symbol followed by space (> ) represents the user input. Notice that it's not the part of the input.

> python browser.py dir-for-files
> docs.python.org

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta charset="utf-8" /><title>3.7.4 Documentation</title>
    <link rel="stylesheet" target="_blank" href="_static/pydoctheme.css" type="text/css" />
    <link rel="stylesheet" target="_blank" href="_static/pygments.css" type="text/css" />

    <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <script type="text/javascript" src="_static/language_data.js"></script>

    <script type="text/javascript" src="_static/sidebar.js"></script>

    <link rel="search" type="application/opensearchdescription+xml"
          title="Search within Python 3.7.4 documentation"
          target="_blank" href="_static/opensearch.xml"/>
    <link rel="author" title="About these documents" target="_blank" href="about.html" />
    <link rel="index" title="Index" target="_blank" href="genindex.html" />
    <link rel="search" title="Search" target="_blank" href="search.html" />
    <link rel="copyright" title="Copyright" target="_blank" href="copyright.html" />
    <link rel="shortcut icon" type="image/png" target="_blank" href="_static/py.png" />
    <link rel="canonical" target="_blank" href="https://docs.python.org/3/index.html" />

    <script type="text/javascript" src="_static/copybutton.js"></script>
    <script type="text/javascript" src="_static/switchers.js"></script>

   …  (More than 200 such terrifying strings)
> exit

5. Parsing

Description

Now it is important for us to bring the resulting "text" to a form that is understandable to the user.

If you don’t know what HTML is, here's a short explanation. When working on the previous task you could see a lot of <div>, <script> or <p> “words” on the displayed web page. These are called tags. Browsers need tags to know how exactly to show the page. For example, there could be headers that look different from the rest of the text. Also, there could be links; they could be blue, and the cursor could look like a pointing finger when it's on the link. To let the browser know where the links are, where an image should be and so on, tags are used.

Tags are necessary for the browser but aren’t useful for users. Most tags are paired. For example: <p>Some text</p>, where <p> is an opening tag and </p> is a closing tag. You need to show only “Some text” without <p> and </p> on a web page.

Each tag has its own purpose: <p> for text, <h1> <h3> … <h6> for headers, <a> for links, <ul> <ol> <li> for lists.

Objectives

At this stage, you need to cut all content outside of these tags and output what remains. No more <div>, <script>, <p> and so on, just text! You need to show only the content of a limited list of tags (<p>, headers, <a> and <ul>, <ol>, <li>) without the tags themselves.

Use beautifulsoup4 library for solving this, it is installed in your project already. Feel free to get curious and browse through some more information about parsing!

Example

The greater-than symbol followed by space (> ) represents the user input. Notice that it's not the part of the input.

> python browser.py dir-for-files
> docs.python.org
index
modules
Python
Documentation
Python 3.7.4 documentation
Welcome! This is the documentation for Python 3.7.4.
Parts of the documentation:
What's new in Python 3.7? or all "What's new" documents since 2.0
Tutorial start here
Library Reference keep this under your pillow
Language Reference describes syntax and language elements
Python Setup and Usage how to use Python on different platforms
Python HOWTOs in-depth documents on specific topics
Installing Python Modules installing from the Python Package Index & other sources
Distributing Python Modules publishing modules for installation by others
Extending and Embedding tutorial for C/C++ programmers
Python/C API reference for C/C++ programmers
FAQs frequently asked questions (with answers!)
Indices and tables:
Global Module Index quick access to all modules
General Index all functions, classes, terms
Glossary the most important terms explained
Search page search this documentation
Complete Table of Contents lists all sections and subsections
Meta information:
Reporting bugs
About the documentation
History and License of Python
Copyright
> exit

6. Formatted output

Description

It’s not enough to just drop the tags. You should make your output “readable”. After all, we would like to have a user-friendly browser, right? At this stage, try to make your browser look more like a browser.

Almost every page contains links. Have you ever wondered why blue was chosen to highlight them?

One of the reasons lies in the physiology of the human eye. Red and green are detected by the same cells in the eye, and one of the most common forms of colorblindness is red-green colorblindness. It affects 7% of men and only 0.4% of women, that’s still one person in 25 overall. But almost no one has a blue deficiency. Accordingly, nearly everyone can see blue, or, more accurately, almost everyone can distinguish blue as a color different from others.

Also, blue is the darkest color that does not reduce the readability of the text.

Objectives

Let all links in your browser be blue! Pay attention to the Colorama library. This library is already installed in the project, so you can use it. With this library, you can easily solve this task just after reading the documentation!

Example

The greater-than symbol followed by space (> ) represents the user input. Notice that it's not the part of the input.
> https://docs.python.org
index
modules
Python
Documentation
Python 3.7.4 documentation
Welcome! This is the documentation for Python 3.7.4.
Parts of the documentation:
What's new in Python 3.7? or all "What's new" documents since 2.0
Tutorial start here
Library Reference keep this under your pillow
Language Reference describes syntax and language elements
Python Setup and Usage how to use Python on different platforms
Python HOWTOs in-depth documents on specific topics
Installing Python Modules installing from the Python Package Index & other sources
Distributing Python Modules publishing modules for installation by others
Extending and Embedding tutorial for C/C++ programmers
Python/C API reference for C/C++ programmers
FAQs frequently asked questions (with answers!)
Indices and tables:
Global Module Index quick access to all modules
General Index all functions, classes, terms
Glossary the most important terms explained
Search page search this documentation
Complete Table of Contents lists all sections and subsections
Meta information:
Reporting bugs
About the documentation
History and License of Python
Copyright

> exit