Python Tutorial
Learn Python for business analysis using real-world data. No coding experience necessary.
Start Now
Mode Studio
The Collaborative Data Science Platform
Python Basics: Lists, Dictionaries, & Booleans
Welcome to the Python Tutorial for Data Analysis
Python is an extremely versatile programming language. It's popular for software development, but it's also one of the go-to tools in analytics, data analysis, and data science. It's a readable, friendly language with a large community that maintains open source libraries (toolkits for performing specific functions).
This lesson is the first of a full tutorial in using Python for Data Analysis. You'll learn about counting categorical data, selecting subsets, grouping data, deriving columns, and visualizing distributions.
Who is this Python tutorial for?
This tutorial is written for beginners—people who are new to Python, and to programming in general. It's specifically focused on data analysis. SQL and Excel users looking to save time on routine data manipulation tasks—like pivots—or do more advanced work—like regression analysis, clustering, or text analysis—will find this very useful. The goal of this tutorial is to cover the basics thoroughly in plain English, preparing you to get real work done and to learn more on your own.
These tutorials are by no means a comprehensive introduction to Python, but a practical way to start using Python for data analysis. For more examples, check out this gallery of Python data analysis.
Goals of this lesson
This lesson covers basic objects in Python including strings, lists, and dictionaries. In subsequent lessons, you’ll learn how to manipulate objects for some serious analytical power.
In this lesson, you’ll learn about:
- Writing and running Python code
- Basic Python objects, including lists and dictionaries
- Changing values and combining objects
- Comparing values with booleans
Writing and running Python code
What are Python notebooks?
Notebooks are a popular tool for performing data analysis in Python and other programming languages. Notebooks are useful because they allow you to write and run code, and see the results inline (other methods of authoring and running Python do not offer inline visualization). Each input block is referred to as a cell. As you can see below, the notebook’s cells make it easy to read through the steps in an analytical process:
a = 6
b = 2
After declaring that "a" is 6 and "b" is 2, we should be able to divide "a" by "b" and get 3:
print a/b
3
Jupyter is the most popular notebook tool, but in this tutorial, we’ll be using the notebooks built into Mode (which are similar in many ways) so that you won’t have to go through hours of setup.
Who uses Python notebooks?
People like atmospheric scientists, anthropologists, data scientists, marketers, microbiologists, econometrists, Python hobbyists, and data journalists use notebooks to develop and communicate their Python code, and even present notebooks alongside published studies. Check out these great examples.
Using Mode Python notebooks
Mode is an analytics platform that brings together a SQL editor, Python notebook, and data visualization builder. Throughout this tutorial, you can use Mode for free to practice writing and running Python code.
- Log into Mode or create an account.
- Navigate to this report and click Duplicate. This will take you to the SQL Query Editor.
- Click Python Notebook under Notebook in the left navigation panel. This will open a new notebook.
Now you’re all ready to go.
To run Python code in a notebook
- Write code in an input "cell"
- Run with keys
[shift] + [return]
or the "Run" button - Code runs and shows output in a new cell
It’s time to try it yourself: type a sentence in quotes and press [shift] + [enter]
:
'this is a string'
'this is a string'
The code runs and prints output below. This object is a string, or a text object. Strings can include any combination of letters, numbers, and special characters. You'll learn more about other data types later.
Output
Once the cell is done running, you will usually see output or an error below it (more on this in a minute). If you hit an error (which will happen), try to fix the code in that cell, then run the cell again.
Variables
So that you can reference this string later, you should give it a variable name, “first_string
”. Note that the variable name in the example below is entire lowercase—this is a convention in Python. We use underscores because Python assumes that a space ends the variable name:
first_string = 'hello from mode'
You'll notice that this did not generate an output cell. That's because you didn't issue an explicit instruction to do so—all you did was assign the first_string
variable. In order to see that the variable was stored, you can print the name. Get used to this—you'll use it often to confirm that the contents of variables match your expectations:
print first_string
hello from mode
You can also run the variable name to see its raw value as output:
first_string
'hello from mode'
Finally, you can use either double quotes or single quotes to define a string:
print "one bacon double cheeseburger"
one bacon double cheeseburger
How Python notebooks run
A Python notebook is more than just a list of all the things you did. As you can see, it's actually running code. When you store a variable, the notebook remembers that variable... for a limited time. Here's the simplified version:
When you open a notebook in Mode, there's a computer somewhere that reserves a space for you to work. When you run a cell, it uses that computer's processing power to complete the operation. When you assign a variable (as above), that variable gets stored in the computer's memory. Memory is where actively operating programs temporarily hold data, and where all of the Python objects you're creating are saved. As long as a variable is in the computer's memory, you'll be able to access it (like you did with that print
statement above). When the memory is cleared, you'll lose the objects you've been working with in our Python code.
Saving the notebook
If you leave Mode for more than 15 minutes, your session will end. It's not a big deal—it just means that Mode will purge your assigned variables from memory. If you want to use them again, you'll have to run the notebook from the top, step by step. You can also hit the "Run Notebook"" button to speed things up.
As you work, you may delete or edit code. Mode will automatically save your work each time you run a cell. During this process, it's important to make sure that your code stays clean and can be run from top to bottom. Keep this in mind when modifying or re-running cells. A good rule of thumb: If someone else copies the code, they should see the same results.
Basic Python objects
In Python, everything is an object. This means that for any code you run, the result is some type of object.
There are many types of objects in Python, but the types you'll learn about in this tutorial are data objects.
- List
- Dictionary
- Series (covered in the next lesson)
- DataFrame (covered in the next lesson)
Lists
A list is exactly what it sound like. Lists can contain any kind of objects, as long as they're between square brackets. That's right—a list is both an object AND a container of other objects. This might be confusing at first, but you'll come to learn that it's very convenient to be able to treat one item in a list similarly to how you might treat the list itself.
In lists:
- the order stays the same
- you can get an item by referring to its position in the list, a number called the index
To try it out, create a list named “cities” and give it a few string items:
cities = ['Tokyo','Los Angeles','New York','San Francisco']
Variable names
You gave the list above a variable name of "cities". This means that "cities" is this specific list. You can refer back to it by name:
print cities # see the specific list object this variable refers to
['Tokyo', 'Los Angeles', 'New York', 'San Francisco']
Again, it’s good to get in the habit of printing the variable name immediately after you create it to make sure that it matches your expectations. The above case is simple but as you start writing more complex functions, this can be tremendously helpful.
Accessing list items
To get an item in the list, you'll use the position in the list, the index, of the item inside square brackets. In Python, as in many languages, lists are zero-indexed, meaning the first item has an index of 0
, the second item has an index of 1, and so forth.
Run this bit of code to get the second item in the list:
cities[1]
'Los Angeles'
Since the list is zero-indexed, 'Los Angeles'
has an index of 1, even though it’s the second item in the list. That means using an index of 0 should output 'Tokyo'
.
cities[0]
'Tokyo'
You'll get used to zero-indexing after awhile. If you're curious, here are a few explanations of why this choice was made.
Dictionaries
Dictionaries are also what they sound like - a list of definitions that correspond to unique terms.
- Dictionaries are unordered
- Dictionary values are accessed by keys
Keys and values are to a dictionary what words and their definitions are to an English dictionary. Each entry in a dictionary is called a key-value pair. To create a dictionary, use curly braces:
city_population = {
'Tokyo': 13350000, # a key-value pair
'Los Angeles': 18550000,
'New York City': 8400000,
'San Francisco': 1837442,
}
Note that numbers do not have quotation marks around them.
city_population # a different order than we defined when we created it - ordered alphabetically by key
{'Los Angeles': 18550000,
'New York City': 8400000,
'San Francisco': 1837442,
'Tokyo': 13350000}
Say you want to know the population of New York. Similarly to a list, you can use brackets to access the value, but this time with the key (instead of the index):
city_population['New York City']
8400000
Changing values and combining objects
Changing values
Say you wanted to change the population numbers for San Francisco, because it turns out 837,442 people live there, not 1,837,442 (Maybe San Francisco actually belongs in a different dictionary called "villages").
You can name the index of the item you want to change, using =
to set the new population value:
city_population['San Francisco'] = 837442
city_population # change made!
{'Los Angeles': 18550000,
'New York': 8400000,
'San Francisco': 837442,
'Tokyo': 13350000}
In dictionaries, keys are unique. This means that there can only be one "San Francisco
" in the city_population dictionary, just as there can only be one Supercalifragilisticexpialidocious entry in the English dictionary.
Let's add another big city to the dictionary. To do this, you'll need to assign a value to a key that doesn't yet exist:
city_population['Mumbai'] = 11980000
city_population
{'Los Angeles': 18550000,
'Mumbai': 11980000,
'New York City': 8400000,
'San Francisco': 1837442,
'Tokyo': 13350000}
Bingo, Mumbai is in the mix.
You can see that “=” will assign a given value to a specific key, regardless of whether the key already exists in the dictionary. If it already exists, the new value will overwrite the previous one.
Combining objects
Python is very friendly to combining objects, such as creating a dictionary of lists. For example, if you wanted to list the municipalities of two cities, you could create this dictionary:
municipalities
dictionary (inside{}
):- key: string of city name (inside
''
) - value: list of municipalities (inside
[]
) - list items: strings, each a single municipality (inside
''
)
Copy and run the following cell into your notebook, so that you can explore the dictionary yourself:
municipalities = {
'New York City': [
'Manhattan',
'The Bronx',
'Brooklyn',
'Queens',
'Staten Island'
],
'Tokyo': [
'Akihabara',
'Harajuku',
'Shimokitazawa',
'Nakameguro',
'Shibuya',
'Ebisu/Daikanyama',
'Shibuya District',
'Aoyama',
'Asakusa/Ueno',
'Bunkyo District',
'Ginza',
'Ikebukuro',
'Koto District',
'Meguro District',
'Minato District',
'Roppongi',
'Shinagawa District',
'Shinjuku',
'Shinjuku District',
'Sumida District',
'Tsukiji',
'Tsukishima']
}
To better understand how this structure works, retrieve the municipalities in Tokyo:
municipalities['Tokyo']
['Akihabara',
'Harajuku',
'Shimokitazawa',
'Nakameguro',
'Shibuya',
'Ebisu/Daikanyama',
'Shibuya District',
'Aoyama',
'Asakusa/Ueno',
'Bunkyo District',
'Ginza',
'Ikebukuro',
'Koto District',
'Meguro District',
'Minato District',
'Roppongi',
'Shinagawa District',
'Shinjuku',
'Shinjuku District',
'Sumida District',
'Tsukiji',
'Tsukishima']
So to get a single municipality, you would enter the dictionary variable name and the list index. If you wanted to show Tokyo’s Nakameguro municipality (which is the fourth value in the list), you would write the following (remember that python is zero-indexed):
municipalities['Tokyo'][3]
'Nakameguro'
Practice Problem
What is the fourth listed municipality (okay, borough) in New York City?
Bonus Points: Plot the delays as a stacked bar chart.
View SolutionBoolean objects
Just as some values are stored as the string data type, some values can be stored as the boolean data type. More specifically, booleans have two possible values: True or False. This will come in handy when filtering data in later lessons.
type()
can be used to check the data type (string, boolean, etc.—you will learn more of these later) of an object. Let's check the object type of the value True
:
type(True)
bool
As a point of comparison, you can see that the set of municipalities of Tokyo in the above example is stored as a list:
type(municipalities['Tokyo'])
list
Comparing values using booleans
Until now, you've been using the equal sign to set values. But it is also used in Python to compare equality. The double equals sign "==" is used to check whether the values to the right and left are equal to one another.
For example, is the population of Tokyo equal to 13,350,000? You can use this statement to evaluate the equality (resulting in a boolean):
city_population['Tokyo'] == 13350000
True
city_population['Tokyo'] == 50000
False
two equal signs will test for equality, but one equal sign will assign a variable. If you removed one of the equal signs, you would change the value of Tokyo's population in the dictionary (just as you did above with San Francisco)!
Just as you can compare values for equivalence, you can compare values for non-equivalence, or not being equal to one another with !=
city_population['Tokyo'] != 50000
True
The "not equal to" operator is equivalent to "<>
" in SQL or Excel.
Python code execution and objects
Let's break down one of the examples in the boolean
section above to properly understand how Python works.
type(municipalities['Tokyo'])
list
Lesson summary
In this lesson, you learned about:
- Writing and running Python code
- Basic Python objects, including lists and dictionaries
- Changing values and combining objects
- Comparing values with boolean objects
In the next lesson you'll learn about Python methods and functions, and how to import libraries.
Next Lesson
Python Methods, Functions, & Libraries