ThoughtSpot acquires Mode to define the next generation of collaborative BI >>Learn More

Mode Studio

The Collaborative Data Science Platform

Python Basics: Lists, Dictionaries, & Booleans

Welcome to the Python Tutorial for Data Analysis

Python is an extremely versatile programming language. It's popular for software development, but it's also one of the go-to tools in analytics, data analysis, and data science. It's a readable, friendly language with a large community that maintains open source libraries (toolkits for performing specific functions).

This lesson is the first of a full tutorial in using Python for Data Analysis. You'll learn about counting categorical data, selecting subsets, grouping data, deriving columns, and visualizing distributions.

Who is this Python tutorial for?

This tutorial is written for beginners—people who are new to Python, and to programming in general. It's specifically focused on data analysis. SQL and Excel users looking to save time on routine data manipulation tasks—like pivots—or do more advanced work—like regression analysis, clustering, or text analysis—will find this very useful. The goal of this tutorial is to cover the basics thoroughly in plain English, preparing you to get real work done and to learn more on your own.

These tutorials are by no means a comprehensive introduction to Python, but a practical way to start using Python for data analysis. For more examples, check out this gallery of Python data analysis.

Goals of this lesson

This lesson covers basic objects in Python including strings, lists, and dictionaries. In subsequent lessons, you’ll learn how to manipulate objects for some serious analytical power.

In this lesson, you’ll learn about:

Writing and running Python code

What are Python notebooks?

Notebooks are a popular tool for performing data analysis in Python and other programming languages. Notebooks are useful because they allow you to write and run code, and see the results inline (other methods of authoring and running Python do not offer inline visualization). Each input block is referred to as a cell. As you can see below, the notebook’s cells make it easy to read through the steps in an analytical process:

Input

a = 6
b = 2
    

After declaring that "a" is 6 and "b" is 2, we should be able to divide "a" by "b" and get 3:

Input

print a/b
    
Output

3
    

Jupyter is the most popular notebook tool, but in this tutorial, we’ll be using the notebooks built into Mode (which are similar in many ways) so that you won’t have to go through hours of setup.

Who uses Python notebooks?

People like atmospheric scientists, anthropologists, data scientists, marketers, microbiologists, econometrists, Python hobbyists, and data journalists use notebooks to develop and communicate their Python code, and even present notebooks alongside published studies. Check out these great examples.

Using Mode Python notebooks

Mode is an analytics platform that brings together a SQL editor, Python notebook, and data visualization builder. Throughout this tutorial, you can use Mode for free to practice writing and running Python code.

  1. Log into Mode or create an account.
  2. Navigate to this report and click Duplicate. This will take you to the SQL Query Editor.
  3. Click Python Notebook under Notebook in the left navigation panel. This will open a new notebook.

Now you’re all ready to go.

To run Python code in a notebook

  • Write code in an input "cell"
  • Run with keys [shift] + [return] or the "Run" button
  • Code runs and shows output in a new cell

It’s time to try it yourself: type a sentence in quotes and press [shift] + [enter]:

Input

'this is a string'
    
Output

'this is a string'
    

The code runs and prints output below. This object is a string, or a text object. Strings can include any combination of letters, numbers, and special characters. You'll learn more about other data types later.

Output

Once the cell is done running, you will usually see output or an error below it (more on this in a minute). If you hit an error (which will happen), try to fix the code in that cell, then run the cell again.

Variables

So that you can reference this string later, you should give it a variable name, “first_string”. Note that the variable name in the example below is entire lowercase—this is a convention in Python. We use underscores because Python assumes that a space ends the variable name:

Input

first_string = 'hello from mode'
    

You'll notice that this did not generate an output cell. That's because you didn't issue an explicit instruction to do so—all you did was assign the first_string variable. In order to see that the variable was stored, you can print the name. Get used to this—you'll use it often to confirm that the contents of variables match your expectations:

Input

print first_string
    
Output

hello from mode
    

You can also run the variable name to see its raw value as output:

Input

first_string
    
Output

'hello from mode'
    

Finally, you can use either double quotes or single quotes to define a string:

Input

print "one bacon double cheeseburger"
    
Output

one bacon double cheeseburger
    

How Python notebooks run

A Python notebook is more than just a list of all the things you did. As you can see, it's actually running code. When you store a variable, the notebook remembers that variable... for a limited time. Here's the simplified version:

When you open a notebook in Mode, there's a computer somewhere that reserves a space for you to work. When you run a cell, it uses that computer's processing power to complete the operation. When you assign a variable (as above), that variable gets stored in the computer's memory. Memory is where actively operating programs temporarily hold data, and where all of the Python objects you're creating are saved. As long as a variable is in the computer's memory, you'll be able to access it (like you did with that print statement above). When the memory is cleared, you'll lose the objects you've been working with in our Python code.

Saving the notebook

If you leave Mode for more than 15 minutes, your session will end. It's not a big deal—it just means that Mode will purge your assigned variables from memory. If you want to use them again, you'll have to run the notebook from the top, step by step. You can also hit the "Run Notebook"" button to speed things up.

As you work, you may delete or edit code. Mode will automatically save your work each time you run a cell. During this process, it's important to make sure that your code stays clean and can be run from top to bottom. Keep this in mind when modifying or re-running cells. A good rule of thumb: If someone else copies the code, they should see the same results.

Basic Python objects

In Python, everything is an object. This means that for any code you run, the result is some type of object.

There are many types of objects in Python, but the types you'll learn about in this tutorial are data objects.

  1. List
  2. Dictionary
  3. Series (covered in the next lesson)
  4. DataFrame (covered in the next lesson)

Lists

A list is exactly what it sound like. Lists can contain any kind of objects, as long as they're between square brackets. That's right—a list is both an object AND a container of other objects. This might be confusing at first, but you'll come to learn that it's very convenient to be able to treat one item in a list similarly to how you might treat the list itself.

In lists:

  • the order stays the same
  • you can get an item by referring to its position in the list, a number called the index

To try it out, create a list named “cities” and give it a few string items:

Input

cities = ['Tokyo','Los Angeles','New York','San Francisco']
    

Variable names

You gave the list above a variable name of "cities". This means that "cities" is this specific list. You can refer back to it by name:

Input

print cities # see the specific list object this variable refers to
    
Output

['Tokyo', 'Los Angeles', 'New York', 'San Francisco']
    

Again, it’s good to get in the habit of printing the variable name immediately after you create it to make sure that it matches your expectations. The above case is simple but as you start writing more complex functions, this can be tremendously helpful.

Accessing list items

To get an item in the list, you'll use the position in the list, the index, of the item inside square brackets. In Python, as in many languages, lists are zero-indexed, meaning the first item has an index of 0, the second item has an index of 1, and so forth.

Run this bit of code to get the second item in the list:

Input

cities[1]
    
Output

'Los Angeles'
    

Since the list is zero-indexed, 'Los Angeles' has an index of 1, even though it’s the second item in the list. That means using an index of 0 should output 'Tokyo'.

Input

cities[0]
    
Output

'Tokyo'
    

Practice Problem

Get the third city in the list cities.

View Solution

You'll get used to zero-indexing after awhile. If you're curious, here are a few explanations of why this choice was made.

Dictionaries

Dictionaries are also what they sound like - a list of definitions that correspond to unique terms.

  • Dictionaries are unordered
  • Dictionary values are accessed by keys

Keys and values are to a dictionary what words and their definitions are to an English dictionary. Each entry in a dictionary is called a key-value pair. To create a dictionary, use curly braces:

Input

city_population = {
    'Tokyo': 13350000, # a key-value pair
    'Los Angeles': 18550000,
    'New York City': 8400000,
    'San Francisco': 1837442,
}
    

Note that numbers do not have quotation marks around them.

Input

city_population # a different order than we defined when we created it - ordered alphabetically by key
    
Output

{'Los Angeles': 18550000,
 'New York City': 8400000,
 'San Francisco': 1837442,
 'Tokyo': 13350000}
    

Say you want to know the population of New York. Similarly to a list, you can use brackets to access the value, but this time with the key (instead of the index):

Input

city_population['New York City']
    
Output

8400000
    

Practice Problem

Get the population of Tokyo.

View Solution

Changing values and combining objects

Changing values

Say you wanted to change the population numbers for San Francisco, because it turns out 837,442 people live there, not 1,837,442 (Maybe San Francisco actually belongs in a different dictionary called "villages").

You can name the index of the item you want to change, using = to set the new population value:

Input

city_population['San Francisco'] = 837442
    
Input

city_population # change made!
    
Output

{'Los Angeles': 18550000,
 'New York': 8400000,
 'San Francisco': 837442,
 'Tokyo': 13350000}
    

In dictionaries, keys are unique. This means that there can only be one "San Francisco" in the city_population dictionary, just as there can only be one Supercalifragilisticexpialidocious entry in the English dictionary.

Let's add another big city to the dictionary. To do this, you'll need to assign a value to a key that doesn't yet exist:

Input

city_population['Mumbai'] = 11980000
    
Input

city_population
    
Output

{'Los Angeles': 18550000,
 'Mumbai': 11980000,
 'New York City': 8400000,
 'San Francisco': 1837442,
 'Tokyo': 13350000}
    

Bingo, Mumbai is in the mix.

You can see that “=” will assign a given value to a specific key, regardless of whether the key already exists in the dictionary. If it already exists, the new value will overwrite the previous one.

Combining objects

Python is very friendly to combining objects, such as creating a dictionary of lists. For example, if you wanted to list the municipalities of two cities, you could create this dictionary:

  • municipalities dictionary (inside {}):
  • key: string of city name (inside '')
  • value: list of municipalities (inside [])
  • list items: strings, each a single municipality (inside '')

Copy and run the following cell into your notebook, so that you can explore the dictionary yourself:

Input

municipalities = {
    'New York City': [
        'Manhattan',
        'The Bronx',
        'Brooklyn',
        'Queens',
        'Staten Island'
    ],
    'Tokyo': [
        'Akihabara',
        'Harajuku',
        'Shimokitazawa',
        'Nakameguro',
        'Shibuya',
        'Ebisu/Daikanyama',
        'Shibuya District',
        'Aoyama',
        'Asakusa/Ueno',
        'Bunkyo District',
        'Ginza',
        'Ikebukuro',
        'Koto District',
        'Meguro District',
        'Minato District',
        'Roppongi',
        'Shinagawa District',
        'Shinjuku',
        'Shinjuku District',
        'Sumida District',
        'Tsukiji',
        'Tsukishima']
}
    

To better understand how this structure works, retrieve the municipalities in Tokyo:

Input

municipalities['Tokyo']
    
Output

['Akihabara',
 'Harajuku',
 'Shimokitazawa',
 'Nakameguro',
 'Shibuya',
 'Ebisu/Daikanyama',
 'Shibuya District',
 'Aoyama',
 'Asakusa/Ueno',
 'Bunkyo District',
 'Ginza',
 'Ikebukuro',
 'Koto District',
 'Meguro District',
 'Minato District',
 'Roppongi',
 'Shinagawa District',
 'Shinjuku',
 'Shinjuku District',
 'Sumida District',
 'Tsukiji',
 'Tsukishima']
    

So to get a single municipality, you would enter the dictionary variable name and the list index. If you wanted to show Tokyo’s Nakameguro municipality (which is the fourth value in the list), you would write the following (remember that python is zero-indexed):

Input

municipalities['Tokyo'][3]
    
Output

'Nakameguro'
    

Practice Problem

What is the fourth listed municipality (okay, borough) in New York City?

Bonus Points: Plot the delays as a stacked bar chart.

View Solution

Boolean objects

Just as some values are stored as the string data type, some values can be stored as the boolean data type. More specifically, booleans have two possible values: True or False. This will come in handy when filtering data in later lessons.

type() can be used to check the data type (string, boolean, etc.—you will learn more of these later) of an object. Let's check the object type of the value True:

Input

type(True)
    
Output

bool
    

As a point of comparison, you can see that the set of municipalities of Tokyo in the above example is stored as a list:

Input

type(municipalities['Tokyo'])
    
Output

list
    

Comparing values using booleans

Until now, you've been using the equal sign to set values. But it is also used in Python to compare equality. The double equals sign "==" is used to check whether the values to the right and left are equal to one another.

For example, is the population of Tokyo equal to 13,350,000? You can use this statement to evaluate the equality (resulting in a boolean):

Input

city_population['Tokyo'] == 13350000
    
Output

True
    
Input

city_population['Tokyo'] == 50000
    
Output

False
    

two equal signs will test for equality, but one equal sign will assign a variable. If you removed one of the equal signs, you would change the value of Tokyo's population in the dictionary (just as you did above with San Francisco)!

Just as you can compare values for equivalence, you can compare values for non-equivalence, or not being equal to one another with !=

Input

city_population['Tokyo'] != 50000
    
Output

True
    

The "not equal to" operator is equivalent to "<>" in SQL or Excel.

Python code execution and objects

Let's break down one of the examples in the boolean section above to properly understand how Python works.

Input

type(municipalities['Tokyo'])
    
Output

list
    

Lesson summary

In this lesson, you learned about:

  • Writing and running Python code
  • Basic Python objects, including lists and dictionaries
  • Changing values and combining objects
  • Comparing values with boolean objects

In the next lesson you'll learn about Python methods and functions, and how to import libraries.

Next Lesson

Python Methods, Functions, & Libraries

Get more from your data

Your team can be up and running in 30 minutes or less.