Javascript required
Skip to content Skip to sidebar Skip to footer

Write a Program That Reads a Wordlist and Finds All the Rotate Pairs.

Purchase this book at Amazon.com

Chapter 11  Dictionaries

This chapter presents some other built-in type called a dictionary.
Dictionaries are 1 of Python's best features; they are the
edifice blocks of many efficient and elegant algorithms.

xi.1  A dictionary is a mapping







A dictionary is like a listing, but more than general. In a list,
the indices have to be integers; in a dictionary they can
be (almost) whatsoever type.

A dictionary contains a drove of indices, which are called keys, and a drove of values. Each key is associated with a
single value. The association of a key and a value is called a cardinal-value pair or sometimes an item.

In mathematical language, a dictionary represents a mapping
from keys to values, so you can also say that each key
"maps to" a value.
Equally an example, we'll build a lexicon that maps from English
to Spanish words, and then the keys and the values are all strings.

The role dict creates a new dictionary with no items.
Because dict is the proper name of a built-in office, you
should avoid using information technology as a variable proper name.

>>> eng2sp = dict() >>> eng2sp {}        

The squiggly-brackets, {}, represent an empty lexicon.
To add items to the dictionary, you can use foursquare brackets:

>>> eng2sp['one'] = 'uno'        

This line creates an item that maps from the key
'one' to the value 'uno'. If nosotros impress the
dictionary again, we run into a key-value pair with a colon
between the central and value:

>>> eng2sp {'one': 'uno'}        

This output format is also an input format. For case,
you can create a new dictionary with 3 items:

>>> eng2sp = {'one': 'uno', '2': 'dos', '3': 'tres'}        

But if y'all print eng2sp, you might be surprised:

>>> eng2sp {'one': 'uno', 'iii': 'tres', 'two': 'dos'}        

The lodge of the cardinal-value pairs might non exist the same. If
yous blazon the aforementioned case on your computer, y'all might get a
dissimilar result. In general, the order of items in
a dictionary is unpredictable.

Merely that'southward not a problem considering
the elements of a lexicon are never indexed with integer indices.
Instead, you utilise the keys to await upwardly the corresponding values:

>>> eng2sp['2'] 'dos'        

The primal '2' always maps to the value 'dos' so the order
of the items doesn't matter.

If the primal isn't in the dictionary, y'all get an exception:

>>> eng2sp['4'] KeyError: '4'        

The len function works on dictionaries; information technology returns the
number of fundamental-value pairs:

>>> len(eng2sp) 3        

The in operator works on dictionaries, too; it tells you whether
something appears as a central in the dictionary (appearing
as a value is not good enough).


>>> 'one' in eng2sp True >>> 'uno' in eng2sp Simulated        

To come across whether something appears as a value in a lexicon, you
can use the method values, which returns a collection of
values, and then use the in operator:

>>> vals = eng2sp.values() >>> 'uno' in vals True        

The in operator uses different algorithms for lists and
dictionaries. For lists, it searches the elements of the list in
order, as in Section 8.vi. As the list gets longer, the search
time gets longer in direct proportion.

Python dictionaries use a data construction
called a hashtable that has a remarkable property: the
in operator takes about the same corporeality of time no matter how
many items are in the lexicon. I explain how that's possible
in Section B.4, but the explanation might not make
sense until you lot've read a few more chapters.

eleven.2  Lexicon as a collection of counters

Suppose y'all are given a string and yous desire to count how many
times each letter of the alphabet appears. There are several ways you could do it:

  1. You could create 26 variables, one for each letter of the
    alphabet. And so you could traverse the string and, for each
    graphic symbol, increment the corresponding counter, probably using
    a chained conditional.
  2. You could create a list with 26 elements. Then you could
    convert each character to a number (using the born function
    ord), use the number as an index into the list, and increase
    the appropriate counter.
  3. You could create a lexicon with characters equally keys
    and counters as the corresponding values. The first time you
    see a character, you would add together an item to the dictionary. Later
    that you would increase the value of an existing item.

Each of these options performs the same computation, just each
of them implements that computation in a different fashion.

An implementation is a way of performing a computation;
some implementations are better than others. For example,
an advantage of the dictionary implementation is that we don't
accept to know ahead of time which letters announced in the string
and we only have to brand room for the messages that exercise appear.

Here is what the code might look like:

def histogram(due south):     d = dict()     for c in s:         if c not in d:             d[c] = 1         else:             d[c] += 1     return d        

The proper name of the function is histogram, which is a statistical
term for a collection of counters (or frequencies).


The first line of the
function creates an empty dictionary. The for loop traverses
the string. Each time through the loop, if the grapheme c is
not in the dictionary, we create a new item with key c and the
initial value one (since we have seen this letter once). If c is
already in the dictionary nosotros increment d[c].

Hither's how information technology works:

>>> h = histogram('brontosaurus') >>> h {'a': i, 'b': 1, 'o': ii, 'n': 1, 'southward': ii, 'r': ii, 'u': ii, 't': one}        

The histogram indicates that the letters 'a' and 'b'
appear once; 'o' appears twice, and so on.



Dictionaries take a method called get that takes a key
and a default value. If the key appears in the lexicon,
get returns the corresponding value; otherwise it returns
the default value. For case:

>>> h = histogram('a') >>> h {'a': 1} >>> h.get('a', 0) 1 >>> h.become('c', 0) 0        

As an exercise, utilise go to write histogram more concisely. You
should be able to eliminate the if statement.

11.3  Looping and dictionaries

If you use a dictionary in a for argument, it traverses
the keys of the dictionary. For example, print_hist
prints each central and the corresponding value:

def print_hist(h):     for c in h:         print(c, h[c])        

Here's what the output looks like:

>>> h = histogram('parrot') >>> print_hist(h) a ane p 1 r 2 t one o one        

Once more, the keys are in no particular order. To traverse the keys
in sorted gild, you can employ the built-in function sorted:

>>> for key in sorted(h): ...     print(key, h[key]) a 1 o 1 p i r ii t 1        

11.four  Reverse lookup

Given a dictionary d and a key k, it is easy to
observe the corresponding value v = d[k]. This operation
is called a lookup.

Just what if y'all accept 5 and y'all want to discover one thousand?
You have ii issues: first, there might be more 1
key that maps to the value v. Depending on the application,
y'all might exist able to choice one, or you might have to make
a list that contains all of them. 2d, there is no
simple syntax to do a reverse lookup; y'all have to search.

Here is a office that takes a value and returns the first
primal that maps to that value:

def reverse_lookup(d, v):     for k in d:         if d[k] == five:             return k     raise LookupError()        

This part is yet some other example of the search pattern, merely it
uses a characteristic we haven't seen before, raise. The
enhance statement causes an exception; in this case it causes a
LookupError, which is a built-in exception used to indicate
that a lookup operation failed.


If nosotros get to the end of the loop, that means v
doesn't appear in the dictionary as a value, so we heighten an
exception.

Hither is an case of a successful reverse lookup:

>>> h = histogram('parrot') >>> central = reverse_lookup(h, ii) >>> primal 'r'        

And an unsuccessful one:

>>> key = reverse_lookup(h, 3) Traceback (most recent call final):   File "<stdin>", line 1, in <module>   File "<stdin>", line 5, in reverse_lookup LookupError        

The effect when yous heighten an exception is the same equally when
Python raises one: it prints a traceback and an mistake message.


When you raise an exception, y'all can provide a detailed error message equally an optional argument. For example:

>>> raise LookupError('value does non appear in the dictionary') Traceback (nearly contempo phone call concluding):   File "<stdin>", line one, in ? LookupError: value does not appear in the dictionary        

A reverse lookup is much slower than a forrard lookup; if you
accept to exercise it often, or if the lexicon gets large, the performance
of your plan volition endure.

xi.v  Dictionaries and lists

Lists can appear as values in a dictionary. For example, if yous
are given a dictionary that maps from messages to frequencies, you
might want to capsize it; that is, create a dictionary that maps
from frequencies to letters. Since there might be several messages
with the same frequency, each value in the inverted dictionary
should be a listing of messages.

Hither is a function that inverts a dictionary:

def invert_dict(d):     inverse = dict()     for key in d:         val = d[fundamental]         if val not in inverse:             inverse[val] = [key]         else:             inverse[val].suspend(central)     return inverse        

Each fourth dimension through the loop, key gets a central from d and
val gets the corresponding value. If val is not in inverse, that ways we haven't seen it before, so we create a new
particular and initialize information technology with a singleton (a list that contains a
unmarried element). Otherwise we accept seen this value before, then nosotros
append the respective fundamental to the list.

Here is an example:

>>> hist = histogram('parrot') >>> hist {'a': 1, 'p': 1, 'r': ii, 't': ane, 'o': 1} >>> inverse = invert_dict(hist) >>> changed {1: ['a', 'p', 't', 'o'], 2: ['r']}        

image

Effigy eleven.1: State diagram.

Figure 11.1 is a country diagram showing hist and inverse.
A lexicon is represented as a box with the blazon dict above information technology
and the fundamental-value pairs inside. If the values are integers, floats or
strings, I draw them inside the box, merely I usually draw lists
exterior the box, merely to keep the diagram simple.

Lists can be values in a dictionary, every bit this case shows, but they
cannot be keys. Here'due south what happens if y'all try:

>>> t = [one, ii, 3] >>> d = dict() >>> d[t] = 'oops' Traceback (most recent call last):   File "<stdin>", line 1, in ? TypeError: listing objects are unhashable        

I mentioned earlier that a dictionary is implemented using
a hashtable and that means that the keys have to be hashable.

A hash is a function that takes a value (of any kind)
and returns an integer. Dictionaries use these integers,
called hash values, to shop and look up key-value pairs.

This system works fine if the keys are immutable. Simply if the
keys are mutable, like lists, bad things happen. For example,
when you lot create a fundamental-value pair, Python hashes the key and
stores it in the corresponding location. If you lot modify the
key and and so hash information technology once more, it would get to a different location.
In that case yous might take two entries for the same primal,
or you might non be able to detect a key. Either fashion, the
dictionary wouldn't work correctly.

That'due south why keys have to exist hashable, and why mutable types similar
lists aren't. The simplest manner to get effectually this limitation is to
use tuples, which we will see in the adjacent chapter.

Since dictionaries are mutable, they can't be used as keys,
just they tin can be used equally values.

11.vi  Memos

If you played with the fibonacci part from
Section 6.7, you might take noticed that the bigger
the argument you provide, the longer the role takes to run.
Furthermore, the run fourth dimension increases quickly.

To empathise why, consider Figure 11.2, which shows
the call graph for fibonacci with n=four:

image

A call graph shows a set of function frames, with lines connecting each
frame to the frames of the functions it calls. At the acme of the
graph, fibonacci with n=iv calls fibonacci with north=3 and n=ii. In plow, fibonacci with n=iii calls
fibonacci with n=2 and n=ane. And then on.


Count how many times fibonacci(0) and fibonacci(1) are
called. This is an inefficient solution to the problem, and it gets
worse as the statement gets bigger.

One solution is to keep track of values that have already been
computed past storing them in a lexicon. A previously computed value
that is stored for later use is chosen a memo. Here is a
"memoized" version of fibonacci:

known = {0:0, i:i}  def fibonacci(due north):     if n in known:         return known[n]      res = fibonacci(n-1) + fibonacci(north-2)     known[due north] = res     return res        

known is a dictionary that keeps track of the Fibonacci
numbers nosotros already know. Information technology starts with
2 items: 0 maps to 0 and 1 maps to i.

Whenever fibonacci is called, it checks known.
If the result is already there, information technology can render
immediately. Otherwise it has to
compute the new value, add it to the dictionary, and return it.

If you run this version of fibonacci and compare it with
the original, yous volition observe that information technology is much faster.

11.7  Global variables

In the previous example, known is created outside the part,
so it belongs to the special frame called __main__.
Variables in __main__ are sometimes called global
considering they tin exist accessed from whatsoever office. Unlike local
variables, which disappear when their office ends, global variables
persist from one function telephone call to the next.

It is mutual to apply global variables for flags; that is,
boolean variables that indicate ("flag") whether a condition
is true. For example, some programs use
a flag named verbose to control the level of detail in the
output:

verbose = True  def example1():     if verbose:         impress('Running example1')        

If you attempt to reassign a global variable, you might be surprised.
The post-obit case is supposed to keep rail of whether the
part has been chosen:

been_called = Faux  def example2():     been_called = Truthful         # WRONG        

Only if you run it you volition encounter that the value of been_called
doesn't change. The problem is that example2 creates a new local
variable named been_called. The local variable goes away when
the function ends, and has no upshot on the global variable.


To reassign a global variable inside a office you accept to
declare the global variable before y'all utilise it:

been_called = Imitation  def example2():     global been_called      been_called = True        

The global argument tells the interpreter
something like, "In this function, when I say been_called, I
mean the global variable; don't create a local i."

Hither'south an example that tries to update a global variable:

count = 0  def example3():     count = count + i          # Wrong        

If you run information technology y'all go:

UnboundLocalError: local variable 'count' referenced before consignment        

Python assumes that count is local, and nether that assumption
you are reading it before writing it. The solution, again,
is to declare count global.

def example3():     global count     count += 1        

If a global variable refers to a mutable value, you tin change
the value without declaring the variable:

known = {0:0, 1:i}  def example4():     known[2] = 1        

So you lot tin can add, remove and replace elements of a global list or
lexicon, but if you want to reassign the variable, y'all
accept to declare it:

def example5():     global known     known = dict()        

Global variables can be useful, only if you have a lot of them,
and you lot modify them frequently, they tin can brand programs
hard to debug.

11.viii  Debugging

As you piece of work with bigger datasets information technology can get unwieldy to
debug by printing and checking the output by hand. Here are some
suggestions for debugging large datasets:

Calibration down the input:
If possible, reduce the size of the
dataset. For example if the program reads a text file, start with
just the starting time 10 lines, or with the smallest example you can observe.
You can either edit the files themselves, or (improve) modify the
plan and then it reads only the first n lines.

If in that location is an error, you can reduce n to the smallest
value that manifests the fault, and and so increment information technology gradually
as you find and right errors.

Check summaries and types:
Instead of printing and checking the
unabridged dataset, consider press summaries of the data: for example,
the number of items in a lexicon or the total of a list of numbers.

A common cause of runtime errors is a value that is non the right
blazon. For debugging this kind of error, it is often plenty to print
the type of a value.

Write self-checks:
Sometimes you tin can write code to check
for errors automatically. For case, if you lot are computing the
average of a list of numbers, you could check that the result is
non greater than the largest chemical element in the list or less than
the smallest. This is called a "sanity bank check" considering it detects
results that are "insane".

Another kind of check compares the results of two dissimilar
computations to meet if they are consequent. This is called a
"consistency check".

Format the output:
Formatting debugging output
can make it easier to spot an error. We saw an example in
Section half-dozen.9. Another tool you might find useful is the pprint module, which provides
a pprint office that displays congenital-in types in
a more human-readable format (pprint stands for
"pretty print").


Again, time you spend edifice scaffolding can reduce
the time y'all spend debugging.

11.nine  Glossary

mapping:
A human relationship in which each chemical element of one set
corresponds to an element of another set.
dictionary:
A mapping from keys to their
corresponding values.
primal-value pair:
The representation of the mapping from
a key to a value.
item:
In a lexicon, another proper noun for a fundamental-value
pair.
key:
An object that appears in a dictionary as the
offset role of a key-value pair.
value:
An object that appears in a dictionary every bit the
second function of a cardinal-value pair. This is more specific than
our previous use of the word "value".
implementation:
A mode of performing a computation.
hashtable:
The algorithm used to implement Python
dictionaries.
hash office:
A part used past a hashtable to compute the
location for a primal.
hashable:
A type that has a hash function. Immutable
types like integers,
floats and strings are hashable; mutable types like lists and
dictionaries are not.
lookup:
A lexicon performance that takes a key and finds
the corresponding value.
reverse lookup:
A dictionary operation that takes a value and finds
1 or more keys that map to information technology.
raise statement:
A argument that (deliberately) raises an exception.

singleton:
A list (or other sequence) with a unmarried chemical element.
telephone call graph:
A diagram that shows every frame created during
the execution of a plan, with an arrow from each caller to
each callee.

memo:
A computed value stored to avoid unnecessary future
computation.
global variable:
A variable defined outside a function. Global
variables can be accessed from any role.
global statement:
A statement that declares a variable proper noun
global.

flag:
A boolean variable used to indicate whether a condition
is true.
annunciation:
A statement like global that tells the
interpreter something about a variable.

11.10  Exercises

Exercise 1


Write a part that reads the words in words.txt and
stores them equally keys in a dictionary. It doesn't thing what the
values are. Then you tin can employ the in operator
equally a fast fashion to check whether a string is in
the dictionary.

If you did Do 10 , yous can compare the speed
of this implementation with the list in operator and the
bisection search.

Exercise 3
Memoize the Ackermann function from Practice
ii and see if
memoization makes it possible to evaluate the role with bigger
arguments. Hint: no.
Solution:
http://thinkpython2.com/code/ackermann_memo.py .

Exercise four

If you did Exercise 7 , y'all already have
a function named has_duplicates that takes a listing
every bit a parameter and returns True if in that location is any object
that appears more than once in the list.

Use a dictionary to write a faster, simpler version of
has_duplicates.
Solution:
http://thinkpython2.com/code/has_duplicates.py .

Exercise v


Two words are "rotate pairs" if you lot can rotate ane of them
and get the other (see rotate_word in Exercise
5 ).

Write a program that reads a wordlist and finds all the rotate
pairs. Solution:
http://thinkpython2.com/lawmaking/rotate_pairs.py .

Exercise 6

Here'due south another Puzzler from Automobile Talk
(
http://www.cartalk.com/content/puzzlers ):


This was sent in by a fellow named Dan O'Leary. He came upon a common
one-syllable, 5-alphabetic character word recently that has the following unique
property. When you remove the first letter, the remaining letters course
a homophone of the original give-and-take, that is a word that sounds exactly
the same. Replace the first letter, that is, put it back and remove
the second letter of the alphabet and the result is yet another homophone of the
original word. And the question is, what'south the word?

Now I'm going to give y'all an example that doesn't work. Allow's look at
the five-letter word, 'wrack.' W-R-A-C-Yard, you lot know like to 'wrack with
pain.' If I remove the first letter, I am left with a four-letter
discussion, 'R-A-C-K.' Equally in, 'Holy moo-cow, did you lot come across the rack on that cadet!
It must accept been a 9-pointer!' It's a perfect homophone. If you lot
put the 'westward' back, and remove the 'r,' instead, yous're left with the
give-and-take, 'wack,' which is a real word, it'due south just not a homophone of the
other 2 words.

But at that place is, however, at least one word that Dan and nosotros know of,
which will yield two homophones if you remove either of the offset ii
letters to make two, new iv-letter words. The question is, what's
the word?

You can use the dictionary from Practice one to check
whether a string is in the word listing.

To check whether ii words are homophones, yous can use the CMU
Pronouncing Lexicon. You can download it from
http://www.speech communication.cs.cmu.edu/cgi-bin/cmudict or from
http://thinkpython2.com/code/c06d and you can also download
http://thinkpython2.com/lawmaking/pronounce.py , which provides a function
named read_dictionary that reads the pronouncing lexicon and
returns a Python dictionary that maps from each word to a string that
describes its chief pronunciation.

Write a program that lists all the words that solve the Puzzler.
Solution:
http://thinkpython2.com/code/homophone.py .

Buy this book at Amazon.com

gentileconstainey.blogspot.com

Source: https://uhlibraries.pressbooks.pub/buildingskillsfordatascience/chapter/dictionaries/