Python3 Notes
This is just a collection of notes I’ve made over a period of time to remind me of certain commands or syntax. I will continue adding to this over time. I’m also going to add my Natural Language Processing notes and Machine Learning Notes in a couple of other articles.
Package / Dependency Management
pip
The Basics
installing a particular version: pip3 install requests==6.3.26
installing the latest minor version of my current major version: pip3 install requests>=6,<7
install any 6.3.X version >= 6.3.22: pip3 install requests~=6.3.22
Installing From a Repository
installing from a git repository: pip install git+https://github.com/user/repo.git@branch
for master: @master
for a commit hash: @3cdf42b
for a tag: @1.1.1
Checking for New Versions and Upgrading
Checking for new versions: pip3 list --outdated
Upgrading a specific package: pip3 install --upgrade requests
Sharing Packages
PyPI Python Package Index is a package repository, accounts are free and is a public repository where you can make your packages available.
pypi.python.org or the newer pypi.org
VirtualEnv
To create a virtual environment in a subfolder called venv: python3 -m venv ./venv
To activate a virtual environment in a subfolder called venv: source ./venv/bin/activate
To deactivate the a virtual environment and return to the default environment: deactivate
If you want to use idle and you are using virtual environments it’s better to launch it like this: python -m idlelib.idle
pipenv
This is actually the preferred method for Python 3.6 onward for creating your virtual environment and doing package management.
You can install pipenv by running the following command: pip install --user pipenv
You should then add pipenv to your path in the .bash_profile file and run: ```source ~/.bash_profile
Instead of using activate you use from the project folder: pipenv shell
Instead of deactivate use: exit
When creating your environment, if you have multiple versions of Python installed you can use pipenv --python 3.9
to install version 3.9 into your environment.
Requirements.txt
This is really useful for packaging your project because it allows for automatically installing all the necessary dependencies.
You can capture the list from pip using: pip3 freeze > requirements.txt
Likewise you can defrost by using: pip3 install -r requirements.txt
To split up dev and prod dependencies you can have two separate files for example requirements-dev.txt and requirements.txt and simply import the second into the first by adding a line in the requirements-dev.txt like this: -r requirements.txt
Then you just call it as before, for dev like this: pip3 install -r requirements-dev.txt
and for prod like this: pip3 install -r requirements.txt
Core Language Syntax
Notes About Strings
substring in python can be done like this 'pippo'[1:4] == 'ipp'
you can also use negative values for example 'pippo'[-4:-1] == 'ipp'
Formatting strings previously could be done like this: "%s %s" %("Hello", "World")
however it is now possible to use this syntax which is perhaps preferrable "{0} {1}".format("Hello", "World")
This formatting can be very rich indeed, there is however another alternative called f-string which is extreemly interesting and elegant as a solution. With this you can include directly in the string the values of variables, a bit like you would do in ant if you have ever used that. For example: example = f"this is an example {myvar} of myvar"
so if we had myvar = "that displays the content"
then the command print(example)
would print: this is an example that displays the conent of myvar"
Adding a * before an array does what is called unpacking so that if you print an array of strings it is printed without the square brackets.
Notes About Typing of Variables
Annotations can be put on parameters, return and variables to describe the types
tuples are defined like this: (str, int)
or alternatively like this: Tuple[str, int]
lists are defined like this: [str]
or alterantively like this: List[str]
dicts are defined like this: {str, str}
or alternatively like this: Dict[str, int]
you can also use these in comments after variable declarations like this # type: Dict[str, int]
You may need to import from the package module typing the classes Any, Union, Callable, TypeVar, Generic, Dict, List and Tupel.
The Optional return type is a useful one to remember because that allows you to also return None.
Variables defined in classes that are members of the class and not statics should be declared in the constructor using self.varname and not in the head of the class as this would render them static.
Notes About Decorators
Apart from you should never trust them like you should never trust plumbers :D
asyncio and Green Threads
This introduces async and await which allow you to handoff to another thread if you are waiting on for example the network.
This also introduces the concept of futures to python
This is an alternative to using callbacks
Notes about logging, sysout and print
Matt Harrison’s suggestion from one of his courses on Python was to use sys.stdout.write() instead of print for explicitly writing to the screen, however for debug logging he suggested using the logging module.
Notes Related to Particular Libraries
Numpy
It’s pretty neat that you can sum two arrays like this np.array([1,2,3]) + np.array([3,2,1]) would result in [4,4,4]
You can also filter a numpy array like this: give me all values of the array greater than 1 np.array([1,2,3]) > 1 would return [2,3]
Although it is possible to use logical operators it is not possible to use boolean operators directly on two numpy arrays however there is a solution, to use np.logical_and() or np.logical_or() etc etc.
Pandas
Delete a column from a DataFrame using: del df['id'] or df.drop('id', 1)
Pandas pivot
on a dataframe converts an integer column to float if it contains None
values, make sure you have already done adequate cleaning prior to using pivot.
You can transpose a dataframe with: df.T
pd.getDummies(df.field)
is useful for converting qualitative/categorical code into numerical essentially bit fields
useful pandas feature for slicing and dicing a data frame
df.iloc[:,:2]
means get all rows but only the first two columns
Jupyter Notes
interesting run a system command from Jupyter using the bang character:
!head ../data/data.csv
Jupyter commands:
<shift>-<tab> : context help
<shift>-<tab> * 2 : more detail
<shift>-<tab> * 4 : documentation
<tab> : autocompletion
Other Notes
timeit
from package timeit
is pretty neat for getting metrics. Bit like using System.nanotime before and after but it does all the work.
When having problems with SSL certificates in python its worth trying this: /Applications/Python\ 3.6/Install\ Certificates.command
Bandit for security unit testing
Process for choosing the right 3rd party packages:
-
Compile a list of candidate packages.
For doing this there are a number of sites containing curated lists of python packages such as: awesome-python.com, python.libhunt.com, python-guide.org, pymotw.com, wiki.python.org
other possible sources are, stackoverflow, google, reddit, hacker news (hn.algolia.com) and twitter
-
Check out the popularity (downloads, results on google for package etc etc) the more popular usually is a good indication of quality.
-
Check out the projects home page to get an understanding about how well maintained it is, if there are api documents, if there are stats about test coverage, coding guidelines for contributors etc etc.
-
Check out the readme, what the project does, what license it is under, who is the author
-
Check if the project is actively maintained, changelog, release notes, bug tracker, commit history
-
Check the source code, verify if it is formatted using best practices, if there are unit tests, if it looks like it was developed by a seasoned developer and if it was developed by a developer coming from another language.
-
Try out the library
choosealicense.com/licenses