This is just a collection of notes I’ve made over a period of time to remind me of certain commands or syntax. I will continue adding to this over time. I’m also going to add my Natural Language Processing notes and Machine Learning Notes in a couple of other articles.

Package / Dependency Management

pip

The Basics

installing a particular version: pip3 install requests==6.3.26

installing the latest minor version of my current major version: pip3 install requests>=6,<7

install any 6.3.X version >= 6.3.22: pip3 install requests~=6.3.22

Installing From a Repository

installing from a git repository: pip install git+https://github.com/user/repo.git@branch

for master: @master

for a commit hash: @3cdf42b

for a tag: @1.1.1

Checking for New Versions and Upgrading

Checking for new versions: pip3 list --outdated

Upgrading a specific package: pip3 install --upgrade requests

Sharing Packages

PyPI Python Package Index is a package repository, accounts are free and is a public repository where you can make your packages available.

pypi.python.org or the newer pypi.org

VirtualEnv

To create a virtual environment in a subfolder called venv: python3 -m venv ./venv

To activate a virtual environment in a subfolder called venv: source ./venv/bin/activate

To deactivate the a virtual environment and return to the default environment: deactivate

If you want to use idle and you are using virtual environments it’s better to launch it like this: python -m idlelib.idle

pipenv

This is actually the preferred method for Python 3.6 onward for creating your virtual environment and doing package management.

You can install pipenv by running the following command: pip install --user pipenv

You should then add pipenv to your path in the .bash_profile file and run: ```source ~/.bash_profile

Instead of using activate you use from the project folder: pipenv shell

Instead of deactivate use: exit

When creating your environment, if you have multiple versions of Python installed you can use pipenv --python 3.9 to install version 3.9 into your environment.

Requirements.txt

This is really useful for packaging your project because it allows for automatically installing all the necessary dependencies.

You can capture the list from pip using: pip3 freeze > requirements.txt

Likewise you can defrost by using: pip3 install -r requirements.txt

To split up dev and prod dependencies you can have two separate files for example requirements-dev.txt and requirements.txt and simply import the second into the first by adding a line in the requirements-dev.txt like this: -r requirements.txt

Then you just call it as before, for dev like this: pip3 install -r requirements-dev.txt and for prod like this: pip3 install -r requirements.txt

Core Language Syntax

Notes About Strings

substring in python can be done like this 'pippo'[1:4] == 'ipp'

you can also use negative values for example 'pippo'[-4:-1] == 'ipp'

Formatting strings previously could be done like this: "%s %s" %("Hello", "World") however it is now possible to use this syntax which is perhaps preferrable "{0} {1}".format("Hello", "World")

This formatting can be very rich indeed, there is however another alternative called f-string which is extreemly interesting and elegant as a solution. With this you can include directly in the string the values of variables, a bit like you would do in ant if you have ever used that. For example: example = f"this is an example {myvar} of myvar" so if we had myvar = "that displays the content" then the command print(example) would print: this is an example that displays the conent of myvar"

Adding a * before an array does what is called unpacking so that if you print an array of strings it is printed without the square brackets.

Notes About Typing of Variables

Annotations can be put on parameters, return and variables to describe the types

tuples are defined like this: (str, int) or alternatively like this: Tuple[str, int]

lists are defined like this: [str] or alterantively like this: List[str]

dicts are defined like this: {str, str} or alternatively like this: Dict[str, int]

you can also use these in comments after variable declarations like this # type: Dict[str, int]

You may need to import from the package module typing the classes Any, Union, Callable, TypeVar, Generic, Dict, List and Tupel.

The Optional return type is a useful one to remember because that allows you to also return None.

Variables defined in classes that are members of the class and not statics should be declared in the constructor using self.varname and not in the head of the class as this would render them static.

Notes About Decorators

Apart from you should never trust them like you should never trust plumbers :D

asyncio and Green Threads

This introduces async and await which allow you to handoff to another thread if you are waiting on for example the network.

This also introduces the concept of futures to python

This is an alternative to using callbacks

Notes about logging, sysout and print

Matt Harrison’s suggestion from one of his courses on Python was to use sys.stdout.write() instead of print for explicitly writing to the screen, however for debug logging he suggested using the logging module.

Notes Related to Particular Libraries

Numpy

It’s pretty neat that you can sum two arrays like this np.array([1,2,3]) + np.array([3,2,1]) would result in [4,4,4]

You can also filter a numpy array like this: give me all values of the array greater than 1 np.array([1,2,3]) > 1 would return [2,3]

Although it is possible to use logical operators it is not possible to use boolean operators directly on two numpy arrays however there is a solution, to use np.logical_and() or np.logical_or() etc etc.

Pandas

Delete a column from a DataFrame using: del df['id'] or df.drop('id', 1)

Pandas pivot on a dataframe converts an integer column to float if it contains None values, make sure you have already done adequate cleaning prior to using pivot.

You can transpose a dataframe with: df.T

pd.getDummies(df.field) is useful for converting qualitative/categorical code into numerical essentially bit fields

useful pandas feature for slicing and dicing a data frame

df.iloc[:,:2] means get all rows but only the first two columns

Jupyter Notes

interesting run a system command from Jupyter using the bang character: !head ../data/data.csv

Jupyter commands:

<shift>-<tab> : context help

<shift>-<tab> * 2 : more detail

<shift>-<tab> * 4 : documentation

<tab> : autocompletion

Other Notes

timeit from package timeit is pretty neat for getting metrics. Bit like using System.nanotime before and after but it does all the work.

When having problems with SSL certificates in python its worth trying this: /Applications/Python\ 3.6/Install\ Certificates.command

Bandit for security unit testing

Process for choosing the right 3rd party packages:

  1. Compile a list of candidate packages.

    For doing this there are a number of sites containing curated lists of python packages such as: awesome-python.com, python.libhunt.com, python-guide.org, pymotw.com, wiki.python.org

    other possible sources are, stackoverflow, google, reddit, hacker news (hn.algolia.com) and twitter

  2. Check out the popularity (downloads, results on google for package etc etc) the more popular usually is a good indication of quality.

  3. Check out the projects home page to get an understanding about how well maintained it is, if there are api documents, if there are stats about test coverage, coding guidelines for contributors etc etc.

  4. Check out the readme, what the project does, what license it is under, who is the author

  5. Check if the project is actively maintained, changelog, release notes, bug tracker, commit history

  6. Check the source code, verify if it is formatted using best practices, if there are unit tests, if it looks like it was developed by a seasoned developer and if it was developed by a developer coming from another language.

  7. Try out the library

choosealicense.com/licenses