Reframe Documentation¶

Reframe is a simple relational algebra implementation. It allows you to experiment with relational operators in Python. The relational operators provided in Reframe use an object oriented notation. That is relation.operator() In keeping with relational algebra all operators return a relation. This allows you to chain together operators in a pipeline: relation.operator().operator().operator()

class reframe.Relation(filepath=None, sep='|')¶

Create a Relation from a csv file of data for use with relational operators

This module is designed for educational purposes, specifically teaching and experimenting with relational algebra. To that end, you can create a relation from a file of data typically a csv file, although you can also specify a separator, for example a vertical bar may be better in some cases.

This module is built on top of the pandas system in in many cases is just a thin shell.

parameters:

Parameters:	filepath – a string specifying a path to a csv file, OR a Pandas DataFrame to convert to a Relation sep – specify a separator for the data file. default is `\|`

cartesian_product(other)¶

extend(newcol, series)¶

Create a new attribute by combining or modifying one or more existing attributes

Parameters:	newcol – Name of the new column to create series – An expression involving one or more other attributes
Returns:
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.extend('gnpdiff',country.gnp - country.gnpold).project(['name','gnpdiff']).head(10)
                   name  gnpdiff
0           Afghanistan      NaN
1           Netherlands    10884
2  Netherlands Antilles      NaN
3               Albania      705
4               Algeria     3016
5        American Samoa      NaN
6               Andorra      NaN
7                Angola    -1336
8              Anguilla      NaN
9   Antigua and Barbuda       28
>>>

groupby(cols)¶

Collapse a relation containing one row per unique value in the given group by attributes.

The groupby operator is always used in conjunction with an aggregate operator.

count
sum
mean
median
min
max

Parameters:	cols – A list of columns to group on
Returns:	A GroupWrap object for one of the aggregate operators to work on.
Example:

How many countries are in each continent?

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).count('name')
       continent  count_name
0         Africa          58
1     Antarctica           5
2           Asia          51
3         Europe          46
5        Oceania          28
6  South America          14

intersect(other)¶

Create a new relation that is the intersection of the two given relations

In order to compute the intersection the relations must be union compatible. That is they must have exactly the same columns. This may require some projecting and renaming.

Parameters:	other – The relation to compute the intersection with.
Returns:
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Africa"').project(['name', 'region']).intersect(country.query('region == "Western Africa"').project(['name', 'region']))
             name          region
         Benin  Western Africa
  Burkina Faso  Western Africa
        Gambia  Western Africa
         Ghana  Western Africa
        Guinea  Western Africa
 Guinea-Bissau  Western Africa
    Cape Verde  Western Africa
       Liberia  Western Africa
          Mali  Western Africa
    Mauritania  Western Africa
        Niger  Western Africa
      Nigeria  Western Africa
Côte d'Ivoire  Western Africa
 Saint Helena  Western Africa
      Senegal  Western Africa
 Sierra Leone  Western Africa
         Togo  Western Africa
>>>

minus(other)¶

return a relation containing the rows in self ‘but not’ in other

Parameters:	other –
Returns:
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Africa"').minus(country.query('region == "Western Africa"')).project(['name','region','continent'])
                                      name           region continent
                                Algeria  Northern Africa    Africa
                                 Angola   Central Africa    Africa
                              Botswana  Southern Africa    Africa
                               Burundi   Eastern Africa    Africa
                              Djibouti   Eastern Africa    Africa
                                 Egypt  Northern Africa    Africa
                               Eritrea   Eastern Africa    Africa
                          South Africa  Southern Africa    Africa
                              Ethiopia   Eastern Africa    Africa
                                 Gabon   Central Africa    Africa
                              Cameroon   Central Africa    Africa
                                 Kenya   Eastern Africa    Africa
              Central African Republic   Central Africa    Africa
                               Comoros   Eastern Africa    Africa
                                 Congo   Central Africa    Africa
 Congo, The Democratic Republic of the   Central Africa    Africa
                              Lesotho  Southern Africa    Africa
                              Liberia   Western Africa    Africa
               Libyan Arab Jamahiriya  Northern Africa    Africa
                       Western Sahara  Northern Africa    Africa
                           Madagascar   Eastern Africa    Africa
                               Malawi   Eastern Africa    Africa
                              Morocco  Northern Africa    Africa
                            Mauritius   Eastern Africa    Africa
                              Mayotte   Eastern Africa    Africa
                           Mozambique   Eastern Africa    Africa
                              Namibia  Southern Africa    Africa
                    Equatorial Guinea   Central Africa    Africa
                              Réunion   Eastern Africa    Africa
                               Rwanda   Eastern Africa    Africa
                         Saint Helena   Western Africa    Africa
                               Zambia   Eastern Africa    Africa
                Sao Tome and Principe   Central Africa    Africa
                           Seychelles   Eastern Africa    Africa
                              Somalia   Eastern Africa    Africa
                                Sudan  Northern Africa    Africa
                            Swaziland  Southern Africa    Africa
                             Tanzania   Eastern Africa    Africa
                                 Chad   Central Africa    Africa
                              Tunisia  Northern Africa    Africa
                               Uganda   Eastern Africa    Africa
                             Zimbabwe   Eastern Africa    Africa
       British Indian Ocean Territory   Eastern Africa    Africa

njoin(other)¶

Create a new relation that is the intersection of the two given relations

In order to compute the intersection the relations must be union compatible. That is they must have exactly the same columns. This may require some projecting and renaming.

Parameters:	other – The relation to compute the intersection with.
Returns:
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Africa"').project(['name', 'region']).njoin(country.query('region == "Western Africa"').project(['name', 'region']))
             name          region
         Benin  Western Africa
  Burkina Faso  Western Africa
        Gambia  Western Africa
         Ghana  Western Africa
        Guinea  Western Africa
 Guinea-Bissau  Western Africa
    Cape Verde  Western Africa
       Liberia  Western Africa
          Mali  Western Africa
    Mauritania  Western Africa
        Niger  Western Africa
      Nigeria  Western Africa
Côte d'Ivoire  Western Africa
 Saint Helena  Western Africa
      Senegal  Western Africa
 Sierra Leone  Western Africa
         Togo  Western Africa
>>>

project(cols)¶

returns a new Relation with only the specified columns

Parameters:	cols – a list of columns to project
Returns:	a Relation with duplicate rows dropped
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.project(['region','continent','name']).head(10)
                      region      continent                  name
0  Southern and Central Asia           Asia           Afghanistan
1             Western Europe         Europe           Netherlands
2                  Caribbean  North America  Netherlands Antilles
3            Southern Europe         Europe               Albania
4            Northern Africa         Africa               Algeria
5                  Polynesia        Oceania        American Samoa
6            Southern Europe         Europe               Andorra
7             Central Africa         Africa                Angola
8                  Caribbean  North America              Anguilla
9                  Caribbean  North America   Antigua and Barbuda
>>>

Note

Relations have no duplicate rows, so projecting a single column creates a relation with all of the distinct values for that column

query(q)¶

return a new relation with tuples matching the query condition

Parameters:	q – a query string
Returns:	a Relation
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Antarctica"').project(['code','name'])
    code                                          name
232  ATA                                    Antarctica
233  BVT                                 Bouvet Island
235  SGS  South Georgia and the South Sandwich Islands
236  HMD             Heard Island and McDonald Islands
237  ATF                   French Southern territories
>>>

rename(old, new)¶

Rename old attribute to new

Parameters:	old – string, name of old attribute new – string, name to change old to
Returns:	Relation
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.project(['name']).rename('name','countryname').head()
            countryname
0           Afghanistan
1           Netherlands
2  Netherlands Antilles
3               Albania
4               Algeria
>>>

sort(*args, **kwargs)¶

sort the relation on the given columns

Parameters:	cols – A list of columns to sort on ascending – Boolean, ascending=False implies a sort in reverse order
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.sort(['indepyear'], ascending=False).query('indepyear < 1200').project(['name','indepyear'])
               name  indepyear
159        Portugal       1143
29   United Kingdom       1066
180      San Marino        885
164          France        843
170          Sweden        836
200         Denmark        800
81            Japan       -660
48         Ethiopia      -1000
93            China      -1523

union(other)¶

Take two Relations with the same columns and put them together top to bottom

Parameters:	other –
Returns:
Example:

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('region == "Western Africa"').union(country.query('region == "Polynesia"')).project(['name','region'])
                  name          region
             Benin  Western Africa
      Burkina Faso  Western Africa
            Gambia  Western Africa
             Ghana  Western Africa
            Guinea  Western Africa
     Guinea-Bissau  Western Africa
        Cape Verde  Western Africa
          Liberia  Western Africa
             Mali  Western Africa
       Mauritania  Western Africa
            Niger  Western Africa
          Nigeria  Western Africa
    Côte d'Ivoire  Western Africa
     Saint Helena  Western Africa
          Senegal  Western Africa
     Sierra Leone  Western Africa
             Togo  Western Africa
     American Samoa       Polynesia
      Cook Islands       Polynesia
             Niue       Polynesia
         Pitcairn       Polynesia
 French Polynesia       Polynesia
            Samoa       Polynesia
          Tokelau       Polynesia
            Tonga       Polynesia
           Tuvalu       Polynesia
Wallis and Futuna       Polynesia
>>>

class reframe.GroupWrap(gbo, cols)¶

Wrapper for a DataFrameGroupBy object – invisible to end user

count(col)¶