Usage

All of the functions below wil return JSON objects retrieved from the API. Every function will also include a run parameter that allows users to switch between the retrieved JSON object or just the API call URL as output.


Installation and Setup

The package can be installed from PyPi:

$ pip install histlabapi

And can be imported in the following way:

from histlabapi import histlabapi as hl

Overview Functions

This package provides several functions that allow users to get a brief overview of the entities and collections that can be found in the History Lab’s database.

List Collections

Lists all the collections and the number of documents that can be found in each

list_collections()

Here is an example:

>>> hl.list_collections()
[{'corpus': 'frus', 'doc_cnt': 209046},
     {'corpus': 'cia', 'doc_cnt': 935716},
     {'corpus': 'clinton', 'doc_cnt': 54149},
     {'corpus': 'pdb', 'doc_cnt': 5011},
     {'corpus': 'cfpf', 'doc_cnt': 3214293},
     {'corpus': 'kissinger', 'doc_cnt': 4552},
     {'corpus': 'nato', 'doc_cnt': 46002}]

Entity Overview

Lists all the entities of a certain type that appear in the History Lab’s collection of texts, as well as the number of times they appear

hlab_overview(
	collection = None, 
	sort = None, 
	entity_type = None, 
	start_date = None, 
	end_date = None, 
	limit = 25, 
	run = False
)
  • collection: A string that can be used to filter results to a specific document collection. A list of available collections can be found here.

  • sort: None, ‘asc’ or ‘desc’. Specifies how you want your results sorted. If a None object is given, results will not be sorted.

  • entity_type: ‘person’, ‘topic’ or ‘country’. Specifies the type of entities you want the function to display.

  • start_date: A string (in the YYYY-MM-DD format) specifying the start of the date range of the documents from which the specified entity types will be displayed. Need to provide an end date as well if using this.

  • end_date: A string (in the YYYY-MM-DD format) specifying the end of the date range of the documents from which the specified entity types will be displayed. Need to provide a start date as well if using this.

  • limit: An integer specifying the maximum number of entities this function will display. The default limit is 25.

  • run: If False, function will only return the API URL. If True, function will return the JSON object that is generated by the full API query.

In the following example, I generate a list of three persons that appear in the FRUS collection:

>>> hl.hlab_overview(collection = 'frus', sort = None, entity_type = 'person', limit = 3, run = False)
'http://api.foiarchive.org/entities?entity_type=eq.person&corpus=eq.frus&limit=3'

>>> hl.hlab_overview(collection = 'frus', sort = None, entity_type = 'person', limit = 3, run = True)
[{'entity_type': 'person',
  'entity_id': '100001',
  'entity_name': 'Aaron, David Laurence',
  'corpus': 'frus',
  'ref_cnt': 155},
 {'entity_type': 'person',
  'entity_id': '100004',
  'entity_name': 'Abbas, Ferhat',
  'corpus': 'frus',
  'ref_cnt': 5},
 {'entity_type': 'person',
  'entity_id': '100007',
  'entity_name': 'Abbas Hilmi Pasha',
  'corpus': 'frus',
  'ref_cnt': 1}]

Search Functions

The bulk of this wrapper package focuses on a series of search functions that allow users to search for and retrieve documents in various ways

Search by text

Search for documents that contain specific search term(s). Users can also string various search terms together with AND/OR connectors!

hlab_search(
	text, 
	fields = None, 
	join_or = False, 
	start_date = None, 
	end_date = None, 
	collection = None, 
	limit = 25, 
	run = False
)
  • text: The search term that the user wants to look up. Can be input as a list if the user wishes to have multiple search terms.

  • fields: Series of fields that the user wants to display for each document that the search function finds. Can be input as a list if the user wishes to have multiple fields displayed. If no field is provided, the function defaults to displaying the doc_id, authored, and title fields. Guide of available fields can be found here.

  • join_or: If True, the search terms will be joined by an OR connector. If False, the search terms will be joined by an AND connector. To illustrate, if the user inputs [‘A’, ‘B’] in the text parameter and True in the join_or parameter, this function will return all the documents that contain either ‘A’ or ‘B’.

  • start_date: A string (in the YYYY-MM-DD format) specifying the start of the date range of the documents from which the specified entity types will be displayed. Need to provide an end date as well if using this.

  • end_date: A string (in the YYYY-MM-DD format) specifying the end of the date range of the documents from which the specified entity types will be displayed. Need to provide a start date as well if using this.

  • collection: A string that can be used to filter results to a specific document collection. A list of available collections can be found here.

  • limit: An integer specifying the maximum number of entities this function will display. The default limit is 25.

  • run: If False, function will only return the API URL. If True, function will return the JSON object that is generated by the full API query.

In the following example, I generate a list of documents from all the collections where the search terms ‘league of nations’ or ‘trade’ appear in the body of text:

>>> hl.hlab_search('league of nations', run = False)
'http://api.foiarchive.org/documents?and=(full_text.phfts.league%20of%20nations)&select=doc_id,authored,title&limit=25'

>>> hl.hlab_search(['league of nations', 'trade'], fields = ['doc_id', 'title', 'countries'], join_or = True, limit = 3, run = True)
[{'doc_id': '0000BA89',
  'title': 'TELECON WITH DAVID BINDER AT 3:21 P.M.',
  'countries': ['Argentina', 'Denmark', 'Soviet Union']},
 {'doc_id': '0000BB01',
  'title': 'TELECON WITH SONNENFELDT AT 8:10 P.M.',
  'countries': ['Soviet Union']},
 {'doc_id': '0000BB02',
  'title': 'TELECON WITH WILLIAM ROGERS AT 8:21 P.M.',
  'countries': ['Ecuador']}]

Search by entity

Search for documents that contain the entities that users are looking for. Functionality for looking up multiple entities at the same is already built-in. However, users can only string search terms together with AND/OR connectors within the same entity types.

hlab_entity(
	country = None, 
	topic = None, 
	person = None, 
	country_or = False, 
	topic_or = False, 
	person_or = False, 
	fields = None, 
        collection = None, 
	date = None, 
	start_date = None, 
	end_date = None, 
	summary = False, 
	limit = 25, 
	run = False
)
  • country: Specifies the countries that the user wants to use to search documents with. Can be input as a list if the user wishes to search for multiple countries.

  • topic: Specifies the topics that the user wants to use to search documents with. Can be input as a list if the user wishes to search for multiple topics.

  • person: Specifies the persons that the user wants to use to search documents with. Can be input as a list if the user wishes to search for multiple persons. Since search terms can return multiple results (eg. there are more than one “Kennedy”s), users will need to input the specific person ID in this parameter. These IDs can be retrieved using the find_entity_id function.

  • country_or: If True, joins the countries users are using as their search entities with an OR connector. If False, joins them with an AND connector. To illustrate, if the user provides [‘A’, ‘B’] in the parameter country and True in the country_or parameter, the function will return documents that contain either the country ‘A’ or ‘B’.

  • topic_or: Same like above but for topics.

  • person_or: Same like above but for persons.

  • fields: Series of fields that the user wants to display for each document that the search function finds. Can be input as a list if the user wishes to have multiple fields displayed. If no field is provided, the function defaults to displaying the doc_id, authored, and title fields. Guide of available fields can be found here.

  • collection: A string that can be used to filter results to a specific document collection. A list of available collections can be found here.

  • start_date: A string (in the YYYY-MM-DD format) specifying the start of the date range of the documents from which the specified entity types will be displayed. Need to provide an end date as well if using this.

  • end_date: A string (in the YYYY-MM-DD format) specifying the end of the date range of the documents from which the specified entity types will be displayed. Need to provide a start date as well if using this.

  • summary: If True, result is a summary that count the number of documents that contain the specific entity searched for. When using summary, can only search for one entity at a time. Date filters also do not work.

  • limit: An integer specifying the maximum number of entities this function will display. The default limit is 25.

  • run: If False, function will only return the API URL. If True, function will return the JSON object that is generated by the full API query.

In the following example, I generate a list of documents that mention the country Belize in the FRUS collection:

>>> hl.hlab_entity(collection = 'frus', country = 'Belize', run = False, limit = 3)
'http://api.foiarchive.org/documents?countries=cs.{Belize}&select=doc_id,authored,title&corpus=eq.frus&limit=3'

>>> hl.hlab_entity(collection = 'frus', country = 'Belize', run = True, limit = 5)
[{'doc_id': 'frus1865p4d471',
  'authored': '1865-04-28T00:00:00+00:00',
  'title': 'British Honduras Company'},
 {'doc_id': 'frus1914Suppd474',
  'authored': '1914-09-03T19:00:00+00:00',
  'title': '\nThe Consul General at London (\nSkinner\n) to the Secretary of\n                              State\n\n'},
 {'doc_id': 'frus1914Suppd849',
  'authored': '1914-12-18T00:00:00+00:00',
  'title': '\nThe Ambassador in Great Britain (\nPage\n) to the Secretary of\n                                    State\n\n'}]

The following example re-runs the above line of code but with the summary parameter set to True:

>>> hl.hlab_entity(collection = 'frus', country = 'Belize', summary = True, run = True)
[{'entity_type': 'country',
  'entity_id': '084',
  'entity_name': 'Belize',
  'corpus': 'frus',
  'ref_cnt': 247}]

To illustrate searching for a specific person, the following function looks up a series of documents where Richard Nixon is mentioned. Notice that his unique ID instead of the string ‘Richard Nixon’ is entered here:

>>> hl.hlab_entity(collection = 'frus', person = '109882', summary = False, run = True, limit = 3)
[{'doc_id': 'frus1952-54v02p1d140',
  'authored': None,
  'title': 'Notes by the Assistant Staff\n                                Secretary to the President (Minnich) on the\n                            Legislative Leadership Meeting, December 13, 1954'},
 {'doc_id': 'frus1952-54v02p1d141',
  'authored': None,
  'title': 'Notes by the Assistant Staff\n                                Secretary to the President (Minnich) on the\n                            Legislative Leadership Meeting, December 14, 1954'},
 {'doc_id': 'frus1952-54v02p2d108',
  'authored': None,
  'title': 'Memorandum of Discussion at the 162d Meeting of the\n                            National Security Council, Thursday, September 17, 1953'}]

Find Entity ID

As mentioned above, since there may be multiple persons with similar or the same names, this package also includes a function for looking up specific entity IDs.

find_entity_id(
	entity_type, 
	value = None
)
  • entity_type: ‘country’, ‘topic’, or ‘person’. Specifies the type of entities you want the function to display.

  • value: The actual entity the user wants to search for

In the following example, the function returns the unique IDs of every person that has the name ‘Nixon’:

>>> find_entity_id('person', 'Nixon')
[{'full_name': 'Nixon, Patricia', 'person_id': '109881'},
 {'full_name': 'Nixon, Richard M.', 'person_id': '109882'},
 {'full_name': 'Nixon, Robert', 'person_id': '109883'},
 {'full_name': 'Nixon, Patricia', 'person_id': 'hrc5592'},
 {'full_name': 'Nixon, Richard; Nixon, Richard M.; Nixon, Richard Milhouse',
  'person_id': 'hrc5593'},
 {'full_name': 'Cox, Tricia Nixon', 'person_id': 'hrc10032'},
 {'full_name': 'Cox, Tricia Nixon', 'person_id': 'kiss10032'},
 {'full_name': 'Nixon, Richard; Nixon, Richard M.; Nixon, Richard Milhouse',
  'person_id': 'kiss109882'}]

Note that there are multiple instances of ‘Richard Nixon’ here, this is because a different unique ID is used for each instance of Richard Nixon in different collections. For example, ‘109881’ refers to all the instances of Richard Nixon in the FRUS collection, while ‘hrc5593’ refers to all the instances of Richrd Nixon in the Clinton emails collection.

Search by date

Search for documents that were authored at a specific date or date range.

hlab_date(
	date = None, 
	start_date = None, 
	end_date = None, 
	fields = None, 
	collection = None, 
	limit = 25, 
	run = False
)
  • date: A string (in the YYYY-MM-DD format) specifying the exact date of the documents this function will search for. Mutually exclusive with the start_date and end_date parameters.

  • start_date: A string (in the YYYY-MM-DD format) specifying the start of the date range of the documents from which the specified entity types will be displayed. Need to provide an end date as well if using this.

  • end_date: A string (in the YYYY-MM-DD format) specifying the end of the date range of the documents from which the specified entity types will be displayed. Need to provide a start date as well if using this.

  • fields: Series of fields that the user wants to display for each document that the search function finds. Can be input as a list if the user wishes to have multiple fields displayed. If no field is provided, the function defaults to displaying the doc_id, authored, and title fields. Guide of available fields can be found here.

  • collection: A string that can be used to filter results to a specific document collection. A list of available collections can be found here.

  • limit: An integer specifying the maximum number of entities this function will display. The default limit is 25.

  • run: If False, function will only return the API URL. If True, function will return the JSON object that is generated by the full API query.

The following example retrieves documents that were authored on the date 1955-01-20:

>>> hl.hlab_date(date = '1955-01-20', collection = 'frus', limit = 3, run = False)
'http://api.foiarchive.org/documents?&authored=eq.1955-01-20&select=doc_id,authored,title&corpus=eq.frus&limit=3'
    
>>> hl.hlab_date(date = '1955-01-20', collection = 'frus', limit = 5, run = True)
[{'doc_id': 'frus1955-57v01d24',
  'authored': '1955-01-20T00:00:00+00:00',
  'title': '24. Memorandum From the Special Representative in Vietnam (Collins) to the Secretary of State'},
 {'doc_id': 'frus1955-57v02d24',
  'authored': '1955-01-20T00:00:00+00:00',
  'title': '24. Draft Message From the President to the\n                                Congress'},
 {'doc_id': 'frus1955-57v01d23',
  'authored': '1955-01-20T00:00:00+00:00',
  'title': '23. Letter From the Counselor of the Department of State (MacArthur) to the Chargé in France (Achilles)'}] 

Search by document ID

Searches for documents by their specific IDs

hlab_id(
	ids, 
	fields = None, 
	run = False
)
  • ids: The specific IDs of the documents users want to search. Can be input as a list if the user wishes to search for multipled IDs

  • fields: Series of fields that the user wants to display for each document that the search function finds. Can be input as a list if the user wishes to have multiple fields displayed. If no field is provided, the function defaults to displaying the doc_id, authored, and title fields. Guide of available fields can be found here.

  • run: If False, function will only return the API URL. If True, function will return the JSON object that is generated by the full API query.

The following example looks up the document with the ID ‘frus1969-76ve05p1d11’ and modifies the fields parameter such that only the document’s ID, title, as well as the persons that appear in the document are displayed:

>>> hl.hlab_search_id(ids = 'frus1969-76ve05p1d11', fields = ['doc_id','title',"persons"], run = True)
[{'doc_id': 'frus1969-76ve05p1d11',
  'title': '11. Letter From Secretary of State Rogers to President Nixon, Washington, March 26,\n                                1970\n\n',
  'persons': ['Nixon, Richard M.', 'Rogers, William Pierce']}]