Elasticsearch Notes with Python

Connect to elasticsearch server.

	
>>> from elasticsearch import Elasticsearch		
>>> conn = Elasticsearch(hosts='127.0.0.1:8001') # the localhost is 127.0.0.1 and the port is 8001

Construct the query for sorting documents based on created_at in ascending order.

>>> the_query = { 'query': { 'match_all': {  } },
                  'sort' : { 'created_at': { 'order': 'asc' } } }

Construct the query in order to look for documents containing 'elasticsearch' in 'tweet' field.

>>> the_query = { 'query': { 'match': { 'tweet': 'elasticsearch' } } }

Construct the query in order to obtain all _id from all documents.

>>> the_query = { 'query'  : { 'match_all': {  },
							   
                   },
                   'fields' : []
                }

Construct the query in order to obtain all documents created from 1 January 1970.

>>> the_query = { 'query'  :   { 'filtered': { 'query' : { 'match_all': {  }  }, 
                                             'filter': { 'range':  { 'created_at': 
                                                                       { 
                                                                          'gte' : datetime.datetime(1970,1,1) 
                                                                        } 
                                                                    } 
                                                        }  
                                            }
                                } 
                }

Construct the query in order to obtain documents containing politician in user_category field and lobbyist in retweet_user_category and all documents are sorted ascending by created_at date.

>>> the_query = { 'query': { 
                         'bool': { 
                            'must': 
                                  [ { 'match': { 'user_category': 'journalist' } }, 
                                    { 'match': { 'retweet_user_category': 'other' } } 
                                   ]
                                   } 
                            }, 
                  'sort': { 'created_at': { 'order': 'asc' } } }

Convert a date (11 August 2014 at 11:02:58) in unicode (u'2014-08-11T11:02:58') into a datetime.

>>> import unicodedata
>>> import datetime
>>> the_date_str = unicodedata.normalize( 'NFKD', the_date_unicode ).encode( 'ascii', 'ignore' )
>>> the_date_time = datetime.datetime.strptime( the_date_str, '%Y-%m-%dT%H:%M:%S')

Utilize 'scan-and-scroll' to process a huge number of documents. One batch (50 documents) is set for 10 minutes.

>>> scanResp = conn.search( index ='tweets', doc_type='tweet-type', 
                            body  = the_query, search_type='scan', scroll='2m', size=1000  )
>>> scrollId = scanResp['_scroll_id']
>>> doc_num = 1
>>> response = conn.scroll( scroll_id = scrollId, scroll='2m' )
>>> while ( len( response['hits']['hits'] ) > 0 ):
>>>     for item in response['hits']['hits']:
>>>         # process the document
>>>         # as you wish
>>>         doc_num += 1
>>>     # end of for item
>>>     scroll_id = response['scroll_id']
>>>     response = conn.scroll( scroll_id=scrollId, scroll='2m' )

Example of Creating a Relationship between Two Tables in MySQL: Using a FOREIGN key

use test;

drop table if exists accounts, customers;

CREATE TABLE customers(
    customer_id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(20) NOT NULL,
    address VARCHAR(20) NOT NULL,
    city VARCHAR(20) NOT NULL,
    state VARCHAR(20) NOT NULL,
    PRIMARY KEY( customer_id )
) ENGINE=INNODB;

create table accounts(  
	account_id INT NOT NULL AUTO_INCREMENT,
	customer_id INT(4) NOT NULL,
	account_type ENUM( 'savings', 'credit' ) NOT NULL,
	balance FLOAT( 9 ) NOT NULL,
	PRIMARY KEY ( account_id ),
	FOREIGN KEY ( customer_id ) REFERENCES customers( customer_id )

) ENGINE=INNODB;

INSERT INTO customers( customer_id, name, address, city, state ) VALUES ( 1, 'Hendra', 'Carolina Mc', 'Amsterdam', '1098XK' );
INSERT INTO accounts( account_id, customer_id, account_type, balance ) VALUES ( 1, 1, 'savings', 10.5 );

Anki file for studying Vocabulary 4000 by Jeff Kolby

All the 4000 words essential for an educated vocabulary ( anki)

Some videos about Bayesian Network from Prof. Daphne Koller

Introduction (42 MB) (rar)
Bayesian Network Fundamentals (114 MB) (rar)

Small Datasets for Doing Information Retrieval (IR) Experiments

Each dataset has its own query list and relevance judgment (rar)

adi dataset (language is english, less than 1 MB)
eng dataset (language is english, about 8.5 MB)
ina dataset (language is indonesia, about 1 MB)
med dataset (language is english, about 1 MB)
npl dataset (language is english, about 3 MB)

Table of Contents

Elasticsearch Notes with Python

Example of Creating a Relationship between Two Tables in MySQL: Using a FOREIGN key

Anki file for studying Vocabulary 4000 by Jeff Kolby

Some videos about Bayesian Network from Prof. Daphne Koller

Small Datasets for Doing Information Retrieval (IR) Experiments

Each dataset has its own query list and relevance judgment (rar)