news aggregator
DESCRIPTION
News Aggregator. - PowerPoint PPT PresentationTRANSCRIPT
News AggregatorNews AggregatorA news aggregator refers to a system including
software application, webpage or service that collects syndicated content using RSS and other XML feeds from weblogs and mainstream media sites. Aggregators improve upon the time and effort needed to regularly check websites of interest for updates, creating a unique information space or "personal newspaper." An aggregator is able to subscribe to a feed, check for new content at user-determined intervals, and retrieve the content. The content is sometimes described as being "pulled" to the subscriber, as opposed to "pushed" with email or other channels. Unlike recipients of some "pushed" information, the aggregator user can easily unsubscribe from a feed.
Software which allows syndicated news content (such as RSS feeds) to be brought together and displayed.
Introduction of PythonIntroduction of Python
For Engr 101 5-Week News Aggregator Module Fall 2010Instructor: Tao Wang
What is a computer?What is a computer?
Computer OrganizationComputer Organization
Software / ProgramsSoftware / ProgramsComputer Programs instruct the
CPU which operations it should perform/execute
Programmers write the code for the tasks a computer program performs
But computers only understand binary (1’s and 0’s)
Programmers need a language to write computer code
Types of Programming Types of Programming LanguagesLanguagesLow-Level LanguagesMachine Languages
◦ CPU instructions are binary strings of 1’s and 0’s [10010000]
◦ Each kind of CPU has different instruction sets◦ Programs are not portable across CPU’s
architectures◦ Difficult for programmers to read/write
Assembly Languages◦ Use English-like abbreviations to represent
CPU instructions◦ A bit easier to understand [MOV AL, 42h]◦ Converted to machine language using an
assembler◦ Still not portable
Types of Programming Types of Programming LanguagesLanguagesHigh-Level LanguagesC/C++, Java/C#, Python, Ruby, many more...These languages abstract hardware
implementation detailsProvides programmers a logical computer
model◦ Allows programmer to focus on solving problems
instead of low-level hardware detailsUse English-like keywords and statements to
write codeUse a compiler or interpreter that translates
code to machine language◦ Makes code portable across different CPU’s and
platforms◦ Programmer does not have to learn each CPU’s
instructions
Compiled vs. InterpretedCompiled vs. InterpretedBoth Compilers and Interpreters
translate source code to machine language
Compiled◦Must compile program on each target
CPU architecture prior to execution.
Interpreted◦Code can be directly run on any platform
that has an available interpreter
About PythonAbout PythonDesigned by Guido van Rossum in late 1980’s Interpreted programming language Imperative, Dynamic, and Object-OrientedPython Programs
◦ are a sequence of instructions written in Python language
◦ interpreter executes the instructions sequentially, in order
◦ programs can take inputs and send data to outputs◦ programs can process and manipulate data◦ programs may read and write data to RAM, Hard
Drive, ...
First Program◦ print “Welcome to learning Python.“
PYTHON: LETS BEGINPYTHON: LETS BEGINThe Python Shell (2.6)You type commands It does stuff
It converts python to machines instructions and runs them right now
Python as CalculatorPython as Calculator
>> 2 + 2 # add4>> 2 * 3 # muliply6>> 3**2 # powers9>> 12 % 11 #modulo (remainder)1
DivisionDivisionInteger division
◦1/4 #Integer division, no decimals◦0
Float division◦1.0/4.0 #float number division◦0.25
LITERALS, VARIABLES, LITERALS, VARIABLES, DATA TYPES,DATA TYPES,STATEMENTS AND STATEMENTS AND EXPRESSIONSEXPRESSIONS
Literals, Data TypesLiterals, Data TypesNumbers Integers are natural numbers: ..., -2, -1, 0, 1,
2, ... (32 bits)Floats contain decimals: 4.5, 67.444443335,
7E2...Booleans: True, FalseLong int’s that exceed 32 bit capacityComplex numbers: 4.5, 1j * 1j = -1 + 0jStringsStrings are used to represent words, text, and
charactersexamples (can use single, double or triple
quotes):◦ “I am learning python.“◦ 'hello.'
VariablesVariablesLiterals are data values that our
program useData is stored in memory, and we
need a way to access itWe use variables to name our data
in memoryWe can retrieve data by calling the
name that refers to it◦student_name = “Ben”◦print student_name
= is the assignment operator, assigns a data value to a variable
Variables Syntax RulesVariables Syntax RulesVariable names must begin with a letter
(uppercase or lowercase) or underscore (_)◦ * good programming convention: variables
should not start with uppercase letters, commonly used for something else
remainder of name can contain letters, (_), and numbers
names may not contain spaces or special characters
names are case sensitive and must not be a reserved python keyword◦ myVariable and myvariable refer to
different data
Statements and Statements and ExpressionsExpressionsStatement perform a task; do not
return a value◦x = 2◦y = 3◦print y
Expression return a value◦>> x+ y◦5
Expressions (evaluate to Expressions (evaluate to values)values)Math expressions
◦>> 10 * 2 + 3◦>> 10 * (2.0 + 3)
Boolean expressions◦>> 10 < 2 # False◦>> 10 >=10 # True
Combined with logic operators◦>> (10<2) or (10==10)
Can combine◦>> (a*c+d) > (d*a-c)
Expressions (evaluate to Expressions (evaluate to values)values)String expressions
◦>> “hel” + “lo” # ‘hello’◦>> “Hi” * 3 # ‘HiHiHi’
Operator Precedence◦Parentheses◦Exponentiation◦Multiplication and Division◦Addition and Subtraction
Operator Precedence (top-to-Operator Precedence (top-to-bottom)bottom)
Data TypesData TypesFinding out a data type
Data TypesData TypesWhat if data types don’t match?
STRONG TYPES no automatic conversion (for non number types)
Data TypesData TypesExplicit conversion
PYTHON KEYWORDS, PYTHON KEYWORDS, USER INPUTUSER INPUT
Python KeywordsPython KeywordsRESERVED: do not use as
variable names
User InputUser InputCreate interactive programs by
requesting user input
CONTROL CONTROL STRUCTURESSTRUCTURES
Branching / Conditional Branching / Conditional StatementsStatementsDecision making
if - statementif - statement
if a < 0:print “a is negative”
if - elseif - else
if - elif - elseif - elif - elseIf one test fails, perform next test
Nested if statementsNested if statements
MODULESMODULES
ModulesModulesPython Strength: large collection of
open source modules•Modules are collections
(packages) of useful (tested) code you can reuse
Common modules: random, mathThe modules we use for the
project:◦urllib, xml
ModulesModulesPython Standard Library
(packages included with most python distributions)◦http://docs.python.org/library/
index.htmlPyPI (Python Package Index)
◦http://pypi.python.org/pypi◦repository of optional modules
available (11,000+)
Using ModulesUsing ModulesMath module contains many
useful functions and values: math.sin, math.cos, math.pow, math.sqrt, ...
Using modules:
Getting helpGetting helpIn python interpreter you can get
documentation
CONTROL STRUCTURES:CONTROL STRUCTURES:REPETITION, ITERATIONREPETITION, ITERATION
RepetitionRepetitionselection statements (if-elif-else)
let us make simple decisions
repeating statements and decisions let us build more complex programs
whilewhile
Testing PrimenessTesting Primeness
break statementbreak statementbreak immediately ends loopsDO NOT overuse; Can be difficult
to read/understand logic
Testing PrimenessTesting Primeness
range(...)range(...)Built in function; produces a
sequence (list)range(0, 3) [0, 1, 2]range(3) [0, 1, 2]range(1, 3) [1, 2]range(1,7,2) [1, 3, 5]
forforThe for loop is used for iteration
continue statementcontinue statement
break and continue work in both while and for loops
Find all primesFind all primes
while - elsewhile - else
Nesting Control StructuresNesting Control Structures
Counter-Controlled LoopsCounter-Controlled Loops
Sentinel-Controlled LoopsSentinel-Controlled Loops
AccumulatingAccumulating
SwappingSwappingx = 2y = 3Swap (WRONG)
◦ x = y ◦ y = x
x = 3y = 3
Swap (CORRECT)◦ z = x◦ x = y◦ y = z
x = 3y = 2
Multiple AssignmentsMultiple AssignmentsaInt, bInt, cInt = 1, 2, 3Swapping with multiple
assignment◦aInt, bInt = bInt, aInt◦Why does this work? (entire right
side is evaluated before assignments)
Everything is an objectEverything is an object
DEBUGGINGDEBUGGING
DebuggingDebuggingSyntax Errors: Python gives us an alert;
code crashesRuntime Errors: How do we fix
incorrect results (logic) in our programs?◦We need to trace the codes execution flow.◦Tracing: Keep track of variable values
as each line is executed◦Print Statements: strategically add
print to view results at each step; don't over do or it will be difficult to keep track Can help us detect Infinite Loops
MORE DATA TYPES...MORE DATA TYPES...(LISTS)(LISTS)
Collection TypesCollection TypesList s:
◦ Sequential and mutable
>> k = [1,3, 5]>> m = [“hel”, 3]
Tuples:◦ Sequential and
immutable
>> (1,2,3)
Dictionaries:◦ map collection>> d={‘name’:
‘Alice’, ‘grade’: ‘100’}
>> print d[‘name’]>> ‘Alice’
Sets:◦ Has unique element>> aSet =
set([‘a,b’])
Lists (also called arrays)Lists (also called arrays)Lists are sequences of
objectsMutable (unlike strings,
and tuples)List are defined with
square brackets [ ], elements are comma , separated
List elements can have different types
List indexing starts at 0 If index is negative,
begin at end of list If index past last
element ERROR
List accessList accessIndexing and Slicing just like
strings (same rules apply)
Working with listsWorking with listsCan convert other collections to
lists
List can contain mixed types, including other collections
Indexing lists of listsIndexing lists of listsLists can be nested
List operatorsList operators+ concatenates two lists (list1 + list2)* repeats the list a number of times (list1 *
Integer) in tests membership
List comparisonsList comparisons>, <, ==, <=, >=, !=Similar rules to strings, compares
elements in orderordered elements being
compared should have same type
Collection functionsCollection functionslen(C) returns the length of the
collection Cmin(C) returns the minimum
element in C; only considers first element of list of lists
max(C) returns the maximum element in C; only considers first element of list of lists
sum(L) returns the sum of elements in list L; elements must all be numbers
Lists can changeLists can changeLists can be modified, MUTABLE
Strings cannot be changed, IMMUTABLE
List methodsList methodsThere are Non-modifying methods (don't
change the list)◦ index(x)◦ count(x)
and Modifying methods (Will change the list without need for assignment)◦ append(x)◦ extend(C)◦ insert(i, x)◦ remove(x)◦ pop()◦ sort()◦ reverse()
Appending and ExtendingAppending and Extendingappend(...) adds a
single element to a list
extend(...) joins two lists◦ can also use '+'
List methodsList methodssort()count(x)pop()del
keyword also removes an element
split() and join()split() and join()Going from
strings to lists and back again
These are string methods
join(C) takes as an argument any iterable collection type (such as lists, strings, tuples)
List ComprehensionList Comprehension
STRINGSSTRINGS
Quote UseQuote UseSingle Quotes
◦‘ These strings must fit on a single line of source ’
Double Quotes◦“ Also has to fit on a single line of source
”Triple (single or double) Quotes
◦""" These quotes are very useful when you need to span multiple lines. They are also often used for long code comments """
Quotes inside stringsQuotes inside stringsTo use apostrophes
◦" Let's use double quotes “To use double quotes in our strings
◦' They say, "use single quotes" ‘Triple Quotes can take care of both
cases◦""" With 3 quotes it's "easy" to use
apostrophes & quotes. """◦''' With 3 quotes it's "easy" to use
apostrophes & quotes. '''
Slash \Slash \We can use the \ to span multiple
linesWorks with strings or expressionsNo character can follow the \
Character escapingCharacter escapingSince some characters have
special meanings, we have to escape them to use them in our strings◦"We can \"escape\" characters like
this“◦'Or let\'s escape them like this‘◦'and this \\ is how we get a backslash
in our string'
WhitespaceWhitespaceThis is an empty string,
not a characterThis is a spaceThis is a tab (a single
character)This is a new line (in
Unix/Mac OS X)This is a new line (in
Windows)This is a new line (in old
Mac <= 9)
Strings are sequencesStrings are sequences
Simple string usageSimple string usageCan access with
indexing like listsStrings do not
have append(...) and extend(...) functions
Adding (+) and Repeating Adding (+) and Repeating (*)(*)We can add (concatenate) strings
with +
We can also repeat them with *
Compare stringsCompare stringsTest equality using ==
What about <, >, <=, >=
StringsStringsStrings are sequences like listsEach element in a string is a
characterCharacters we can print: letters
('a', 'b', 'c', ...) numbers ('1', '3', '4', ...) and symbols ('@', '$', '&', ...)
Non printing characters◦Whitespace: '\t', '\n', '\r\n'◦try printing this '\a'
Characters are really Characters are really numbersnumbersASCII table
Character numerical Character numerical valuesvalues
Print the ABCsPrint the ABCsUsing numbers...
String ComparisonsString ComparisonsCharacters are compared by their
numerical valueshorter strings are smallerIf first characters are equal,
compare the next one
String ComparisonsString ComparisonsThese are
characters, not numbers
Capital letters are smaller (refer to ascii table)
Testing membershipTesting membership
import stringimport string
String is an ObjectString is an Object
Objects containData
◦x = 'Hello' # data is sequence of characters
Actions (methods)◦things object can do (often on self)
Upper/Lower Case Upper/Lower Case methodsmethods
These methods are available to all string objects
Strings are IMMUTABLE◦this means that
characters in the sequence cannot change
◦methods return a new string
◦original data is unchanged
What kind of characterWhat kind of characterThese methods are
available to all string objects
Tests that return boolean types:◦ isalnum() - does the
string contain only letters and digits
◦ isalpha() - does the string contain only letters
◦ isdigit() - does the string contain only digits
Formatting strings with Formatting strings with strip(...)strip(...)
Formatting stringsFormatting strings
String FormattingString Formatting
Formatting FloatsFormatting Floats
Creating Forms/TemplatesCreating Forms/Templates
Using replaceUsing replace
OutputOutput
find(...); rfind(...)find(...); rfind(...)
Go through all matches and Go through all matches and capitalizecapitalize
DICTIONARIESDICTIONARIES
DictionariesDictionariesAnother collection type, but NOT a
sequence (order doesn't matter)Also referred to as an associative
array, or mapDictionaries are a collection of keys
that point to values
Key --> ValueKey --> Value
About DictionariesAbout DictionariesDefine dictionaries using curly braces {}key-value pairs are separated using
colons :Dictionaries are MUTABLE (can add or
remove entries)Keys:
◦ Can only be IMMUTABLE objects (int, float, string, tuples)
Values:◦ Can be anything
Idea: easier to find data based on a simple key, like the English Language Webster Dictionary
Indexing and AssignmentIndexing and AssignmentIndex using square brackets and
keys; returns associated value◦Numbered indices are not defined
Can modify the dictionary by assigning new key-value pairs; or changing value a key points to
Dictionaries with Different Dictionaries with Different Key TypesKey TypesCannot index or search based on
values, only through keysNumbers, Strings, Tuples can be
keys (anything IMMUTABLE)
OperatorsOperators[ ]: for indexing using key inside
square bracketslen(): "length" is the number of
key-value pairs in the dictionaryin: boolean test of membership
(is key in dictionary?)for: iterates through keys in the
dictionary
Operators in useOperators in use
Dictionary MethodsDictionary Methodsitems(): returns all the key-
value pairs as a list of tupleskeys(): returns a list of all the
keysvalues(): returns a list of all the
valuescopy(): returns a shallow copy of
the dictionary
MethodsMethods
zip( ) - dict( )zip( ) - dict( )zip(): creates a list of tuples
from two listsdict(): creates a dictionary from
a mapping object, or an empty dictionary
FUNCTIONSFUNCTIONS
Why Use Functions?Why Use Functions?Functions provide encapsulation,
making code better, readableDivide and Conquer Problem Solving
◦Break large complicated problems into smaller sub-problems
◦Solution to sub-problems written in functions
◦sub-problem solutions can be reused, shared
Simplification/Readability◦Removes duplicate code with function
calls
Why Use Functions?Why Use Functions?Abstraction
◦Provides a high-level interface to program
◦You know WHAT it does, not HOW it does it
Security◦A small well defined piece of code is
easier to prove correct◦Code can be fixed in one place, and will
be corrected everywhere function is called
Function DefinitionFunction Definition
Function CallsFunction Calls
Functions that do things...Functions that do things...no parametersno return statementaffect the environment
Functions that have Functions that have parameters...parameters...definition has parameterscall takes argumentsno return statementaffect the environment
Functions that return Functions that return results...results...return keywordFunction performs processingReturns value/object
Functions with default Functions with default parameters...parameters...Parameters can have default values
We can call this function with:◦print_message("Hello class.")◦print_message("Hello class.", 3)
Can explicitly name arguments:◦print_message(times=3, msg="Hello
class.")
Variable Scope (local Variable Scope (local variables)variables)
Variable Scope (global Variable Scope (global variables)variables)
INTRODUCTION OF INTRODUCTION OF NEWS AGGREGATORNEWS AGGREGATOR
First Module - urllibFirst Module - urllibUsage: import urllibThis module provides a high-level
interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames. Some restrictions apply -- it can only open URLs for reading, and no seek operations are available
urlopen()urlopen()urllib.urlopen(url)Example:
◦import urllib◦f =
urllib.open(“http://www.python.org”)◦text = f.read()
Second Module - Second Module - xml.dom.minidomxml.dom.minidomUsage
◦from xml.dom.minidom import parse, parseString
is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller
DOM applications typically start by parsing some XML into a DOM
RSS feed programRSS feed programdef get_latest_feed_items():
◦return item_list def
search_latest_feed_items(item_list, searchterm):◦Return filtered_item_list◦Example: search item description
Function usage◦latests = get_latest_feed_items()◦search =
search_latest_feed_items(latests, "game")
Example of ModificationExample of ModificationRetrieve latest feed item list
◦item_list = get_latest_feed_items()Define a search term, “” means
all◦searchterm = “”
Obtain the filtered item list◦filtered_item_list =
search_latest_feed_items(item_list,searchterm)
Example of ModificationExample of ModificationRemember, keys = tagnames in
the XML!If you want to modify useful_keys,
make sure you attach the "u".For example, if you want to add
author, add u'author' to the listDefine your useful keys
◦useful_keys = [u'title', u'pubDate', u'description', u'guid']
Example of ModificationExample of ModificationDisplay all items and keys
◦for item in filtered_item_list:◦ for key in useful_keys:◦ # print "%s: %s" %
(key,item[key])◦ print key + " " + item[key]◦ print " "
Some Modification Ideas Some Modification Ideas (1)(1)Read in an RSS feed and find
MULTIPLE keywords (as many as the user wants),
Return the corresponding articles. You may want to think about the
readability of the results. Note that articles MAY be repeated if
different keywords occur in their titles and/or description (hint: Useful keys).
Some Modification Ideas Some Modification Ideas (II)(II)Filter articles from an RSS feed
based on multiple keywords.(hint: Nested loops, filtering by
one keyword in each loop).
Some Modification Ideas Some Modification Ideas (III)(III)Count how many times certain
interesting words appear in an RSS feed
Plot Excel charts (bar, pie, or line graphs).
Some Modification Ideas Some Modification Ideas (IV)(IV)Read an RSS feed and allow the
user to specify how many news he/she wants to see at one time.
You may want to display how the total number of news first,
THEN ask the user how many news they want to see.
Some Modification Ideas Some Modification Ideas (IV)(IV)The ability to take MULTIPLE RSS
feeds, then go through them ALL and look for articles with a certain keyword.
You can either give user a limit on maximum number of feeds, or allow as many feeds as user wants.
Note: Probably the hardest. This one simulates a mini search engine / web crawler.
Your WorksYour WorksSpecify rolesCome out some ideas or use those
ideas but explain in your own wordsHow much progress you can makeTeam work, coordinate with each
other (Project manager)Try to answer all listed questionPrepare your presentation and all
other worksGrade is based on creativity and
complexity as well as the role you performed
DiscussionDiscussion