Change Contents of the Bubble
Welcome to CS1315. Click on the python to add comments.

Looking for the book? They have it at the Engineer's Bookstore at 748 Marietta St NW. Here is there website: http://www.engrbookstore.com/ - Monica

Hotspots: Slides and CodeTA CornerComments?AnnouncementsFAQStatic Webspace
View this PageEdit this Page (locked)Uploads to this PageHistory of this PageHomeRecent ChangesSearchHelp Guide

Fall 2005 Homework 5 Questions Page


does the homework need to have any inputs?

Yes. There has been an adjustment to the assignment so that input is required. Sorry for the confusion. Kelly Lyons

when is it due? next friday, right?
you got it. Amanda Bennett

the last test avg was about a 70 something, will there be a curve or is there someway to earn points back on the test?
There will be no curve on the test or on your final grade. If you have questions about your grade on specific questions (once you get the exam back), go see your TA. Amanda Bennett

I am slightly confused with this assignment. Can it be links to any articles on the news sites or specifically the first 5? If I submitted the assignment early the news articles obviously wouldn't be the same as they would be on Friday Nov. 11th.

Nevermind I just read the last line about the headlines being different. Sorry

Just for clarification: The assignment should grab the top 5 headlines from the site at the time the function is run in JES. Therefore when you run it before you turn it in, you'll get one set of headlines. When your TA grades it, he/she'll get a different set of headlines. The headlines must be the most recent top 5 from the site at the time it is run. DO NOT hard code headlines into your assignment. This is not what we are looking for and will result in a very poor grade. Kelly Lyons

I just finished my code. Just so you know, on the CNN website, there's some funky html with a little video link that says "Watch" after the 4th headline. At least right now there is. My code won't look for headlines beyond this little "watch" button for some reason. The rest of the websites work fine.
Yes there is the button watch but I think you should skip over it and go to the next text headline. -Albert d'Heurle

  economist = getEconomistNews()
  cnn = getCNNNews()
  wwn = getWeeklyWorldNews()
  for num in range(5):
    code = code + '<tr><td><b>' + economist[num] + '</b></td><td><i>' + cnn[num] + '</i></td><td><u>' + wwn[num] + '</u></td></tr>'

There's part of my code (I hope it's not too much). It keeps giving me this error for the "code = ..." line:
An attempt was made to call a function with a parameter of an invalid type. This means that you did something such as trying to pass a string to a method that is expecting an integer.

Can anyone tell me what I'm doing wrong? Thanks!
Parameter of an invalid type means that something in that line is not a string. Go back and check what the values in your 3 lists are. Most likely one of them is not a string. Also make sure when you defined code, you defined it to be a string. Kelly Lyons
You also need to make sure that code is declared (code = ?????) before the for loop; otherwise you run into scope issues. Blake Israel

The values for economist, cnn, and wwn are supposed to go to another function. I'm not really sure if they're a string or not...Do you think my functions are defined wrong and that's why it isn't reading them right? Also, I did define code earlier with html, etc. so that shouldn't be the problem. I really think it's something with the way I put in the other functions. Any advice or should I just email you my code? Thanks!

I have a problem with getting the href link off of the cnn website when there is extra stuff in front of the link. For instance, at 6:22 on Sunday, there is "SI.com:" and there is "Watch:" in front of 2 of the links. Are we to write our code to include for these specific anomilies or can we just assume the general format? –Thanks
There is an even more general format than what you are probably trying. Remember that every one of those links is it's own tag. Try looking for what encloses the whole item in that list. Blake Israel

In order to find each successive headline can we look for the previous headline found as a starting point to look for the next one? Like can I do:
end=cnn.find("",cnnhead1)
where cnnhead1 is the the first headline? I'm not sure if this will work right.
Nope, you can't use the headline itself but you may want to try using something around the headline that won't change to base the next search off of. David Baxter


this is kinda in the beginning of my code:

def hw5(directory):
  file=open(directory,"wt")
  file.write(doctype())
  file.write('<head><title>"News Headlines"</titel></head>')

but jes keeps telling me that "hw5(directory)
I tried to read a file, and couldn't. Are you sure that file exists? If it does exist, did you specify the correct directory/folder?
Please check line 5 of C:\Documents and Settings\sunshine\My Documents\hw5"
and i did specify the correct folder. i have newfolder on my desktop called newsheadlines.html
and in the command area, i did directory=r'Cblah blah' then hw5(directory)
what did i do wrong?
You aren't giving it a file to open. It should be file=open("aFileName.extension","wt"). Look at your last homework to see exactly how to write it. poof #10

The economist website won't allow us to use the html for this assignment. It has some blocking stuff at the end. We discovered this in Blake Isreal's and Jonny's recitation....so, what should we do?

Do we need to have a Meta tag or can we do with out it and get no points reduced?

You do not need Meta tags in your assignment. Kelly Lyons

are any other sites being blocked as well? My cnn headlines work and thats about it...If they are being blocked...is there an alternative?

You can still pull headlines from Weekly World News without problems... I don't see any problem pulling headlines from the Economist either; the headlines under 'Global Agenda' are just as easy to parse as those on CNN. If you are running into trouble, make sure you get the URL http://www.economist.com/index.html rather than the root directory. The latter page contains some site setup stuff and a JavaScript redirect to the former, which is the one you want. – Matt Britt

After running my program and viewing my page source, it shows that the headlines are in my table. However, when I view the page the only headlines that show up are the world weekly news headlines. Could this be due to the "blocking" that people are talking about?

Nevermind, I just fixed it by adding a
before each of the headlines. I don't know why that worked.

Can we just skip the first headline on the economist site since it has that image block?
No. You can do it very easily without having to worry about the image. poof #10

So can we use the Economist or not? Supposedly, it will not let us use it since we are not on a browser. That's what Blake and Jonny said.
Use the three websites that it says to use in the instuctions. poof #10

ahh! last night i wrote code that worked fine, and now when i try to run it to get the headlines, i keep getting the message" A local name was used before it was created. You need to define the method or variable before you try to use it."- why is that????
We can't help you unless we see your code

AND

exactly what you typed in to get that error message. poof #10

for the economist, use the address http://www.economist.com/index.html and it will work just fine
Correct. I think that is posted as the website to use. poof #10


on worldweeklynews.com after we search "breaking_news.gif"
do we start and end at a and because this is the only one that is not putting the headlines intot eh table whereas the others are wokring and have the same basic code

look at the source code for worldweeklynews and look for the headlines. Then look for something that you could use. Amanda Bennett

im aheving trouble how to make the links functional... i know that i need to add the first part...for example "http://cnn.com/" + "the rest"
can someone steer me in the right direction please

when i typed the code i get this respond from JES:
You are trying to access a part of the object that doesn't exist.
in file D:\CS1315\hw5\hw5.py, on line 3, in function hw5
in file C:\JES-SA1a\jython\Lib\urllib.py, on line 71, in function urlopen
in file C:\JES-SA1a\jython\Lib\urllib.py, on line 176, in function open
in file C:\JES-SA1a\jython\Lib\urllib.py, on line 283, in function open_http
in file C:\JES-SA1a\jython\Lib\httplib.py, on line 440, in function putrequest
AttributeError: __getitem__
Please check line 3 of D:\CS1315\hw5\hw5.py
i think that it can't recognize the url, what should i do?

I'm very confused about something. So my input for my function is a directory: hw5(directory)
so inside that directory, do i make another folder named newsheadlines.html? and once i do that do i, file=open(newsheadlines.html,"wt")?

What does "& n b s p ;" mean? (I had to put spaces between the characters otherwise it wouldn't appear)
I find it in headline tags on The Economist.

i have typed up my code but when i run it i get the same headline repeated 5 times. what do i need to change in order for there to be 5 different headlines this is what i have:

news.write(" CNN The Economist Weekly World News ")
for number in range(0,5):
news.write("" +findCnnNews()+ ""+econ+ ""+wwn+ "")
news.write('')
i also use the livetemp example to write the def for findCnnNews

For CNN.com, do we include the main headline as the first headline or do we just use the first five headlines under 'MORE NEWS'?

Some answers:
For links: yes, you need to add the right prefix to the html "a" tag to get to the page ... look at the HTML source to the sample output on the assignment page for an example!
if it's failing down inside "urlopen", you are probably not passing a string, or you are not passing a valid URL. Try printing out the url you are passing in and looking at it to make sure it is valid
you don't "make another folder" inside "directory", you simply create the file as you show (with the open) ... just make sure you pass the right string to open so that it opens the file in that directory
& n b s p ; means "non-breaking space" ... it allows you to put spaces between words without the browser breaking those words across a line
for the repeating headlines, where are you changing the values of "econ" and "wnm"? Also, findCnnNews() is probably finding the same first headline over and over, since it probably has no way of knowing which headline you want ... perhaps you should pass in "number" and have it find that headline
For CNN.com, I intended that you use the first five under "MORE NEWS" and not the main headline. Either is fine.
Blair MacIntyre
how can we exactly distinguish between the headlines? like, when we use .find to find specific strings in the whole string after "MORE NEWS", what should we use to search for each headline?
what i mean is that for example: for the CNN website, the headline is "Blair suffers..." and before that is and it ended with , but this is not true for every single case, is there any general format we may use so the .find function can work more easily?
Each headline is a link so you can search for the and . The text of the headline should be in between those two tags. Kelly Lyons
well, that's all well and good accept when there is some other kind of link between those two tags...for example an image link...how do we make our functions only find the text and when it's an image, skip to the next iteration of the loop determining our headline search?
If there is an image tag in the link tag, then it isn't a headline and you don't want to get it. You can check to see if the image tag is there using functions like find, beginswith, etc. If it is, then skip to the next link by searching for the next link tag. If it isn't there, then it's a headline and you get it.

I was wondering if I can use the rss feed for all the websites to pull down the headlines?
No. You can't do this. You must use the site. Kelly Lyons


do you write the code to find the healines separately or together?, because i can't get it to work at all either way; here is a part of my code:



when i run it it just print out the headlines for cnn, and does nothing to the second one
When you type return it will return and stop running the function. So everything you've written after your first return is not ever happening. I deleted your code from the page. Please in the future if you have a question that requires one to see so much of your code, email a TA with it or visit office hours. Kelly Lyons

here is my stuff for trying to search for a headline...how could I change it so that it doesn't get stopped by image tags, video links, etc.?
def getCnn(num):
  import urllib
  connection=urllib.urlopen("http://www.cnn.com")
  head = connection.read()
  connection.close()
  curloc=head.find("MORE NEWS")
  for i in range (0,num):
    headloc=head.find("</a>",curloc)
    headstart=head.rfind(">",0,headloc)
    cnn1=head[headstart+1:headloc]
    curloc=headloc+10
  return cnn1+"</a>"

Try an if statement that checks to see if the img tag is after headstart. Kelly Lyons

QUick question. THe page and links are created fine. I go to view source and the a hrefs look normal. However, when I click on a link in the page jes creates it always put the root directory source before the webpage. I can't figure out what to do to fix this, because the webpage source looks fine to me, but when i click on the link it always goes to try and find the link in the jes directory where the webpage is created.
Show your code to a TA and they can help you. This is a difficult question to answer on the questions page. Kelly Lyons

When I try to run my program, I get this message: "You are trying to access a part of the object that doesn't exist."

This is the line it tells me to check: htmlfile.write=("table border='0' cellspacing='2' cellpadding='3' width='100%' bgcolor='#FFFFFF'")

Can you tell me what's wrong with this line?

Nevermind I got it!


So this is what I have:

for x in range(0,5):
htmlfile.write('td bgcolor= "#aaaaff" + divs1[x]+ ''\n')
htmlfile.write('td bgcolor= "#ffaaaa" + divs2[x]+ ''\n')
htmlfile.write('td bgcolor= "#aaaaaa" + divs3[x]+ ''\n')

when I try to run the program, it tells me: "An error occurred attempting to pass an argument to a function." and it directs me to the first htmlfile.write line. What am I doing wrong?

Your quotes don't line up. Maybe you mean:
htmlfile.write('td bgcolor= "#aaaaff"' + divs1[x]+ '\n')

for our hw, what name do we save our code in? is it hw5.py? i need to know this cuz i got points taken off on one of the hw, cuz i didn't save it in the right name.

Hey, some of the headlines on cnn.com link to videos using javascript tags. I don't know how to edit these tags so that it will link to the video from my news.html page. And I suppose we are not expected to do so, thus can we just leave these headlines as broken links and still get the extra credit?

I wouldn't... You set an if statement to skip gif images, right? Just add an or that looks for the word 'javascript.' Works like a charm.

Yes. We only want the text headlines. Anything that is a video you don't want. Kelly Lyons

ok, i'm trying to write an if statement that overlooks or removes items with "javascript" from the html. i've been stuck for like 2 hours; here's what i have so far:
I created an empty list named "cnn", then

pos= text.find("MORE NEWS")
for num in range(0, 5):
href = text.find("href", pos)
video= text.find("javascript", href)
firstanglebracket = text.find(">", href)
secondanglebracket = text.find("", firstanglebracket)
pos= firstanglebracket
if text.find("video", firstanglebracket):
cnn.remove(something)
I can't figure out how to get jes to remove something from the list if it has "javascript". Anybody have any ideas? (hope i didn't post too much)
As long as the "headline" portion is text, then it is ok to keep it as a headline. That way you can avoid having to edit out the javascript issues. However, you do need to make sure that you skip over any of the image links. For example, the watch button should not count as a headline. If you would like to skip over the javascript anyways, feel free. It is not necessary. Kelly Lyons

Hmmm... My question still isn't answered. Some of the headlines on cnn.com link to videos using javascript tags. We haven't learned how to edit these tags so that it will link to the video from the news.html page. Thus can we just leave these headlines as broken links and still get the extra credit? I realize that the directions call for text headlines. However, some headlines simply don't link to text at all and only to videos. What should we do with those headlines if we want to get the extra credit?
To get the extra credit, you only need to make the links that link to actual articles working. The "headlines" that link to videos, you can have as broken links and you will still get the extra credit as long as all the other ones work. Kelly Lyons


I can't figure out what is wrong with this piece of code:
def hw5():
  import urllib 
  def cnnpart():
    code = urllib.open("http://www.cnn.com") 

It's telling me that I haven't defined a variable in that last line, and to my knowledge, I have. What is wrong?
You should not be defining a function within another function. Also, the function you want to be using is urlopen not open. Kelly Lyons

Well one problem may be because you have to add a backslash after com. So it should be www.cnn.com/
Although, true. This is not what is causing the error. Kelly Lyons

do we turn it in from a saved JES file? or notepad?
Turn in a .py file. Kelly Lyons

This is the beginning of my code:
def hw5(directory):
 file=open(directory + "//headlines.html","wt")


And according to JES, this is not legal Jython. I can't even load the function. What is wrong with this line?

I'm havong a problem with this part of my code:
 connectionCNN=urllib.urlopen('http://www.cnn.com')
 CNNsite=connectionCNN.read()
 connectionCNN.close()
 CNNheadline=CNNsite.find('MORE NEWS')
 for num in range(1,6):
  CNNp1=CNNsite.find('</a>', CNNheadline)
  CNNp2=CNNsite.rfind('>', 0, CNNp1)
  if CNNheadline <> -1:
   'text' + num + 'CNN'=CNNsite[CNNp2 + 1, CNNp1]
    CNNp1=CNNp1 + 5
  if CNNheadline == -1:
   'text'+ num +'CNN'='not found'


There is something wrong with the first 1st line of the 1st "if" statement. I think its because of the way I tried to name the file. How can I rename the files correctly within my for loop? I want the filenames to match the article number, so "text1CNN" is the 1st headline, "text2CNN" is the second...without having to make a seperate variable for each. Can that be done?

So, I changed my "if" loop a little...
connectionCNN=urllib.urlopen('http://www.cnn.com')
 CNNsite=connectionCNN.read()
 connectionCNN.close()
 CNNheadline=CNNsite.find('MORE NEWS')
 for num in range(1,6):
  CNNp1=CNNsite.find('</a>', CNNheadline)
  CNNp2=CNNsite.rfind('>', 0, CNNp1)
  if CNNheadline <> -1:
   text1CNN=CNNsite[CNNp2 + 1, CNNp1]
   CNNp1=CNNp1 + 5
   text1CNN='text'+num+1+'CNN'

And now get the error message:"An attempt was made to call a function with a parameter of an invalid type. This means that you did something such as trying to pass a string to a method that is expecting an integer." for the 1st line of the 1st "if" statement. I thought it was finding a string and accepting a string. I'm confused now. Please help.

I can't get this part to work! It loads, but it's only bringing up the last headline and there is a dot in each of the other cells in the column

webwwn = urllib.urlopen("http://www.weeklyworldnews.com/")
wwn = webwwn.read()
webwwn.close()
iloc = wwn.find("breaking_news.gif")
headwwn = []
if iloc <> -1:
for x in range (0,5):
start = wwn.find('•', iloc)
end = econ.find('', start)
start2 = wwn.rfind('1">', 0, end)
headwwn = headwwn +[wwn[start2+3:end]]
iloc = start + 1

NEVERMIND

concerning extra credit: for the cnn headlines, with video links, the links, when copied to our pages, won't create valid links due to the javascript pop-ups. should I link them anyway or just not link these cases? will i still receive the extra credit if those particular ones aren't valid links?

I'm having problems with the Economist site. If I try to use '' in the find function, it stops on a '' tag before it even reaches the headlines. How can I skip over the first ' tag?


In noticed something on the CNN site: The tag that has the javascript is not causing me problem, but the lack of stuff between the ">" is what is causing a problem. My code won't look at the javascript part because its not between the the area I search, but closing the javascript tag makes for a weird sutuation. What can you do if the ">" has nothing between it at all? That means I lose a headline automatically.

would anyone know why my after using the merge function for sounds on page 171 the sounds become high pitched?



Link to this Page