Files
imdbscrapper/README.md

60 lines
2.3 KiB
Markdown
Raw Normal View History

2021-04-30 13:11:32 +02:00
# imdbscrapper
2021-05-02 14:26:43 +02:00
Scrapper to get movies information from IMDB, indexing it into movies and shows, with rating, release date, and a few more information.
2021-05-02 14:23:12 +02:00
### Situation
Finding movies / shows to watch, based on ratings and release date.
This search and notes would have to be done manually.
### Task
2021-05-02 14:34:54 +02:00
Create a way to automatically index entries from movies (IMDB), so they can be searched and filtered afterwards via common software (Spreadsheet)
### Action
2021-05-03 17:06:35 +02:00
- Just the scrap data
[IMDBScrap](https://github.com/zebrajr/imdbscrap)
2021-05-02 14:23:12 +02:00
- With Docker
```sh
2021-05-02 14:26:43 +02:00
docker build -t yourUser/yourPackage:yourVersion .
2021-05-02 14:23:12 +02:00
```
- Directly
2021-05-02 14:26:43 +02:00
> Install the requirements described in requirements.txt (pip3 install -r requirements.txt)
2021-05-02 14:23:12 +02:00
> Create the folder structure or edit the settings in the main script
```sh
python3 scrapper.yml
```
### Result
| File | Content |
| ------ | ------ |
| movies.csv | CSV file with all movies indexed |
| series.csv | CSV file with all shows indexed |
| info.log | Any errors occured. Change the debug level if you want to log info messages |
| counter.txt | The last indexed url. Needed to continue in case the script is interrupted |
### Note
### ToDo
2021-05-04 00:05:11 +02:00
- Dont input duplicates into dataTable
2021-05-02 14:35:54 +02:00
- Add Error Handling in case Internet is not available
2021-05-02 14:56:18 +02:00
- Add possibility to re-index failed entries (to go though the indexer faster when a new movie/show is added)
2021-05-02 18:38:56 +02:00
- Add Multithreading
2021-09-01 20:56:57 +02:00
Ps.: Feel free to improve :)
## Some Statistics
<img src="https://img.shields.io/github/license/zebrajr/imdbscrapper?logo=github"><img src="https://img.shields.io/github/forks/zebrajr/imdbscrapper?logo=github"><img src="https://img.shields.io/github/stars/zebrajr/imdbscrapper?logo=github">
<br>
<img src="https://img.shields.io/github/last-commit/zebrajr/imdbscrapper?logo=gitfs"><img src="https://img.shields.io/maintenance/yes/2021">
<br>
<img src="https://img.shields.io/github/repo-size/zebrajr/imdbscrapper?logo=files"><img src="https://img.shields.io/tokei/lines/github/zebrajr/imdbscrapper?logo=files">
<br>
<img src="https://img.shields.io/github/issues-raw/zebrajr/imdbscrapper?logo=gitbook"><img src="https://img.shields.io/github/issues-closed-raw/zebrajr/imdbscrapper?logo=gitbook">
<br>
<img src="https://img.shields.io/github/issues-pr-raw/zebrajr/imdbscrapper?logo=git"><img src="https://img.shields.io/github/issues-pr-closed-raw/zebrajr/imdbscrapper?logo=git">