2021-04-30 13:11:32 +02:00
|
|
|
# imdbscrapper
|
2021-05-02 14:26:43 +02:00
|
|
|
Scrapper to get movies information from IMDB, indexing it into movies and shows, with rating, release date, and a few more information.
|
2021-05-02 14:23:12 +02:00
|
|
|
|
|
|
|
|
### Situation
|
|
|
|
|
Finding movies / shows to watch, based on ratings and release date.
|
|
|
|
|
This search and notes would have to be done manually.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Task
|
2021-05-02 14:34:54 +02:00
|
|
|
Create a way to automatically index entries from movies (IMDB), so they can be searched and filtered afterwards via common software (Spreadsheet)
|
|
|
|
|
|
|
|
|
|
### Action
|
2021-05-03 17:06:35 +02:00
|
|
|
- Just the scrap data
|
|
|
|
|
[IMDBScrap](https://github.com/zebrajr/imdbscrap)
|
|
|
|
|
|
2021-05-02 14:23:12 +02:00
|
|
|
- With Docker
|
|
|
|
|
|
|
|
|
|
```sh
|
2021-05-02 14:26:43 +02:00
|
|
|
docker build -t yourUser/yourPackage:yourVersion .
|
2021-05-02 14:23:12 +02:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
- Directly
|
|
|
|
|
|
2021-05-02 14:26:43 +02:00
|
|
|
> Install the requirements described in requirements.txt (pip3 install -r requirements.txt)
|
2021-05-02 14:23:12 +02:00
|
|
|
> Create the folder structure or edit the settings in the main script
|
|
|
|
|
```sh
|
|
|
|
|
python3 scrapper.yml
|
|
|
|
|
```
|
|
|
|
|
### Result
|
|
|
|
|
| File | Content |
|
|
|
|
|
| ------ | ------ |
|
|
|
|
|
| movies.csv | CSV file with all movies indexed |
|
|
|
|
|
| series.csv | CSV file with all shows indexed |
|
|
|
|
|
| info.log | Any errors occured. Change the debug level if you want to log info messages |
|
|
|
|
|
| counter.txt | The last indexed url. Needed to continue in case the script is interrupted |
|
|
|
|
|
|
|
|
|
|
### Note
|
|
|
|
|
|
|
|
|
|
### ToDo
|
2021-05-04 00:05:11 +02:00
|
|
|
- Dont input duplicates into dataTable
|
2021-05-02 14:35:54 +02:00
|
|
|
- Add Error Handling in case Internet is not available
|
2021-05-02 14:56:18 +02:00
|
|
|
- Add possibility to re-index failed entries (to go though the indexer faster when a new movie/show is added)
|
2021-05-02 18:38:56 +02:00
|
|
|
- Add Multithreading
|
2021-09-01 20:56:57 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ps.: Feel free to improve :)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Some Statistics
|
|
|
|
|
<img src="https://img.shields.io/github/license/zebrajr/imdbscrapper?logo=github"><img src="https://img.shields.io/github/forks/zebrajr/imdbscrapper?logo=github"><img src="https://img.shields.io/github/stars/zebrajr/imdbscrapper?logo=github">
|
|
|
|
|
<br>
|
|
|
|
|
<img src="https://img.shields.io/github/last-commit/zebrajr/imdbscrapper?logo=gitfs"><img src="https://img.shields.io/maintenance/yes/2021">
|
|
|
|
|
<br>
|
|
|
|
|
<img src="https://img.shields.io/github/repo-size/zebrajr/imdbscrapper?logo=files"><img src="https://img.shields.io/tokei/lines/github/zebrajr/imdbscrapper?logo=files">
|
|
|
|
|
<br>
|
|
|
|
|
<img src="https://img.shields.io/github/issues-raw/zebrajr/imdbscrapper?logo=gitbook"><img src="https://img.shields.io/github/issues-closed-raw/zebrajr/imdbscrapper?logo=gitbook">
|
|
|
|
|
<br>
|
|
|
|
|
<img src="https://img.shields.io/github/issues-pr-raw/zebrajr/imdbscrapper?logo=git"><img src="https://img.shields.io/github/issues-pr-closed-raw/zebrajr/imdbscrapper?logo=git">
|